Panda Hacks

  1. make your own data using your brian, google or chatgpt, should look different than mine.
  2. modify my code or write your own
  3. output your data other than a bar graph.

note

all hacks due saturday night, the more earlier you get them in the higher score you will get. if you miss the due date, you will get a 0. there will be no tolerance.

no questions answered

Tonight- 2.9

Friday Night- 2.8

Saturday Night - 2.7

Sunday Night - 0.0

questions answered

Tonight- 3.0

Friday Night- 2.9

Saturday Night - 2.8

Sunday Night - 0.0

wdfasdf

Hacks 1 and 2

Hack 1 is songs.csv

import pandas as pd
import matplotlib.pyplot as plt

# Read the CSV file
df = pd.read_csv('songs.csv')

# Sort the data by ranking
df_sorted = df.sort_values('Ranking')

# Print the rankings in best to worst
print(df_sorted[['Song Name', 'Ranking']].to_string(index=False))

# Create a line graph
plt.plot(df_sorted['Song Name'], df_sorted['Ranking'])

plt.xlabel('Song Name')
plt.ylabel('Ranking')
plt.title('Ranking of Popular Songs')

plt.xticks(rotation=90)

# Display the graph
plt.show()
                              Song Name  Ranking
                             Love Story       31
                           Shake It Off       61
                     You Belong With Me       76
                I Knew You Were Trouble       89
                            Blank Space       98
We Are Never Ever Getting Back Together      106
                              Bad Blood      128
                                     22      139
                               Delicate      149
                                  Style      178

Hack 3

  1. What are the two primary data structures in pandas and how do they differ?
  • The two primary data structures in pandas are Series and DataFrame. A Series is a one-dimensional array-like object that can hold any data type. A DataFrame is a two-dimensional table-like data structure with rows and columns, similar to a spreadsheet.
  1. How do you read a CSV file into a pandas DataFrame?
  • To read a CSV file into a pandas DataFrame, you can use the read_csv function in pandas.
  1. How do you select a single column from a pandas DataFrame?
  • To select a single column from a pandas DataFrame, you can use the indexing operator [] with the column name
  1. How do you filter rows in a pandas DataFrame based on a condition?
  • To filter rows in a pandas DataFrame based on a condition, you can use boolean indexing.
  1. How do you group rows in a pandas DataFrame by a particular column?
  • To group rows in a pandas DataFrame by a particular column, you can use the groupby method.
  1. How do you aggregate data in a pandas DataFrame using functions like sum and mean?
  • To aggregate data in a pandas DataFrame using functions like sum and mean, you can use the agg method.
  1. How do you handle missing values in a pandas DataFrame?
  • To handle missing values in a pandas DataFrame, you can use the fillna method to fill in missing values with a specific value or method, or you can use the dropna method to remove rows with missing values.
  1. How do you merge two pandas DataFrames together?
  • To merge two pandas DataFrames together, you can use the merge method
  1. How do you export a pandas DataFrame to a CSV file?
  • To export a pandas DataFrame to a CSV file, you can use the to_csv method
  1. What is the difference between a Series and a DataFrame in Pandas?
  • The main difference between a Series and a DataFrame in pandas is that a Series is a one-dimensional array-like object, while a DataFrame is a two-dimensional table-like data structure. A Series can be thought of as a single column of a DataFrame, while a DataFrame can have multiple columns.

Data analysis hacks

  1. Preprocessing data with NumPy and Pandas for predictive analysis: NumPy and Pandas are two popular Python libraries that can be used to preprocess data for predictive analysis. NumPy is primarily used for numerical computations, while Pandas is used for data manipulation and analysis. Some common preprocessing tasks include cleaning, scaling, and feature engineering. Cleaning involves removing missing values, outliers, and duplicates. Scaling involves normalizing data to improve performance and accuracy. Feature engineering involves selecting and transforming relevant features to improve model performance.

  2. Machine learning algorithms for predictive analysis: There are various machine learning algorithms that can be used for predictive analysis, including linear regression, logistic regression, decision trees, random forests, support vector machines, and neural networks. These algorithms differ in their complexity, interpretability, and performance. Linear and logistic regression are simple and interpretable, while decision trees and random forests are more complex but can handle non-linear relationships. Support vector machines are powerful for handling high-dimensional data, while neural networks are flexible and can learn complex patterns.

  3. Real-world applications of predictive analysis: Predictive analysis is used in various industries to improve decision-making and optimize business processes. For example, in finance, predictive analysis can be used to predict stock prices and detect fraud. In healthcare, it can be used to predict disease outcomes and personalize treatment plans. In marketing, it can be used to predict customer behavior and optimize advertising campaigns.

  4. Feature engineering in predictive analysis: Feature engineering is the process of selecting and transforming relevant features to improve model accuracy. This can involve creating new features from existing ones, encoding categorical variables, and reducing dimensionality. Feature engineering can be time-consuming, but it can greatly improve model performance.

  5. Deploying machine learning models in real-time applications: Machine learning models can be deployed in real-time applications using various techniques such as REST APIs, microservices, and serverless functions. These techniques allow models to be deployed on the cloud, which can improve scalability and reliability.

  6. Limitations of NumPy and Pandas: NumPy and Pandas are powerful tools for data analysis, but they have some limitations. They can be memory-intensive and may not be suitable for handling large datasets. They also have limited support for parallel processing and distributed computing.

  7. Using predictive analysis to improve decision-making: Predictive analysis can be used to improve decision-making by providing insights and predictions based on data. This can help businesses optimize their operations, reduce costs, and improve customer satisfaction. For example, predictive analysis can be used to predict customer churn and identify the most effective retention strategies. It can also be used to optimize supply chain management and reduce inventory costs.

Numpy

Still working on it

from skimage import io
photo = io.imread('images/waldo.jpg')
type(photo)
import matplotlib.pyplot as plt
plt.imshow(photo)
photo.shape
(461, 700, 3)
plt.imshow(photo[210:350, 425:500])
<matplotlib.image.AxesImage at 0x7fb69111ceb0>

Another use of Numpy

import numpy as np

# create a NumPy array
my_array = np.array([1, 2, 3, 4, 5, 3])

# use np.where to find the indices of the value 3 in the array
indices = np.where(my_array == 3)

# print the result
print(indices)
(array([2, 5]),)

The result is a tuple containing the indices where the condition my_array == 3 is True, which is (array([2, 5]),). This means that the value 3 is located at indices 2 and 5 within the array.