Warming up for data analysis – Ayten Yesim Semchenko, Ph.D.

Hello there,

This week, we will warm ourselves up for data analysis!! I will be using the same data frame that I used in the previous blog post.

Let us first see the descriptive statistics such as mean, standard deviation, min and max values. To do that:

import pandas as pd
df = pd.read_csv("/Users/aytensemchenko/PycharmProjects/datapreparation/mydata.csv")
print(df.describe())

Sometimes, when we prepare our data for the analysis, we would need to add a new variable. For instance, let us add the variable “Total”. Total will be the sum of the ratings for the Image 1 and Image 2. (Remember, participants rated the attractiveness of Image 1 and Image 2). To create the variable “Total”:

import pandas as pd
df = pd.read_csv("/Users/aytensemchenko/PycharmProjects/datapreparation/mydata.csv")
df["Total"] = df["Image1"] + df["Image2"]
print(df)

We used “print” to see our new variable in the console. If we want to apply the changes (i.e., having a new variable column “Total”) into our .csv file:

import pandas as pd
df = pd.read_csv("/Users/aytensemchenko/PycharmProjects/datapreparation/mydata.csv")
df["Total"] = df["Image1"] + df["Image2"]
df.to_csv("modified.csv", index=False)

print(df)

So, now we created a new .csv file including the variable “Total”, and named it “modified.csv”. If we want to drop this new variable, then:

import pandas as pd
df = pd.read_csv("/Users/aytensemchenko/PycharmProjects/datapreparation/mydata.csv")
df["Total"] = df["Image1"] + df["Image2"]
df = df.drop(columns=["Total"])
df.to_csv("modified.csv", index=False)
print(df)

Let us assume that we want to filter our data with multiple conditions (e.g., age and gender). For instance, we only want to see 19 year-old-females. And we want to create a new data frame (new–df), including only these data:

import pandas as pd
df = pd.read_csv("/Users/aytensemchenko/PycharmProjects/datapreparation/mydata.csv")
new_df = df.loc[(df["Age"] == 19) & (df["Gender"] == "F")]
print(new_df)

Lastly, when we want to modify the data: For instance, instead of having “F”, we may want to see “Female” in the data frame. To achieve that:

import pandas as pd
df = pd.read_csv("/Users/aytensemchenko/PycharmProjects/datapreparation/mydata.csv")
df.loc[df["Gender"] == "F", "Gender"] = "Female"
print(df)

Cheers!

Category: Uncategorized

Leave a Reply Cancel reply