Pandas in Python Part 2

Pandas require three steps to be met before you can display the data. The library needs to be imported, a dictionary of data needs to be defined, and a DataFrame needs to be created. The following code shows the general process of using Pandas:

import pandas as pd

data = {'Names':['On-Campus, 'Online'],'Students':[1000,1150]}

df = pd.DataFrame(data,columns=['Names','Students'])

print(df)

The first line imports the Pandas library as an alias that we can reference as pd. This allows for less typing when we need to do a Pandas function. The second line of code creates a variable called data and stores it as a dictionary. Recall that a dictionary is a key-value store. The key is a string, and our value is a list. The first key of “Names” refers to the metric we want to track, and the second key, “Students” provides the numerical values. For this DataFrame we only need two keys because they are for the X and Y access of our future graph. Lastly, we print the data frame and get a table showing how many students are associated with on-campus and online courses.

Graphing 

In data analytics, we use matplotlib to graph Panda DataFrames. First, we need to import the library, define the x and y axis, and show the plot. The following code will create a graph shown in the result:

import pandas as pd

import matplotlib.pyplot as plt

data = {'Names':['On-Campus, 'Online'],'Students':[1000,1150]}

df = pd.DataFrame(data,columns=['Names','Students'])

df.plot(x ='Names', y='Students', kind = 'bar')

plt.show()

Line Chart

Using what we just learned, we can now create new DataFrames and plot types. To create a line plot, we need to create a DataFrame that uses a key of a metric name and a value of a list of numerical outputs. If we want to track error types over time, we need a DataFrame that looks like the following:

data = {"400":[20,22,12,28],"500":[12,15,11,8]}

df = pd.DataFrame(data, index=["May 1","May 3", "May 4","May 8"])

Data is a dictionary of two key values. Each key will represent a line when graphed. These represent 400 and 500-level errors, respectively. The values in the list represent how many times each error occurred.

The DataFrame takes in the data but this time we added an index. The index is a label that matches when the data point occurred. To get the line graph use the following code:

import pandas as pd

import matplotlib.pyplot as plt

data = {"400":[20,22,12,28],"500":[12,15,11,8]}

df = pd.DataFrame(data, index=["May 1","May 3", "May 4","May 8"])

df.plot.line()

plt.show()

Pie Charts 

The last graph covered in this practice is the Pie Chart. The Pie Chart requires a DataFrame of a metric key and a value of a list of numbers. The DataFrame looks like the following:

data = {"People":[20,35,45]}

df = pd.DataFrame(data, index=["Kids","Adults","Teens"])

Notice that in the DataFrame, there is an index that labels what the values represent. In this case, we have a DataFrame that shows people broken down into three categories. To show the pie chart, use the following:

import pandas as pd

import matplotlib.pyplot as plt

data = {"People":[20,35,45]}

df = pd.DataFrame(data, index=["Kids","Adults","Teens"])

df.plot.pie(y="People")

plt.show()

Additional Resources

Learn Numpy

Learn More About Pandas

Matplotlib resources

More Python Posts

Leave a Comment