
Data analysts and Machine Learning (ML) experts regularly use Python because of its powerful integration with ML and calculation libraries. As a data analyst or ML expert what libraries they use and they will undoubtedly say numpy (not covered in this post) and pandas.
Python Pandas
Pandas are actually built-in methods within Python 3. If you are using Python 3 then you can copy and paste code directly from this page and use your favorite Python interpreter. We use VS Code. Before we provide examples, it is important to note that Pandas only provides the data columns and values, it does not draw visual graphs. We need Pandas to create the structured data to be interpreted by our plotting engine. The data structures are also called Data Frames.
Panda Structures
A simple example is to create a column and value pair. We prefer dictionaries, but you can use lists if you like. Pandas have three steps before it can be plotted or printed.
- import pandas
- define data dictionary
- create a data frame
Doing those three steps will allow you to print the data frame and see your structured data. Check out this example.
import pandas as pd
data = {'Names':['Elite Hosting USA', 'Elite Code Academy','Twitter'],'Subscribers':[120,25,51]}
df = pd.DataFrame(data,columns=['Names','Subscribers'])
print(df)
If you copy and paste this code and run it, the output should like:
Names Subscribers
0 Elite Hosting USA 120
1 Elite Code Academy 25
2 Twitter 51
Plotting/Graphing
When we plot graphs, we use a plotting engine. We recommend matplotlib. It is important to note that matplotlib is not installed as a standard library so you will need to use pip to install it. Since we have a virtual environment, we had to use the command “pip3 install matplotlib”.
After installing matplotlib, you are ready to plot. There are two steps to seeing graphs.
- Define your plot type and data assignment
- show the plot
This goes without saying that you will need to import your newly installed library. Running the following code will generate a bar chart using the previous data:
import pandas as pd
import matplotlib.pyplot as plt
data = {'Names':['Elite Hosting USA', 'Elite Code Academy','Twitter'],'Subscribers':[120,25,51]}
df = pd.DataFrame(data,columns=['Names','Subscribers'])
df.plot(x ='Names', y='Subscribers', kind = 'bar')
plt.show()
The one important thing to note is the plot function takes an x-axis, y-axis, and kind. Typical options for kind include bar, scatter, and line. Go on run it!