First we will need a dataset, let us learn to use rdatatset in Python https://vincentarelbundock.github.io/Rdatasets/datasets.html
# import the packages required
import numpy as np
import scipy as sc
import statsmodels.api as sm
import matplotlib.pyplot as plt
# import the data set using statsmodel.api
cars_speed=sm.datasets.get_rdataset("cars", "datasets")
Each dataset in rdataset has a description attached to it, contained in the doc file on the website. This document can be printed using python docstrings
print cars_speed.__doc__
Next let us print the data to see its content.
print cars_speed.data
We can access the columns of this data set in the following way:
print cars_speed.data['speed']
print cars_speed.data['dist']
Next we will plot a scatter plot between these variables using matplotlib
#make matplotlib inline
%matplotlib inline
#scatter plot
plt.scatter(cars_speed.data['speed'],cars_speed.data['dist'],c='b',s=60)
# xlable of the scatter plot
plt.xlabel('Speed')
# ylabel of the scatter plot
plt.ylabel('Distance to stop')
# title of the scatter plot
plt.title('Distance cars took to stop in 1920s')