Today we learn how to make my own template for plots
I have to confess something. I have a problem with plots, graphics, visualizations. I have no problem with image or with decision what I have to create. I have a problem with realization.
In frankly speaking there are so many methods of creating plots in python, I can remember what to use it. Sure, if I could do some more exercises it would be easier for me. Never mind!
Fortunately somebody invented computer, who can remember this pretty mess. I decided to create special library of plots. This solution gave me independence.
I can make presentation faster because I don't have thought about colors or plot size. Every plot is the same, a have, prepared earlier my own style.
Are you convenience? Let's go to do template for plots !
Data preparation
At the first step we open data and needed libraries.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
## data source: https://s3.amazonaws.com/dq-blog-files/fortune500.csv
df = pd.read_csv('c:/2/fortune500.csv')
df.columns = ['year', 'rank', 'company', 'revenue', 'profit']
df.head(3)
We routinely check how formats have our columns. Turn out we have non numeric data in column: 'profit'. The reason of that may be any words or signs in place of numbers. We have to find out what kind of contamination are there.
df.dtypes
df.profit.value_counts
I detected contamination. So I wipe it out and exchange format from str in to float.
df.loc[df.profit=='N.A.']
df.profit.replace('N.A.',np.nan, inplace = True)
df = df.dropna(how='any')
df['profit'] = df['profit'].apply(pd.to_numeric)
Ok, we have data ready to next steps!
We do template for plots
I prepared template for linear plots. I use them most frequently because I am a financial analyst.
This ready for using template I put to my repository.
Now we need to have adequate prepared data to put into the template.
def LinearPlot(x, y, ax, title, x_label, y_label):
ax.set_title(title, color='darkred', alpha=1)
ax.set_ylabel(y_label, color='grey', alpha=0.6)
ax.set_xlabel(x_label, color='grey', alpha=0.6)
ax.plot(x, y, color='black', alpha=0.6, linestyle='dashed')
ax.grid(linewidth=0.85, alpha=0.2)
ax.margins(x=0, y=0)
Pivot table is the best
To have good linear plot we need three things: x arrow, y arrow and data. Additionally, a title and descriptions of axes could be useful. Now we create pivot table, next exchange it into dataframe. Next easily query I separate x and y and data.
Ewa = df.pivot_table(index='year', values=['revenue', 'profit'], aggfunc='mean') df2 = Ewa.reset_index() x = df2.year y = df2.profit title = 'Profit fortune500' y_label = 'Profit (millions)' x_label = 'Years'
Use template
fig, ax = plt.subplots(figsize=(6, 2)) LinearPlot(x, y, ax, title, x_label, y_label)
x = df2.year y = df2.revenue title = 'Revenue fortune500' y_label = 'Profit (millions)' x_label = 'Years' fig, ax = plt.subplots(figsize=(6, 2)) LinearPlot(x, y, ax, title, x_label, y_label)
I hope this is good solution to do template for plots!
Entire code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
## data source: https://s3.amazonaws.com/dq-blog-files/fortune500.csv
df = pd.read_csv('c:/2/fortune500.csv')
df.columns = ['year', 'rank', 'company', 'revenue', 'profit']
df.head(3)
df.dtypes
df.profit.value_counts
df.profit.value_counts
df.loc[df.profit=='N.A.']
df.profit.replace('N.A.',np.nan, inplace = True)
df = df.dropna(how='any')
df['profit'] = df['profit'].apply(pd.to_numeric)
def LinearPlot(x, y, ax, title, x_label, y_label):
ax.set_title(title, color='darkred', alpha=1)
ax.set_ylabel(y_label, color='grey', alpha=0.6)
ax.set_xlabel(x_label, color='grey', alpha=0.6)
ax.plot(x, y, color='black', alpha=0.6, linestyle='dashed')
ax.grid(linewidth=0.85, alpha=0.2)
ax.margins(x=0, y=0)
Ewa = df.pivot_table(index='year', values=['revenue', 'profit'], aggfunc='mean')
df2 = Ewa.reset_index()
x = df2.year
y = df2.profit
title = 'Profit fortune500'
y_label = 'Profit (millions)'
x_label = 'Years'
fig, ax = plt.subplots(figsize=(6, 2))
LinearPlot(x, y, ax, title, x_label, y_label)
x = df2.year
y = df2.revenue
title = 'Revenue fortune500'
y_label = 'Profit (millions)'
x_label = 'Years'
fig, ax = plt.subplots(figsize=(6, 2))
LinearPlot(x, y, ax, title, x_label, y_label)





