template for plots - THE DATA SCIENCE LIBRARY http://sigmaquality.pl/tag/template-for-plots/ Wojciech Moszczyński Wed, 05 Sep 2018 19:24:00 +0000 pl-PL hourly 1 https://wordpress.org/?v=6.8.3 https://sigmaquality.pl/wp-content/uploads/2019/02/cropped-ryba-32x32.png template for plots - THE DATA SCIENCE LIBRARY http://sigmaquality.pl/tag/template-for-plots/ 32 32 How to make my own template for plots https://sigmaquality.pl/uncategorized/how-to-make-my-own-template-for-plots/ Wed, 05 Sep 2018 19:24:00 +0000 http://sigmaquality.pl/?p=6020 Today we learn how to make my own template for plots I have to confess something. I have a problem with plots, graphics, visualizations. I [...]

Artykuł How to make my own template for plots pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]>

Today we learn how to make my own template for plots

I have to confess something. I have a problem with plots, graphics, visualizations. I have no problem with image or with decision what I have to create. I have a problem with realization.

In frankly speaking there are so many methods of creating plots in python, I can remember what to use it. Sure, if I could do some more exercises it would be easier for me. Never mind!

Fortunately somebody invented computer, who can remember this pretty mess. I decided to create special library of plots. This solution gave me independence.

I can make presentation faster because I don't have thought about colors or plot size. Every plot is the same, a have, prepared earlier my own style.

Are you convenience? Let's go to do template for plots !

Data preparation

At the first step we open data and needed libraries.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
## data source: https://s3.amazonaws.com/dq-blog-files/fortune500.csv
df = pd.read_csv('c:/2/fortune500.csv')
df.columns = ['year', 'rank', 'company', 'revenue', 'profit']
df.head(3)

We routinely check how formats have our columns. Turn out we have non numeric data in column: 'profit'. The reason of that may be any words or signs in place of numbers. We have to find out what kind of contamination are there.

df.dtypes

df.profit.value_counts

I detected contamination. So I wipe it out and exchange format from str in to float.

df.loc[df.profit=='N.A.']
df.profit.replace('N.A.',np.nan, inplace = True)
df = df.dropna(how='any')
df['profit'] = df['profit'].apply(pd.to_numeric)

Ok, we have data ready to next steps!

We do template for plots

I prepared template for linear plots. I use them most frequently because I am a financial analyst.

This ready for using template I put to my repository.

Now we need to have adequate prepared data to put into the template.

def LinearPlot(x, y, ax, title, x_label, y_label):
    ax.set_title(title, color='darkred', alpha=1)
    ax.set_ylabel(y_label, color='grey', alpha=0.6)
    ax.set_xlabel(x_label, color='grey', alpha=0.6)
    ax.plot(x, y, color='black', alpha=0.6, linestyle='dashed')
    ax.grid(linewidth=0.85, alpha=0.2)
    ax.margins(x=0, y=0)

Pivot table is the best

To have good linear plot we need three things: x arrow, y arrow and data. Additionally, a title and descriptions of axes could be useful. Now we create pivot table, next exchange it into dataframe. Next easily query I separate x and y and data.

Ewa = df.pivot_table(index='year', values=['revenue', 'profit'], aggfunc='mean')
df2 = Ewa.reset_index()

x = df2.year
y = df2.profit
title = 'Profit fortune500'
y_label = 'Profit (millions)'
x_label = 'Years'

Use template

fig, ax = plt.subplots(figsize=(6, 2))
LinearPlot(x, y, ax, title, x_label, y_label)

 

x = df2.year
y = df2.revenue
title = 'Revenue fortune500'
y_label = 'Profit (millions)'
x_label = 'Years'

fig, ax = plt.subplots(figsize=(6, 2))
LinearPlot(x, y, ax, title, x_label, y_label)

I hope this is good solution to do template for plots!

Entire code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt  
## data source: https://s3.amazonaws.com/dq-blog-files/fortune500.csv 
df = pd.read_csv('c:/2/fortune500.csv')
df.columns = ['year', 'rank', 'company', 'revenue', 'profit']
df.head(3)

df.dtypes

df.profit.value_counts

df.profit.value_counts
df.loc[df.profit=='N.A.']
df.profit.replace('N.A.',np.nan, inplace = True)
df = df.dropna(how='any')
df['profit'] = df['profit'].apply(pd.to_numeric)

def LinearPlot(x, y, ax, title, x_label, y_label):
    ax.set_title(title, color='darkred', alpha=1)
    ax.set_ylabel(y_label, color='grey', alpha=0.6)
    ax.set_xlabel(x_label, color='grey', alpha=0.6)
    ax.plot(x, y, color='black', alpha=0.6, linestyle='dashed')
    ax.grid(linewidth=0.85, alpha=0.2)
    ax.margins(x=0, y=0)

Ewa = df.pivot_table(index='year', values=['revenue', 'profit'], aggfunc='mean')
df2 = Ewa.reset_index()

x = df2.year
y = df2.profit
title = 'Profit fortune500'
y_label = 'Profit (millions)'
x_label = 'Years'

fig, ax = plt.subplots(figsize=(6, 2))
LinearPlot(x, y, ax, title, x_label, y_label)

x = df2.year
y = df2.revenue
title = 'Revenue fortune500'
y_label = 'Profit (millions)'
x_label = 'Years'

fig, ax = plt.subplots(figsize=(6, 2))
LinearPlot(x, y, ax, title, x_label, y_label)

 

Artykuł How to make my own template for plots pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]>