Perfect plot Joyplot

In [1]:

import joypy
import pandas as pd
import matplotlib.pyplot as plt
In [3]:
df= pd.read_csv('/home/wojciech/Pulpit/1/autos.csv')
df.head()
Out[3]:
Unnamed: 0 symboling normalized_losses make fuel_type aspiration num_doors body_style drive_wheels engine_location engine_size fuel_system bore stroke compression_ratio horsepower peak_rpm city_mpg highway_mpg price
0 0 3 NaN alfa-romero gas std two convertible rwd front 130 mpfi 3.47 2.68 9.0 111.0 5000.0 21 27 13495.0
1 1 3 NaN alfa-romero gas std two convertible rwd front 130 mpfi 3.47 2.68 9.0 111.0 5000.0 21 27 16500.0
2 2 1 NaN alfa-romero gas std two hatchback rwd front 152 mpfi 2.68 3.47 9.0 154.0 5000.0 19 26 16500.0
3 3 2 164.0 audi gas std four sedan fwd front 109 mpfi 3.19 3.40 10.0 102.0 5500.0 24 30 13950.0
4 4 2 164.0 audi gas std four sedan 4wd front 136 mpfi 3.19 3.40 8.0 115.0 5500.0 18 22 17450.0

5 rows × 27 columns

In [4]:
def N_plots(df,x1,x2,by,title, x_title):

    plt.figure(dpi= 380)

    fig, axes = joypy.joyplot(df, column=[x1, x2], by=by, ylim='own', figsize=(12,8), legend=True, color=['#f4cccc', '#0c343d'], alpha=0.4)
    # color=['#76a5af', '#134f5c']
    # color=['#f4cccc', '#0c343d']
    # color=['#a4c2f4', '#1c4587']
    #color=['#e06666', '#d9d9d9']
    #color=['#e06666', '#434343']
    #color=['#b6d7a8','#6aa84f']
    
    # Decoration
    plt.title(title, fontsize=32, color='#d0e0e3', alpha=0.9)
    plt.rc("font", size=20)
    plt.xlabel(x_title,  fontsize=16, color='darkred', alpha=1)
    #plt.ylabel('Data Scientist', fontsize=26,  color='grey', alpha=0.8)

    plt.show
In [5]:
df4 = df[['body_style','highway_mpg','city_mpg']]
df4.head()
Out[5]:
body_style highway_mpg city_mpg
0 convertible 27 21
1 convertible 27 21
2 hatchback 26 19
3 sedan 30 24
4 sedan 22 18
In [6]:
df=df
x1='highway_mpg'
x2='city_mpg'
by='body_style'
title = 'Fuel consumption by body style'
x_title = 'Fuel consumption'

N_plots(df,x1,x2,by,title, x_title)
<Figure size 2280x1520 with 0 Axes>

Joyplot Plot by class designer

In [11]:
class N_plot:
    
    def __init__(self,df,x1,x2,by,title, x_title):
        self.df = df
        self.x1 = x1
        self.x2 = x2
        self.by = by
        self.title = title
        self.x_title = x_title
    
    def plot(self):
        plt.figure(dpi= 380)
        fig, axes = joypy.joyplot(df, column=[x1, x2], by=by, ylim='own', figsize=(12,8), legend=True, color=['#e06666', '#d9d9d9'], alpha=0.4)
        plt.title(title, fontsize=32, color='#d0e0e3', alpha=0.9)
        plt.rc("font", size=20)
        plt.xlabel(x_title,  fontsize=16, color='darkred', alpha=1)
    
import matplotlib.pyplot as plt
plt.figure(dpi= 380)
    #color=['#76a5af', '#134f5c']
    #color=['#f4cccc', '#0c343d']
    #color=['#a4c2f4', '#1c4587']
    #color=['#e06666', '#d9d9d9']
    #color=['#e06666', '#434343']
    #color=['#b6d7a8','#6aa84f']
Out[11]:
<Figure size 2280x1520 with 0 Axes>
<Figure size 2280x1520 with 0 Axes>
In [12]:
df=df
x1='highway_mpg'
x2='city_mpg'
by='body_style'
title = 'Fuel consumption by body style'
x_title = 'Fuel consumption'

kot = N_plot(df,x1,x2,by,title, x_title)
In [13]:
kot.plot()
<Figure size 2280x1520 with 0 Axes>

Titanic disaster

We ought to find which passengers have chance to survive according to their affiliation to the established groups.

Source of data: https://www.kaggle.com/shivamp629/traincsv

In [14]:
df2 = pd.read_csv('/home/wojciech/Pulpit/1/kaggletrain.csv')
df2.head(3)
Out[14]:
Unnamed: 0 PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th… female 38.0 1 0 PC 17599 71.2833 C85 C
2 2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
In [5]:
df2['Age'].head()
Out[5]:
0    22.0
1    38.0
2    26.0
3    35.0
4    35.0
Name: Age, dtype: float64
In [16]:
AA = df2.pivot_table(index=['Name','Pclass'], columns='Sex', values='Age').reset_index()
AA.head()
Out[16]:
Sex Name Pclass female male
0 Abbing, Mr. Anthony 3 NaN 42.0
1 Abbott, Mr. Rossmore Edward 3 NaN 16.0
2 Abbott, Mrs. Stanton (Rosa Hunt) 3 35.0 NaN
3 Abelson, Mr. Samuel 2 NaN 30.0
4 Abelson, Mrs. Samuel (Hannah Wizosky) 2 28.0 NaN
In [18]:
df=AA
x1='female'
x2='male'
by='Pclass'
title = 'Titanic disaster: age distribution of casualties by the class'
x_title = 'Age of passengers'

pks = N_plot(df,x1,x2,by,title, x_title)
pks.plot()
<Figure size 2280x1520 with 0 Axes>
In [20]:
BB = df2.pivot_table(index=['Name','Survived'], columns='Sex', values='Age').reset_index()
BB.head()
Out[20]:
Sex Name Survived female male
0 Abbing, Mr. Anthony 0 NaN 42.0
1 Abbott, Mr. Rossmore Edward 0 NaN 16.0
2 Abbott, Mrs. Stanton (Rosa Hunt) 1 35.0 NaN
3 Abelson, Mr. Samuel 0 NaN 30.0
4 Abelson, Mrs. Samuel (Hannah Wizosky) 1 28.0 NaN
In [21]:
df=BB
x1='female'
x2='male'
by='Survived'
title = 'Titanic disaster: age distribution of casualties by the Survived'
x_title = 'Age of passengers'

ZHP = N_plot(df,x1,x2,by,title, x_title)
ZHP.plot()
<Figure size 2280x1520 with 0 Axes>
In [23]:
df3= pd.read_csv('/home/wojciech/Pulpit/1/drinksbycountry.csv')
df3.head()
Out[23]:
Unnamed: 0 country beer_servings spirit_servings wine_servings total_litres_of_pure_alcohol continent
0 0 Afghanistan 0 0 0 0.0 Asia
1 1 Albania 89 132 54 4.9 Europe
2 2 Algeria 25 0 14 0.7 Africa
3 3 Andorra 245 138 312 12.4 Europe
4 4 Angola 217 57 45 5.9 Africa
In [27]:
class N_plot3:
    
    def __init__(self,df,x1,x2,x3, by,title, x_title):
        self.df = df
        self.x1 = x1
        self.x2 = x2
        self.x3 = x3
        self.by = by
        self.title = title
        self.x_title = x_title
    
    def plot(self):
        plt.figure(dpi= 380)
        fig, axes = joypy.joyplot(df, column=[x1,x2,x3], by=by, ylim='own', figsize=(12,8), legend=True, color=['#b6d7a8','#1c4587', '#6aa84f'], alpha=0.4)
        plt.title(title, fontsize=32, color='#d0e0e3', alpha=0.9)
        plt.rc("font", size=20)
        plt.xlabel(x_title,  fontsize=16, color='darkred', alpha=1)
    
import matplotlib.pyplot as plt
plt.figure(dpi= 380)
    #color=['#76a5af', '#134f5c']
    #color=['#f4cccc', '#0c343d']
    #color=['#a4c2f4', '#1c4587']
    #color=['#e06666', '#d9d9d9']
    #color=['#e06666', '#434343']
    #color=['#b6d7a8','#6aa84f']
Out[27]:
<Figure size 2280x1520 with 0 Axes>
<Figure size 2280x1520 with 0 Axes>
In [30]:
df=df3
x1='beer_servings'
x2='spirit_servings'
x3='wine_servings'
by='continent'
title = 'Alcohol consumption by continents'
x_title = 'The level o consumptions'

PKO = N_plot3(df,x1,x2,x3,by,title, x_title)
PKO.plot()
<Figure size 2280x1520 with 0 Axes>

World Happiness Report

Source of data: https://data.world/promptcloud/world-happiness-report-2019

In [31]:
df4 = pd.read_csv('/home/wojciech/Pulpit/1/WorldHappinessReport.csv')
df4.head(3)
Out[31]:
Unnamed: 0 Country Region Happiness Rank Happiness Score Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual Year
0 0 Afghanistan Southern Asia 153.0 3.575 0.31982 0.30285 0.30335 0.23414 0.09719 0.36510 1.95210 2015.0
1 1 Albania Central and Eastern Europe 95.0 4.959 0.87867 0.80434 0.81325 0.35733 0.06413 0.14272 1.89894 2015.0
2 2 Algeria Middle East and Northern Africa 68.0 5.605 0.93929 1.07772 0.61766 0.28579 0.17383 0.07822 2.43209 2015.0
In [32]:
df4['Year'].value_counts()
Out[32]:
2017.0    164
2016.0    164
2015.0    164
Name: Year, dtype: int64
In [34]:
CC = df4[df4['Year']==2017]
CC.head(3)
Out[34]:
Unnamed: 0 Country Region Happiness Rank Happiness Score Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual Year
330 330 Afghanistan Southern Asia 141.0 3.794 0.401477 0.581543 0.180747 0.106180 0.061158 0.311871 2.150801 2017.0
331 331 Albania Central and Eastern Europe 109.0 4.644 0.996193 0.803685 0.731160 0.381499 0.039864 0.201313 1.490442 2017.0
332 332 Algeria Middle East and Northern Africa 53.0 5.872 1.091864 1.146217 0.617585 0.233336 0.146096 0.069437 2.567604 2017.0
In [36]:
df=CC
x1='Freedom'
x2='Trust (Government Corruption)'
by='Region'
title = 'World Happiness Report'
x_title = 'Indicator'

ZNP = N_plot(df,x1,x2,by,title, x_title)
ZNP.plot()
<Figure size 2280x1520 with 0 Axes>

Banking marketing

Analysis of the categorical results.
Source of data: https://archive.ics.uci.edu/ml/machine-learning-databases/00222/

In [37]:
df5 = pd.read_csv('/home/wojciech/Pulpit/1/bank.csv')
df5.head(3)
Out[37]:
Unnamed: 0 Unnamed: 0.1 age job marital education default housing loan contact campaign pdays previous poutcome emp_var_rate cons_price_idx cons_conf_idx euribor3m nr_employed y
0 0 0 44 blue-collar married basic.4y unknown yes no cellular 1 999 0 nonexistent 1.4 93.444 -36.1 4.963 5228.1 0
1 1 1 53 technician married unknown no no no cellular 1 999 0 nonexistent -0.1 93.200 -42.0 4.021 5195.8 0
2 2 2 28 management single university.degree no yes no cellular 3 6 2 success -1.7 94.055 -39.8 0.729 4991.6 1

3 rows × 23 columns

In [38]:
FF = df5.pivot_table(index=['Unnamed: 0','marital'], columns='y', values='age').reset_index()
FF.head()
Out[38]:
y Unnamed: 0 marital 0 1
0 0 married 44.0 NaN
1 1 married 53.0 NaN
2 2 single NaN 28.0
3 3 married 39.0 NaN
4 4 married NaN 55.0
In [40]:
df=FF
x1=0
x2=1
by='marital'
title = 'Customer age structure'
x_title = 'customer age'

KLD = N_plot(df,x1,x2,by,title, x_title)
KLD.plot()
<Figure size 2280x1520 with 0 Axes>