# Perfect Plots: Categorical Plot

Analysis of the categorical results.

In [1]:
```import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
```

## Titanic disaster

Analysis of the categorical results.
We ought to find which passengers have chance to survive according to their affiliation to the established groups.

Source of data: https://www.kaggle.com/shivamp629/traincsv

In [2]:
```df = pd.read_csv('c:/1/kaggletrain.csv')
```
Out[2]:
Unnamed: 0 PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th… female 38.0 1 0 PC 17599 71.2833 C85 C
2 2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
In [3]:
```Woj = ['#b6d7a8','#6aa84f']

# Plot
g = sns.catplot("Survived", col="Pclass", col_wrap=4,
data=df[df.Pclass.notnull()],
kind="count", height=3.5, aspect=.8,
palette=Woj)

plt.show()
```

## Banking marketing

Analysis of the categorical results.
Source of data: https://archive.ics.uci.edu/ml/machine-learning-databases/00222/

In [4]:
```df2 = pd.read_csv('c:/1/bank.csv')
```
Out[4]:
Unnamed: 0 Unnamed: 0.1 age job marital education default housing loan contact campaign pdays previous poutcome emp_var_rate cons_price_idx cons_conf_idx euribor3m nr_employed y
0 0 0 44 blue-collar married basic.4y unknown yes no cellular 1 999 0 nonexistent 1.4 93.444 -36.1 4.963 5228.1 0
1 1 1 53 technician married unknown no no no cellular 1 999 0 nonexistent -0.1 93.200 -42.0 4.021 5195.8 0
2 2 2 28 management single university.degree no yes no cellular 3 6 2 success -1.7 94.055 -39.8 0.729 4991.6 1

3 rows × 23 columns

In [5]:
```Kot = ['grey', 'red']
plt.figure(dpi= 380)
# Plot
g = sns.catplot("y", col="marital", col_wrap=4,
data=df2[df2.marital.notnull()],
kind="count", height=3.5, aspect=.8,
palette=Kot,  alpha=0.5, legend=True)

plt.rc("font", size=15)

plt.show()
```
`<Figure size 2280x1520 with 0 Axes>`

## Clinical tests

Source of data: https://www.kaggle.com/saurabh00007/diabetescsv

In [6]:
```df3 = pd.read_csv('c:/1/diabetes.csv')
```
Out[6]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
0 6 148 72 35 0 33.6 0.627 50 1
1 1 85 66 29 0 26.6 0.351 31 0
2 8 183 64 0 0 23.3 0.672 32 1
In [10]:
```kot = ['young patient', 'medium patient', 'senior patient']
df3['Age group'] = pd.qcut(df['Age'],3, labels=kot)
```
In [11]:
```Kot = ['#ff9900', '#783f04']
plt.figure(dpi= 380)
# Plot
g = sns.catplot("Outcome", col='Age group', col_wrap=4,
data=df3[df2.marital.notnull()],
kind="count", height=5.5, aspect=.7,
palette=Kot,  alpha=0.4)

plt.rc("font", size=14)

plt.show()
```
```C:ProgramDataAnaconda3libsite-packagesipykernel_launcher.py:5: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
"""
```
`<Figure size 2280x1520 with 0 Axes>`