https://slundberg.github.io/shap/notebooks/NHANES%20I%20Survival%20Model.html

import pandas as pd

df = pd.read_csv('/home/wojciech/Pulpit/1/tit_train.csv', na_values="-1")
df.head(2)

Using loops in place of gaps I insert values out of range¶

## ile jest zmiennych
a,b = df.shape     #<- ile mamy kolumn
b

13

print('NUMBER OF EMPTY RECORDS vs. FULL RECORDS')
print('----------------------------------------')
for i in range(1,b):
    i = df.columns[i]
    r = df[i].isnull().sum()
    h = df[i].count()
   
    if r > 0:
        print(i,"--------",r,"--------",h)

NUMBER OF EMPTY RECORDS vs. FULL RECORDS
----------------------------------------
Age -------- 177 -------- 714
Cabin -------- 687 -------- 204
Embarked -------- 2 -------- 889

Unnamed: 0     0
PassengerId    0
Survived       0
Pclass         0
Name           0
Sex            0
Age            0
SibSp          0
Parch          0
Ticket         0
Fare           0
Embarked       0
dtype: int64

(712, 12)

DISCRETE FUNCTIONS CODED
------------------------
Name --- object
Sex --- object
Ticket --- object
Embarked --- object

/home/wojciech/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
                      max_features='auto', max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, n_estimators=10,
                      n_jobs=None, oob_score=False, random_state=None,
                      verbose=0, warm_start=False)

del df['Cabin']
df = df.dropna(how='any')

df = df.dropna(how='any')
df.isnull().sum()

Unnamed: 0     0
PassengerId    0
Survived       0
Pclass         0
Name           0
Sex            0
Age            0
SibSp          0
Parch          0
Ticket         0
Fare           0
Embarked       0
dtype: int64

df.shape

(712, 12)

Encodes discrete (categorical) variables¶

import numpy as np

a,b = df.shape     #<- ile mamy kolumn
b


print('DISCRETE FUNCTIONS CODED')
print('------------------------')
for i in range(1,b):
    i = df.columns[i]
    f = df[i].dtypes
    if f == np.object:
        print(i,"---",f)   
    
        if f == np.object:
        
            df[i] = pd.Categorical(df[i]).codes
        
            continue

DISCRETE FUNCTIONS CODED
------------------------
Name --- object
Sex --- object
Ticket --- object
Embarked --- object

/home/wojciech/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
                      max_features='auto', max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, n_estimators=10,
                      n_jobs=None, oob_score=False, random_state=None,
                      verbose=0, warm_start=False)

I run the LinearRegression () model

y = df['Survived']
X = df.drop('Survived', axis=1)

from sklearn.ensemble import RandomForestRegressor


model = RandomForestRegressor()
model.fit(X, y)

/home/wojciech/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
                      max_features='auto', max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, n_estimators=10,
                      n_jobs=None, oob_score=False, random_state=None,
                      verbose=0, warm_start=False)

#import xgboost
import shap
shap.initjs()

explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)

Interpretation of SHAP diagnostic charts¶

Interpretacja wykresów diagnostycznych SHAP

Graph (1) of overall function suitability assessment¶

Wykres (1) ogólnej oceny przydatności funkcji

Chart interpretation:
    • The x axis represents the SHAP value (which for this model is in logarithmic chances of winning). By analyzing the validity for all functions, we can see which features greatly affect the model’s predictive ability (e.g. 'Sex’ and 'Pclass’), and which only slightly influence predictability (e.g. Parch, Embarked). Note that when the points do not form a horizontal line, they stack up vertically to show density.
    • Each dot is colored with a high red to blue low effect value. Each point is a Titanic man. one sex affected the model’s assurance of prognosis, while the other significantly reduced the forecasting abilities of the model.
    • So it would be enough to somehow pull a woman and class 1 and the rest of the passengers. The other functions are only mixed up. This function counts where the blue and red enclaves are clean (’Sex’ and 'Pclass’).

The most important function in the model is 'Sex’, slightly less important is 'Pclass’, i.e. in which class the passenger traveled.
The importance of the feature on the ability to forecast: blue is low ability to forecast – red is the high ability of the model to forecast.
In the case of 'Sex’ and 'Pclass’ there are great differences between the sexes and between the classes in which travelers traveled.
’Name’, 'PassangerId’, 'Ticket’ you can see that there is no separation here and there is full randomness, the data is centered around 0 on the SHAP value axis.

Interpretacja wykresu:

– Oś x przedstawia wartość SHAP (która dla tego modelu jest w jednostkach logarytmicznych szans na wygraną). Robiąc analizę ważności dla wszystkich funkcji, widzimy, które cechy bardzo wpływają na zdolności przewidywania modelu (np. 'Sex’ i 'Pclass’ ), a które tylko nieznacznie wpływają na przewidywanie (np. Parch, Embarked). Zauważ, że gdy punkty nie tworzą linii poziomej, układają się w pionowe stosy, aby pokazać gęstość.

– Każda kropka jest zabarwiona wartością wpływu tej cechy od wysokiej czerwonej do niebieskiej niskiej. Każdy punkt to człowiek z Titanica. jedna płeć wpływała na upewnienie się modelu co do prognozy a druga płeć istotnie obniżała zdolności prognostyczne modelu.

– Czyli wystarczyłoby jakoś wyciągnąć kobieta i klasa 1 i reszta pasażerów. Pozostałe funkcje tylko mieszają. Ta funkcja się liczy gdzie są czyste enklawy niebieskie i czerwone (’Sex’ i 'Pclass’).

1. Najważniejszą funkcją w modelu jest 'Sex’ nieco mniej ważną jest 'Pclass’ czyli, w której klasie podróżował pasażer.

2. Ważność cechy na zdolność do prognozowania: niebieski to niska zdolność do prognozowania – czerwony to wysoka zdolność modelu do prognozowania

3. W przypadku 'Sex’ i 'Pclass’ istnieją wielkie różnice między płciami i miedzy klasami, w których podróżowali podróżni

4. 'Name’, 'PassangerId’, 'Ticket’ widać, że tu nie ma rozdzielenia i jest pełna losowość, dane są skupione wokół 0 na osi SHAP value

Chart interpretation:

This chart shows that women (marked as 0) differ in their impact on model estimation accuracy.
Women are aptly typed by the model. Men are incorrectly selected.
SHAP has yet automatically added the nearest relevant Pclass function.
The model most accurately estimates women from class 2, and the least from the set of women, those women who traveled in third class. In general, the values of women on the y-axis above zero mean that the model well predicted the fate of women and weaker the fate of men.
Men from class 3 (red) were tipped quite aptly (certain that they would die). The model had a problem with men in classes 2 and 3 (had difficulty predicting or dying).

Interpretacja wykresu:

– Na tym wykresie widać, że kobiety (oznaczone jako 0) różnią się wpływem na trafność szacowania modelu.
Kobiety są trafnie typowane przez model. Mężczyźni są typowani nietrafnie.
SHAP jeszcze dodał automatycznie najbliższą istotną funkcję Pclass.

– Najtrafniej model szacuje kobiety z klasy 2, a najmniej ze zbioru kobiet, te kobiety, które podróżowały klasą trzecią.
Ogólnie wartości kobiet na osi y powyżej zera oznaczają, że model dobrze przewidywał los kobiet a słabiej los mężczyzn.

– Mężczyźnie z klasy 3 (kolor czerwony) byli typowani dość trafnie (pewne, że zginą). Model miał problem z mężczyznami z klasy 2 i 3 (miał trudność z przewidzeniem czy zginą).

The figure above shows

class 1: half and half women = 0 (blue) and men = 1 (red). In the first class, the model coped well with forecasts, much better in class 1 forecasting what would happen to women than men.
In class 3 the model had a very big problem with predicting what would happen to women, each blue point is one woman. As for the fate of men in class 3, the model predicted quite well – close to baseline.
The conclusion is – if you throw women from class 3 from the data, the model would improve your ratings.

Na powyższym rysunku widać

– klasę 1: pół na pół kobiety = 0 (niebieskie) i mężczyzn = 1 (czerwone). W klasie pierwszej model dobrze radził sobie z prognozami, znacznie lepiej w klasie 1 prognozował co stanie się z kobietami niż mężczyznami.

– W klasie 3 model miał bardzo duży problem z przewidzeniem co stanie się z kobietami, każdy niebieski punkt to jedna kobieta. Co do losu mężczyzn z klasy 3 model prognozował dość dobrze – blisko wartości bazowej.

– Nasuwa się wniosek – gdyby wyrzucić z danych kobiety z klasy 3 model poprawiłby swoje notowania.

Interpretation:

The model coped well with predicting the fate of children, people from zero to 10 years old. Values above zero can be seen on the y axis.
Above this age, the model could not very well indicate the fate of people. Forecasts are getting worse when people are older.
spend on the cloud that the blue dots are above, which means that the model better predicted the fate of people from class 1 (blue dots) than from class three (red dots)

Interpretacja:

– Model doskonale radził sobie z przewidywaniem losu dzieci, osób w wieku od zera do 10 lat. Widać na osi y wartości powyżej zera.

– Powyżej tego wieku model nie umiał wskazywać bardzo dobrze na los osób. Prognozy pogarszają sie czym starsze są osoby.

– wydać na chmurze, że niebieskie kropkli są powyżej, co oznacza, że model lepiej przewidywał los ludzi z klasy 1 (niebieskie kropki) niż z klasy trzeciej (czerwone kropki)

Interpretation:

statement: 'Sex’ vs 'Pclass’ shows that there are strong differences affecting the quality of forecasting.
’Age’ vs 'Sex’ has an illegible result
the rest of the features are piled around zero and therefore have a low impact on improving or deteriorating forecasting quality.

Interpretacja:

– zestawienie: 'Sex’ vs 'Pclass’ pokazuje że istnieją mocne różnice wpływające na jakość prognozowania.

– 'Age’ vs 'Sex’ ma nieczytelny wynik

reszta cech jest zbita w stosy wokół zera i przez to mają niski wpływ na poprawę lub pogorszenie jakość prognozowania.

practical use: predict_proba

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

from catboost.datasets import titanic

train_df, test_df = titanic()

#train_df.head()

df = train_df 
print(df.shape)
df.head(3)

(891, 12)

int64 --- 2 --- Survived
int64 --- 3 --- Pclass
object --- 891 --- Name
object --- 2 --- Sex
float64 --- 88 --- Age
int64 --- 7 --- SibSp
int64 --- 7 --- Parch
object --- 681 --- Ticket
float64 --- 248 --- Fare
object --- 147 --- Cabin
object --- 3 --- Embarked

NUMBER OF EMPTY RECORDS vs. FULL RECORDS
----------------------------------------
Age -------- 177 -------- 714
Cabin -------- 687 -------- 204
Embarked -------- 2 -------- 889

ONLY DISCRETE FUNCTION
----------------------
Survived int64
Pclass int64
Name object

DISCRETE FUNCTIONS CODED
------------------------
Ticket --- object
Cabin --- object
Embarked --- object

/home/wojciech/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)

array([[0.7, 0.3]])

array([[0.2, 0.8]])

PassengerId     42.0
Pclass           2.0
Name           827.0
Sex              0.0
Age             27.0
SibSp            1.0
Parch            0.0
Ticket          53.0
Fare            21.0
Cabin            0.0
Embarked         3.0
Name: 41, dtype: float64

array([[ 42.,   2., 827.,   0.,  27.,   1.,   0.,  53.,  21.,   0.,   3.]])

Using the classification model, let’s check what happened to the three accidental passengers of the unlucky cruise.¶

We choose passengers:

df.loc[df['PassengerId']==422]

df.loc[df['PassengerId']==20]

df.loc[df['PassengerId']==42]

Display variable types and unique values

import numpy as np
a,b = df.shape     #<- ile mamy kolumn
b 

for i in range(1,b):
    i = df.columns[i]
    h = df[i].nunique()
    f = df[i].dtypes
          
    print(f,"---",h,"---", i)

int64 --- 2 --- Survived
int64 --- 3 --- Pclass
object --- 891 --- Name
object --- 2 --- Sex
float64 --- 88 --- Age
int64 --- 7 --- SibSp
int64 --- 7 --- Parch
object --- 681 --- Ticket
float64 --- 248 --- Fare
object --- 147 --- Cabin
object --- 3 --- Embarked

NUMBER OF EMPTY RECORDS vs. FULL RECORDS
----------------------------------------
Age -------- 177 -------- 714
Cabin -------- 687 -------- 204
Embarked -------- 2 -------- 889

ONLY DISCRETE FUNCTION
----------------------
Survived int64
Pclass int64
Name object

DISCRETE FUNCTIONS CODED
------------------------
Ticket --- object
Cabin --- object
Embarked --- object

/home/wojciech/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)

array([[0.7, 0.3]])

array([[0.2, 0.8]])

PassengerId     42.0
Pclass           2.0
Name           827.0
Sex              0.0
Age             27.0
SibSp            1.0
Parch            0.0
Ticket          53.0
Fare            21.0
Cabin            0.0
Embarked         3.0
Name: 41, dtype: float64

array([[ 42.,   2., 827.,   0.,  27.,   1.,   0.,  53.,  21.,   0.,   3.]])

array([[0.9, 0.1]])

Showing missing items

print('NUMBER OF EMPTY RECORDS vs. FULL RECORDS')
print('----------------------------------------')
for i in range(1,b):
    i = df.columns[i]
    r = df[i].isnull().sum()
    h = df[i].count()
   
    if r > 0:
        print(i,"--------",r,"--------",h)

NUMBER OF EMPTY RECORDS vs. FULL RECORDS
----------------------------------------
Age -------- 177 -------- 714
Cabin -------- 687 -------- 204
Embarked -------- 2 -------- 889

ONLY DISCRETE FUNCTION
----------------------
Survived int64
Pclass int64
Name object

DISCRETE FUNCTIONS CODED
------------------------
Ticket --- object
Cabin --- object
Embarked --- object

/home/wojciech/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)

array([[0.7, 0.3]])

array([[0.2, 0.8]])

PassengerId     42.0
Pclass           2.0
Name           827.0
Sex              0.0
Age             27.0
SibSp            1.0
Parch            0.0
Ticket          53.0
Fare            21.0
Cabin            0.0
Embarked         3.0
Name: 41, dtype: float64

array([[ 42.,   2., 827.,   0.,  27.,   1.,   0.,  53.,  21.,   0.,   3.]])

array([[0.9, 0.1]])

We put values out of range in place of missing values¶

df.fillna(-777, inplace=True)

Model results

H0: the passenger will be saved (marked as: 1)
H1: The passenger will not be saved (marked as: 0)

0 means he drowned and died in a catastrophe.

We enter an additional Sex column to know what sex the passenger was.¶

df['Sex'] = df.Sex.map({'female':0, 'male':1})

Display DISCRETE variables

a,b = df.shape     #<- ile mamy kolumn
b 

print('ONLY DISCRETE FUNCTION')
print('----------------------')
for i in range(1,b):
    i = df.columns[i]
    f = df[i].dtypes
    print(i,f)
    
    if f == np.object:
        
        df[i] = pd.Categorical(df[i]).codes
        
        break

ONLY DISCRETE FUNCTION
----------------------
Survived int64
Pclass int64
Name object

DISCRETE FUNCTIONS CODED
------------------------
Ticket --- object
Cabin --- object
Embarked --- object

/home/wojciech/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)

array([[0.7, 0.3]])

array([[0.2, 0.8]])

PassengerId     42.0
Pclass           2.0
Name           827.0
Sex              0.0
Age             27.0
SibSp            1.0
Parch            0.0
Ticket          53.0
Fare            21.0
Cabin            0.0
Embarked         3.0
Name: 41, dtype: float64

array([[ 42.,   2., 827.,   0.,  27.,   1.,   0.,  53.,  21.,   0.,   3.]])

array([[0.9, 0.1]])

6. Cyfrowanie, kodowanie zmiennych dyskretnych, kategorycznych

a,b = df.shape     #<- ile mamy kolumn
b 

print('DISCRETE FUNCTIONS CODED')
print('------------------------')
for i in range(1,b):
    i = df.columns[i]
    f = df[i].dtypes
    if f == np.object:
        print(i,"---",f)   
    
        if f == np.object:
        
            df[i] = pd.Categorical(df[i]).codes
        
            continue
    

df.head()

DISCRETE FUNCTIONS CODED
------------------------
Ticket --- object
Cabin --- object
Embarked --- object

/home/wojciech/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)

array([[0.7, 0.3]])

array([[0.2, 0.8]])

PassengerId     42.0
Pclass           2.0
Name           827.0
Sex              0.0
Age             27.0
SibSp            1.0
Parch            0.0
Ticket          53.0
Fare            21.0
Cabin            0.0
Embarked         3.0
Name: 41, dtype: float64

array([[ 42.,   2., 827.,   0.,  27.,   1.,   0.,  53.,  21.,   0.,   3.]])

array([[0.9, 0.1]])

Division into test and training variables¶

I have dataframe X with variables in the format: e.g. int64 and y 'in the air’, we divide into test and training variables. I could have had this variable in the air in dataFrame as an additional column.

X = df.drop('Survived', axis=1) 
y = df['Survived']  


train_X, test_X, train_y, test_y = train_test_split(X, y, random_state=1)

Forecast of probability of survival on Titanic¶

We know that the survival of the Titanic catastrophe was determined by several features such as gender and age.
We select a random passenger from record no. 422

pasażer_422 = test_X.loc[test_X['PassengerId']==422] 
pasażer_422

What is the probability that the passenger will survive?¶

we will use: predict_proba

Passenger no. 422 Charters, Mr. David¶

A young man of 21, traveling in the third grade. the model gave him a 30

model_RFC1 = RandomForestClassifier(random_state=0).fit(train_X, train_y)
model_RFC1.predict_proba(pasażer_422)

/home/wojciech/anaconda3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)

array([[0.7, 0.3]])

array([[0.2, 0.8]])

PassengerId     42.0
Pclass           2.0
Name           827.0
Sex              0.0
Age             27.0
SibSp            1.0
Parch            0.0
Ticket          53.0
Fare            21.0
Cabin            0.0
Embarked         3.0
Name: 41, dtype: float64

array([[ 42.,   2., 827.,   0.,  27.,   1.,   0.,  53.,  21.,   0.,   3.]])

array([[0.9, 0.1]])

Passenger no. 20 Masselmani, Mrs. Fatima¶

woman of unknown age, traveling in the third grade. The algorithm gave her 80

pasażer_20 = test_X.loc[test_X['PassengerId']==20]  
pasażer_20

model_RFC1.predict_proba(pasażer_20)

array([[0.2, 0.8]])

We draw another passenger¶

Our casual passenger, Mrs. Turpin, Mrs. William John Robert,¶

traveling alone in the second grade at the age of 27, also did not survive the disaster. The computer gave her a 10

passenger = 5
random_passenger = train_X.iloc[passenger] 
random_passenger

PassengerId     42.0
Pclass           2.0
Name           827.0
Sex              0.0
Age             27.0
SibSp            1.0
Parch            0.0
Ticket          53.0
Fare            21.0
Cabin            0.0
Embarked         3.0
Name: 41, dtype: float64

data_array = random_passenger.values.reshape(1, -1)
data_array

array([[ 42.,   2., 827.,   0.,  27.,   1.,   0.,  53.,  21.,   0.,   3.]])

model_RFC1.predict_proba(data_array)

array([[0.9, 0.1]])

What characteristics determined the model’s classification?¶

Passenger no. 422 Charters, Mr. David¶

We will use the SHAP tool. This is an abbreviation of SHPley Additive exPlanations

import shap
expl = shap.TreeExplainer(model_RFC1)

shap_values = expl.shap_values(pasażer_422)
shap.initjs()
shap.force_plot(expl.expected_value[1], shap_values[1],pasażer_422)

Interpretation¶

base value – the average output data of the model in the given set of training data output value – is the quality of the current model. The functions raising the prediction above are shown in red, those raising the prediction below in blue. Each of the functions is shown separately, its size on the axis indicates how much it contributed to the improvement (red) or deterioration (blue) of the model’s properties relative to the average quality of the model (base value ). For passenger No. 422 the model was of lower quality than average because of mainly gender. Unfortunately, the model correctly predicted the death of the passenger.

Passenger no. 20 Masselmani, Mrs. Fatima

hap_values = expl.shap_values(pasażer_20)
shap.initjs()
shap.force_plot(expl.expected_value[1], shap_values[1],pasażer_20)

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked
0	1	0	3	108	1	22.0	1	523	7.2500	0	3
1	2	1	1	190	0	38.0	1	596	71.2833	82	1
2	3	1	3	353	0	26.0	0	669	7.9250	0	3
3	4	1	1	272	0	35.0	1	49	53.1000	56	3
4	5	0	3	15	1	35.0	0	472	8.0500	0	3

SHAP - THE DATA SCIENCE LIBRARY

Interpretation of SHAP charts for the Titanic case (Feature Selection Techniques)

Using loops in place of gaps I insert values out of range¶

Encodes discrete (categorical) variables¶

I run the LinearRegression () model

Interpretation of SHAP diagnostic charts¶

Graph (1) of overall function suitability assessment¶

How to calculate the probability of survival of the Titanic catastrophe_080420201050

Using the classification model, let’s check what happened to the three accidental passengers of the unlucky cruise.¶

Display variable types and unique values

Showing missing items

We put values out of range in place of missing values¶

We enter an additional Sex column to know what sex the passenger was.¶

Display DISCRETE variables

6. Cyfrowanie, kodowanie zmiennych dyskretnych, kategorycznych

Division into test and training variables¶

Forecast of probability of survival on Titanic¶

What is the probability that the passenger will survive?¶

Passenger no. 422 Charters, Mr. David¶

Passenger no. 20 Masselmani, Mrs. Fatima¶

We draw another passenger¶

Our casual passenger, Mrs. Turpin, Mrs. William John Robert,¶

What characteristics determined the model’s classification?¶

Passenger no. 422 Charters, Mr. David¶

Interpretation¶

Passenger no. 20 Masselmani, Mrs. Fatima

	Unnamed: 0	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
0	0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	0	A/5 21171	7.2500	NaN	S
1	1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th…	female	38.0	1	0	PC 17599	71.2833	C85	C