
EN201220191421
Source of data: poliaxid
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('c:/2/poliaxid.csv', index_col=0)
del df['nr.']
df.head(5)
1. We analyze the completeness of the data
df.isnull().sum()
The data is complete
2. Result variable analysis – Creating result classes
df['quality class'].value_counts().plot(kind='bar')
The owner of the process was interviewed – the owner is interested in separating substances in classes 0 and 1 from the other quality classes. So we create two quality classes.
df['quality class'].dtypes
We map 5 classes into two classes
df['quality_class2'] = df['quality class'].apply(lambda x: 1 if x <= 1 else 0)
df['quality_class2'].value_counts()
The sets are balanced, so there is no need to perform class equalization (e.g. oversampling).
Correction of variable names.
df.columns = ['factorA', 'factorB', 'citric_catoda', 'residual_butanol', 'caroton',
'stable_nodinol', 'sulfur_in_nodinol', 'density', 'pH', 'noracid',
'lacapon', 'quality_class','quality_class2']
3. We divide variables into independent variables and a dependent variable
X = df.drop('quality_class2', axis=1)
y = df['quality_class2']
4. We divide the data into a test and training set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=123,stratify=y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
5. Initial data analysis
df.dtypes
df.head()
df.columns
CORREL = df.corr().sort_values('quality_class2')
CORREL['quality_class2'].to_frame().sort_values('quality_class2')
plt.figure(figsize=(10,3))
CORREL['quality_class2'].to_frame().sort_values('quality_class2')
CORREL['quality_class2'].plot(kind='barh', color='red')
plt.title('Korelacja ze zmienną wynikową', fontsize=20)
plt.xlabel('Poziom korelacji')
plt.ylabel('Zmienne nezależne ciągłe')
Correlation of variables by output variable
import seaborn as sns
plt.figure(figsize=(8,8))
sns.heatmap(CORREL, cmap="YlGnBu", annot=True, cbar=True)
Correlation of variables
CORREL = df.corr()
plt.figure(figsize=(8,8))
sns.heatmap(CORREL, cmap="YlGnBu", annot=True, cbar=True)
import tensorflow as tf
feat_column = tf.contrib.layers.real_valued_column('features', dimension=13)
estimator = tf.estimator.LinearClassifier(feature_columns=[feat_column],
n_classes=2,
model_dir = "kernel_e"
)
Step 3: Change from continuous variables to Tensorflow variables, function:
tf.feature_column.numeric_column
Single variable processed to form a TF tensor
age = tf.feature_column.numeric_column('caroton')
age
Zmiana wszystkich zmiennych numerycznych na zmienne TensorFlow
COLUMNS = ['factorA', 'factorB', 'citric_catoda', 'residual_butanol', 'caroton',
'stable_nodinol', 'sulfur_in_nodinol', 'density', 'pH', 'noracid',
'lacapon', 'quality_class','quality_class2']
features = [tf.feature_column.numeric_column(k) for k in COLUMNS]
features
Step 5. Creating a linear classification model
5.1 Define of classifier
model = tf.estimator.LinearClassifier(
n_classes = 2,
model_dir="ongoing/train5",
feature_columns=features)
5.2 Create the input function¶
COLUMNS = ['factorA', 'factorB', 'citric_catoda', 'residual_butanol', 'caroton',
'stable_nodinol', 'sulfur_in_nodinol', 'density', 'pH', 'noracid',
'lacapon', 'quality_class','quality_class2']
LABEL= 'quality_class2'
def get_input_fn(data_set, num_epochs=None, n_batch = 128, shuffle=True):
return tf.estimator.inputs.pandas_input_fn(
x=pd.DataFrame({k: data_set[k].values for k in COLUMNS}),
y = pd.Series(data_set[LABEL].values),
batch_size=n_batch,
num_epochs=num_epochs,
shuffle=shuffle)
5.3 Train the model
Preparation of a set of training variables
df_train = pd.concat([X_train, y_train], axis=1, sort=False)
df_test = pd.concat([X_test, y_test], axis=1, sort=False)
Correction of variable names
model.train(input_fn=get_input_fn(df_train,
num_epochs=None,
n_batch = 128,
shuffle=False),
steps=1000)
5.4 To evaluate the performance of model
model.evaluate(input_fn=get_input_fn(df_test,
num_epochs=1,
n_batch = 128,
shuffle=False),
steps=1000)
he model is perfect and that worries me…