EN201220191421

Source of data: poliaxid

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('c:/2/poliaxid.csv', index_col=0)
del df['nr.']
df.head(5)

Table of Contents

1. We analyze the completeness of the data

df.isnull().sum()

factorA              0
factorB              0
citric catoda        0
residual butanol     0
caroton              0
stable nodinol       0
sulfur in nodinol    0
density              0
pH                   0
noracid              0
lacapon              0
quality class        0
dtype: int64

The data is complete

2. Result variable analysis – Creating result classes

df['quality class'].value_counts().plot(kind='bar')

<matplotlib.axes._subplots.AxesSubplot at 0x219bca29cf8>

dtype('int64')

0    852
1    742
Name: quality_class2, dtype: int64

(1275, 12) (1275,) (319, 12) (319,)

factorA              float64
factorB              float64
citric_catoda        float64
residual_butanol     float64
caroton              float64
stable_nodinol       float64
sulfur_in_nodinol    float64
density              float64
pH                   float64
noracid              float64
lacapon              float64
quality_class          int64
quality_class2         int64
dtype: object

Index(['factorA', 'factorB', 'citric_catoda', 'residual_butanol', 'caroton',
       'stable_nodinol', 'sulfur_in_nodinol', 'density', 'pH', 'noracid',
       'lacapon', 'quality_class', 'quality_class2'],
      dtype='object')

Text(0, 0.5, 'Zmienne nezależne ciągłe')

<matplotlib.axes._subplots.AxesSubplot at 0x219c4fedac8>

<matplotlib.axes._subplots.AxesSubplot at 0x219c54034a8>

C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:493: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:494: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:495: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:496: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:497: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:502: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])

The owner of the process was interviewed – the owner is interested in separating substances in classes 0 and 1 from the other quality classes. So we create two quality classes.

df['quality class'].dtypes

dtype('int64')

We map 5 classes into two classes

df['quality_class2'] = df['quality class'].apply(lambda x: 1 if x <= 1 else 0)

df['quality_class2'].value_counts()

0    852
1    742
Name: quality_class2, dtype: int64

The sets are balanced, so there is no need to perform class equalization (e.g. oversampling).

Correction of variable names.

df.columns = ['factorA', 'factorB', 'citric_catoda', 'residual_butanol', 'caroton',
       'stable_nodinol', 'sulfur_in_nodinol', 'density', 'pH', 'noracid',
       'lacapon', 'quality_class','quality_class2']

3. We divide variables into independent variables and a dependent variable

X = df.drop('quality_class2', axis=1) 
y = df['quality_class2']

4. We divide the data into a test and training set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=123,stratify=y)

print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(1275, 12) (1275,) (319, 12) (319,)

factorA              float64
factorB              float64
citric_catoda        float64
residual_butanol     float64
caroton              float64
stable_nodinol       float64
sulfur_in_nodinol    float64
density              float64
pH                   float64
noracid              float64
lacapon              float64
quality_class          int64
quality_class2         int64
dtype: object

Index(['factorA', 'factorB', 'citric_catoda', 'residual_butanol', 'caroton',
       'stable_nodinol', 'sulfur_in_nodinol', 'density', 'pH', 'noracid',
       'lacapon', 'quality_class', 'quality_class2'],
      dtype='object')

Text(0, 0.5, 'Zmienne nezależne ciągłe')

<matplotlib.axes._subplots.AxesSubplot at 0x219c4fedac8>

<matplotlib.axes._subplots.AxesSubplot at 0x219c54034a8>

C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:493: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:494: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:495: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:496: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:497: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:502: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'kernel_e', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000219C989F470>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

_NumericColumn(key='caroton', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)

[_NumericColumn(key='factorA', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='factorB', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='citric_catoda', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='residual_butanol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='caroton', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='stable_nodinol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='sulfur_in_nodinol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='density', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='pH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='noracid', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='lacapon', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='quality_class', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='quality_class2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

5. Initial data analysis

df.dtypes

factorA              float64
factorB              float64
citric_catoda        float64
residual_butanol     float64
caroton              float64
stable_nodinol       float64
sulfur_in_nodinol    float64
density              float64
pH                   float64
noracid              float64
lacapon              float64
quality_class          int64
quality_class2         int64
dtype: object

df.head()

df.columns

Index(['factorA', 'factorB', 'citric_catoda', 'residual_butanol', 'caroton',
       'stable_nodinol', 'sulfur_in_nodinol', 'density', 'pH', 'noracid',
       'lacapon', 'quality_class', 'quality_class2'],
      dtype='object')

CORREL = df.corr().sort_values('quality_class2')
CORREL['quality_class2'].to_frame().sort_values('quality_class2')

plt.figure(figsize=(10,3))
CORREL['quality_class2'].to_frame().sort_values('quality_class2')
CORREL['quality_class2'].plot(kind='barh', color='red')
plt.title('Korelacja ze zmienną wynikową', fontsize=20)
plt.xlabel('Poziom korelacji')
plt.ylabel('Zmienne nezależne ciągłe')

Text(0, 0.5, 'Zmienne nezależne ciągłe')

<matplotlib.axes._subplots.AxesSubplot at 0x219c4fedac8>

<matplotlib.axes._subplots.AxesSubplot at 0x219c54034a8>

C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:493: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:494: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:495: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:496: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:497: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:502: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'kernel_e', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000219C989F470>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

_NumericColumn(key='caroton', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)

[_NumericColumn(key='factorA', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='factorB', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='citric_catoda', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='residual_butanol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='caroton', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='stable_nodinol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='sulfur_in_nodinol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='density', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='pH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='noracid', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='lacapon', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='quality_class', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='quality_class2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'ongoing/train5', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000219C989FA58>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into ongoing/train5model.ckpt.
INFO:tensorflow:loss = 88.72288, step = 1
INFO:tensorflow:global_step/sec: 287.186
INFO:tensorflow:loss = 15.2773905, step = 101 (0.348 sec)
INFO:tensorflow:global_step/sec: 330.966
INFO:tensorflow:loss = 8.641602, step = 201 (0.302 sec)
INFO:tensorflow:global_step/sec: 298.718
INFO:tensorflow:loss = 5.899028, step = 301 (0.335 sec)
INFO:tensorflow:global_step/sec: 289.405
INFO:tensorflow:loss = 4.651561, step = 401 (0.346 sec)
INFO:tensorflow:global_step/sec: 307.797
INFO:tensorflow:loss = 4.288885, step = 501 (0.325 sec)
INFO:tensorflow:global_step/sec: 270.935
INFO:tensorflow:loss = 3.759474, step = 601 (0.369 sec)
INFO:tensorflow:global_step/sec: 309.218
INFO:tensorflow:loss = 3.1157045, step = 701 (0.323 sec)
INFO:tensorflow:global_step/sec: 331.573
INFO:tensorflow:loss = 2.6179585, step = 801 (0.302 sec)
INFO:tensorflow:global_step/sec: 318.118
INFO:tensorflow:loss = 2.471261, step = 901 (0.314 sec)
INFO:tensorflow:Saving checkpoints for 1000 into ongoing/train5model.ckpt.
INFO:tensorflow:Loss for final step: 2.152443.

<tensorflow.python.estimator.canned.linear.LinearClassifier at 0x219c989f550>

Correlation of variables by output variable

import seaborn as sns

plt.figure(figsize=(8,8))
sns.heatmap(CORREL, cmap="YlGnBu", annot=True, cbar=True)

<matplotlib.axes._subplots.AxesSubplot at 0x219c4fedac8>

<matplotlib.axes._subplots.AxesSubplot at 0x219c54034a8>

C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:493: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:494: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:495: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:496: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:497: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:502: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'kernel_e', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000219C989F470>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

_NumericColumn(key='caroton', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)

[_NumericColumn(key='factorA', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='factorB', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='citric_catoda', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='residual_butanol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='caroton', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='stable_nodinol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='sulfur_in_nodinol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='density', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='pH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='noracid', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='lacapon', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='quality_class', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='quality_class2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'ongoing/train5', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000219C989FA58>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into ongoing/train5model.ckpt.
INFO:tensorflow:loss = 88.72288, step = 1
INFO:tensorflow:global_step/sec: 287.186
INFO:tensorflow:loss = 15.2773905, step = 101 (0.348 sec)
INFO:tensorflow:global_step/sec: 330.966
INFO:tensorflow:loss = 8.641602, step = 201 (0.302 sec)
INFO:tensorflow:global_step/sec: 298.718
INFO:tensorflow:loss = 5.899028, step = 301 (0.335 sec)
INFO:tensorflow:global_step/sec: 289.405
INFO:tensorflow:loss = 4.651561, step = 401 (0.346 sec)
INFO:tensorflow:global_step/sec: 307.797
INFO:tensorflow:loss = 4.288885, step = 501 (0.325 sec)
INFO:tensorflow:global_step/sec: 270.935
INFO:tensorflow:loss = 3.759474, step = 601 (0.369 sec)
INFO:tensorflow:global_step/sec: 309.218
INFO:tensorflow:loss = 3.1157045, step = 701 (0.323 sec)
INFO:tensorflow:global_step/sec: 331.573
INFO:tensorflow:loss = 2.6179585, step = 801 (0.302 sec)
INFO:tensorflow:global_step/sec: 318.118
INFO:tensorflow:loss = 2.471261, step = 901 (0.314 sec)
INFO:tensorflow:Saving checkpoints for 1000 into ongoing/train5model.ckpt.
INFO:tensorflow:Loss for final step: 2.152443.

<tensorflow.python.estimator.canned.linear.LinearClassifier at 0x219c989f550>

INFO:tensorflow:Starting evaluation at 2019-12-20-12:57:55
INFO:tensorflow:Restoring parameters from ongoing/train5model.ckpt-1000
INFO:tensorflow:Finished evaluation at 2019-12-20-12:57:56
INFO:tensorflow:Saving dict for global step 1000: accuracy = 1.0, accuracy_baseline = 0.5360502, auc = 1.0, auc_precision_recall = 1.0, average_loss = 0.01771097, global_step = 1000, label/mean = 0.46394983, loss = 1.8832666, prediction/mean = 0.46446657

Correlation of variables

CORREL = df.corr()
plt.figure(figsize=(8,8))
sns.heatmap(CORREL, cmap="YlGnBu", annot=True, cbar=True)

<matplotlib.axes._subplots.AxesSubplot at 0x219c54034a8>

C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:493: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:494: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:495: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:496: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:497: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:502: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'kernel_e', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000219C989F470>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

_NumericColumn(key='caroton', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)

[_NumericColumn(key='factorA', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='factorB', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='citric_catoda', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='residual_butanol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='caroton', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='stable_nodinol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='sulfur_in_nodinol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='density', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='pH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='noracid', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='lacapon', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='quality_class', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='quality_class2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'ongoing/train5', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000219C989FA58>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into ongoing/train5model.ckpt.
INFO:tensorflow:loss = 88.72288, step = 1
INFO:tensorflow:global_step/sec: 287.186
INFO:tensorflow:loss = 15.2773905, step = 101 (0.348 sec)
INFO:tensorflow:global_step/sec: 330.966
INFO:tensorflow:loss = 8.641602, step = 201 (0.302 sec)
INFO:tensorflow:global_step/sec: 298.718
INFO:tensorflow:loss = 5.899028, step = 301 (0.335 sec)
INFO:tensorflow:global_step/sec: 289.405
INFO:tensorflow:loss = 4.651561, step = 401 (0.346 sec)
INFO:tensorflow:global_step/sec: 307.797
INFO:tensorflow:loss = 4.288885, step = 501 (0.325 sec)
INFO:tensorflow:global_step/sec: 270.935
INFO:tensorflow:loss = 3.759474, step = 601 (0.369 sec)
INFO:tensorflow:global_step/sec: 309.218
INFO:tensorflow:loss = 3.1157045, step = 701 (0.323 sec)
INFO:tensorflow:global_step/sec: 331.573
INFO:tensorflow:loss = 2.6179585, step = 801 (0.302 sec)
INFO:tensorflow:global_step/sec: 318.118
INFO:tensorflow:loss = 2.471261, step = 901 (0.314 sec)
INFO:tensorflow:Saving checkpoints for 1000 into ongoing/train5model.ckpt.
INFO:tensorflow:Loss for final step: 2.152443.

<tensorflow.python.estimator.canned.linear.LinearClassifier at 0x219c989f550>

INFO:tensorflow:Starting evaluation at 2019-12-20-12:57:55
INFO:tensorflow:Restoring parameters from ongoing/train5model.ckpt-1000
INFO:tensorflow:Finished evaluation at 2019-12-20-12:57:56
INFO:tensorflow:Saving dict for global step 1000: accuracy = 1.0, accuracy_baseline = 0.5360502, auc = 1.0, auc_precision_recall = 1.0, average_loss = 0.01771097, global_step = 1000, label/mean = 0.46394983, loss = 1.8832666, prediction/mean = 0.46446657

{'accuracy': 1.0,
 'accuracy_baseline': 0.5360502,
 'auc': 1.0,
 'auc_precision_recall': 1.0,
 'average_loss': 0.01771097,
 'label/mean': 0.46394983,
 'loss': 1.8832666,
 'prediction/mean': 0.46446657,
 'global_step': 1000}

import tensorflow as tf
feat_column = tf.contrib.layers.real_valued_column('features', dimension=13)

C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:493: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:494: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:495: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:496: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:497: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:502: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'kernel_e', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000219C989F470>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

_NumericColumn(key='caroton', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)

[_NumericColumn(key='factorA', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='factorB', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='citric_catoda', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='residual_butanol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='caroton', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='stable_nodinol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='sulfur_in_nodinol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='density', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='pH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='noracid', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='lacapon', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='quality_class', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='quality_class2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'ongoing/train5', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000219C989FA58>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into ongoing/train5model.ckpt.
INFO:tensorflow:loss = 88.72288, step = 1
INFO:tensorflow:global_step/sec: 287.186
INFO:tensorflow:loss = 15.2773905, step = 101 (0.348 sec)
INFO:tensorflow:global_step/sec: 330.966
INFO:tensorflow:loss = 8.641602, step = 201 (0.302 sec)
INFO:tensorflow:global_step/sec: 298.718
INFO:tensorflow:loss = 5.899028, step = 301 (0.335 sec)
INFO:tensorflow:global_step/sec: 289.405
INFO:tensorflow:loss = 4.651561, step = 401 (0.346 sec)
INFO:tensorflow:global_step/sec: 307.797
INFO:tensorflow:loss = 4.288885, step = 501 (0.325 sec)
INFO:tensorflow:global_step/sec: 270.935
INFO:tensorflow:loss = 3.759474, step = 601 (0.369 sec)
INFO:tensorflow:global_step/sec: 309.218
INFO:tensorflow:loss = 3.1157045, step = 701 (0.323 sec)
INFO:tensorflow:global_step/sec: 331.573
INFO:tensorflow:loss = 2.6179585, step = 801 (0.302 sec)
INFO:tensorflow:global_step/sec: 318.118
INFO:tensorflow:loss = 2.471261, step = 901 (0.314 sec)
INFO:tensorflow:Saving checkpoints for 1000 into ongoing/train5model.ckpt.
INFO:tensorflow:Loss for final step: 2.152443.

<tensorflow.python.estimator.canned.linear.LinearClassifier at 0x219c989f550>

INFO:tensorflow:Starting evaluation at 2019-12-20-12:57:55
INFO:tensorflow:Restoring parameters from ongoing/train5model.ckpt-1000
INFO:tensorflow:Finished evaluation at 2019-12-20-12:57:56
INFO:tensorflow:Saving dict for global step 1000: accuracy = 1.0, accuracy_baseline = 0.5360502, auc = 1.0, auc_precision_recall = 1.0, average_loss = 0.01771097, global_step = 1000, label/mean = 0.46394983, loss = 1.8832666, prediction/mean = 0.46446657

{'accuracy': 1.0,
 'accuracy_baseline': 0.5360502,
 'auc': 1.0,
 'auc_precision_recall': 1.0,
 'average_loss': 0.01771097,
 'label/mean': 0.46394983,
 'loss': 1.8832666,
 'prediction/mean': 0.46446657,
 'global_step': 1000}

estimator = tf.estimator.LinearClassifier(feature_columns=[feat_column],
                                          n_classes=2,
                                          model_dir = "kernel_e"
                                         )

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'kernel_e', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000219C989F470>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

_NumericColumn(key='caroton', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)

[_NumericColumn(key='factorA', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='factorB', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='citric_catoda', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='residual_butanol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='caroton', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='stable_nodinol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='sulfur_in_nodinol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='density', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='pH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='noracid', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='lacapon', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='quality_class', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='quality_class2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'ongoing/train5', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000219C989FA58>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into ongoing/train5model.ckpt.
INFO:tensorflow:loss = 88.72288, step = 1
INFO:tensorflow:global_step/sec: 287.186
INFO:tensorflow:loss = 15.2773905, step = 101 (0.348 sec)
INFO:tensorflow:global_step/sec: 330.966
INFO:tensorflow:loss = 8.641602, step = 201 (0.302 sec)
INFO:tensorflow:global_step/sec: 298.718
INFO:tensorflow:loss = 5.899028, step = 301 (0.335 sec)
INFO:tensorflow:global_step/sec: 289.405
INFO:tensorflow:loss = 4.651561, step = 401 (0.346 sec)
INFO:tensorflow:global_step/sec: 307.797
INFO:tensorflow:loss = 4.288885, step = 501 (0.325 sec)
INFO:tensorflow:global_step/sec: 270.935
INFO:tensorflow:loss = 3.759474, step = 601 (0.369 sec)
INFO:tensorflow:global_step/sec: 309.218
INFO:tensorflow:loss = 3.1157045, step = 701 (0.323 sec)
INFO:tensorflow:global_step/sec: 331.573
INFO:tensorflow:loss = 2.6179585, step = 801 (0.302 sec)
INFO:tensorflow:global_step/sec: 318.118
INFO:tensorflow:loss = 2.471261, step = 901 (0.314 sec)
INFO:tensorflow:Saving checkpoints for 1000 into ongoing/train5model.ckpt.
INFO:tensorflow:Loss for final step: 2.152443.

<tensorflow.python.estimator.canned.linear.LinearClassifier at 0x219c989f550>

INFO:tensorflow:Starting evaluation at 2019-12-20-12:57:55
INFO:tensorflow:Restoring parameters from ongoing/train5model.ckpt-1000
INFO:tensorflow:Finished evaluation at 2019-12-20-12:57:56
INFO:tensorflow:Saving dict for global step 1000: accuracy = 1.0, accuracy_baseline = 0.5360502, auc = 1.0, auc_precision_recall = 1.0, average_loss = 0.01771097, global_step = 1000, label/mean = 0.46394983, loss = 1.8832666, prediction/mean = 0.46446657

{'accuracy': 1.0,
 'accuracy_baseline': 0.5360502,
 'auc': 1.0,
 'auc_precision_recall': 1.0,
 'average_loss': 0.01771097,
 'label/mean': 0.46394983,
 'loss': 1.8832666,
 'prediction/mean': 0.46446657,
 'global_step': 1000}

Step 3: Change from continuous variables to Tensorflow variables, function:

tf.feature_column.numeric_column

Single variable processed to form a TF tensor

age = tf.feature_column.numeric_column('caroton')
age

_NumericColumn(key='caroton', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)

Zmiana wszystkich zmiennych numerycznych na zmienne TensorFlow

COLUMNS  = ['factorA', 'factorB', 'citric_catoda', 'residual_butanol', 'caroton',
       'stable_nodinol', 'sulfur_in_nodinol', 'density', 'pH', 'noracid',
       'lacapon', 'quality_class','quality_class2']

features = [tf.feature_column.numeric_column(k) for k in COLUMNS]
features

[_NumericColumn(key='factorA', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='factorB', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='citric_catoda', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='residual_butanol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='caroton', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='stable_nodinol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='sulfur_in_nodinol', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='density', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='pH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='noracid', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='lacapon', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='quality_class', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='quality_class2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

Step 5. Creating a linear classification model

5.1 Define of classifier

model = tf.estimator.LinearClassifier(
    n_classes = 2,
    model_dir="ongoing/train5", 
    feature_columns=features)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'ongoing/train5', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000219C989FA58>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into ongoing/train5model.ckpt.
INFO:tensorflow:loss = 88.72288, step = 1
INFO:tensorflow:global_step/sec: 287.186
INFO:tensorflow:loss = 15.2773905, step = 101 (0.348 sec)
INFO:tensorflow:global_step/sec: 330.966
INFO:tensorflow:loss = 8.641602, step = 201 (0.302 sec)
INFO:tensorflow:global_step/sec: 298.718
INFO:tensorflow:loss = 5.899028, step = 301 (0.335 sec)
INFO:tensorflow:global_step/sec: 289.405
INFO:tensorflow:loss = 4.651561, step = 401 (0.346 sec)
INFO:tensorflow:global_step/sec: 307.797
INFO:tensorflow:loss = 4.288885, step = 501 (0.325 sec)
INFO:tensorflow:global_step/sec: 270.935
INFO:tensorflow:loss = 3.759474, step = 601 (0.369 sec)
INFO:tensorflow:global_step/sec: 309.218
INFO:tensorflow:loss = 3.1157045, step = 701 (0.323 sec)
INFO:tensorflow:global_step/sec: 331.573
INFO:tensorflow:loss = 2.6179585, step = 801 (0.302 sec)
INFO:tensorflow:global_step/sec: 318.118
INFO:tensorflow:loss = 2.471261, step = 901 (0.314 sec)
INFO:tensorflow:Saving checkpoints for 1000 into ongoing/train5model.ckpt.
INFO:tensorflow:Loss for final step: 2.152443.

<tensorflow.python.estimator.canned.linear.LinearClassifier at 0x219c989f550>

INFO:tensorflow:Starting evaluation at 2019-12-20-12:57:55
INFO:tensorflow:Restoring parameters from ongoing/train5model.ckpt-1000
INFO:tensorflow:Finished evaluation at 2019-12-20-12:57:56
INFO:tensorflow:Saving dict for global step 1000: accuracy = 1.0, accuracy_baseline = 0.5360502, auc = 1.0, auc_precision_recall = 1.0, average_loss = 0.01771097, global_step = 1000, label/mean = 0.46394983, loss = 1.8832666, prediction/mean = 0.46446657

{'accuracy': 1.0,
 'accuracy_baseline': 0.5360502,
 'auc': 1.0,
 'auc_precision_recall': 1.0,
 'average_loss': 0.01771097,
 'label/mean': 0.46394983,
 'loss': 1.8832666,
 'prediction/mean': 0.46446657,
 'global_step': 1000}

5.2 Create the input function¶

COLUMNS  = ['factorA', 'factorB', 'citric_catoda', 'residual_butanol', 'caroton',
       'stable_nodinol', 'sulfur_in_nodinol', 'density', 'pH', 'noracid',
       'lacapon', 'quality_class','quality_class2']
LABEL= 'quality_class2'
def get_input_fn(data_set, num_epochs=None, n_batch = 128, shuffle=True):
    return tf.estimator.inputs.pandas_input_fn(
       x=pd.DataFrame({k: data_set[k].values for k in COLUMNS}),
       y = pd.Series(data_set[LABEL].values),
       batch_size=n_batch,   
       num_epochs=num_epochs,
       shuffle=shuffle)

5.3 Train the model

Preparation of a set of training variables

df_train = pd.concat([X_train, y_train], axis=1, sort=False) 
df_test = pd.concat([X_test, y_test], axis=1, sort=False)

Correction of variable names

model.train(input_fn=get_input_fn(df_train, 
                                      num_epochs=None,
                                      n_batch = 128,
                                      shuffle=False),
                                      steps=1000)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into ongoing/train5model.ckpt.
INFO:tensorflow:loss = 88.72288, step = 1
INFO:tensorflow:global_step/sec: 287.186
INFO:tensorflow:loss = 15.2773905, step = 101 (0.348 sec)
INFO:tensorflow:global_step/sec: 330.966
INFO:tensorflow:loss = 8.641602, step = 201 (0.302 sec)
INFO:tensorflow:global_step/sec: 298.718
INFO:tensorflow:loss = 5.899028, step = 301 (0.335 sec)
INFO:tensorflow:global_step/sec: 289.405
INFO:tensorflow:loss = 4.651561, step = 401 (0.346 sec)
INFO:tensorflow:global_step/sec: 307.797
INFO:tensorflow:loss = 4.288885, step = 501 (0.325 sec)
INFO:tensorflow:global_step/sec: 270.935
INFO:tensorflow:loss = 3.759474, step = 601 (0.369 sec)
INFO:tensorflow:global_step/sec: 309.218
INFO:tensorflow:loss = 3.1157045, step = 701 (0.323 sec)
INFO:tensorflow:global_step/sec: 331.573
INFO:tensorflow:loss = 2.6179585, step = 801 (0.302 sec)
INFO:tensorflow:global_step/sec: 318.118
INFO:tensorflow:loss = 2.471261, step = 901 (0.314 sec)
INFO:tensorflow:Saving checkpoints for 1000 into ongoing/train5model.ckpt.
INFO:tensorflow:Loss for final step: 2.152443.

<tensorflow.python.estimator.canned.linear.LinearClassifier at 0x219c989f550>

INFO:tensorflow:Starting evaluation at 2019-12-20-12:57:55
INFO:tensorflow:Restoring parameters from ongoing/train5model.ckpt-1000
INFO:tensorflow:Finished evaluation at 2019-12-20-12:57:56
INFO:tensorflow:Saving dict for global step 1000: accuracy = 1.0, accuracy_baseline = 0.5360502, auc = 1.0, auc_precision_recall = 1.0, average_loss = 0.01771097, global_step = 1000, label/mean = 0.46394983, loss = 1.8832666, prediction/mean = 0.46446657

{'accuracy': 1.0,
 'accuracy_baseline': 0.5360502,
 'auc': 1.0,
 'auc_precision_recall': 1.0,
 'average_loss': 0.01771097,
 'label/mean': 0.46394983,
 'loss': 1.8832666,
 'prediction/mean': 0.46446657,
 'global_step': 1000}

5.4 To evaluate the performance of model

model.evaluate(input_fn=get_input_fn(df_test, 
                                      num_epochs=1,
                                      n_batch = 128,
                                      shuffle=False),
                                      steps=1000)

INFO:tensorflow:Starting evaluation at 2019-12-20-12:57:55
INFO:tensorflow:Restoring parameters from ongoing/train5model.ckpt-1000
INFO:tensorflow:Finished evaluation at 2019-12-20-12:57:56
INFO:tensorflow:Saving dict for global step 1000: accuracy = 1.0, accuracy_baseline = 0.5360502, auc = 1.0, auc_precision_recall = 1.0, average_loss = 0.01771097, global_step = 1000, label/mean = 0.46394983, loss = 1.8832666, prediction/mean = 0.46446657

{'accuracy': 1.0,
 'accuracy_baseline': 0.5360502,
 'auc': 1.0,
 'auc_precision_recall': 1.0,
 'average_loss': 0.01771097,
 'label/mean': 0.46394983,
 'loss': 1.8832666,
 'prediction/mean': 0.46446657,
 'global_step': 1000}

he model is perfect and that worries me…

	factorA	factorB	citric catoda	residual butanol	caroton	stable nodinol	sulfur in nodinol	density	pH	noracid	lacapon	quality class
0	4.933333	0.466667	0.000000	1.266667	0.050667	7.333333	22.666667	0.665200	2.340000	0.373333	6.266667	1
1	5.200000	0.586667	0.000000	1.733333	0.065333	16.666667	44.666667	0.664533	2.133333	0.453333	6.533333	1
2	5.200000	0.506667	0.026667	1.533333	0.061333	10.000000	36.000000	0.664667	2.173333	0.433333	6.533333	1
3	7.466667	0.186667	0.373333	1.266667	0.050000	11.333333	40.000000	0.665333	2.106667	0.386667	6.533333	2
4	4.933333	0.466667	0.000000	1.266667	0.050667	7.333333	22.666667	0.665200	2.340000	0.373333	6.266667	1

	factorA	factorB	citric_catoda	residual_butanol	caroton	stable_nodinol	sulfur_in_nodinol	density	pH	noracid	lacapon	quality_class	quality_class2
0	4.933333	0.466667	0.000000	1.266667	0.050667	7.333333	22.666667	0.665200	2.340000	0.373333	6.266667	1	1
1	5.200000	0.586667	0.000000	1.733333	0.065333	16.666667	44.666667	0.664533	2.133333	0.453333	6.533333	1	1
2	5.200000	0.506667	0.026667	1.533333	0.061333	10.000000	36.000000	0.664667	2.173333	0.433333	6.533333	1	1
3	7.466667	0.186667	0.373333	1.266667	0.050000	11.333333	40.000000	0.665333	2.106667	0.386667	6.533333	2	0
4	4.933333	0.466667	0.000000	1.266667	0.050667	7.333333	22.666667	0.665200	2.340000	0.373333	6.266667	1	1

THE DATA SCIENCE LIBRARY

Wojciech Moszczyński

Tensorflow linear classifier – example 1