The standardization of data for machine learning models involves the transformation of primary data so that their distribution has an average value of 0 and a standard deviation of 1. The average column value will be subtracted from each value in the data column, and then what will come out will be divided by the standard deviation of the data column . The described process applies to each column separately.

import tensorflow as tf
import pandas as pd
from sklearn import model_selection
import numpy as np

df = pd.read_csv('c:/TF/AirQ_filled2.csv', usecols=['CO(GT)','PT08.S1(CO)','C6H6(GT)','PT08.S2(NMHC)','NOx(GT)','PT08.S3(NOx)','NO2(GT)','PT08.S4(NO2)','PT08.S5(O3)','T','RH', 'AH'
        ,'Month','Weekday','Hours'])
df.head(3)

array([[   2.6, 1360. ,   11.9, ...,    3. ,    2. ,   18. ],
       [   2. , 1292. ,    9.4, ...,    3. ,    2. ,   19. ],
       [   2.2, 1402. ,    9. , ...,    3. ,    2. ,   20. ],
       ...,
       [   2.4, 1142. ,   12.4, ...,    4. ,    0. ,   12. ],
       [   2.1, 1003. ,    9.5, ...,    4. ,    0. ,   13. ],
       [   2.2, 1071. ,   11.9, ...,    4. ,    0. ,   14. ]])

array([2.09193117e+00, 1.10273036e+03, 1.01903922e+01, 9.42548253e+02,
       2.34058566e+02, 8.32742225e+02, 1.09698942e+02, 1.45301453e+03,
       1.03051192e+03, 1.83173560e+01, 4.88174308e+01, 1.01738155e+00,
       6.31035588e+00, 3.00993908e+00, 1.14985572e+01])

array([1.43839252e+00, 2.19576367e+02, 7.56536693e+00, 2.69566963e+02,
       2.04971518e+02, 2.55695758e+02, 4.75175481e+01, 3.47415518e+02,
       4.10894801e+02, 8.82141160e+00, 1.73533985e+01, 4.04807227e-01,
       3.43797585e+00, 2.00021575e+00, 6.92281165e+00])

array([[ 0.35321987,  1.17166361,  0.22597817, ..., -0.96287933,
        -0.50491507,  0.93913327],
       [-0.06391244,  0.86197636, -0.10447507, ..., -0.96287933,
        -0.50491507,  1.08358325],
       [ 0.07513167,  1.36294102, -0.15734759, ..., -0.96287933,
        -0.50491507,  1.22803323],
       ...,
       [ 0.21417577,  0.17884273,  0.29206882, ..., -0.6720105 ,
        -1.50480721,  0.0724334 ],
       [ 0.00560961, -0.45419443, -0.09125694, ..., -0.6720105 ,
        -1.50480721,  0.21688338],
       [ 0.07513167, -0.14450718,  0.22597817, ..., -0.6720105 ,
        -1.50480721,  0.36133336]])

standard deviation:  [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean:  [ 4.55622145e-17 -2.27811073e-16 -2.42998478e-17 -2.00473744e-16
  3.64497716e-17  1.64023972e-16 -7.28995433e-17  2.55148401e-16
 -1.15424277e-16 -2.06548706e-16  1.21499239e-16 -2.42998478e-16
  7.28995433e-17 -5.90410363e-17 -3.95821739e-17]

array([[ 0.35321987,  1.17166361,  0.22597817, ..., -0.96287933,
        -0.50491507,  0.93913327],
       [-0.06391244,  0.86197636, -0.10447507, ..., -0.96287933,
        -0.50491507,  1.08358325],
       [ 0.07513167,  1.36294102, -0.15734759, ..., -0.96287933,
        -0.50491507,  1.22803323],
       ...,
       [ 0.21417577,  0.17884273,  0.29206882, ..., -0.6720105 ,
        -1.50480721,  0.0724334 ],
       [ 0.00560961, -0.45419443, -0.09125694, ..., -0.6720105 ,
        -1.50480721,  0.21688338],
       [ 0.07513167, -0.14450718,  0.22597817, ..., -0.6720105 ,
        -1.50480721,  0.36133336]])

(7486, 15) (1871, 15)

[_NumericColumn(key='CO_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S1_CO', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='C6H6_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S2_NMHC', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NOx_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S3_NOx', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NO2_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S4_NO2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S5_O3', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='T', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='RH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='AH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Month', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Weekday', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Hours', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'train_Wojtek', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000001E130F139E8>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

In the previous post I built a linear regression model that had a square of 90

http://sigmaquality.pl/python/linear-regression-3/

I change the names of the columns to use the Tensorflow model

df.columns = ['CO_GT', 'PT08.S1_CO', 'C6H6_GT', 'PT08.S2_NMHC',
       'NOx_GT', 'PT08.S3_NOx', 'NO2_GT', 'PT08.S4_NO2', 'PT08.S5_O3',
       'T', 'RH', 'AH', 'Month', 'Weekday', 'Hours']

Standardization of data in Sklearn¶

df.head(2)

Matrix created from the table above.

a = np.array(df)
a

array([[   2.6, 1360. ,   11.9, ...,    3. ,    2. ,   18. ],
       [   2. , 1292. ,    9.4, ...,    3. ,    2. ,   19. ],
       [   2.2, 1402. ,    9. , ...,    3. ,    2. ,   20. ],
       ...,
       [   2.4, 1142. ,   12.4, ...,    4. ,    0. ,   12. ],
       [   2.1, 1003. ,    9.5, ...,    4. ,    0. ,   13. ],
       [   2.2, 1071. ,   11.9, ...,    4. ,    0. ,   14. ]])

The average of the columns is:

np.mean(a, axis=0)

array([2.09193117e+00, 1.10273036e+03, 1.01903922e+01, 9.42548253e+02,
       2.34058566e+02, 8.32742225e+02, 1.09698942e+02, 1.45301453e+03,
       1.03051192e+03, 1.83173560e+01, 4.88174308e+01, 1.01738155e+00,
       6.31035588e+00, 3.00993908e+00, 1.14985572e+01])

The standard deviation of the columns is:

np.std(a, axis=0)

array([1.43839252e+00, 2.19576367e+02, 7.56536693e+00, 2.69566963e+02,
       2.04971518e+02, 2.55695758e+02, 4.75175481e+01, 3.47415518e+02,
       4.10894801e+02, 8.82141160e+00, 1.73533985e+01, 4.04807227e-01,
       3.43797585e+00, 2.00021575e+00, 6.92281165e+00])

We transform the data into a standard normal distribution with an average of 0 and standard deviation of 1.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaled_data = scaler.fit_transform(a)
k = scaler.transform(df)
k

array([[ 0.35321987,  1.17166361,  0.22597817, ..., -0.96287933,
        -0.50491507,  0.93913327],
       [-0.06391244,  0.86197636, -0.10447507, ..., -0.96287933,
        -0.50491507,  1.08358325],
       [ 0.07513167,  1.36294102, -0.15734759, ..., -0.96287933,
        -0.50491507,  1.22803323],
       ...,
       [ 0.21417577,  0.17884273,  0.29206882, ..., -0.6720105 ,
        -1.50480721,  0.0724334 ],
       [ 0.00560961, -0.45419443, -0.09125694, ..., -0.6720105 ,
        -1.50480721,  0.21688338],
       [ 0.07513167, -0.14450718,  0.22597817, ..., -0.6720105 ,
        -1.50480721,  0.36133336]])

The mean and standard deviation of the columns after standardization.

print("standard deviation: ",np.std(k, axis=0))
#k=k.astype(int)
print()
print("mean: ",np.mean(k, axis=0))

standard deviation:  [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

mean:  [ 4.55622145e-17 -2.27811073e-16 -2.42998478e-17 -2.00473744e-16
  3.64497716e-17  1.64023972e-16 -7.28995433e-17  2.55148401e-16
 -1.15424277e-16 -2.06548706e-16  1.21499239e-16 -2.42998478e-16
  7.28995433e-17 -5.90410363e-17 -3.95821739e-17]

array([[ 0.35321987,  1.17166361,  0.22597817, ..., -0.96287933,
        -0.50491507,  0.93913327],
       [-0.06391244,  0.86197636, -0.10447507, ..., -0.96287933,
        -0.50491507,  1.08358325],
       [ 0.07513167,  1.36294102, -0.15734759, ..., -0.96287933,
        -0.50491507,  1.22803323],
       ...,
       [ 0.21417577,  0.17884273,  0.29206882, ..., -0.6720105 ,
        -1.50480721,  0.0724334 ],
       [ 0.00560961, -0.45419443, -0.09125694, ..., -0.6720105 ,
        -1.50480721,  0.21688338],
       [ 0.07513167, -0.14450718,  0.22597817, ..., -0.6720105 ,
        -1.50480721,  0.36133336]])

(7486, 15) (1871, 15)

[_NumericColumn(key='CO_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S1_CO', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='C6H6_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S2_NMHC', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NOx_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S3_NOx', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NO2_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S4_NO2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S5_O3', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='T', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='RH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='AH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Month', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Weekday', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Hours', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'train_Wojtek', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000001E130F139E8>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from train_Wojtekmodel.ckpt-32000
INFO:tensorflow:Saving checkpoints for 32001 into train_Wojtekmodel.ckpt.
INFO:tensorflow:loss = 5.8126163, step = 32001
INFO:tensorflow:global_step/sec: 263.507
INFO:tensorflow:loss = 3.952513, step = 32101 (0.379 sec)
INFO:tensorflow:global_step/sec: 299.618
INFO:tensorflow:loss = 3.8751945, step = 32201 (0.349 sec)
INFO:tensorflow:global_step/sec: 282.839
INFO:tensorflow:loss = 4.1194134, step = 32301 (0.338 sec)
INFO:tensorflow:global_step/sec: 303.789
INFO:tensorflow:loss = 3.1714904, step = 32401 (0.329 sec)
INFO:tensorflow:global_step/sec: 298.869
INFO:tensorflow:loss = 4.1107397, step = 32501 (0.335 sec)
INFO:tensorflow:global_step/sec: 300.399
INFO:tensorflow:loss = 1.9511085, step = 32601 (0.349 sec)
INFO:tensorflow:global_step/sec: 294.727
INFO:tensorflow:loss = 4.6305227, step = 32701 (0.339 sec)
INFO:tensorflow:global_step/sec: 296.434
INFO:tensorflow:loss = 4.6675496, step = 32801 (0.322 sec)
INFO:tensorflow:global_step/sec: 296.704
INFO:tensorflow:loss = 3.619099, step = 32901 (0.337 sec)
INFO:tensorflow:Saving checkpoints for 33000 into train_Wojtekmodel.ckpt.
INFO:tensorflow:Loss for final step: 3.9190884.

<tensorflow.python.estimator.canned.linear.LinearRegressor at 0x1e130f13630>

INFO:tensorflow:Starting evaluation at 2019-12-23-14:53:48
INFO:tensorflow:Restoring parameters from train_Wojtekmodel.ckpt-33000
INFO:tensorflow:Finished evaluation at 2019-12-23-14:53:49
INFO:tensorflow:Saving dict for global step 33000: average_loss = 0.03159822, global_step = 33000, loss = 9.853379

INFO:tensorflow:Restoring parameters from train_Wojtekmodel.ckpt-33000

dtype('float32')

We substitute standardized data for Tensorflow

conc = np.vstack(k)
conc

array([[ 0.35321987,  1.17166361,  0.22597817, ..., -0.96287933,
        -0.50491507,  0.93913327],
       [-0.06391244,  0.86197636, -0.10447507, ..., -0.96287933,
        -0.50491507,  1.08358325],
       [ 0.07513167,  1.36294102, -0.15734759, ..., -0.96287933,
        -0.50491507,  1.22803323],
       ...,
       [ 0.21417577,  0.17884273,  0.29206882, ..., -0.6720105 ,
        -1.50480721,  0.0724334 ],
       [ 0.00560961, -0.45419443, -0.09125694, ..., -0.6720105 ,
        -1.50480721,  0.21688338],
       [ 0.07513167, -0.14450718,  0.22597817, ..., -0.6720105 ,
        -1.50480721,  0.36133336]])

SKS = pd.DataFrame(conc)
SKS.columns = ['CO_GT', 'PT08.S1_CO', 'C6H6_GT', 'PT08.S2_NMHC',
       'NOx_GT', 'PT08.S3_NOx', 'NO2_GT', 'PT08.S4_NO2', 'PT08.S5_O3',
       'T', 'RH', 'AH', 'Month', 'Weekday', 'Hours']

SKS.head(3)

Teraz dodajemy wartość wynikowa która nie została wystandaryzowana.

Now that we have standardized data, we can create a new Tensorflow linear regression model

Tensorflow linear regression model without standardization

We take the original data without standardization

Step 1. Divides the data into a test and training set</span>

df_train=df.sample(frac=0.8,random_state=200)
df_test=df.drop(df_train.index)

print(df_train.shape, df_test.shape)

(7486, 15) (1871, 15)

[_NumericColumn(key='CO_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S1_CO', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='C6H6_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S2_NMHC', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NOx_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S3_NOx', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NO2_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S4_NO2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S5_O3', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='T', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='RH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='AH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Month', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Weekday', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Hours', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'train_Wojtek', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000001E130F139E8>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from train_Wojtekmodel.ckpt-32000
INFO:tensorflow:Saving checkpoints for 32001 into train_Wojtekmodel.ckpt.
INFO:tensorflow:loss = 5.8126163, step = 32001
INFO:tensorflow:global_step/sec: 263.507
INFO:tensorflow:loss = 3.952513, step = 32101 (0.379 sec)
INFO:tensorflow:global_step/sec: 299.618
INFO:tensorflow:loss = 3.8751945, step = 32201 (0.349 sec)
INFO:tensorflow:global_step/sec: 282.839
INFO:tensorflow:loss = 4.1194134, step = 32301 (0.338 sec)
INFO:tensorflow:global_step/sec: 303.789
INFO:tensorflow:loss = 3.1714904, step = 32401 (0.329 sec)
INFO:tensorflow:global_step/sec: 298.869
INFO:tensorflow:loss = 4.1107397, step = 32501 (0.335 sec)
INFO:tensorflow:global_step/sec: 300.399
INFO:tensorflow:loss = 1.9511085, step = 32601 (0.349 sec)
INFO:tensorflow:global_step/sec: 294.727
INFO:tensorflow:loss = 4.6305227, step = 32701 (0.339 sec)
INFO:tensorflow:global_step/sec: 296.434
INFO:tensorflow:loss = 4.6675496, step = 32801 (0.322 sec)
INFO:tensorflow:global_step/sec: 296.704
INFO:tensorflow:loss = 3.619099, step = 32901 (0.337 sec)
INFO:tensorflow:Saving checkpoints for 33000 into train_Wojtekmodel.ckpt.
INFO:tensorflow:Loss for final step: 3.9190884.

<tensorflow.python.estimator.canned.linear.LinearRegressor at 0x1e130f13630>

INFO:tensorflow:Starting evaluation at 2019-12-23-14:53:48
INFO:tensorflow:Restoring parameters from train_Wojtekmodel.ckpt-33000
INFO:tensorflow:Finished evaluation at 2019-12-23-14:53:49
INFO:tensorflow:Saving dict for global step 33000: average_loss = 0.03159822, global_step = 33000, loss = 9.853379

INFO:tensorflow:Restoring parameters from train_Wojtekmodel.ckpt-33000

dtype('float32')

dtype('float32')

y         float32
y_pred    float32
dtype: object

df_train.head(3)

Step 2. Converts data to Tensorflow format¶

COL = ['CO_GT', 'PT08.S1_CO', 'C6H6_GT', 'PT08.S2_NMHC',
       'NOx_GT', 'PT08.S3_NOx', 'NO2_GT', 'PT08.S4_NO2', 'PT08.S5_O3',
       'T', 'RH', 'AH', 'Month', 'Weekday', 'Hours']


features = [tf.feature_column.numeric_column(k) for k in COL]
features

[_NumericColumn(key='CO_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S1_CO', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='C6H6_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S2_NMHC', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NOx_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S3_NOx', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NO2_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S4_NO2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S5_O3', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='T', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='RH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='AH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Month', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Weekday', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Hours', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

Step 3. Tensorflow Linear Regression Estimator

katalog: train_Wojtek

model = tf.estimator.LinearRegressor(model_dir="train_Wojtek", feature_columns=features)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'train_Wojtek', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000001E130F139E8>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from train_Wojtekmodel.ckpt-32000
INFO:tensorflow:Saving checkpoints for 32001 into train_Wojtekmodel.ckpt.
INFO:tensorflow:loss = 5.8126163, step = 32001
INFO:tensorflow:global_step/sec: 263.507
INFO:tensorflow:loss = 3.952513, step = 32101 (0.379 sec)
INFO:tensorflow:global_step/sec: 299.618
INFO:tensorflow:loss = 3.8751945, step = 32201 (0.349 sec)
INFO:tensorflow:global_step/sec: 282.839
INFO:tensorflow:loss = 4.1194134, step = 32301 (0.338 sec)
INFO:tensorflow:global_step/sec: 303.789
INFO:tensorflow:loss = 3.1714904, step = 32401 (0.329 sec)
INFO:tensorflow:global_step/sec: 298.869
INFO:tensorflow:loss = 4.1107397, step = 32501 (0.335 sec)
INFO:tensorflow:global_step/sec: 300.399
INFO:tensorflow:loss = 1.9511085, step = 32601 (0.349 sec)
INFO:tensorflow:global_step/sec: 294.727
INFO:tensorflow:loss = 4.6305227, step = 32701 (0.339 sec)
INFO:tensorflow:global_step/sec: 296.434
INFO:tensorflow:loss = 4.6675496, step = 32801 (0.322 sec)
INFO:tensorflow:global_step/sec: 296.704
INFO:tensorflow:loss = 3.619099, step = 32901 (0.337 sec)
INFO:tensorflow:Saving checkpoints for 33000 into train_Wojtekmodel.ckpt.
INFO:tensorflow:Loss for final step: 3.9190884.

<tensorflow.python.estimator.canned.linear.LinearRegressor at 0x1e130f13630>

INFO:tensorflow:Starting evaluation at 2019-12-23-14:53:48
INFO:tensorflow:Restoring parameters from train_Wojtekmodel.ckpt-33000
INFO:tensorflow:Finished evaluation at 2019-12-23-14:53:49
INFO:tensorflow:Saving dict for global step 33000: average_loss = 0.03159822, global_step = 33000, loss = 9.853379

INFO:tensorflow:Restoring parameters from train_Wojtekmodel.ckpt-33000

dtype('float32')

dtype('float32')

y         float32
y_pred    float32
dtype: object

<tf.Tensor 'Sub_2:0' shape=() dtype=float32>

R Square parameter:  0.9838469

Step 4. Defining how to feed the model and what is the result variable

FEATURES = ['CO_GT','PT08.S1_CO', 'C6H6_GT', 'PT08.S2_NMHC',
       'NOx_GT', 'PT08.S3_NOx', 'NO2_GT', 'PT08.S4_NO2', 'PT08.S5_O3',
       'T', 'RH', 'AH', 'Month', 'Weekday', 'Hours']
LABEL= 'CO_GT'

def get_input_fn(data_set, num_epochs=None, n_batch = 128, shuffle=True):
    return tf.estimator.inputs.pandas_input_fn(
       x=pd.DataFrame({k: data_set[k].values for k in FEATURES}),
       y = pd.Series(data_set[LABEL].values),
       batch_size=n_batch,   
       num_epochs=num_epochs,
       shuffle=shuffle)

Step 5. Training the model

model.train(input_fn=get_input_fn(df_train, 
                                      num_epochs=None,
                                      n_batch = 128,
                                      shuffle=False),
                                      steps=1000)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from train_Wojtekmodel.ckpt-32000
INFO:tensorflow:Saving checkpoints for 32001 into train_Wojtekmodel.ckpt.
INFO:tensorflow:loss = 5.8126163, step = 32001
INFO:tensorflow:global_step/sec: 263.507
INFO:tensorflow:loss = 3.952513, step = 32101 (0.379 sec)
INFO:tensorflow:global_step/sec: 299.618
INFO:tensorflow:loss = 3.8751945, step = 32201 (0.349 sec)
INFO:tensorflow:global_step/sec: 282.839
INFO:tensorflow:loss = 4.1194134, step = 32301 (0.338 sec)
INFO:tensorflow:global_step/sec: 303.789
INFO:tensorflow:loss = 3.1714904, step = 32401 (0.329 sec)
INFO:tensorflow:global_step/sec: 298.869
INFO:tensorflow:loss = 4.1107397, step = 32501 (0.335 sec)
INFO:tensorflow:global_step/sec: 300.399
INFO:tensorflow:loss = 1.9511085, step = 32601 (0.349 sec)
INFO:tensorflow:global_step/sec: 294.727
INFO:tensorflow:loss = 4.6305227, step = 32701 (0.339 sec)
INFO:tensorflow:global_step/sec: 296.434
INFO:tensorflow:loss = 4.6675496, step = 32801 (0.322 sec)
INFO:tensorflow:global_step/sec: 296.704
INFO:tensorflow:loss = 3.619099, step = 32901 (0.337 sec)
INFO:tensorflow:Saving checkpoints for 33000 into train_Wojtekmodel.ckpt.
INFO:tensorflow:Loss for final step: 3.9190884.

<tensorflow.python.estimator.canned.linear.LinearRegressor at 0x1e130f13630>

INFO:tensorflow:Starting evaluation at 2019-12-23-14:53:48
INFO:tensorflow:Restoring parameters from train_Wojtekmodel.ckpt-33000
INFO:tensorflow:Finished evaluation at 2019-12-23-14:53:49
INFO:tensorflow:Saving dict for global step 33000: average_loss = 0.03159822, global_step = 33000, loss = 9.853379

INFO:tensorflow:Restoring parameters from train_Wojtekmodel.ckpt-33000

dtype('float32')

dtype('float32')

y         float32
y_pred    float32
dtype: object

<tf.Tensor 'Sub_2:0' shape=() dtype=float32>

R Square parameter:  0.9838469

(7486, 15) (1871, 15)

Step 6. Model assessment

ev = model.evaluate(    
          input_fn=get_input_fn(df_test,                          
          num_epochs=1,                          
          n_batch = 356,                          
          shuffle=False))

INFO:tensorflow:Starting evaluation at 2019-12-23-14:53:48
INFO:tensorflow:Restoring parameters from train_Wojtekmodel.ckpt-33000
INFO:tensorflow:Finished evaluation at 2019-12-23-14:53:49
INFO:tensorflow:Saving dict for global step 33000: average_loss = 0.03159822, global_step = 33000, loss = 9.853379

INFO:tensorflow:Restoring parameters from train_Wojtekmodel.ckpt-33000

dtype('float32')

dtype('float32')

y         float32
y_pred    float32
dtype: object

<tf.Tensor 'Sub_2:0' shape=() dtype=float32>

R Square parameter:  0.9838469

(7486, 15) (1871, 15)

[_NumericColumn(key='CO_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S1_CO', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='C6H6_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S2_NMHC', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NOx_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S3_NOx', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NO2_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S4_NO2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S5_O3', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='T', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='RH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='AH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Month', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Weekday', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Hours', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'train_Wojtek7', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000001E131AA1358>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

Calculation of R2

I make a prediction on a test set.

y = model.predict(    
         input_fn=get_input_fn(df_test,                          
         num_epochs=1,                          
         n_batch = 256,                          
         shuffle=False))

import itertools

predictions = list(p["predictions"] for p in itertools.islice(y, 3000))
#print("Predictions: {}".format(str(predictions)))

INFO:tensorflow:Restoring parameters from train_Wojtekmodel.ckpt-33000

dtype('float32')

dtype('float32')

y         float32
y_pred    float32
dtype: object

<tf.Tensor 'Sub_2:0' shape=() dtype=float32>

R Square parameter:  0.9838469

(7486, 15) (1871, 15)

[_NumericColumn(key='CO_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S1_CO', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='C6H6_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S2_NMHC', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NOx_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S3_NOx', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NO2_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S4_NO2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S5_O3', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='T', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='RH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='AH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Month', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Weekday', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Hours', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'train_Wojtek7', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000001E131AA1358>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from train_Wojtek7model.ckpt-5000
INFO:tensorflow:Saving checkpoints for 5001 into train_Wojtek7model.ckpt.
INFO:tensorflow:loss = 5.9656813e-09, step = 5001
INFO:tensorflow:global_step/sec: 275.63
INFO:tensorflow:loss = 4.005897e-09, step = 5101 (0.370 sec)
INFO:tensorflow:global_step/sec: 322.183
INFO:tensorflow:loss = 2.7887523e-09, step = 5201 (0.318 sec)
INFO:tensorflow:global_step/sec: 257.555
INFO:tensorflow:loss = 3.2279417e-09, step = 5301 (0.388 sec)
INFO:tensorflow:global_step/sec: 258.823
INFO:tensorflow:loss = 2.1228068e-09, step = 5401 (0.371 sec)
INFO:tensorflow:global_step/sec: 223.394
INFO:tensorflow:loss = 1.5789352e-09, step = 5501 (0.448 sec)
INFO:tensorflow:global_step/sec: 219.565
INFO:tensorflow:loss = 1.301308e-09, step = 5601 (0.455 sec)
INFO:tensorflow:global_step/sec: 204.645
INFO:tensorflow:loss = 1.0119683e-09, step = 5701 (0.501 sec)
INFO:tensorflow:global_step/sec: 191.976
INFO:tensorflow:loss = 7.569053e-10, step = 5801 (0.524 sec)
INFO:tensorflow:global_step/sec: 229.883
INFO:tensorflow:loss = 1.9050654e-09, step = 5901 (0.419 sec)
INFO:tensorflow:Saving checkpoints for 6000 into train_Wojtek7model.ckpt.
INFO:tensorflow:Loss for final step: 3.4729108e-10.

Przekształcam wynik na dataframe

import numpy as np

conc = np.vstack(predictions)
ZHP = pd.DataFrame(conc)
ZHP.rename(columns={0:'y_pred'}, inplace=True)

kot = ZHP['y_pred'].values
kot = kot.astype('float32')
kot.dtype

dtype('float32')

I agree on the data formats for the theoretical and empirical variable

y = df_test['CO_GT'].values
y = y.astype('float32')
y.dtype

dtype('float32')

PZU = pd.DataFrame({'y': y, 'y_pred': kot })
PZU.dtypes

y         float32
y_pred    float32
dtype: object

def R_squared(y, y_pred):
    
  residual = tf.reduce_sum(tf.square(tf.subtract(y,y_pred)))
  total = tf.reduce_sum(tf.square(tf.subtract(y, tf.reduce_mean(y))))
  r2 = tf.subtract(1.0, tf.div(residual, total))
  return r2

residual = tf.reduce_sum(tf.square(tf.subtract(y,kot)))
total = tf.reduce_sum(tf.square(tf.subtract(y, tf.reduce_mean(y))))
r2 = tf.subtract(1.0, tf.div(residual, total))
r2

<tf.Tensor 'Sub_2:0' shape=() dtype=float32>

sess = tf.Session()
a = sess.run(r2)
print('R Square parameter: ',a)

R Square parameter:  0.9838469

(7486, 15) (1871, 15)

[_NumericColumn(key='CO_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S1_CO', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='C6H6_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S2_NMHC', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NOx_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S3_NOx', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NO2_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S4_NO2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S5_O3', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='T', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='RH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='AH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Month', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Weekday', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Hours', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'train_Wojtek7', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000001E131AA1358>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from train_Wojtek7model.ckpt-5000
INFO:tensorflow:Saving checkpoints for 5001 into train_Wojtek7model.ckpt.
INFO:tensorflow:loss = 5.9656813e-09, step = 5001
INFO:tensorflow:global_step/sec: 275.63
INFO:tensorflow:loss = 4.005897e-09, step = 5101 (0.370 sec)
INFO:tensorflow:global_step/sec: 322.183
INFO:tensorflow:loss = 2.7887523e-09, step = 5201 (0.318 sec)
INFO:tensorflow:global_step/sec: 257.555
INFO:tensorflow:loss = 3.2279417e-09, step = 5301 (0.388 sec)
INFO:tensorflow:global_step/sec: 258.823
INFO:tensorflow:loss = 2.1228068e-09, step = 5401 (0.371 sec)
INFO:tensorflow:global_step/sec: 223.394
INFO:tensorflow:loss = 1.5789352e-09, step = 5501 (0.448 sec)
INFO:tensorflow:global_step/sec: 219.565
INFO:tensorflow:loss = 1.301308e-09, step = 5601 (0.455 sec)
INFO:tensorflow:global_step/sec: 204.645
INFO:tensorflow:loss = 1.0119683e-09, step = 5701 (0.501 sec)
INFO:tensorflow:global_step/sec: 191.976
INFO:tensorflow:loss = 7.569053e-10, step = 5801 (0.524 sec)
INFO:tensorflow:global_step/sec: 229.883
INFO:tensorflow:loss = 1.9050654e-09, step = 5901 (0.419 sec)
INFO:tensorflow:Saving checkpoints for 6000 into train_Wojtek7model.ckpt.
INFO:tensorflow:Loss for final step: 3.4729108e-10.

<tensorflow.python.estimator.canned.linear.LinearRegressor at 0x1e131438860>

INFO:tensorflow:Starting evaluation at 2019-12-23-14:53:56
INFO:tensorflow:Restoring parameters from train_Wojtek7model.ckpt-6000
INFO:tensorflow:Finished evaluation at 2019-12-23-14:53:57
INFO:tensorflow:Saving dict for global step 6000: average_loss = 2.491106e-12, global_step = 6000, loss = 7.768099e-10

INFO:tensorflow:Restoring parameters from train_Wojtek7model.ckpt-6000

dtype('float32')

dtype('float32')

Tensorflow linear regression model with standardization

We are now substituting standardized data

WE STANDARDIZE INDEPENDENT VARIABLES ONLY BECAUSE WE WANT TO HAVE EASY TO READ THE RESULT IN THE LINEAR REGRESSION MODEL

Step 1. Divides the data into a test and training set

del SKS['CO_GT']
WKD = pd.concat([df['CO_GT'], SKS], axis=1, sort=False)

WKD.head(3)

df_trainS=WKD.sample(frac=0.8,random_state=200)
df_testS=WKD.drop(df_trainS.index)

print(df_trainS.shape, df_testS.shape)
df_trainS.head(3)

(7486, 15) (1871, 15)

[_NumericColumn(key='CO_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S1_CO', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='C6H6_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S2_NMHC', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NOx_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S3_NOx', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NO2_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S4_NO2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S5_O3', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='T', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='RH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='AH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Month', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Weekday', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Hours', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'train_Wojtek7', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000001E131AA1358>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from train_Wojtek7model.ckpt-5000
INFO:tensorflow:Saving checkpoints for 5001 into train_Wojtek7model.ckpt.
INFO:tensorflow:loss = 5.9656813e-09, step = 5001
INFO:tensorflow:global_step/sec: 275.63
INFO:tensorflow:loss = 4.005897e-09, step = 5101 (0.370 sec)
INFO:tensorflow:global_step/sec: 322.183
INFO:tensorflow:loss = 2.7887523e-09, step = 5201 (0.318 sec)
INFO:tensorflow:global_step/sec: 257.555
INFO:tensorflow:loss = 3.2279417e-09, step = 5301 (0.388 sec)
INFO:tensorflow:global_step/sec: 258.823
INFO:tensorflow:loss = 2.1228068e-09, step = 5401 (0.371 sec)
INFO:tensorflow:global_step/sec: 223.394
INFO:tensorflow:loss = 1.5789352e-09, step = 5501 (0.448 sec)
INFO:tensorflow:global_step/sec: 219.565
INFO:tensorflow:loss = 1.301308e-09, step = 5601 (0.455 sec)
INFO:tensorflow:global_step/sec: 204.645
INFO:tensorflow:loss = 1.0119683e-09, step = 5701 (0.501 sec)
INFO:tensorflow:global_step/sec: 191.976
INFO:tensorflow:loss = 7.569053e-10, step = 5801 (0.524 sec)
INFO:tensorflow:global_step/sec: 229.883
INFO:tensorflow:loss = 1.9050654e-09, step = 5901 (0.419 sec)
INFO:tensorflow:Saving checkpoints for 6000 into train_Wojtek7model.ckpt.
INFO:tensorflow:Loss for final step: 3.4729108e-10.

<tensorflow.python.estimator.canned.linear.LinearRegressor at 0x1e131438860>

INFO:tensorflow:Starting evaluation at 2019-12-23-14:53:56
INFO:tensorflow:Restoring parameters from train_Wojtek7model.ckpt-6000
INFO:tensorflow:Finished evaluation at 2019-12-23-14:53:57
INFO:tensorflow:Saving dict for global step 6000: average_loss = 2.491106e-12, global_step = 6000, loss = 7.768099e-10

INFO:tensorflow:Restoring parameters from train_Wojtek7model.ckpt-6000

dtype('float32')

dtype('float32')

y         float32
y_pred    float32
dtype: object

Step 2. Converts data to Tensorflow format

FOL = ['CO_GT', 'PT08.S1_CO', 'C6H6_GT', 'PT08.S2_NMHC',
       'NOx_GT', 'PT08.S3_NOx', 'NO2_GT', 'PT08.S4_NO2', 'PT08.S5_O3',
       'T', 'RH', 'AH', 'Month', 'Weekday', 'Hours']


PCK = [tf.feature_column.numeric_column(k) for k in FOL]
PCK

[_NumericColumn(key='CO_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S1_CO', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='C6H6_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S2_NMHC', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NOx_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S3_NOx', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NO2_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S4_NO2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S5_O3', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='T', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='RH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='AH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Month', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Weekday', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Hours', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

Step 3. Tensorflow Linear Regression Estimator

katalog: train_Wojtek

model = tf.estimator.LinearRegressor(model_dir="train_Wojtek7", feature_columns=PCK)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'train_Wojtek7', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000001E131AA1358>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from train_Wojtek7model.ckpt-5000
INFO:tensorflow:Saving checkpoints for 5001 into train_Wojtek7model.ckpt.
INFO:tensorflow:loss = 5.9656813e-09, step = 5001
INFO:tensorflow:global_step/sec: 275.63
INFO:tensorflow:loss = 4.005897e-09, step = 5101 (0.370 sec)
INFO:tensorflow:global_step/sec: 322.183
INFO:tensorflow:loss = 2.7887523e-09, step = 5201 (0.318 sec)
INFO:tensorflow:global_step/sec: 257.555
INFO:tensorflow:loss = 3.2279417e-09, step = 5301 (0.388 sec)
INFO:tensorflow:global_step/sec: 258.823
INFO:tensorflow:loss = 2.1228068e-09, step = 5401 (0.371 sec)
INFO:tensorflow:global_step/sec: 223.394
INFO:tensorflow:loss = 1.5789352e-09, step = 5501 (0.448 sec)
INFO:tensorflow:global_step/sec: 219.565
INFO:tensorflow:loss = 1.301308e-09, step = 5601 (0.455 sec)
INFO:tensorflow:global_step/sec: 204.645
INFO:tensorflow:loss = 1.0119683e-09, step = 5701 (0.501 sec)
INFO:tensorflow:global_step/sec: 191.976
INFO:tensorflow:loss = 7.569053e-10, step = 5801 (0.524 sec)
INFO:tensorflow:global_step/sec: 229.883
INFO:tensorflow:loss = 1.9050654e-09, step = 5901 (0.419 sec)
INFO:tensorflow:Saving checkpoints for 6000 into train_Wojtek7model.ckpt.
INFO:tensorflow:Loss for final step: 3.4729108e-10.

<tensorflow.python.estimator.canned.linear.LinearRegressor at 0x1e131438860>

INFO:tensorflow:Starting evaluation at 2019-12-23-14:53:56
INFO:tensorflow:Restoring parameters from train_Wojtek7model.ckpt-6000
INFO:tensorflow:Finished evaluation at 2019-12-23-14:53:57
INFO:tensorflow:Saving dict for global step 6000: average_loss = 2.491106e-12, global_step = 6000, loss = 7.768099e-10

INFO:tensorflow:Restoring parameters from train_Wojtek7model.ckpt-6000

dtype('float32')

dtype('float32')

y         float32
y_pred    float32
dtype: object

<tf.Tensor 'Sub_5:0' shape=() dtype=float32>

R Square parameter:  1.0

Step 4. Defining how to feed the model and what is the result variable

Step 5. Training the model

model.train(input_fn=get_input_fn(df_trainS, 
                                      num_epochs=None,
                                      n_batch = 128,
                                      shuffle=False),
                                      steps=1000)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from train_Wojtek7model.ckpt-5000
INFO:tensorflow:Saving checkpoints for 5001 into train_Wojtek7model.ckpt.
INFO:tensorflow:loss = 5.9656813e-09, step = 5001
INFO:tensorflow:global_step/sec: 275.63
INFO:tensorflow:loss = 4.005897e-09, step = 5101 (0.370 sec)
INFO:tensorflow:global_step/sec: 322.183
INFO:tensorflow:loss = 2.7887523e-09, step = 5201 (0.318 sec)
INFO:tensorflow:global_step/sec: 257.555
INFO:tensorflow:loss = 3.2279417e-09, step = 5301 (0.388 sec)
INFO:tensorflow:global_step/sec: 258.823
INFO:tensorflow:loss = 2.1228068e-09, step = 5401 (0.371 sec)
INFO:tensorflow:global_step/sec: 223.394
INFO:tensorflow:loss = 1.5789352e-09, step = 5501 (0.448 sec)
INFO:tensorflow:global_step/sec: 219.565
INFO:tensorflow:loss = 1.301308e-09, step = 5601 (0.455 sec)
INFO:tensorflow:global_step/sec: 204.645
INFO:tensorflow:loss = 1.0119683e-09, step = 5701 (0.501 sec)
INFO:tensorflow:global_step/sec: 191.976
INFO:tensorflow:loss = 7.569053e-10, step = 5801 (0.524 sec)
INFO:tensorflow:global_step/sec: 229.883
INFO:tensorflow:loss = 1.9050654e-09, step = 5901 (0.419 sec)
INFO:tensorflow:Saving checkpoints for 6000 into train_Wojtek7model.ckpt.
INFO:tensorflow:Loss for final step: 3.4729108e-10.

<tensorflow.python.estimator.canned.linear.LinearRegressor at 0x1e131438860>

INFO:tensorflow:Starting evaluation at 2019-12-23-14:53:56
INFO:tensorflow:Restoring parameters from train_Wojtek7model.ckpt-6000
INFO:tensorflow:Finished evaluation at 2019-12-23-14:53:57
INFO:tensorflow:Saving dict for global step 6000: average_loss = 2.491106e-12, global_step = 6000, loss = 7.768099e-10

INFO:tensorflow:Restoring parameters from train_Wojtek7model.ckpt-6000

dtype('float32')

dtype('float32')

y         float32
y_pred    float32
dtype: object

<tf.Tensor 'Sub_5:0' shape=() dtype=float32>

R Square parameter:  1.0

Step 6. Model assessment

ev = model.evaluate(    
          input_fn=get_input_fn(df_testS,                          
          num_epochs=1,                          
          n_batch = 356,                          
          shuffle=False))

INFO:tensorflow:Starting evaluation at 2019-12-23-14:53:56
INFO:tensorflow:Restoring parameters from train_Wojtek7model.ckpt-6000
INFO:tensorflow:Finished evaluation at 2019-12-23-14:53:57
INFO:tensorflow:Saving dict for global step 6000: average_loss = 2.491106e-12, global_step = 6000, loss = 7.768099e-10

INFO:tensorflow:Restoring parameters from train_Wojtek7model.ckpt-6000

dtype('float32')

dtype('float32')

y         float32
y_pred    float32
dtype: object

<tf.Tensor 'Sub_5:0' shape=() dtype=float32>

R Square parameter:  1.0

Calculation of R2

I make a prediction on a test set.

y = model.predict(    
         input_fn=get_input_fn(df_testS,                          
         num_epochs=1,                          
         n_batch = 256,                          
         shuffle=False))

import itertools

predictions = list(p["predictions"] for p in itertools.islice(y, 3000))

INFO:tensorflow:Restoring parameters from train_Wojtek7model.ckpt-6000

dtype('float32')

dtype('float32')

y         float32
y_pred    float32
dtype: object

<tf.Tensor 'Sub_5:0' shape=() dtype=float32>

R Square parameter:  1.0

import numpy as np

conc = np.vstack(predictions)
ZHP = pd.DataFrame(conc)
ZHP.rename(columns={0:'y_pred'}, inplace=True)

kot = ZHP['y_pred'].values
kot = kot.astype('float32')
kot.dtype

dtype('float32')

y = df_testS['CO_GT'].values
y = y.astype('float32')
y.dtype

dtype('float32')

PZU = pd.DataFrame({'y': y, 'y_pred': kot })
PZU.dtypes

y         float32
y_pred    float32
dtype: object

def R_squared(y, y_pred):
    
  residual = tf.reduce_sum(tf.square(tf.subtract(y,y_pred)))
  total = tf.reduce_sum(tf.square(tf.subtract(y, tf.reduce_mean(y))))
  r2 = tf.subtract(1.0, tf.div(residual, total))
  return r2

residual = tf.reduce_sum(tf.square(tf.subtract(y,kot)))
total = tf.reduce_sum(tf.square(tf.subtract(y, tf.reduce_mean(y))))
r2 = tf.subtract(1.0, tf.div(residual, total))
r2

<tf.Tensor 'Sub_5:0' shape=() dtype=float32>

sess = tf.Session()
a = sess.run(r2)
print('R Square parameter: ',a)

R Square parameter:  1.0

	CO(GT)	PT08.S1(CO)	C6H6(GT)	PT08.S2(NMHC)	NOx(GT)	PT08.S3(NOx)	NO2(GT)	PT08.S4(NO2)	PT08.S5(O3)	T	RH	AH	Month	Weekday	Hours
0	2.6	1360.0	11.9	1046.0	166.0	1056.0	113.0	1692.0	1268.0	13.6	48.9	0.7578	3	2	18
1	2.0	1292.0	9.4	955.0	103.0	1174.0	92.0	1559.0	972.0	13.3	47.7	0.7255	3	2	19
2	2.2	1402.0	9.0	939.0	131.0	1140.0	114.0	1555.0	1074.0	11.9	54.0	0.7502	3	2	20

	CO_GT	PT08.S1_CO	C6H6_GT	PT08.S2_NMHC	NOx_GT	PT08.S3_NOx	NO2_GT	PT08.S4_NO2	PT08.S5_O3	T	RH	AH	Month	Weekday	Hours
0	0.353220	1.171664	0.225978	0.383770	-0.332039	0.873138	0.069470	0.687895	0.577978	-0.534762	0.004758	-0.641247	-0.962879	-0.504915	0.939133
1	-0.063912	0.861976	-0.104475	0.046192	-0.639399	1.334624	-0.372472	0.305068	-0.142401	-0.568770	-0.064393	-0.721038	-0.962879	-0.504915	1.083583
2	0.075132	1.362941	-0.157348	-0.013163	-0.502795	1.201654	0.090515	0.293555	0.105838	-0.727475	0.298649	-0.660022	-0.962879	-0.504915	1.228033

	CO_GT	PT08.S1_CO	C6H6_GT	PT08.S2_NMHC	NOx_GT	PT08.S3_NOx	NO2_GT	PT08.S4_NO2	PT08.S5_O3	T	RH	AH	Month	Weekday	Hours
6632	2.6	1099.0	10.4	994.0	401.0	715.0	117.0	1164.0	1186.0	6.8	57.8	0.5768	12	6	2
7123	2.2	1149.0	8.4	914.0	382.0	742.0	147.0	1072.0	1242.0	9.5	41.2	0.4908	1	5	13
7599	5.7	1578.0	29.0	1527.0	875.0	419.0	179.0	1761.0	2086.0	7.9	60.0	0.6406	1	4	9

	CO_GT	PT08.S1_CO	C6H6_GT	PT08.S2_NMHC	NOx_GT	PT08.S3_NOx	NO2_GT	PT08.S4_NO2	PT08.S5_O3	T	RH	AH	Month	Weekday	Hours
6632	2.6	-0.016989	0.027706	0.190868	0.814462	-0.460478	0.153650	-0.831899	0.378413	-1.305614	0.517626	-1.088374	1.654940	1.494869	-1.372066
7123	2.2	0.210722	-0.236656	-0.105904	0.721766	-0.354884	0.784995	-1.096711	0.514701	-0.999540	-0.438959	-1.300821	-1.544617	0.994923	0.216883
7599	5.7	2.164484	2.486278	2.168113	3.126978	-1.618104	1.458431	0.886505	2.568755	-1.180917	0.644402	-0.930768	-1.544617	0.494977	-0.360917

THE DATA SCIENCE LIBRARY

Wojciech Moszczyński

Data standardization – Tensorflow linear regression model

Standardization of data in Sklearn¶

We substitute standardized data for Tensorflow

Tensorflow linear regression model without standardization

Step 1. Divides the data into a test and training set</span>

Step 2. Converts data to Tensorflow format¶

Step 3. Tensorflow Linear Regression Estimator

Step 4. Defining how to feed the model and what is the result variable

Step 5. Training the model

Step 6. Model assessment

Calculation of R2

Tensorflow linear regression model with standardization

WE STANDARDIZE INDEPENDENT VARIABLES ONLY BECAUSE WE WANT TO HAVE EASY TO READ THE RESULT IN THE LINEAR REGRESSION MODEL

Step 1. Divides the data into a test and training set

Step 2. Converts data to Tensorflow format

Step 3. Tensorflow Linear Regression Estimator

Step 4. Defining how to feed the model and what is the result variable

Step 5. Training the model

Step 6. Model assessment

Calculation of R2