Source of data: https://archive.ics.uci.edu/ml/datasets/Parking+Birmingham

import pandas as pd

df = pd.read_csv('c:/TF/ParkingBirmingham.csv')
df.head(3)

df.LastUpdated = pd.to_datetime(df.LastUpdated)
df.dtypes

SystemCodeNumber            object
Capacity                     int64
Occupancy                    int64
LastUpdated         datetime64[ns]
dtype: object

df['month'] = df.LastUpdated.dt.month
df['hour'] = df.LastUpdated.dt.hour
df['weekday_name'] = df.LastUpdated.dt.weekday_name
df['weekday'] = df.LastUpdated.dt.weekday

df.head(4)

df = df.loc[df['SystemCodeNumber']=='BHMMBMMBX01'] 
df.shape

(1312, 8)

import tensorflow as tf

C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:493: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:494: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:495: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:496: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:497: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:502: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])

[_NumericColumn(key='month', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='hour', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='weekday', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'ABC', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000147BB11B940>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

(1050, 8) (262, 8)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-20000
INFO:tensorflow:Saving checkpoints for 20001 into ABCmodel.ckpt.
INFO:tensorflow:loss = 1604473.0, step = 20001
INFO:tensorflow:global_step/sec: 524.813
INFO:tensorflow:loss = 1890832.8, step = 20101 (0.191 sec)
INFO:tensorflow:global_step/sec: 595.828
INFO:tensorflow:loss = 1691072.0, step = 20201 (0.183 sec)
INFO:tensorflow:global_step/sec: 581.214
INFO:tensorflow:loss = 1660972.2, step = 20301 (0.172 sec)
INFO:tensorflow:global_step/sec: 577.628
INFO:tensorflow:loss = 1830299.8, step = 20401 (0.158 sec)
INFO:tensorflow:global_step/sec: 591.553
INFO:tensorflow:loss = 1564311.5, step = 20501 (0.169 sec)
INFO:tensorflow:global_step/sec: 659.048
INFO:tensorflow:loss = 1851407.0, step = 20601 (0.167 sec)
INFO:tensorflow:global_step/sec: 565.153
INFO:tensorflow:loss = 1717692.1, step = 20701 (0.161 sec)
INFO:tensorflow:global_step/sec: 597.055
INFO:tensorflow:loss = 1668234.1, step = 20801 (0.167 sec)
INFO:tensorflow:global_step/sec: 597.223
INFO:tensorflow:loss = 1785292.5, step = 20901 (0.167 sec)
INFO:tensorflow:Saving checkpoints for 21000 into ABCmodel.ckpt.
INFO:tensorflow:Loss for final step: 1761262.6.

<tensorflow.python.estimator.canned.linear.LinearRegressor at 0x147bb11bd30>

INFO:tensorflow:Starting evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000
INFO:tensorflow:Finished evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Saving dict for global step 21000: average_loss = 12334.496, global_step = 21000, loss = 1077212.6

INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000

[array([319.3249], dtype=float32),
 array([437.01642], dtype=float32),
 array([476.24692], dtype=float32),
 array([495.86215], dtype=float32),

 The model gave us a result string y. I am now processing this result string into a list.

array([[319.3249 ],
       [437.01642],
       [476.24692],
       [495.86215],
       [326.4933 ],
       [424.56955],
       [444.1848 ],

Table of Contents

Step 1: Convert Data

We convert numeric variables in the correct Tensorflow format. Tensorflow provides a continuous variable conversion method: tf.feature_column.numeric_column ().

FEATURES = ['month', 'hour', 'weekday'] 
LABEL = 'Occupancy'

PKS = [tf.feature_column.numeric_column(k) for k in FEATURES] 
PKS

[_NumericColumn(key='month', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='hour', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='weekday', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

Step 2: Defining the estimator

Tensorflow will automatically create a file called „ABC” in your working directory. You must use this path to access Tensorboard. The estimator applies to independent variables.

estimator = tf.estimator.LinearRegressor( feature_columns=PKS, model_dir="ABC")

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'ABC', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000147BB11B940>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

(1050, 8) (262, 8)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-20000
INFO:tensorflow:Saving checkpoints for 20001 into ABCmodel.ckpt.
INFO:tensorflow:loss = 1604473.0, step = 20001
INFO:tensorflow:global_step/sec: 524.813
INFO:tensorflow:loss = 1890832.8, step = 20101 (0.191 sec)
INFO:tensorflow:global_step/sec: 595.828
INFO:tensorflow:loss = 1691072.0, step = 20201 (0.183 sec)
INFO:tensorflow:global_step/sec: 581.214
INFO:tensorflow:loss = 1660972.2, step = 20301 (0.172 sec)
INFO:tensorflow:global_step/sec: 577.628
INFO:tensorflow:loss = 1830299.8, step = 20401 (0.158 sec)
INFO:tensorflow:global_step/sec: 591.553
INFO:tensorflow:loss = 1564311.5, step = 20501 (0.169 sec)
INFO:tensorflow:global_step/sec: 659.048
INFO:tensorflow:loss = 1851407.0, step = 20601 (0.167 sec)
INFO:tensorflow:global_step/sec: 565.153
INFO:tensorflow:loss = 1717692.1, step = 20701 (0.161 sec)
INFO:tensorflow:global_step/sec: 597.055
INFO:tensorflow:loss = 1668234.1, step = 20801 (0.167 sec)
INFO:tensorflow:global_step/sec: 597.223
INFO:tensorflow:loss = 1785292.5, step = 20901 (0.167 sec)
INFO:tensorflow:Saving checkpoints for 21000 into ABCmodel.ckpt.
INFO:tensorflow:Loss for final step: 1761262.6.

<tensorflow.python.estimator.canned.linear.LinearRegressor at 0x147bb11bd30>

INFO:tensorflow:Starting evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000
INFO:tensorflow:Finished evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Saving dict for global step 21000: average_loss = 12334.496, global_step = 21000, loss = 1077212.6

INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000

[array([319.3249], dtype=float32),
 array([437.01642], dtype=float32),
 array([476.24692], dtype=float32),
 array([495.86215], dtype=float32),

 The model gave us a result string y. I am now processing this result string into a list.

array([[319.3249 ],
       [437.01642],
       [476.24692],
       [495.86215],
       [326.4933 ],
       [424.56955],
       [444.1848 ],

dtype('float32')

dtype('float32')

To instruct Tensorflow how to feed the model, you can use pandas_input_fn. This object needs 5 parameters: x: function data y: label data batch_size: batch. Default 128 num_epoch: by default number of epochs 1 random: Random or not data. Default None

def get_input_fn(data_set, num_epochs=None, n_batch = 128, shuffle=True): 
    return tf.estimator.inputs.pandas_input_fn( x=pd.DataFrame({k: data_set[k].values for k in FEATURES}),
                                               y = pd.Series(data_set[LABEL].values), batch_size=n_batch, num_epochs=num_epochs, shuffle=shuffle)

Step 3: Model training

To feed the model you can use the function created above: get_input_fn.
Then you instruct the model to iterate 1000 times.
Remember that you do not specify the number of epochs (num_epochs).
It is better to set the number of epochs to none and define the number of iterations.

To test the model, we must divide the data set into a test set and a training set.

df_train=df.sample(frac=0.8,random_state=200) 
df_test=df.drop(df_train.index) 
print(df_train.shape, df_test.shape)

(1050, 8) (262, 8)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-20000
INFO:tensorflow:Saving checkpoints for 20001 into ABCmodel.ckpt.
INFO:tensorflow:loss = 1604473.0, step = 20001
INFO:tensorflow:global_step/sec: 524.813
INFO:tensorflow:loss = 1890832.8, step = 20101 (0.191 sec)
INFO:tensorflow:global_step/sec: 595.828
INFO:tensorflow:loss = 1691072.0, step = 20201 (0.183 sec)
INFO:tensorflow:global_step/sec: 581.214
INFO:tensorflow:loss = 1660972.2, step = 20301 (0.172 sec)
INFO:tensorflow:global_step/sec: 577.628
INFO:tensorflow:loss = 1830299.8, step = 20401 (0.158 sec)
INFO:tensorflow:global_step/sec: 591.553
INFO:tensorflow:loss = 1564311.5, step = 20501 (0.169 sec)
INFO:tensorflow:global_step/sec: 659.048
INFO:tensorflow:loss = 1851407.0, step = 20601 (0.167 sec)
INFO:tensorflow:global_step/sec: 565.153
INFO:tensorflow:loss = 1717692.1, step = 20701 (0.161 sec)
INFO:tensorflow:global_step/sec: 597.055
INFO:tensorflow:loss = 1668234.1, step = 20801 (0.167 sec)
INFO:tensorflow:global_step/sec: 597.223
INFO:tensorflow:loss = 1785292.5, step = 20901 (0.167 sec)
INFO:tensorflow:Saving checkpoints for 21000 into ABCmodel.ckpt.
INFO:tensorflow:Loss for final step: 1761262.6.

<tensorflow.python.estimator.canned.linear.LinearRegressor at 0x147bb11bd30>

INFO:tensorflow:Starting evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000
INFO:tensorflow:Finished evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Saving dict for global step 21000: average_loss = 12334.496, global_step = 21000, loss = 1077212.6

INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000

[array([319.3249], dtype=float32),
 array([437.01642], dtype=float32),
 array([476.24692], dtype=float32),
 array([495.86215], dtype=float32),

 The model gave us a result string y. I am now processing this result string into a list.

array([[319.3249 ],
       [437.01642],
       [476.24692],
       [495.86215],
       [326.4933 ],
       [424.56955],
       [444.1848 ],

dtype('float32')

dtype('float32')

y         float32
y_pred    float32
dtype: object

estimator.train(input_fn=get_input_fn(df_train, num_epochs=None, n_batch = 128, shuffle=False), steps=1000)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-20000
INFO:tensorflow:Saving checkpoints for 20001 into ABCmodel.ckpt.
INFO:tensorflow:loss = 1604473.0, step = 20001
INFO:tensorflow:global_step/sec: 524.813
INFO:tensorflow:loss = 1890832.8, step = 20101 (0.191 sec)
INFO:tensorflow:global_step/sec: 595.828
INFO:tensorflow:loss = 1691072.0, step = 20201 (0.183 sec)
INFO:tensorflow:global_step/sec: 581.214
INFO:tensorflow:loss = 1660972.2, step = 20301 (0.172 sec)
INFO:tensorflow:global_step/sec: 577.628
INFO:tensorflow:loss = 1830299.8, step = 20401 (0.158 sec)
INFO:tensorflow:global_step/sec: 591.553
INFO:tensorflow:loss = 1564311.5, step = 20501 (0.169 sec)
INFO:tensorflow:global_step/sec: 659.048
INFO:tensorflow:loss = 1851407.0, step = 20601 (0.167 sec)
INFO:tensorflow:global_step/sec: 565.153
INFO:tensorflow:loss = 1717692.1, step = 20701 (0.161 sec)
INFO:tensorflow:global_step/sec: 597.055
INFO:tensorflow:loss = 1668234.1, step = 20801 (0.167 sec)
INFO:tensorflow:global_step/sec: 597.223
INFO:tensorflow:loss = 1785292.5, step = 20901 (0.167 sec)
INFO:tensorflow:Saving checkpoints for 21000 into ABCmodel.ckpt.
INFO:tensorflow:Loss for final step: 1761262.6.

<tensorflow.python.estimator.canned.linear.LinearRegressor at 0x147bb11bd30>

INFO:tensorflow:Starting evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000
INFO:tensorflow:Finished evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Saving dict for global step 21000: average_loss = 12334.496, global_step = 21000, loss = 1077212.6

INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000

[array([319.3249], dtype=float32),
 array([437.01642], dtype=float32),
 array([476.24692], dtype=float32),
 array([495.86215], dtype=float32),

 The model gave us a result string y. I am now processing this result string into a list.

array([[319.3249 ],
       [437.01642],
       [476.24692],
       [495.86215],
       [326.4933 ],
       [424.56955],
       [444.1848 ],

dtype('float32')

dtype('float32')

y         float32
y_pred    float32
dtype: object

dtype('float32')

Step 4. Model evaluation

To enter a test set, use the following code:

ev = estimator.evaluate( input_fn=get_input_fn(df_test, num_epochs=1, n_batch = 128, shuffle=False))

INFO:tensorflow:Starting evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000
INFO:tensorflow:Finished evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Saving dict for global step 21000: average_loss = 12334.496, global_step = 21000, loss = 1077212.6

INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000

[array([319.3249], dtype=float32),
 array([437.01642], dtype=float32),
 array([476.24692], dtype=float32),
 array([495.86215], dtype=float32),

 The model gave us a result string y. I am now processing this result string into a list.

array([[319.3249 ],
       [437.01642],
       [476.24692],
       [495.86215],
       [326.4933 ],
       [424.56955],
       [444.1848 ],

dtype('float32')

dtype('float32')

y         float32
y_pred    float32
dtype: object

dtype('float32')

dtype('float32')

<tf.Tensor 'Sub_2:0' shape=() dtype=float32>

Step 5. Calculation of R Square

Calculation of R Square parameter using Tensorflow

I make a prediction on a test set

y = estimator.predict(    
         input_fn=get_input_fn(df_test,                          
         num_epochs=1,                          
         n_batch = 256,                          
         shuffle=False))

import itertools

predictions = list(p["predictions"] for p in itertools.islice(y, 1871))
#print("Predictions: {}".format(str(predictions)))

INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000

[array([319.3249], dtype=float32),
 array([437.01642], dtype=float32),
 array([476.24692], dtype=float32),
 array([495.86215], dtype=float32),

 The model gave us a result string y. I am now processing this result string into a list.

array([[319.3249 ],
       [437.01642],
       [476.24692],
       [495.86215],
       [326.4933 ],
       [424.56955],
       [444.1848 ],

dtype('float32')

dtype('float32')

y         float32
y_pred    float32
dtype: object

dtype('float32')

dtype('float32')

<tf.Tensor 'Sub_2:0' shape=() dtype=float32>

R Square parameter:  0.13424665

predictions

[array([319.3249], dtype=float32),
 array([437.01642], dtype=float32),
 array([476.24692], dtype=float32),
 array([495.86215], dtype=float32),

 The model gave us a result string y. I am now processing this result string into a list.

import numpy as np

conc = np.vstack(predictions)
conc

array([[319.3249 ],
       [437.01642],
       [476.24692],
       [495.86215],
       [326.4933 ],
       [424.56955],
       [444.1848 ],

ZHP = pd.DataFrame(conc)
ZHP.rename(columns={0:'y_pred'}, inplace=True)

kot = ZHP['y_pred'].values
kot = kot.astype('float32')
kot.dtype

dtype('float32')

Now I’m creating a list of real y values from the test set.

y = df_test['Occupancy'].values
y = y.astype('float32')
y.dtype

dtype('float32')

Now I create a dataframe with y-real and y-predicted variables.

PZU = pd.DataFrame({'y': y, 'y_pred': kot })
PZU.dtypes

y         float32
y_pred    float32
dtype: object

https://stackoverflow.com/questions/42351184/how-to-calculate-r2-in-tensorflow

def R_squared(y, y_pred):
    
  residual = tf.reduce_sum(tf.square(tf.subtract(y,y_pred)))
  total = tf.reduce_sum(tf.square(tf.subtract(y, tf.reduce_mean(y))))
  r2 = tf.subtract(1.0, tf.div(residual, total))
  return r2

https://blog.minitab.com/blog/adventures-in-statistics-2/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit

To use this function, both variables must have the same data type.

y.dtype

dtype('float32')

kot.dtype

dtype('float32')

residual = tf.reduce_sum(tf.square(tf.subtract(y,kot)))

total = tf.reduce_sum(tf.square(tf.subtract(y, tf.reduce_mean(y))))

r2 = tf.subtract(1.0, tf.div(residual, total))

r2

<tf.Tensor 'Sub_2:0' shape=() dtype=float32>

sess = tf.Session()
a = sess.run(r2)
print('R Square parameter: ',a)

R Square parameter:  0.13424665

Sum_SST : 3732746.8
Sum_SSE : 3231638.2

R Square parameter:  0.13424659

Calculation of R Square parameter using Pandas

PZU.head(5)

PZU['SSE'] = (PZU['y'] - PZU['y_pred'])**2
PZU.head(3)

Point 2. We calculate the average empirical value of y¶

PZU['ave_y'] = PZU['y'].mean()
PZU.head(3)

Point 3. We calculate the difference between empirical values y and the average of empirical values y¶

PZU['SST'] = (PZU['y'] - PZU['ave_y'])**2
PZU.head(3)

Point 4. We calculate the difference between sum of SST and sum of SSE

Sum_SST = PZU['SST'].sum()
print('Sum_SST :',Sum_SST)
Sum_SSE = PZU['SSE'].sum()
print('Sum_SSE :',Sum_SSE)
SSR = Sum_SST - Sum_SSE

Sum_SST : 3732746.8
Sum_SSE : 3231638.2

R Square parameter:  0.13424659

Point 5. We calculate the R Square parameter

r2 = SSR/Sum_SST
print('R Square parameter: ',r2)

R Square parameter:  0.13424659

	y	y_pred
0	264.0	319.324890
1	651.0	437.016418
2	572.0	476.246918
3	471.0	495.862152
4	282.0	326.493286

	y	y_pred	SSE
0	264.0	319.324890	3060.843506
1	651.0	437.016418	45788.972656
2	572.0	476.246918	9168.652344

	y	y_pred	SSE	ave_y
0	264.0	319.324890	3060.843506	463.973297
1	651.0	437.016418	45788.972656	463.973297
2	572.0	476.246918	9168.652344	463.973297

	y	y_pred	SSE	ave_y	SST
0	264.0	319.324890	3060.843506	463.973297	39989.320312
1	651.0	437.016418	45788.972656	463.973297	34978.988281
2	572.0	476.246918	9168.652344	463.973297	11669.768555

THE DATA SCIENCE LIBRARY

Wojciech Moszczyński

Tensorflow – Calculation of R square for linear regression