R Square - THE DATA SCIENCE LIBRARY http://sigmaquality.pl/tag/r-square/ Wojciech Moszczyński Tue, 03 Dec 2019 19:49:00 +0000 pl-PL hourly 1 https://wordpress.org/?v=6.8.3 https://sigmaquality.pl/wp-content/uploads/2019/02/cropped-ryba-32x32.png R Square - THE DATA SCIENCE LIBRARY http://sigmaquality.pl/tag/r-square/ 32 32 Tensorflow – Calculation of R square for linear regression https://sigmaquality.pl/tensorflow-3/tensorflow-calculation-of-r-square-for-linear-regression/ Tue, 03 Dec 2019 19:49:00 +0000 http://sigmaquality.pl/tensorflow-calculation-of-r-square-for-linear-regression/ Parking Birmingham occupancy Source of data: https://archive.ics.uci.edu/ml/datasets/Parking+Birmingham In [35]: import pandas as pd df = pd.read_csv('c:/TF/ParkingBirmingham.csv') df.head(3) Out[35]: SystemCodeNumber Capacity Occupancy LastUpdated 0 BHMBCCMKT01 577 61 [...]

Artykuł Tensorflow – Calculation of R square for linear regression pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]>
Parking Birmingham occupancy

In [35]:
import pandas as pd

df = pd.read_csv('c:/TF/ParkingBirmingham.csv')
df.head(3)
Out[35]:
SystemCodeNumber Capacity Occupancy LastUpdated
0 BHMBCCMKT01 577 61 2016-10-04 07:59:42
1 BHMBCCMKT01 577 64 2016-10-04 08:25:42
2 BHMBCCMKT01 577 80 2016-10-04 08:59:42
In [2]:
df.LastUpdated = pd.to_datetime(df.LastUpdated)
df.dtypes
Out[2]:
SystemCodeNumber            object
Capacity                     int64
Occupancy                    int64
LastUpdated         datetime64[ns]
dtype: object
In [3]:
df['month'] = df.LastUpdated.dt.month
df['hour'] = df.LastUpdated.dt.hour
df['weekday_name'] = df.LastUpdated.dt.weekday_name
df['weekday'] = df.LastUpdated.dt.weekday
In [4]:
df.head(4)
Out[4]:
SystemCodeNumber Capacity Occupancy LastUpdated month hour weekday_name weekday
0 BHMBCCMKT01 577 61 2016-10-04 07:59:42 10 7 Tuesday 1
1 BHMBCCMKT01 577 64 2016-10-04 08:25:42 10 8 Tuesday 1
2 BHMBCCMKT01 577 80 2016-10-04 08:59:42 10 8 Tuesday 1
3 BHMBCCMKT01 577 107 2016-10-04 09:32:46 10 9 Tuesday 1
In [5]:
df = df.loc[df['SystemCodeNumber']=='BHMMBMMBX01'] 
df.shape
Out[5]:
(1312, 8)
In [6]:
import tensorflow as tf
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:493: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:494: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:495: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:496: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:497: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:502: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])

Step 1: Convert Data

We convert numeric variables in the correct Tensorflow format. Tensorflow provides a continuous variable conversion method: tf.feature_column.numeric_column ().

In [7]:
FEATURES = ['month', 'hour', 'weekday'] 
LABEL = 'Occupancy'
In [8]:
PKS = [tf.feature_column.numeric_column(k) for k in FEATURES] 
PKS
Out[8]:
[_NumericColumn(key='month', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='hour', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='weekday', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

Step 2: Defining the estimator

Tensorflow will automatically create a file called „ABC” in your working directory. You must use this path to access Tensorboard. The estimator applies to independent variables.

In [9]:
estimator = tf.estimator.LinearRegressor( feature_columns=PKS, model_dir="ABC")
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'ABC', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000147BB11B940>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

To instruct Tensorflow how to feed the model, you can use pandas_input_fn. This object needs 5 parameters: x: function data y: label data batch_size: batch. Default 128 num_epoch: by default number of epochs 1 random: Random or not data. Default None

In [10]:
def get_input_fn(data_set, num_epochs=None, n_batch = 128, shuffle=True): 
    return tf.estimator.inputs.pandas_input_fn( x=pd.DataFrame({k: data_set[k].values for k in FEATURES}),
                                               y = pd.Series(data_set[LABEL].values), batch_size=n_batch, num_epochs=num_epochs, shuffle=shuffle)

Step 3: Model training

  • To feed the model you can use the function created above: get_input_fn.
  • Then you instruct the model to iterate 1000 times.
  • Remember that you do not specify the number of epochs (num_epochs).
  • It is better to set the number of epochs to none and define the number of iterations.

To test the model, we must divide the data set into a test set and a training set.

In [11]:
df_train=df.sample(frac=0.8,random_state=200) 
df_test=df.drop(df_train.index) 
print(df_train.shape, df_test.shape)
(1050, 8) (262, 8)
In [12]:
estimator.train(input_fn=get_input_fn(df_train, num_epochs=None, n_batch = 128, shuffle=False), steps=1000)
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-20000
INFO:tensorflow:Saving checkpoints for 20001 into ABCmodel.ckpt.
INFO:tensorflow:loss = 1604473.0, step = 20001
INFO:tensorflow:global_step/sec: 524.813
INFO:tensorflow:loss = 1890832.8, step = 20101 (0.191 sec)
INFO:tensorflow:global_step/sec: 595.828
INFO:tensorflow:loss = 1691072.0, step = 20201 (0.183 sec)
INFO:tensorflow:global_step/sec: 581.214
INFO:tensorflow:loss = 1660972.2, step = 20301 (0.172 sec)
INFO:tensorflow:global_step/sec: 577.628
INFO:tensorflow:loss = 1830299.8, step = 20401 (0.158 sec)
INFO:tensorflow:global_step/sec: 591.553
INFO:tensorflow:loss = 1564311.5, step = 20501 (0.169 sec)
INFO:tensorflow:global_step/sec: 659.048
INFO:tensorflow:loss = 1851407.0, step = 20601 (0.167 sec)
INFO:tensorflow:global_step/sec: 565.153
INFO:tensorflow:loss = 1717692.1, step = 20701 (0.161 sec)
INFO:tensorflow:global_step/sec: 597.055
INFO:tensorflow:loss = 1668234.1, step = 20801 (0.167 sec)
INFO:tensorflow:global_step/sec: 597.223
INFO:tensorflow:loss = 1785292.5, step = 20901 (0.167 sec)
INFO:tensorflow:Saving checkpoints for 21000 into ABCmodel.ckpt.
INFO:tensorflow:Loss for final step: 1761262.6.
Out[12]:
<tensorflow.python.estimator.canned.linear.LinearRegressor at 0x147bb11bd30>

Step 4. Model evaluation

To enter a test set, use the following code:

In [13]:
ev = estimator.evaluate( input_fn=get_input_fn(df_test, num_epochs=1, n_batch = 128, shuffle=False))
INFO:tensorflow:Starting evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000
INFO:tensorflow:Finished evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Saving dict for global step 21000: average_loss = 12334.496, global_step = 21000, loss = 1077212.6

Step 5. Calculation of R Square

Calculation of R Square parameter using Tensorflow

I make a prediction on a test set

In [14]:
y = estimator.predict(    
         input_fn=get_input_fn(df_test,                          
         num_epochs=1,                          
         n_batch = 256,                          
         shuffle=False))
In [15]:
import itertools

predictions = list(p["predictions"] for p in itertools.islice(y, 1871))
#print("Predictions: {}".format(str(predictions)))
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000
In [16]:
predictions
Out[16]:
[array([319.3249], dtype=float32),
 array([437.01642], dtype=float32),
 array([476.24692], dtype=float32),
 array([495.86215], dtype=float32),

 The model gave us a result string y. I am now processing this result string into a list.
In [17]:
import numpy as np

conc = np.vstack(predictions)
conc
Out[17]:
array([[319.3249 ],
       [437.01642],
       [476.24692],
       [495.86215],
       [326.4933 ],
       [424.56955],
       [444.1848 ],
      
In [18]:
ZHP = pd.DataFrame(conc)
ZHP.rename(columns={0:'y_pred'}, inplace=True)

kot = ZHP['y_pred'].values
kot = kot.astype('float32')
kot.dtype
Out[18]:
dtype('float32')

Now I’m creating a list of real y values from the test set.

In [19]:
y = df_test['Occupancy'].values
y = y.astype('float32')
y.dtype
Out[19]:
dtype('float32')

Now I create a dataframe with y-real and y-predicted variables.

In [20]:
PZU = pd.DataFrame({'y': y, 'y_pred': kot })
PZU.dtypes
Out[20]:
y         float32
y_pred    float32
dtype: object
In [21]:
def R_squared(y, y_pred):
    
  residual = tf.reduce_sum(tf.square(tf.subtract(y,y_pred)))
  total = tf.reduce_sum(tf.square(tf.subtract(y, tf.reduce_mean(y))))
  r2 = tf.subtract(1.0, tf.div(residual, total))
  return r2

To use this function, both variables must have the same data type.

In [22]:
y.dtype
Out[22]:
dtype('float32')
In [23]:
kot.dtype
Out[23]:
dtype('float32')
In [24]:
residual = tf.reduce_sum(tf.square(tf.subtract(y,kot)))
In [25]:
total = tf.reduce_sum(tf.square(tf.subtract(y, tf.reduce_mean(y))))
In [26]:
r2 = tf.subtract(1.0, tf.div(residual, total))
In [27]:
r2
Out[27]:
<tf.Tensor 'Sub_2:0' shape=() dtype=float32>
In [28]:
sess = tf.Session()
a = sess.run(r2)
print('R Square parameter: ',a)
R Square parameter:  0.13424665

Calculation of R Square parameter using Pandas

In [29]:
PZU.head(5)
Out[29]:
y y_pred
0 264.0 319.324890
1 651.0 437.016418
2 572.0 476.246918
3 471.0 495.862152
4 282.0 326.493286
In [30]:
PZU['SSE'] = (PZU['y'] - PZU['y_pred'])**2
PZU.head(3)
Out[30]:
y y_pred SSE
0 264.0 319.324890 3060.843506
1 651.0 437.016418 45788.972656
2 572.0 476.246918 9168.652344

Point 2. We calculate the average empirical value of y

In [31]:
PZU['ave_y'] = PZU['y'].mean()
PZU.head(3)
Out[31]:
y y_pred SSE ave_y
0 264.0 319.324890 3060.843506 463.973297
1 651.0 437.016418 45788.972656 463.973297
2 572.0 476.246918 9168.652344 463.973297

Point 3. We calculate the difference between empirical values y and the average of empirical values y

In [32]:
PZU['SST'] = (PZU['y'] - PZU['ave_y'])**2
PZU.head(3)
Out[32]:
y y_pred SSE ave_y SST
0 264.0 319.324890 3060.843506 463.973297 39989.320312
1 651.0 437.016418 45788.972656 463.973297 34978.988281
2 572.0 476.246918 9168.652344 463.973297 11669.768555

Point 4. We calculate the difference between sum of SST and sum of SSE

In [33]:
Sum_SST = PZU['SST'].sum()
print('Sum_SST :',Sum_SST)
Sum_SSE = PZU['SSE'].sum()
print('Sum_SSE :',Sum_SSE)
SSR = Sum_SST - Sum_SSE
Sum_SST : 3732746.8
Sum_SSE : 3231638.2

Point 5. We calculate the R Square parameter

In [34]:
r2 = SSR/Sum_SST
print('R Square parameter: ',r2)
R Square parameter:  0.13424659
In [ ]:
 

Artykuł Tensorflow – Calculation of R square for linear regression pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]>
Tutorial: Linear Regression – Tensorflow, calculation of R Square (#4/281120191525) https://sigmaquality.pl/tensorflow-3/linear-regression-4/ Thu, 28 Nov 2019 18:26:00 +0000 http://sigmaquality.pl/linear-regression-4/ We continue to learn how to build multiple linear regression models. This time we will build a model using the Tensorflow library. As before, the [...]

Artykuł Tutorial: Linear Regression – Tensorflow, calculation of R Square (#4/281120191525) pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]>

We continue to learn how to build multiple linear regression models. This time we will build a model using the Tensorflow library. As before, the data file: AirQ_filled2.csv comes from previous episodes of this cycle.

In [1]:
import tensorflow as tf
import pandas as pd

df = pd.read_csv('c:/TF/AirQ_filled2.csv', usecols=['CO(GT)','PT08.S1(CO)','C6H6(GT)','PT08.S2(NMHC)','NOx(GT)','PT08.S3(NOx)','NO2(GT)','PT08.S4(NO2)','PT08.S5(O3)','T','RH', 'AH'
        ,'Month','Weekday','Hours'])
df.head(3)

Out[1]:
CO(GT) PT08.S1(CO) C6H6(GT) PT08.S2(NMHC) NOx(GT) PT08.S3(NOx) NO2(GT) PT08.S4(NO2) PT08.S5(O3) T RH AH Month Weekday Hours
0 2.6 1360.0 11.9 1046.0 166.0 1056.0 113.0 1692.0 1268.0 13.6 48.9 0.7578 3 2 18
1 2.0 1292.0 9.4 955.0 103.0 1174.0 92.0 1559.0 972.0 13.3 47.7 0.7255 3 2 19
2 2.2 1402.0 9.0 939.0 131.0 1140.0 114.0 1555.0 1074.0 11.9 54.0 0.7502 3 2 20

Step 1: Convert Data

We convert numeric variables in the correct Tensorflow format. Tensorflow provides a continuous variable conversion method: tf.feature_column.numeric_column ().

Separation of a column into an independent variable and a dependent variable.

In [2]:
df.columns
Out[2]:
Index(['CO(GT)', 'PT08.S1(CO)', 'C6H6(GT)', 'PT08.S2(NMHC)', 'NOx(GT)',
       'PT08.S3(NOx)', 'NO2(GT)', 'PT08.S4(NO2)', 'PT08.S5(O3)', 'T', 'RH',
       'AH', 'Month', 'Weekday', 'Hours'],
      dtype='object')
In [3]:
df.columns = ['CO_GT', 'PT08.S1_CO', 'C6H6_GT', 'PT08.S2_NMHC',
       'NOx_GT', 'PT08.S3_NOx', 'NO2_GT', 'PT08.S4_NO2', 'PT08.S5_O3',
       'T', 'RH', 'AH', 'Month', 'Weekday', 'Hours']
In [4]:
df.dtypes
Out[4]:
CO_GT           float64
PT08.S1_CO      float64
C6H6_GT         float64
PT08.S2_NMHC    float64
NOx_GT          float64
PT08.S3_NOx     float64
NO2_GT          float64
PT08.S4_NO2     float64
PT08.S5_O3      float64
T               float64
RH              float64
AH              float64
Month             int64
Weekday           int64
Hours             int64
dtype: object
In [5]:
FEATURES = ['PT08.S1_CO', 'C6H6_GT', 'PT08.S2_NMHC',
       'NOx_GT', 'PT08.S3_NOx', 'NO2_GT', 'PT08.S4_NO2', 'PT08.S5_O3',
       'T', 'RH', 'AH', 'Month', 'Weekday', 'Hours']
LABEL = 'CO_GT'
In [6]:
PKS = [tf.feature_column.numeric_column(k) for k in FEATURES]
PKS
Out[6]:
[_NumericColumn(key='PT08.S1_CO', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='C6H6_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S2_NMHC', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NOx_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S3_NOx', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NO2_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S4_NO2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S5_O3', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='T', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='RH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='AH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Month', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Weekday', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Hours', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

Step 2: Defining the estimator

Tensorflow will automatically create a file called „Air” in your working directory. You must use this path to access Tensorboard. The estimator applies to independent variables.

In [7]:
estimator = tf.estimator.LinearRegressor(    
        feature_columns=PKS,   
        model_dir="Air")
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'Air', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x0000017E850F7CC0>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

To instruct Tensorflow how to feed the model, you can use pandas_input_fn. This object needs 5 parameters: x: function data y: label data batch_size: batch. Default 128 num_epoch: by default number of epochs 1 random: Random or not data. Default None

In [8]:
def get_input_fn(data_set, num_epochs=None, n_batch = 128, shuffle=True):    
         return tf.estimator.inputs.pandas_input_fn(       
         x=pd.DataFrame({k: data_set[k].values for k in FEATURES}),       
         y = pd.Series(data_set[LABEL].values),       
         batch_size=n_batch,          
         num_epochs=num_epochs,       
         shuffle=shuffle)

Step 3: Model training

- To feed the model you can use the function created above: get_input_fn.
- Then you instruct the model to iterate 1000 times.
- Remember that you do not specify the number of epochs (num_epochs).
- It is better to set the number of epochs to none and define the number of iterations.

To test the model, we must divide the data set into a test set and a training set.

In [9]:
df_train=df.sample(frac=0.8,random_state=200)
df_test=df.drop(df_train.index)
print(df_train.shape, df_test.shape)
(7486, 15) (1871, 15)
In [10]:
estimator.train(input_fn=get_input_fn(df_train,                                       
                                           num_epochs=None,                                      
                                           n_batch = 128,                                      
                                           shuffle=False),                                      
                                           steps=1000)
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from Airmodel.ckpt-10000
INFO:tensorflow:Saving checkpoints for 10001 into Airmodel.ckpt.
INFO:tensorflow:loss = 27.90989, step = 10001
INFO:tensorflow:global_step/sec: 231.067
INFO:tensorflow:loss = 19.266008, step = 10101 (0.443 sec)
INFO:tensorflow:global_step/sec: 250.047
INFO:tensorflow:loss = 21.174185, step = 10201 (0.389 sec)
INFO:tensorflow:global_step/sec: 244.378
INFO:tensorflow:loss = 26.823406, step = 10301 (0.409 sec)
INFO:tensorflow:global_step/sec: 263.037
INFO:tensorflow:loss = 16.690845, step = 10401 (0.380 sec)
INFO:tensorflow:global_step/sec: 250.698
INFO:tensorflow:loss = 24.08421, step = 10501 (0.399 sec)
INFO:tensorflow:global_step/sec: 254.447
INFO:tensorflow:loss = 16.630123, step = 10601 (0.406 sec)
INFO:tensorflow:global_step/sec: 248.812
INFO:tensorflow:loss = 25.998842, step = 10701 (0.389 sec)
INFO:tensorflow:global_step/sec: 269.371
INFO:tensorflow:loss = 31.432064, step = 10801 (0.387 sec)
INFO:tensorflow:global_step/sec: 255.634
INFO:tensorflow:loss = 22.70269, step = 10901 (0.391 sec)
INFO:tensorflow:Saving checkpoints for 11000 into Airmodel.ckpt.
INFO:tensorflow:Loss for final step: 24.21025.
Out[10]:
<tensorflow.python.estimator.canned.linear.LinearRegressor at 0x17e850f7828>

Step 4. Model evaluation

To enter a test set, use the following code:

In [11]:
ev = estimator.evaluate(    
          input_fn=get_input_fn(df_test,                          
          num_epochs=1,                          
          n_batch = 356,                          
          shuffle=False))
INFO:tensorflow:Starting evaluation at 2019-11-28-13:40:17
INFO:tensorflow:Restoring parameters from Airmodel.ckpt-11000
INFO:tensorflow:Finished evaluation at 2019-11-28-13:40:17
INFO:tensorflow:Saving dict for global step 11000: average_loss = 0.18934268, global_step = 11000, loss = 59.04336

Print the loss using by the code below:

In [12]:
loss_score = ev["loss"]
print("Loss: {0:f}".format(loss_score))	
Loss: 59.043362

Calculation of R Square parameter using Tensorflow

I make a prediction on a test set

In [13]:
y = estimator.predict(    
         input_fn=get_input_fn(df_test,                          
         num_epochs=1,                          
         n_batch = 256,                          
         shuffle=False))
In [14]:
import itertools

predictions = list(p["predictions"] for p in itertools.islice(y, 1871))
#print("Predictions: {}".format(str(predictions)))
INFO:tensorflow:Restoring parameters from Airmodel.ckpt-11000
In [15]:
predictions
Out[15]:
[array([2.2904341], dtype=float32),
 array([1.4195127], dtype=float32),
 array([0.9917113], dtype=float32),
 array([1.4134599], dtype=float32),
 array([1.2086823], dtype=float32),
 array([1.4521222], dtype=float32),
 ...]

The model gave us a result string y. I am now processing this result string into a list.

In [16]:
import numpy as np

conc = np.vstack(predictions)
conc
Out[16]:
array([[2.2904341],
       [1.4195127],
       [0.9917113],
       ...,
       [1.2040666],
       [0.4435346],
       [3.111309 ]], dtype=float32)
In [48]:
ZHP = pd.DataFrame(conc)
ZHP.rename(columns={0:'y_pred'}, inplace=True)

kot = ZHP['y_pred'].values
kot = kot.astype('float32')
kot.dtype
Out[48]:
dtype('float32')

Now I’m creating a list of real y values from the test set.

In [50]:
y = df_test['CO_GT'].values
y = y.astype('float32')
y.dtype
Out[50]:
dtype('float32')

Now I create a dataframe with y-real and y-predicted variables.

In [47]:
PZU = pd.DataFrame({'y': y, 'y_pred': kot })
PZU.dtypes
Out[47]:
y         float64
y_pred    float64
dtype: object
In [63]:
def R_squared(y, y_pred):
    
  residual = tf.reduce_sum(tf.square(tf.subtract(y,y_pred)))
  total = tf.reduce_sum(tf.square(tf.subtract(y, tf.reduce_mean(y))))
  r2 = tf.subtract(1.0, tf.div(residual, total))
  return r2

To use this function, both variables must have the same data type.

In [51]:
y.dtype
Out[51]:
dtype('float32')
In [52]:
kot.dtype
Out[52]:
dtype('float32')
In [65]:
residual = tf.reduce_sum(tf.square(tf.subtract(y,kot)))
In [66]:
total = tf.reduce_sum(tf.square(tf.subtract(y, tf.reduce_mean(y))))
In [67]:
r2 = tf.subtract(1.0, tf.div(residual, total))
In [68]:
r2
Out[68]:
<tf.Tensor 'Sub_27:0' shape=() dtype=float32>
In [77]:
sess = tf.Session()
a = sess.run(r2)
print('R Square parameter: ',a)
R Square parameter:  0.90320766

Calculation of R Square parameter using Pandas

In [78]:
PZU.head(5)
Out[78]:
y y_pred
0 2.2 2.290434
1 1.2 1.419513
2 1.0 0.991711
3 1.5 1.413460
4 1.6 1.471673
In [80]:
PZU['SSE'] = (PZU['y'] - PZU['y_pred'])**2
PZU.head(3)
Out[80]:
y y_pred SSE
0 2.2 2.290434 0.008178
1 1.2 1.419513 0.048186
2 1.0 0.991711 0.000069

Point 2. We calculate the average empirical value of y

In [81]:
PZU['ave_y'] = PZU['y'].mean()
PZU.head(3)
Out[81]:
y y_pred SSE ave_y
0 2.2 2.290434 0.008178 2.061304
1 1.2 1.419513 0.048186 2.061304
2 1.0 0.991711 0.000069 2.061304

Point 3. We calculate the difference between empirical values y and the average of empirical values y

In [83]:
PZU['SST'] = (PZU['y'] - PZU['ave_y'])**2
PZU.head(3)
Out[83]:
y y_pred SSE ave_y SST
0 2.2 2.290434 0.008178 2.061304 0.019237
1 1.2 1.419513 0.048186 2.061304 0.741845
2 1.0 0.991711 0.000069 2.061304 1.126366

Point 4. We calculate the difference between sum of SST and sum of SSE

In [84]:
Sum_SST = PZU['SST'].sum()
print('Sum_SST :',Sum_SST)
Sum_SSE = PZU['SSE'].sum()
print('Sum_SSE :',Sum_SSE)
SSR = Sum_SST - Sum_SSE
Sum_SST : 3659.9984179583107
Sum_SSE : 354.26016629427124

Point 5. We calculate the R Square parameter

In [85]:
r2 = SSR/Sum_SST
print('R Square parameter: ',r2)
R Square parameter:  0.903207562998923

Artykuł Tutorial: Linear Regression – Tensorflow, calculation of R Square (#4/281120191525) pochodzi z serwisu THE DATA SCIENCE LIBRARY.

]]>