EN121220190807

Practice makes perfect

import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt

Tensor("twelve:0", shape=(), dtype=int16)

Tensor("twelve_1:0", shape=(), dtype=int16)

Tensor("div:0", shape=(), dtype=int16)

3

3

16

16

25

13.799999

Exercise 1: Perform the following equation on the tensor.

R = tf.constant(12, tf.int16, name="twelve") 
print(R)

Tensor("twelve:0", shape=(), dtype=int16)

Tensor("twelve_1:0", shape=(), dtype=int16)

Tensor("div:0", shape=(), dtype=int16)

3

3

16

16

25

13.799999

14.3

F = tf.constant(4, tf.int16, name="twelve") 
print(F)

Tensor("twelve_1:0", shape=(), dtype=int16)

Tensor("div:0", shape=(), dtype=int16)

3

3

16

16

25

13.799999

14.3

0.49377024

KOT = tf.div (R, F)
print(KOT)

Tensor("div:0", shape=(), dtype=int16)

3

3

16

16

25

13.799999

14.3

0.49377024

3.141592653589793

with tf.Session() as sess:    
    result_2 = KOT.eval()
print(result_2)

3

3

16

16

25

13.799999

14.3

0.49377024

3.141592653589793

The x value sought is:  2.5685296

sess = tf.Session()
print(sess.run(KOT))

3

16

16

25

13.799999

14.3

0.49377024

3.141592653589793

The x value sought is:  2.5685296

2.46621207433047

Exercise 2: Perform the following equation on the tensor

F = tf.constant(8) 
R = tf.constant(2) 
FFT = tf.multiply (F, R)

sess = tf.Session()
print(sess.run(FFT))

16

16

25

13.799999

14.3

0.49377024

3.141592653589793

The x value sought is:  2.5685296

2.46621207433047

0.33333334

with tf.Session() as sess:    
    result_2 = FFT.eval()
print(result_2)

16

25

13.799999

14.3

0.49377024

3.141592653589793

The x value sought is:  2.5685296

2.46621207433047

0.33333334

2.466212

Exercise 3: Mathematical operations on tensors

A = tf.constant(5) 
B = tf.constant(2) 
PKO = tf.pow(A, B)

sess = tf.Session()
print(sess.run(PKO))

25

13.799999

14.3

0.49377024

3.141592653589793

The x value sought is:  2.5685296

2.46621207433047

0.33333334

2.466212

2.718281828459045

Exercise 4: Mathematical operations on tensors

You do not need to specify the data type because tensorflow alone guesses what type of data was used in the constant.

A = tf.constant(15.9) 
B = tf.constant(2.1) 
SKO = tf.subtract(A, B)

sess = tf.Session()
print(sess.run(SKO))

13.799999

14.3

0.49377024

3.141592653589793

The x value sought is:  2.5685296

2.46621207433047

0.33333334

2.466212

2.718281828459045

1.2840254166877414

Exercise 5: Perform the following equation on the tensor

Tensorflow does not like to have different data formats.
We create a mathematical formula in Python:

A = tf.constant(1.7, tf.float32) 
B = tf.constant(2.4, tf.float32)
C = tf.constant(15, tf.float32)

SZK = tf.add(tf.subtract(A, B), C)

sess = tf.Session()
print(sess.run(SZK))

14.3

0.49377024

3.141592653589793

The x value sought is:  2.5685296

2.46621207433047

0.33333334

2.466212

2.718281828459045

1.2840254166877414

1.2840254

Exercise 5: Perform the following equation on the tensor

Tensorflow does not like to have different data formats.
We create a mathematical formula in Python:

A = tf.constant(2.07) 
B = tf.constant(1.3)
C = tf.constant(2.7)

SSF = tf.subtract(A, B)
PKO = tf.pow(SSF,C)

with tf.Session() as sess:    
    result_2 = PKO.eval()
print(result_2)

0.49377024

3.141592653589793

The x value sought is:  2.5685296

2.46621207433047

0.33333334

2.466212

2.718281828459045

1.2840254166877414

1.2840254

Exercise 6: Mathematical operations on tensors

We create a mathematical formula in Python:

We can use the math library resources:
https://docs.python.org/3/library/math.html

import math
a = math.pi
a

3.141592653589793

C = tf.constant(2.1) 
D = tf.constant(math.pi)

GG = tf.multiply(C, D)
PKK = tf.sqrt(GG)

with tf.Session() as sess:    
    result_2 = PKK.eval()
print("The x value sought is: ",result_2)

The x value sought is:  2.5685296

2.46621207433047

0.33333334

2.466212

2.718281828459045

1.2840254166877414

1.2840254

Exercise 7: actions on Tensorflow tensors

Please calculate in Tensorflow the value of n:

Let’s check how much it will be:

math.pow(15,(1/3))

2.46621207433047

In Tensorflow, be careful about the fixed format.

C = tf.constant(15, tf.float32) 
A = tf.constant(3, tf.float32)
H = tf.constant(1, tf.float32)
G = tf.div(H,A)

PK = tf.pow(C,G)

with tf.Session() as sess:    
    result_2 = G.eval()
print(result_2)

0.33333334

2.466212

2.718281828459045

1.2840254166877414

1.2840254

with tf.Session() as sess:    
    result_2 = PK.eval()
print(result_2)

2.466212

2.718281828459045

1.2840254166877414

1.2840254

Exercise 8: actions on Tensorflow tensors

Please calculate in Tensorflow the value of n:

We can use the math library resources: https://docs.python.org/3/library/math.html

math.e

2.718281828459045

math.pow(math.e,(1/4))

1.2840254166877414

C = tf.constant(math.e, tf.float32) 
A = tf.constant(4, tf.float32)
H = tf.constant(1, tf.float32)
G = tf.div(H,A)

ZHP = tf.pow(C,G)

with tf.Session() as sess:    
    result_2 = ZHP.eval()
print(result_2)

1.2840254

Exercise 9. Change of the tensor type from float to int

TensorFlow automatically selects the data type when the argument is not specified when creating the tensor.

PKP = tf.constant(3.123456789, tf.float32)
ZNP = tf.cast(PKP, dtype=tf.int32)

print(PKP.dtype)
print(ZNP.dtype)

Exercises on matrix operations in TensorFlow 1.4

Practice makes perfect

import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt

Tensor("Const:0", shape=(3,), dtype=int16)

Tensor("Const_1:0", shape=(2, 2), dtype=int16)

Tensor("Const_2:0", shape=(1, 3, 2), dtype=int16)

TensorShape([Dimension(1), Dimension(3), Dimension(2)])

Tensor("zeros:0", shape=(5,), dtype=float32)

TensorShape([Dimension(5)])

Tensor("ones:0", shape=(4, 4), dtype=float32)

TensorShape([Dimension(4), Dimension(4)])

[[[2 5]
  [4 5]]]

Exercise 1. Please define a tensor: constant = vector [1, 2, 3] in format int16¶

$ begin{bmatrix}
1 \
2 \
3
end{bmatrix}$

vector1 = tf.constant ([1,3,5], tf.int16)
print(vector1)

Tensor("Const:0", shape=(3,), dtype=int16)

Tensor("Const_1:0", shape=(2, 2), dtype=int16)

Tensor("Const_2:0", shape=(1, 3, 2), dtype=int16)

TensorShape([Dimension(1), Dimension(3), Dimension(2)])

Tensor("zeros:0", shape=(5,), dtype=float32)

TensorShape([Dimension(5)])

Tensor("ones:0", shape=(4, 4), dtype=float32)

TensorShape([Dimension(4), Dimension(4)])

[[[2 5]
  [4 5]]]

[[[0 1]
  [0 3]]]

Exercise 2. Please define a tensor: constant = matrix [1, 2, 3, 4] in format int16¶

$begin{bmatrix}
1 & 2 \
3 & 4
end{bmatrix}$

matrix1 = tf.constant ([[1, 2],[3, 4]], tf.int16)
print (matrix1)

Tensor("Const_1:0", shape=(2, 2), dtype=int16)

Tensor("Const_2:0", shape=(1, 3, 2), dtype=int16)

TensorShape([Dimension(1), Dimension(3), Dimension(2)])

Tensor("zeros:0", shape=(5,), dtype=float32)

TensorShape([Dimension(5)])

Tensor("ones:0", shape=(4, 4), dtype=float32)

TensorShape([Dimension(4), Dimension(4)])

[[[2 5]
  [4 5]]]

[[[0 1]
  [0 3]]]

[[ 5 10]
 [15 20]]

Exercise 3. Please define a tensor: constant = matrix [1, 2, 3, 4, 5, 6] in format int16¶

$begin{bmatrix}
1 & 2 \
3 & 4 \
5 & 6
end{bmatrix}$

matrix2 = tf.constant ([[[1, 2],  [3, 4], [5, 6]]], tf.int16)
print (matrix2)

Tensor("Const_2:0", shape=(1, 3, 2), dtype=int16)

TensorShape([Dimension(1), Dimension(3), Dimension(2)])

Tensor("zeros:0", shape=(5,), dtype=float32)

TensorShape([Dimension(5)])

Tensor("ones:0", shape=(4, 4), dtype=float32)

TensorShape([Dimension(4), Dimension(4)])

[[[2 5]
  [4 5]]]

[[[0 1]
  [0 3]]]

[[ 5 10]
 [15 20]]

[[ 5 10]
 [15 20]]

matrix2.shape

TensorShape([Dimension(1), Dimension(3), Dimension(2)])

Exercise 4. Create a tensor in the form of a vector of a specific shape 5, filled with zeros¶

$begin{bmatrix}
0 \
0 \
0 \
0 \
0
end{bmatrix}$

kot = tf.zeros(5)
print(kot)
kot.shape

Tensor("zeros:0", shape=(5,), dtype=float32)

TensorShape([Dimension(5)])

Tensor("ones:0", shape=(4, 4), dtype=float32)

TensorShape([Dimension(4), Dimension(4)])

[[[2 5]
  [4 5]]]

[[[0 1]
  [0 3]]]

[[ 5 10]
 [15 20]]

[[ 5 10]
 [15 20]]

[[33 40]
 [86 66]]

[[57 48 78]
 [59 26 46]
 [10 10 16]]

Exercise 5. Create a tensor in the form of a matrix with a specific 4×4 shape, filled with only ones¶

$begin{bmatrix}
1 & 1 & 1 & 1\
1 & 1 & 1 & 1\
1 & 1 & 1 & 1\
1 & 1 & 1 & 1
end{bmatrix}$

fok = tf.ones ([4, 4])
print(fok)
fok.shape

Tensor("ones:0", shape=(4, 4), dtype=float32)

TensorShape([Dimension(4), Dimension(4)])

[[[2 5]
  [4 5]]]

[[[0 1]
  [0 3]]]

[[ 5 10]
 [15 20]]

[[ 5 10]
 [15 20]]

[[33 40]
 [86 66]]

[[57 48 78]
 [59 26 46]
 [10 10 16]]

[[ 32  63  50]
 [ 46 110 144]
 [ 14  37  30]]

[[1 7]
 [3 2]
 [9 5]]

Exercise 6. Matrix adding¶

$ begin{eqnarray}
nonumber
begin{bmatrix}
2 & 5\
4 & 5\
end{bmatrix}
&=&
begin{bmatrix}
1 & 3\
2 & 4\
end{bmatrix}+
begin{bmatrix}
1 & 2\
2 & 1\
end{bmatrix}
end{eqnarray} $

To add two matrices, their dimensions must be equal. That is, the number of rows and the number of columns of the first and second matrices must be equal.

matrix3 = tf.constant ([[[1, 3], [2, 4]]], tf.int16)
matrix4 = tf.constant ([[[1, 2], [2, 1]]], tf.int16)

PZU = tf.add(matrix3, matrix4)

sess = tf.Session()
print(sess.run(PZU))

[[[2 5]
  [4 5]]]

[[[0 1]
  [0 3]]]

[[ 5 10]
 [15 20]]

[[ 5 10]
 [15 20]]

[[33 40]
 [86 66]]

[[57 48 78]
 [59 26 46]
 [10 10 16]]

[[ 32  63  50]
 [ 46 110 144]
 [ 14  37  30]]

[[1 7]
 [3 2]
 [9 5]]

-2.0

[[1.2 0.  0.  0.  0.  0. ]
 [0.  1.5 0.  0.  0.  0. ]
 [0.  0.  1.  0.  0.  0. ]
 [0.  0.  0.  7.1 0.  0. ]
 [0.  0.  0.  0.  2.  0. ]
 [0.  0.  0.  0.  0.  8.3]]

Exercise 7. Matrix subtraction¶

$ begin{eqnarray}¶

nonumber
begin{bmatrix}
0 & 1\
0 & 3\
end{bmatrix}
&=&
begin{bmatrix}
1 & 3\
2 & 4\
end{bmatrix}-
begin{bmatrix}
1 & 2\
2 & 1\
end{bmatrix}
end{eqnarray} $

To subtract two matrices their dimensions must be equal. That is, the number of rows and the number of columns of the first and second matrices must be equal.

PZU = tf.subtract(matrix3, matrix4)

sess = tf.Session()
print(sess.run(PZU))

[[[0 1]
  [0 3]]]

[[ 5 10]
 [15 20]]

[[ 5 10]
 [15 20]]

[[33 40]
 [86 66]]

[[57 48 78]
 [59 26 46]
 [10 10 16]]

[[ 32  63  50]
 [ 46 110 144]
 [ 14  37  30]]

[[1 7]
 [3 2]
 [9 5]]

-2.0

[[1.2 0.  0.  0.  0.  0. ]
 [0.  1.5 0.  0.  0.  0. ]
 [0.  0.  1.  0.  0.  0. ]
 [0.  0.  0.  7.1 0.  0. ]
 [0.  0.  0.  0.  2.  0. ]
 [0.  0.  0.  0.  0.  8.3]]

[[ 0.05371307  1.4564506  -1.7267214 ]
 [-1.7192566   1.5986782   0.91717476]]

Exercise 8. Multiply the matrix by a number¶

$ begin{eqnarray}¶

nonumber
begin{bmatrix}
5 & 10 \
15 & 20 \
end{bmatrix}
&=&
begin{bmatrix}
1 & 2 \
3 & 4 \
end{bmatrix}*
end{eqnarray}$ 5

matrix1 = tf.constant ([[1, 2],[3, 4]], tf.int16)

FOK = tf.multiply (matrix1, 5)

sess = tf.Session()
print(sess.run(FOK))

[[ 5 10]
 [15 20]]

[[ 5 10]
 [15 20]]

[[33 40]
 [86 66]]

[[57 48 78]
 [59 26 46]
 [10 10 16]]

[[ 32  63  50]
 [ 46 110 144]
 [ 14  37  30]]

[[1 7]
 [3 2]
 [9 5]]

-2.0

[[1.2 0.  0.  0.  0.  0. ]
 [0.  1.5 0.  0.  0.  0. ]
 [0.  0.  1.  0.  0.  0. ]
 [0.  0.  0.  7.1 0.  0. ]
 [0.  0.  0.  0.  2.  0. ]
 [0.  0.  0.  0.  0.  8.3]]

[[ 0.05371307  1.4564506  -1.7267214 ]
 [-1.7192566   1.5986782   0.91717476]]

(2, 3)

Exercise 9. Multiply the matrix by a number TensorFlow¶

$ begin{eqnarray}¶

nonumber
begin{bmatrix}
10 & 35 & 20\
25 & 10 & 5\
30 & 15 & 20\
end{bmatrix}
&=&
begin{bmatrix}
2 & 7 & 4\
5 & 2 & 1\
6 & 3 & 4\
end{bmatrix}*
end{eqnarray}$ 5

matrix5 = tf.constant ([[2,7,4],[5,2,1],[6,3,4]], tf.int16)

ZHP = tf.multiply (matrix1, 5)

sess = tf.Session()
print(sess.run(ZHP))

[[ 5 10]
 [15 20]]

[[33 40]
 [86 66]]

[[57 48 78]
 [59 26 46]
 [10 10 16]]

[[ 32  63  50]
 [ 46 110 144]
 [ 14  37  30]]

[[1 7]
 [3 2]
 [9 5]]

-2.0

[[1.2 0.  0.  0.  0.  0. ]
 [0.  1.5 0.  0.  0.  0. ]
 [0.  0.  1.  0.  0.  0. ]
 [0.  0.  0.  7.1 0.  0. ]
 [0.  0.  0.  0.  2.  0. ]
 [0.  0.  0.  0.  0.  8.3]]

[[ 0.05371307  1.4564506  -1.7267214 ]
 [-1.7192566   1.5986782   0.91717476]]

(2, 3)

[[4 4]
 [4 4]
 [4 4]
 [4 4]]

Exercise 10. Multiply the matrix by a matrix TensorFlow¶

Matrix multiplication is not alternating.

$ begin{eqnarray}¶

nonumber
begin{bmatrix}
33 & 40\
86 & 66\
end{bmatrix}
&=&
begin{bmatrix}
1 & 4 & 6\
8 & 2 & 4\
end{bmatrix}*
begin{bmatrix}
9 & 6\
3 & 7\
2 & 1\
end{bmatrix}
end{eqnarray} $

matrix6 = tf.constant([[1,4,6],[8,2,4]],tf.int32)
matrix7 = tf.constant([[9,6],[3,7],[2,1]],tf.int32)

KPU = tf.matmul(matrix6,matrix7)

sess = tf.Session()
print(sess.run(KPU))

[[33 40]
 [86 66]]

[[57 48 78]
 [59 26 46]
 [10 10 16]]

[[ 32  63  50]
 [ 46 110 144]
 [ 14  37  30]]

[[1 7]
 [3 2]
 [9 5]]

-2.0

[[1.2 0.  0.  0.  0.  0. ]
 [0.  1.5 0.  0.  0.  0. ]
 [0.  0.  1.  0.  0.  0. ]
 [0.  0.  0.  7.1 0.  0. ]
 [0.  0.  0.  0.  2.  0. ]
 [0.  0.  0.  0.  0.  8.3]]

[[ 0.05371307  1.4564506  -1.7267214 ]
 [-1.7192566   1.5986782   0.91717476]]

(2, 3)

[[4 4]
 [4 4]
 [4 4]
 [4 4]]

[[5.8656416 4.652042  4.3458786 5.7581544 6.0479655 6.6700726 6.035503 ]
 [5.729369  7.1762366 6.052859  6.8669724 6.278867  5.959975  7.3483486]
 [7.441143  7.3443627 6.4590683 5.6526484 5.656465  4.5448027 5.9519987]]

Exercise 11. Multiply the matrix by a matrix TensorFlow¶

Matrix multiplication is not alternating.

KPU = tf.matmul(matrix7,matrix6)

sess = tf.Session()
print(sess.run(KPU))

[[57 48 78]
 [59 26 46]
 [10 10 16]]

[[ 32  63  50]
 [ 46 110 144]
 [ 14  37  30]]

[[1 7]
 [3 2]
 [9 5]]

-2.0

[[1.2 0.  0.  0.  0.  0. ]
 [0.  1.5 0.  0.  0.  0. ]
 [0.  0.  1.  0.  0.  0. ]
 [0.  0.  0.  7.1 0.  0. ]
 [0.  0.  0.  0.  2.  0. ]
 [0.  0.  0.  0.  0.  8.3]]

[[ 0.05371307  1.4564506  -1.7267214 ]
 [-1.7192566   1.5986782   0.91717476]]

(2, 3)

[[4 4]
 [4 4]
 [4 4]
 [4 4]]

[[5.8656416 4.652042  4.3458786 5.7581544 6.0479655 6.6700726 6.035503 ]
 [5.729369  7.1762366 6.052859  6.8669724 6.278867  5.959975  7.3483486]
 [7.441143  7.3443627 6.4590683 5.6526484 5.656465  4.5448027 5.9519987]]

array([[ 1.,  2.,  3.],
       [-3., -7., -1.],
       [ 0.,  5., -2.]])

Exercise 11. Multiplication of C⋅D square matrices¶

matrix8 = tf.constant([[5,6,1],[8,7,9],[1,5,2]],tf.int32)
matrix9 = tf.constant([[4,6,7],[2,5,1],[0,3,9]],tf.int32)

PKS = tf.matmul(matrix8,matrix9)

sess = tf.Session()
print(sess.run(PKS))

[[ 32  63  50]
 [ 46 110 144]
 [ 14  37  30]]

[[1 7]
 [3 2]
 [9 5]]

-2.0

[[1.2 0.  0.  0.  0.  0. ]
 [0.  1.5 0.  0.  0.  0. ]
 [0.  0.  1.  0.  0.  0. ]
 [0.  0.  0.  7.1 0.  0. ]
 [0.  0.  0.  0.  2.  0. ]
 [0.  0.  0.  0.  0.  8.3]]

[[ 0.05371307  1.4564506  -1.7267214 ]
 [-1.7192566   1.5986782   0.91717476]]

(2, 3)

[[4 4]
 [4 4]
 [4 4]
 [4 4]]

[[5.8656416 4.652042  4.3458786 5.7581544 6.0479655 6.6700726 6.035503 ]
 [5.729369  7.1762366 6.052859  6.8669724 6.278867  5.959975  7.3483486]
 [7.441143  7.3443627 6.4590683 5.6526484 5.656465  4.5448027 5.9519987]]

array([[ 1.,  2.,  3.],
       [-3., -7., -1.],
       [ 0.,  5., -2.]])

[[ 1.  2.  3.]
 [-3. -7. -1.]
 [ 0.  5. -2.]]

Exercise 11.Transpose the matrix¶

$A=begin{bmatrix}
1 & 3 & 9 \
7 & 2 & 5 \
end{bmatrix}$

$A^T=begin{bmatrix}
1 & 7 \
3 & 2 \
9 & 5 \
end{bmatrix}$

x = tf.constant([[1, 3, 9], [7, 2, 5]])
GAP = tf.transpose(x)

sess = tf.Session()
print(sess.run(GAP))

[[1 7]
 [3 2]
 [9 5]]

-2.0

[[1.2 0.  0.  0.  0.  0. ]
 [0.  1.5 0.  0.  0.  0. ]
 [0.  0.  1.  0.  0.  0. ]
 [0.  0.  0.  7.1 0.  0. ]
 [0.  0.  0.  0.  2.  0. ]
 [0.  0.  0.  0.  0.  8.3]]

[[ 0.05371307  1.4564506  -1.7267214 ]
 [-1.7192566   1.5986782   0.91717476]]

(2, 3)

[[4 4]
 [4 4]
 [4 4]
 [4 4]]

[[5.8656416 4.652042  4.3458786 5.7581544 6.0479655 6.6700726 6.035503 ]
 [5.729369  7.1762366 6.052859  6.8669724 6.278867  5.959975  7.3483486]
 [7.441143  7.3443627 6.4590683 5.6526484 5.656465  4.5448027 5.9519987]]

array([[ 1.,  2.,  3.],
       [-3., -7., -1.],
       [ 0.,  5., -2.]])

[[ 1.  2.  3.]
 [-3. -7. -1.]
 [ 0.  5. -2.]]

Exercise 12. Determinant of the matrix¶

$A = begin{bmatrix}
1 & 2 \
3 & 4
end{bmatrix}$

$|A| = (1*4)-(3*2)= -2$

matrix1 = tf.constant ([[1, 2],[3, 4]], tf.float32)

PKO = tf.matrix_determinant(matrix1)

sess = tf.Session()
print(sess.run(PKO))

-2.0

[[1.2 0.  0.  0.  0.  0. ]
 [0.  1.5 0.  0.  0.  0. ]
 [0.  0.  1.  0.  0.  0. ]
 [0.  0.  0.  7.1 0.  0. ]
 [0.  0.  0.  0.  2.  0. ]
 [0.  0.  0.  0.  0.  8.3]]

[[ 0.05371307  1.4564506  -1.7267214 ]
 [-1.7192566   1.5986782   0.91717476]]

(2, 3)

[[4 4]
 [4 4]
 [4 4]
 [4 4]]

[[5.8656416 4.652042  4.3458786 5.7581544 6.0479655 6.6700726 6.035503 ]
 [5.729369  7.1762366 6.052859  6.8669724 6.278867  5.959975  7.3483486]
 [7.441143  7.3443627 6.4590683 5.6526484 5.656465  4.5448027 5.9519987]]

array([[ 1.,  2.,  3.],
       [-3., -7., -1.],
       [ 0.,  5., -2.]])

[[ 1.  2.  3.]
 [-3. -7. -1.]
 [ 0.  5. -2.]]

Exercise 13. Diagonal matrix¶

Diagonal matrix – a matrix, usually square [a], whose all coefficients lying outside the main diagonal (main diagonal) are zero. In other words, it is an upper- and lower-triangular matrix at the same time.

PPS = tf.diag([1.2, 1.5, 1.0, 7.1, 2, 8.3])

sess = tf.Session()
print(sess.run(PPS))

[[1.2 0.  0.  0.  0.  0. ]
 [0.  1.5 0.  0.  0.  0. ]
 [0.  0.  1.  0.  0.  0. ]
 [0.  0.  0.  7.1 0.  0. ]
 [0.  0.  0.  0.  2.  0. ]
 [0.  0.  0.  0.  0.  8.3]]

[[ 0.05371307  1.4564506  -1.7267214 ]
 [-1.7192566   1.5986782   0.91717476]]

(2, 3)

[[4 4]
 [4 4]
 [4 4]
 [4 4]]

[[5.8656416 4.652042  4.3458786 5.7581544 6.0479655 6.6700726 6.035503 ]
 [5.729369  7.1762366 6.052859  6.8669724 6.278867  5.959975  7.3483486]
 [7.441143  7.3443627 6.4590683 5.6526484 5.656465  4.5448027 5.9519987]]

array([[ 1.,  2.,  3.],
       [-3., -7., -1.],
       [ 0.,  5., -2.]])

[[ 1.  2.  3.]
 [-3. -7. -1.]
 [ 0.  5. -2.]]

Exercise 14. Outputs random values from a truncated normal distribution¶

https://docs.w3cub.com/tensorflow~python/tf/truncated_normal/

The generated values follow a normal distribution with specified mean and standard deviation, except that values whose magnitude is more than 2 standard deviations from the mean are dropped and re-picked.

ABC = tf.truncated_normal([2, 3])

sess = tf.Session()
print(sess.run(ABC))

[[ 0.05371307  1.4564506  -1.7267214 ]
 [-1.7192566   1.5986782   0.91717476]]

(2, 3)

[[4 4]
 [4 4]
 [4 4]
 [4 4]]

[[5.8656416 4.652042  4.3458786 5.7581544 6.0479655 6.6700726 6.035503 ]
 [5.729369  7.1762366 6.052859  6.8669724 6.278867  5.959975  7.3483486]
 [7.441143  7.3443627 6.4590683 5.6526484 5.656465  4.5448027 5.9519987]]

array([[ 1.,  2.,  3.],
       [-3., -7., -1.],
       [ 0.,  5., -2.]])

[[ 1.  2.  3.]
 [-3. -7. -1.]
 [ 0.  5. -2.]]

print(ABC.shape)

(2, 3)

[[4 4]
 [4 4]
 [4 4]
 [4 4]]

[[5.8656416 4.652042  4.3458786 5.7581544 6.0479655 6.6700726 6.035503 ]
 [5.729369  7.1762366 6.052859  6.8669724 6.278867  5.959975  7.3483486]
 [7.441143  7.3443627 6.4590683 5.6526484 5.656465  4.5448027 5.9519987]]

array([[ 1.,  2.,  3.],
       [-3., -7., -1.],
       [ 0.,  5., -2.]])

[[ 1.  2.  3.]
 [-3. -7. -1.]
 [ 0.  5. -2.]]

Random values for the mean: 7 and standard deviation: 2. Generate matrix: 3 by 5, the values should be in the format: float32.

KSU = tf.truncated_normal([3,5],mean=7,stddev=2.0,dtype=tf.float32)
sess = tf.Session()

Exercise 15. Creates a tensor filled with a scalar value.¶

This operation creates a tensor of shape dims and fills it with value.
https://docs.w3cub.com/tensorflow~python/tf/fill/

print(sess.run(tf.fill([4,2],4)))

[[4 4]
 [4 4]
 [4 4]
 [4 4]]

[[5.8656416 4.652042  4.3458786 5.7581544 6.0479655 6.6700726 6.035503 ]
 [5.729369  7.1762366 6.052859  6.8669724 6.278867  5.959975  7.3483486]
 [7.441143  7.3443627 6.4590683 5.6526484 5.656465  4.5448027 5.9519987]]

array([[ 1.,  2.,  3.],
       [-3., -7., -1.],
       [ 0.,  5., -2.]])

[[ 1.  2.  3.]
 [-3. -7. -1.]
 [ 0.  5. -2.]]

Macierz 4×2 wypełniona wartościami 4

Exercise 16. Outputs random values from a uniform distribution.¶

The generated values follow a uniform distribution in the range [minval, maxval). The lower bound minval is included in the range, while the upper bound maxval is excluded.

https://docs.w3cub.com/tensorflow~python/tf/random_uniform/

GAD = tf.random_uniform([3,7], minval=4, maxval=8, dtype=tf.float32)

sess = tf.Session()
print(sess.run(GAD))

[[5.8656416 4.652042  4.3458786 5.7581544 6.0479655 6.6700726 6.035503 ]
 [5.729369  7.1762366 6.052859  6.8669724 6.278867  5.959975  7.3483486]
 [7.441143  7.3443627 6.4590683 5.6526484 5.656465  4.5448027 5.9519987]]

array([[ 1.,  2.,  3.],
       [-3., -7., -1.],
       [ 0.,  5., -2.]])

[[ 1.  2.  3.]
 [-3. -7. -1.]
 [ 0.  5. -2.]]

Exercise 17. Transforming the matrix into a Tensorflow matrix¶

We have some kind of matrix and we want to include it in the tensorflow model.

import numpy as np
KAT = np.array([[1., 2., 3.],[-3., -7., -1.],[0., 5., -2.]])
KAT

array([[ 1.,  2.,  3.],
       [-3., -7., -1.],
       [ 0.,  5., -2.]])

DOK = tf.convert_to_tensor(KAT)

sess = tf.Session()
print(sess.run(DOK))

[[ 1.  2.  3.]
 [-3. -7. -1.]
 [ 0.  5. -2.]]

Source of data: https://archive.ics.uci.edu/ml/datasets/Parking+Birmingham

import pandas as pd

df = pd.read_csv('c:/TF/ParkingBirmingham.csv')
df.head(3)

df.LastUpdated = pd.to_datetime(df.LastUpdated)
df.dtypes

SystemCodeNumber            object
Capacity                     int64
Occupancy                    int64
LastUpdated         datetime64[ns]
dtype: object

df['month'] = df.LastUpdated.dt.month
df['hour'] = df.LastUpdated.dt.hour
df['weekday_name'] = df.LastUpdated.dt.weekday_name
df['weekday'] = df.LastUpdated.dt.weekday

df.head(4)

df = df.loc[df['SystemCodeNumber']=='BHMMBMMBX01'] 
df.shape

(1312, 8)

import tensorflow as tf

C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:493: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:494: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:495: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:496: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:497: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:ProgramDataAnaconda3envsOLD_TFlibsite-packagestensorflowpythonframeworkdtypes.py:502: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])

[_NumericColumn(key='month', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='hour', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='weekday', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'ABC', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': , '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

(1050, 8) (262, 8)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-20000
INFO:tensorflow:Saving checkpoints for 20001 into ABCmodel.ckpt.
INFO:tensorflow:loss = 1604473.0, step = 20001
INFO:tensorflow:global_step/sec: 524.813
INFO:tensorflow:loss = 1890832.8, step = 20101 (0.191 sec)
INFO:tensorflow:global_step/sec: 595.828
INFO:tensorflow:loss = 1691072.0, step = 20201 (0.183 sec)
INFO:tensorflow:global_step/sec: 581.214
INFO:tensorflow:loss = 1660972.2, step = 20301 (0.172 sec)
INFO:tensorflow:global_step/sec: 577.628
INFO:tensorflow:loss = 1830299.8, step = 20401 (0.158 sec)
INFO:tensorflow:global_step/sec: 591.553
INFO:tensorflow:loss = 1564311.5, step = 20501 (0.169 sec)
INFO:tensorflow:global_step/sec: 659.048
INFO:tensorflow:loss = 1851407.0, step = 20601 (0.167 sec)
INFO:tensorflow:global_step/sec: 565.153
INFO:tensorflow:loss = 1717692.1, step = 20701 (0.161 sec)
INFO:tensorflow:global_step/sec: 597.055
INFO:tensorflow:loss = 1668234.1, step = 20801 (0.167 sec)
INFO:tensorflow:global_step/sec: 597.223
INFO:tensorflow:loss = 1785292.5, step = 20901 (0.167 sec)
INFO:tensorflow:Saving checkpoints for 21000 into ABCmodel.ckpt.
INFO:tensorflow:Loss for final step: 1761262.6.

INFO:tensorflow:Starting evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000
INFO:tensorflow:Finished evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Saving dict for global step 21000: average_loss = 12334.496, global_step = 21000, loss = 1077212.6

INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000

[array([319.3249], dtype=float32),
 array([437.01642], dtype=float32),
 array([476.24692], dtype=float32),
 array([495.86215], dtype=float32),

 The model gave us a result string y. I am now processing this result string into a list.

array([[319.3249 ],
       [437.01642],
       [476.24692],
       [495.86215],
       [326.4933 ],
       [424.56955],
       [444.1848 ],

Step 1: Convert Data

We convert numeric variables in the correct Tensorflow format. Tensorflow provides a continuous variable conversion method: tf.feature_column.numeric_column ().

FEATURES = ['month', 'hour', 'weekday'] 
LABEL = 'Occupancy'

PKS = [tf.feature_column.numeric_column(k) for k in FEATURES] 
PKS

[_NumericColumn(key='month', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='hour', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='weekday', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

Step 2: Defining the estimator

Tensorflow will automatically create a file called „ABC” in your working directory. You must use this path to access Tensorboard. The estimator applies to independent variables.

estimator = tf.estimator.LinearRegressor( feature_columns=PKS, model_dir="ABC")

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'ABC', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': , '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

(1050, 8) (262, 8)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-20000
INFO:tensorflow:Saving checkpoints for 20001 into ABCmodel.ckpt.
INFO:tensorflow:loss = 1604473.0, step = 20001
INFO:tensorflow:global_step/sec: 524.813
INFO:tensorflow:loss = 1890832.8, step = 20101 (0.191 sec)
INFO:tensorflow:global_step/sec: 595.828
INFO:tensorflow:loss = 1691072.0, step = 20201 (0.183 sec)
INFO:tensorflow:global_step/sec: 581.214
INFO:tensorflow:loss = 1660972.2, step = 20301 (0.172 sec)
INFO:tensorflow:global_step/sec: 577.628
INFO:tensorflow:loss = 1830299.8, step = 20401 (0.158 sec)
INFO:tensorflow:global_step/sec: 591.553
INFO:tensorflow:loss = 1564311.5, step = 20501 (0.169 sec)
INFO:tensorflow:global_step/sec: 659.048
INFO:tensorflow:loss = 1851407.0, step = 20601 (0.167 sec)
INFO:tensorflow:global_step/sec: 565.153
INFO:tensorflow:loss = 1717692.1, step = 20701 (0.161 sec)
INFO:tensorflow:global_step/sec: 597.055
INFO:tensorflow:loss = 1668234.1, step = 20801 (0.167 sec)
INFO:tensorflow:global_step/sec: 597.223
INFO:tensorflow:loss = 1785292.5, step = 20901 (0.167 sec)
INFO:tensorflow:Saving checkpoints for 21000 into ABCmodel.ckpt.
INFO:tensorflow:Loss for final step: 1761262.6.

INFO:tensorflow:Starting evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000
INFO:tensorflow:Finished evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Saving dict for global step 21000: average_loss = 12334.496, global_step = 21000, loss = 1077212.6

INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000

[array([319.3249], dtype=float32),
 array([437.01642], dtype=float32),
 array([476.24692], dtype=float32),
 array([495.86215], dtype=float32),

 The model gave us a result string y. I am now processing this result string into a list.

array([[319.3249 ],
       [437.01642],
       [476.24692],
       [495.86215],
       [326.4933 ],
       [424.56955],
       [444.1848 ],

dtype('float32')

dtype('float32')

To instruct Tensorflow how to feed the model, you can use pandas_input_fn. This object needs 5 parameters: x: function data y: label data batch_size: batch. Default 128 num_epoch: by default number of epochs 1 random: Random or not data. Default None

def get_input_fn(data_set, num_epochs=None, n_batch = 128, shuffle=True): 
    return tf.estimator.inputs.pandas_input_fn( x=pd.DataFrame({k: data_set[k].values for k in FEATURES}),
                                               y = pd.Series(data_set[LABEL].values), batch_size=n_batch, num_epochs=num_epochs, shuffle=shuffle)

Step 3: Model training

To feed the model you can use the function created above: get_input_fn.
Then you instruct the model to iterate 1000 times.
Remember that you do not specify the number of epochs (num_epochs).
It is better to set the number of epochs to none and define the number of iterations.

To test the model, we must divide the data set into a test set and a training set.

df_train=df.sample(frac=0.8,random_state=200) 
df_test=df.drop(df_train.index) 
print(df_train.shape, df_test.shape)

(1050, 8) (262, 8)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-20000
INFO:tensorflow:Saving checkpoints for 20001 into ABCmodel.ckpt.
INFO:tensorflow:loss = 1604473.0, step = 20001
INFO:tensorflow:global_step/sec: 524.813
INFO:tensorflow:loss = 1890832.8, step = 20101 (0.191 sec)
INFO:tensorflow:global_step/sec: 595.828
INFO:tensorflow:loss = 1691072.0, step = 20201 (0.183 sec)
INFO:tensorflow:global_step/sec: 581.214
INFO:tensorflow:loss = 1660972.2, step = 20301 (0.172 sec)
INFO:tensorflow:global_step/sec: 577.628
INFO:tensorflow:loss = 1830299.8, step = 20401 (0.158 sec)
INFO:tensorflow:global_step/sec: 591.553
INFO:tensorflow:loss = 1564311.5, step = 20501 (0.169 sec)
INFO:tensorflow:global_step/sec: 659.048
INFO:tensorflow:loss = 1851407.0, step = 20601 (0.167 sec)
INFO:tensorflow:global_step/sec: 565.153
INFO:tensorflow:loss = 1717692.1, step = 20701 (0.161 sec)
INFO:tensorflow:global_step/sec: 597.055
INFO:tensorflow:loss = 1668234.1, step = 20801 (0.167 sec)
INFO:tensorflow:global_step/sec: 597.223
INFO:tensorflow:loss = 1785292.5, step = 20901 (0.167 sec)
INFO:tensorflow:Saving checkpoints for 21000 into ABCmodel.ckpt.
INFO:tensorflow:Loss for final step: 1761262.6.

INFO:tensorflow:Starting evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000
INFO:tensorflow:Finished evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Saving dict for global step 21000: average_loss = 12334.496, global_step = 21000, loss = 1077212.6

INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000

[array([319.3249], dtype=float32),
 array([437.01642], dtype=float32),
 array([476.24692], dtype=float32),
 array([495.86215], dtype=float32),

 The model gave us a result string y. I am now processing this result string into a list.

array([[319.3249 ],
       [437.01642],
       [476.24692],
       [495.86215],
       [326.4933 ],
       [424.56955],
       [444.1848 ],

dtype('float32')

dtype('float32')

y         float32
y_pred    float32
dtype: object

estimator.train(input_fn=get_input_fn(df_train, num_epochs=None, n_batch = 128, shuffle=False), steps=1000)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-20000
INFO:tensorflow:Saving checkpoints for 20001 into ABCmodel.ckpt.
INFO:tensorflow:loss = 1604473.0, step = 20001
INFO:tensorflow:global_step/sec: 524.813
INFO:tensorflow:loss = 1890832.8, step = 20101 (0.191 sec)
INFO:tensorflow:global_step/sec: 595.828
INFO:tensorflow:loss = 1691072.0, step = 20201 (0.183 sec)
INFO:tensorflow:global_step/sec: 581.214
INFO:tensorflow:loss = 1660972.2, step = 20301 (0.172 sec)
INFO:tensorflow:global_step/sec: 577.628
INFO:tensorflow:loss = 1830299.8, step = 20401 (0.158 sec)
INFO:tensorflow:global_step/sec: 591.553
INFO:tensorflow:loss = 1564311.5, step = 20501 (0.169 sec)
INFO:tensorflow:global_step/sec: 659.048
INFO:tensorflow:loss = 1851407.0, step = 20601 (0.167 sec)
INFO:tensorflow:global_step/sec: 565.153
INFO:tensorflow:loss = 1717692.1, step = 20701 (0.161 sec)
INFO:tensorflow:global_step/sec: 597.055
INFO:tensorflow:loss = 1668234.1, step = 20801 (0.167 sec)
INFO:tensorflow:global_step/sec: 597.223
INFO:tensorflow:loss = 1785292.5, step = 20901 (0.167 sec)
INFO:tensorflow:Saving checkpoints for 21000 into ABCmodel.ckpt.
INFO:tensorflow:Loss for final step: 1761262.6.

INFO:tensorflow:Starting evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000
INFO:tensorflow:Finished evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Saving dict for global step 21000: average_loss = 12334.496, global_step = 21000, loss = 1077212.6

INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000

[array([319.3249], dtype=float32),
 array([437.01642], dtype=float32),
 array([476.24692], dtype=float32),
 array([495.86215], dtype=float32),

 The model gave us a result string y. I am now processing this result string into a list.

array([[319.3249 ],
       [437.01642],
       [476.24692],
       [495.86215],
       [326.4933 ],
       [424.56955],
       [444.1848 ],

dtype('float32')

dtype('float32')

y         float32
y_pred    float32
dtype: object

dtype('float32')

Step 4. Model evaluation

To enter a test set, use the following code:

ev = estimator.evaluate( input_fn=get_input_fn(df_test, num_epochs=1, n_batch = 128, shuffle=False))

INFO:tensorflow:Starting evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000
INFO:tensorflow:Finished evaluation at 2019-12-03-10:35:11
INFO:tensorflow:Saving dict for global step 21000: average_loss = 12334.496, global_step = 21000, loss = 1077212.6

INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000

[array([319.3249], dtype=float32),
 array([437.01642], dtype=float32),
 array([476.24692], dtype=float32),
 array([495.86215], dtype=float32),

 The model gave us a result string y. I am now processing this result string into a list.

array([[319.3249 ],
       [437.01642],
       [476.24692],
       [495.86215],
       [326.4933 ],
       [424.56955],
       [444.1848 ],

dtype('float32')

dtype('float32')

y         float32
y_pred    float32
dtype: object

dtype('float32')

dtype('float32')

Step 5. Calculation of R Square

Calculation of R Square parameter using Tensorflow

I make a prediction on a test set

y = estimator.predict(    
         input_fn=get_input_fn(df_test,                          
         num_epochs=1,                          
         n_batch = 256,                          
         shuffle=False))

import itertools

predictions = list(p["predictions"] for p in itertools.islice(y, 1871))
#print("Predictions: {}".format(str(predictions)))

INFO:tensorflow:Restoring parameters from ABCmodel.ckpt-21000

[array([319.3249], dtype=float32),
 array([437.01642], dtype=float32),
 array([476.24692], dtype=float32),
 array([495.86215], dtype=float32),

 The model gave us a result string y. I am now processing this result string into a list.

array([[319.3249 ],
       [437.01642],
       [476.24692],
       [495.86215],
       [326.4933 ],
       [424.56955],
       [444.1848 ],

dtype('float32')

dtype('float32')

y         float32
y_pred    float32
dtype: object

dtype('float32')

dtype('float32')

R Square parameter:  0.13424665

predictions

[array([319.3249], dtype=float32),
 array([437.01642], dtype=float32),
 array([476.24692], dtype=float32),
 array([495.86215], dtype=float32),

 The model gave us a result string y. I am now processing this result string into a list.

import numpy as np

conc = np.vstack(predictions)
conc

array([[319.3249 ],
       [437.01642],
       [476.24692],
       [495.86215],
       [326.4933 ],
       [424.56955],
       [444.1848 ],

ZHP = pd.DataFrame(conc)
ZHP.rename(columns={0:'y_pred'}, inplace=True)

kot = ZHP['y_pred'].values
kot = kot.astype('float32')
kot.dtype

dtype('float32')

Now I’m creating a list of real y values from the test set.

y = df_test['Occupancy'].values
y = y.astype('float32')
y.dtype

dtype('float32')

Now I create a dataframe with y-real and y-predicted variables.

PZU = pd.DataFrame({'y': y, 'y_pred': kot })
PZU.dtypes

y         float32
y_pred    float32
dtype: object

https://stackoverflow.com/questions/42351184/how-to-calculate-r2-in-tensorflow

def R_squared(y, y_pred):
    
  residual = tf.reduce_sum(tf.square(tf.subtract(y,y_pred)))
  total = tf.reduce_sum(tf.square(tf.subtract(y, tf.reduce_mean(y))))
  r2 = tf.subtract(1.0, tf.div(residual, total))
  return r2

https://blog.minitab.com/blog/adventures-in-statistics-2/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit

To use this function, both variables must have the same data type.

y.dtype

dtype('float32')

kot.dtype

dtype('float32')

residual = tf.reduce_sum(tf.square(tf.subtract(y,kot)))

total = tf.reduce_sum(tf.square(tf.subtract(y, tf.reduce_mean(y))))

r2 = tf.subtract(1.0, tf.div(residual, total))

r2

sess = tf.Session()
a = sess.run(r2)
print('R Square parameter: ',a)

R Square parameter:  0.13424665

Sum_SST : 3732746.8
Sum_SSE : 3231638.2

R Square parameter:  0.13424659

Calculation of R Square parameter using Pandas

PZU.head(5)

PZU['SSE'] = (PZU['y'] - PZU['y_pred'])**2
PZU.head(3)

Point 2. We calculate the average empirical value of y¶

PZU['ave_y'] = PZU['y'].mean()
PZU.head(3)

Point 3. We calculate the difference between empirical values y and the average of empirical values y¶

PZU['SST'] = (PZU['y'] - PZU['ave_y'])**2
PZU.head(3)

Point 4. We calculate the difference between sum of SST and sum of SSE

Sum_SST = PZU['SST'].sum()
print('Sum_SST :',Sum_SST)
Sum_SSE = PZU['SSE'].sum()
print('Sum_SSE :',Sum_SSE)
SSR = Sum_SST - Sum_SSE

Sum_SST : 3732746.8
Sum_SSE : 3231638.2

R Square parameter:  0.13424659

Point 5. We calculate the R Square parameter

r2 = SSR/Sum_SST
print('R Square parameter: ',r2)

R Square parameter:  0.13424659

We continue to learn how to build multiple linear regression models. This time we will build a model using the Tensorflow library. As before, the data file: AirQ_filled2.csv comes from previous episodes of this cycle.

import tensorflow as tf
import pandas as pd

df = pd.read_csv('c:/TF/AirQ_filled2.csv', usecols=['CO(GT)','PT08.S1(CO)','C6H6(GT)','PT08.S2(NMHC)','NOx(GT)','PT08.S3(NOx)','NO2(GT)','PT08.S4(NO2)','PT08.S5(O3)','T','RH', 'AH'
        ,'Month','Weekday','Hours'])
df.head(3)

Index(['CO(GT)', 'PT08.S1(CO)', 'C6H6(GT)', 'PT08.S2(NMHC)', 'NOx(GT)',
       'PT08.S3(NOx)', 'NO2(GT)', 'PT08.S4(NO2)', 'PT08.S5(O3)', 'T', 'RH',
       'AH', 'Month', 'Weekday', 'Hours'],
      dtype='object')

CO_GT           float64
PT08.S1_CO      float64
C6H6_GT         float64
PT08.S2_NMHC    float64
NOx_GT          float64
PT08.S3_NOx     float64
NO2_GT          float64
PT08.S4_NO2     float64
PT08.S5_O3      float64
T               float64
RH              float64
AH              float64
Month             int64
Weekday           int64
Hours             int64
dtype: object

[_NumericColumn(key='PT08.S1_CO', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='C6H6_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S2_NMHC', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NOx_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S3_NOx', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NO2_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S4_NO2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S5_O3', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='T', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='RH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='AH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Month', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Weekday', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Hours', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'Air', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': , '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

(7486, 15) (1871, 15)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from Airmodel.ckpt-10000
INFO:tensorflow:Saving checkpoints for 10001 into Airmodel.ckpt.
INFO:tensorflow:loss = 27.90989, step = 10001
INFO:tensorflow:global_step/sec: 231.067
INFO:tensorflow:loss = 19.266008, step = 10101 (0.443 sec)
INFO:tensorflow:global_step/sec: 250.047
INFO:tensorflow:loss = 21.174185, step = 10201 (0.389 sec)
INFO:tensorflow:global_step/sec: 244.378
INFO:tensorflow:loss = 26.823406, step = 10301 (0.409 sec)
INFO:tensorflow:global_step/sec: 263.037
INFO:tensorflow:loss = 16.690845, step = 10401 (0.380 sec)
INFO:tensorflow:global_step/sec: 250.698
INFO:tensorflow:loss = 24.08421, step = 10501 (0.399 sec)
INFO:tensorflow:global_step/sec: 254.447
INFO:tensorflow:loss = 16.630123, step = 10601 (0.406 sec)
INFO:tensorflow:global_step/sec: 248.812
INFO:tensorflow:loss = 25.998842, step = 10701 (0.389 sec)
INFO:tensorflow:global_step/sec: 269.371
INFO:tensorflow:loss = 31.432064, step = 10801 (0.387 sec)
INFO:tensorflow:global_step/sec: 255.634
INFO:tensorflow:loss = 22.70269, step = 10901 (0.391 sec)
INFO:tensorflow:Saving checkpoints for 11000 into Airmodel.ckpt.
INFO:tensorflow:Loss for final step: 24.21025.

INFO:tensorflow:Starting evaluation at 2019-11-28-13:40:17
INFO:tensorflow:Restoring parameters from Airmodel.ckpt-11000
INFO:tensorflow:Finished evaluation at 2019-11-28-13:40:17
INFO:tensorflow:Saving dict for global step 11000: average_loss = 0.18934268, global_step = 11000, loss = 59.04336

Loss: 59.043362

Step 1: Convert Data

We convert numeric variables in the correct Tensorflow format. Tensorflow provides a continuous variable conversion method: tf.feature_column.numeric_column ().

Separation of a column into an independent variable and a dependent variable.

df.columns

Index(['CO(GT)', 'PT08.S1(CO)', 'C6H6(GT)', 'PT08.S2(NMHC)', 'NOx(GT)',
       'PT08.S3(NOx)', 'NO2(GT)', 'PT08.S4(NO2)', 'PT08.S5(O3)', 'T', 'RH',
       'AH', 'Month', 'Weekday', 'Hours'],
      dtype='object')

df.columns = ['CO_GT', 'PT08.S1_CO', 'C6H6_GT', 'PT08.S2_NMHC',
       'NOx_GT', 'PT08.S3_NOx', 'NO2_GT', 'PT08.S4_NO2', 'PT08.S5_O3',
       'T', 'RH', 'AH', 'Month', 'Weekday', 'Hours']

df.dtypes

CO_GT           float64
PT08.S1_CO      float64
C6H6_GT         float64
PT08.S2_NMHC    float64
NOx_GT          float64
PT08.S3_NOx     float64
NO2_GT          float64
PT08.S4_NO2     float64
PT08.S5_O3      float64
T               float64
RH              float64
AH              float64
Month             int64
Weekday           int64
Hours             int64
dtype: object

FEATURES = ['PT08.S1_CO', 'C6H6_GT', 'PT08.S2_NMHC',
       'NOx_GT', 'PT08.S3_NOx', 'NO2_GT', 'PT08.S4_NO2', 'PT08.S5_O3',
       'T', 'RH', 'AH', 'Month', 'Weekday', 'Hours']
LABEL = 'CO_GT'

PKS = [tf.feature_column.numeric_column(k) for k in FEATURES]
PKS

[_NumericColumn(key='PT08.S1_CO', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='C6H6_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S2_NMHC', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NOx_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S3_NOx', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='NO2_GT', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S4_NO2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PT08.S5_O3', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='T', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='RH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='AH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Month', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Weekday', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Hours', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

Step 2: Defining the estimator

Tensorflow will automatically create a file called „Air” in your working directory. You must use this path to access Tensorboard. The estimator applies to independent variables.

estimator = tf.estimator.LinearRegressor(    
        feature_columns=PKS,   
        model_dir="Air")

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'Air', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': , '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

(7486, 15) (1871, 15)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from Airmodel.ckpt-10000
INFO:tensorflow:Saving checkpoints for 10001 into Airmodel.ckpt.
INFO:tensorflow:loss = 27.90989, step = 10001
INFO:tensorflow:global_step/sec: 231.067
INFO:tensorflow:loss = 19.266008, step = 10101 (0.443 sec)
INFO:tensorflow:global_step/sec: 250.047
INFO:tensorflow:loss = 21.174185, step = 10201 (0.389 sec)
INFO:tensorflow:global_step/sec: 244.378
INFO:tensorflow:loss = 26.823406, step = 10301 (0.409 sec)
INFO:tensorflow:global_step/sec: 263.037
INFO:tensorflow:loss = 16.690845, step = 10401 (0.380 sec)
INFO:tensorflow:global_step/sec: 250.698
INFO:tensorflow:loss = 24.08421, step = 10501 (0.399 sec)
INFO:tensorflow:global_step/sec: 254.447
INFO:tensorflow:loss = 16.630123, step = 10601 (0.406 sec)
INFO:tensorflow:global_step/sec: 248.812
INFO:tensorflow:loss = 25.998842, step = 10701 (0.389 sec)
INFO:tensorflow:global_step/sec: 269.371
INFO:tensorflow:loss = 31.432064, step = 10801 (0.387 sec)
INFO:tensorflow:global_step/sec: 255.634
INFO:tensorflow:loss = 22.70269, step = 10901 (0.391 sec)
INFO:tensorflow:Saving checkpoints for 11000 into Airmodel.ckpt.
INFO:tensorflow:Loss for final step: 24.21025.

INFO:tensorflow:Starting evaluation at 2019-11-28-13:40:17
INFO:tensorflow:Restoring parameters from Airmodel.ckpt-11000
INFO:tensorflow:Finished evaluation at 2019-11-28-13:40:17
INFO:tensorflow:Saving dict for global step 11000: average_loss = 0.18934268, global_step = 11000, loss = 59.04336

Loss: 59.043362

INFO:tensorflow:Restoring parameters from Airmodel.ckpt-11000

[array([2.2904341], dtype=float32),
 array([1.4195127], dtype=float32),
 array([0.9917113], dtype=float32),
 array([1.4134599], dtype=float32),
 array([1.2086823], dtype=float32),
 array([1.4521222], dtype=float32),
 ...]

array([[2.2904341],
       [1.4195127],
       [0.9917113],
       ...,
       [1.2040666],
       [0.4435346],
       [3.111309 ]], dtype=float32)

dtype('float32')

To instruct Tensorflow how to feed the model, you can use pandas_input_fn. This object needs 5 parameters: x: function data y: label data batch_size: batch. Default 128 num_epoch: by default number of epochs 1 random: Random or not data. Default None

def get_input_fn(data_set, num_epochs=None, n_batch = 128, shuffle=True):    
         return tf.estimator.inputs.pandas_input_fn(       
         x=pd.DataFrame({k: data_set[k].values for k in FEATURES}),       
         y = pd.Series(data_set[LABEL].values),       
         batch_size=n_batch,          
         num_epochs=num_epochs,       
         shuffle=shuffle)

Step 3: Model training

- To feed the model you can use the function created above: get_input_fn.
- Then you instruct the model to iterate 1000 times.
- Remember that you do not specify the number of epochs (num_epochs).
- It is better to set the number of epochs to none and define the number of iterations.

To test the model, we must divide the data set into a test set and a training set.

df_train=df.sample(frac=0.8,random_state=200)
df_test=df.drop(df_train.index)
print(df_train.shape, df_test.shape)

(7486, 15) (1871, 15)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from Airmodel.ckpt-10000
INFO:tensorflow:Saving checkpoints for 10001 into Airmodel.ckpt.
INFO:tensorflow:loss = 27.90989, step = 10001
INFO:tensorflow:global_step/sec: 231.067
INFO:tensorflow:loss = 19.266008, step = 10101 (0.443 sec)
INFO:tensorflow:global_step/sec: 250.047
INFO:tensorflow:loss = 21.174185, step = 10201 (0.389 sec)
INFO:tensorflow:global_step/sec: 244.378
INFO:tensorflow:loss = 26.823406, step = 10301 (0.409 sec)
INFO:tensorflow:global_step/sec: 263.037
INFO:tensorflow:loss = 16.690845, step = 10401 (0.380 sec)
INFO:tensorflow:global_step/sec: 250.698
INFO:tensorflow:loss = 24.08421, step = 10501 (0.399 sec)
INFO:tensorflow:global_step/sec: 254.447
INFO:tensorflow:loss = 16.630123, step = 10601 (0.406 sec)
INFO:tensorflow:global_step/sec: 248.812
INFO:tensorflow:loss = 25.998842, step = 10701 (0.389 sec)
INFO:tensorflow:global_step/sec: 269.371
INFO:tensorflow:loss = 31.432064, step = 10801 (0.387 sec)
INFO:tensorflow:global_step/sec: 255.634
INFO:tensorflow:loss = 22.70269, step = 10901 (0.391 sec)
INFO:tensorflow:Saving checkpoints for 11000 into Airmodel.ckpt.
INFO:tensorflow:Loss for final step: 24.21025.

INFO:tensorflow:Starting evaluation at 2019-11-28-13:40:17
INFO:tensorflow:Restoring parameters from Airmodel.ckpt-11000
INFO:tensorflow:Finished evaluation at 2019-11-28-13:40:17
INFO:tensorflow:Saving dict for global step 11000: average_loss = 0.18934268, global_step = 11000, loss = 59.04336

Loss: 59.043362

INFO:tensorflow:Restoring parameters from Airmodel.ckpt-11000

[array([2.2904341], dtype=float32),
 array([1.4195127], dtype=float32),
 array([0.9917113], dtype=float32),
 array([1.4134599], dtype=float32),
 array([1.2086823], dtype=float32),
 array([1.4521222], dtype=float32),
 ...]

array([[2.2904341],
       [1.4195127],
       [0.9917113],
       ...,
       [1.2040666],
       [0.4435346],
       [3.111309 ]], dtype=float32)

dtype('float32')

dtype('float32')

estimator.train(input_fn=get_input_fn(df_train,                                       
                                           num_epochs=None,                                      
                                           n_batch = 128,                                      
                                           shuffle=False),                                      
                                           steps=1000)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from Airmodel.ckpt-10000
INFO:tensorflow:Saving checkpoints for 10001 into Airmodel.ckpt.
INFO:tensorflow:loss = 27.90989, step = 10001
INFO:tensorflow:global_step/sec: 231.067
INFO:tensorflow:loss = 19.266008, step = 10101 (0.443 sec)
INFO:tensorflow:global_step/sec: 250.047
INFO:tensorflow:loss = 21.174185, step = 10201 (0.389 sec)
INFO:tensorflow:global_step/sec: 244.378
INFO:tensorflow:loss = 26.823406, step = 10301 (0.409 sec)
INFO:tensorflow:global_step/sec: 263.037
INFO:tensorflow:loss = 16.690845, step = 10401 (0.380 sec)
INFO:tensorflow:global_step/sec: 250.698
INFO:tensorflow:loss = 24.08421, step = 10501 (0.399 sec)
INFO:tensorflow:global_step/sec: 254.447
INFO:tensorflow:loss = 16.630123, step = 10601 (0.406 sec)
INFO:tensorflow:global_step/sec: 248.812
INFO:tensorflow:loss = 25.998842, step = 10701 (0.389 sec)
INFO:tensorflow:global_step/sec: 269.371
INFO:tensorflow:loss = 31.432064, step = 10801 (0.387 sec)
INFO:tensorflow:global_step/sec: 255.634
INFO:tensorflow:loss = 22.70269, step = 10901 (0.391 sec)
INFO:tensorflow:Saving checkpoints for 11000 into Airmodel.ckpt.
INFO:tensorflow:Loss for final step: 24.21025.

INFO:tensorflow:Starting evaluation at 2019-11-28-13:40:17
INFO:tensorflow:Restoring parameters from Airmodel.ckpt-11000
INFO:tensorflow:Finished evaluation at 2019-11-28-13:40:17
INFO:tensorflow:Saving dict for global step 11000: average_loss = 0.18934268, global_step = 11000, loss = 59.04336

Loss: 59.043362

INFO:tensorflow:Restoring parameters from Airmodel.ckpt-11000

[array([2.2904341], dtype=float32),
 array([1.4195127], dtype=float32),
 array([0.9917113], dtype=float32),
 array([1.4134599], dtype=float32),
 array([1.2086823], dtype=float32),
 array([1.4521222], dtype=float32),
 ...]

array([[2.2904341],
       [1.4195127],
       [0.9917113],
       ...,
       [1.2040666],
       [0.4435346],
       [3.111309 ]], dtype=float32)

dtype('float32')

dtype('float32')

y         float64
y_pred    float64
dtype: object

Step 4. Model evaluation

To enter a test set, use the following code:

ev = estimator.evaluate(    
          input_fn=get_input_fn(df_test,                          
          num_epochs=1,                          
          n_batch = 356,                          
          shuffle=False))

INFO:tensorflow:Starting evaluation at 2019-11-28-13:40:17
INFO:tensorflow:Restoring parameters from Airmodel.ckpt-11000
INFO:tensorflow:Finished evaluation at 2019-11-28-13:40:17
INFO:tensorflow:Saving dict for global step 11000: average_loss = 0.18934268, global_step = 11000, loss = 59.04336

Loss: 59.043362

INFO:tensorflow:Restoring parameters from Airmodel.ckpt-11000

[array([2.2904341], dtype=float32),
 array([1.4195127], dtype=float32),
 array([0.9917113], dtype=float32),
 array([1.4134599], dtype=float32),
 array([1.2086823], dtype=float32),
 array([1.4521222], dtype=float32),
 ...]

array([[2.2904341],
       [1.4195127],
       [0.9917113],
       ...,
       [1.2040666],
       [0.4435346],
       [3.111309 ]], dtype=float32)

dtype('float32')

dtype('float32')

y         float64
y_pred    float64
dtype: object

dtype('float32')

dtype('float32')

Print the loss using by the code below:

loss_score = ev["loss"]
print("Loss: {0:f}".format(loss_score))

Loss: 59.043362

INFO:tensorflow:Restoring parameters from Airmodel.ckpt-11000

[array([2.2904341], dtype=float32),
 array([1.4195127], dtype=float32),
 array([0.9917113], dtype=float32),
 array([1.4134599], dtype=float32),
 array([1.2086823], dtype=float32),
 array([1.4521222], dtype=float32),
 ...]

array([[2.2904341],
       [1.4195127],
       [0.9917113],
       ...,
       [1.2040666],
       [0.4435346],
       [3.111309 ]], dtype=float32)

dtype('float32')

dtype('float32')

y         float64
y_pred    float64
dtype: object

dtype('float32')

dtype('float32')

Calculation of R Square parameter using Tensorflow

I make a prediction on a test set

y = estimator.predict(    
         input_fn=get_input_fn(df_test,                          
         num_epochs=1,                          
         n_batch = 256,                          
         shuffle=False))

import itertools

predictions = list(p["predictions"] for p in itertools.islice(y, 1871))
#print("Predictions: {}".format(str(predictions)))

INFO:tensorflow:Restoring parameters from Airmodel.ckpt-11000

[array([2.2904341], dtype=float32),
 array([1.4195127], dtype=float32),
 array([0.9917113], dtype=float32),
 array([1.4134599], dtype=float32),
 array([1.2086823], dtype=float32),
 array([1.4521222], dtype=float32),
 ...]

array([[2.2904341],
       [1.4195127],
       [0.9917113],
       ...,
       [1.2040666],
       [0.4435346],
       [3.111309 ]], dtype=float32)

dtype('float32')

dtype('float32')

y         float64
y_pred    float64
dtype: object

dtype('float32')

dtype('float32')

R Square parameter:  0.90320766

predictions

[array([2.2904341], dtype=float32),
 array([1.4195127], dtype=float32),
 array([0.9917113], dtype=float32),
 array([1.4134599], dtype=float32),
 array([1.2086823], dtype=float32),
 array([1.4521222], dtype=float32),
 ...]

The model gave us a result string y. I am now processing this result string into a list.

import numpy as np

conc = np.vstack(predictions)
conc

array([[2.2904341],
       [1.4195127],
       [0.9917113],
       ...,
       [1.2040666],
       [0.4435346],
       [3.111309 ]], dtype=float32)

ZHP = pd.DataFrame(conc)
ZHP.rename(columns={0:'y_pred'}, inplace=True)

kot = ZHP['y_pred'].values
kot = kot.astype('float32')
kot.dtype

dtype('float32')

Now I’m creating a list of real y values from the test set.

y = df_test['CO_GT'].values
y = y.astype('float32')
y.dtype

dtype('float32')

Now I create a dataframe with y-real and y-predicted variables.

PZU = pd.DataFrame({'y': y, 'y_pred': kot })
PZU.dtypes

y         float64
y_pred    float64
dtype: object

https://stackoverflow.com/questions/42351184/how-to-calculate-r2-in-tensorflow

def R_squared(y, y_pred):
    
  residual = tf.reduce_sum(tf.square(tf.subtract(y,y_pred)))
  total = tf.reduce_sum(tf.square(tf.subtract(y, tf.reduce_mean(y))))
  r2 = tf.subtract(1.0, tf.div(residual, total))
  return r2

https://blog.minitab.com/blog/adventures-in-statistics-2/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit

To use this function, both variables must have the same data type.

y.dtype

dtype('float32')

kot.dtype

dtype('float32')

residual = tf.reduce_sum(tf.square(tf.subtract(y,kot)))

total = tf.reduce_sum(tf.square(tf.subtract(y, tf.reduce_mean(y))))

r2 = tf.subtract(1.0, tf.div(residual, total))

r2

sess = tf.Session()
a = sess.run(r2)
print('R Square parameter: ',a)

R Square parameter:  0.90320766

Sum_SST : 3659.9984179583107
Sum_SSE : 354.26016629427124

R Square parameter:  0.903207562998923

Calculation of R Square parameter using Pandas

PZU.head(5)

PZU['SSE'] = (PZU['y'] - PZU['y_pred'])**2
PZU.head(3)

Point 2. We calculate the average empirical value of y

PZU['ave_y'] = PZU['y'].mean()
PZU.head(3)

Point 3. We calculate the difference between empirical values y and the average of empirical values y

PZU['SST'] = (PZU['y'] - PZU['ave_y'])**2
PZU.head(3)

Point 4. We calculate the difference between sum of SST and sum of SSE

Sum_SST = PZU['SST'].sum()
print('Sum_SST :',Sum_SST)
Sum_SSE = PZU['SSE'].sum()
print('Sum_SSE :',Sum_SSE)
SSR = Sum_SST - Sum_SSE

Sum_SST : 3659.9984179583107
Sum_SSE : 354.26016629427124

R Square parameter:  0.903207562998923

Point 5. We calculate the R Square parameter

r2 = SSR/Sum_SST
print('R Square parameter: ',r2)

R Square parameter:  0.903207562998923

Part 1. Preliminary data preparation

AirQualityUCI

Source of data: https://archive.ics.uci.edu/ml/datasets/Air+Quality

import pandas as pd
df = pd.read_csv('c:/TS/AirQualityUCI.csv', sep=';')
df.head(3)

Data Set Information:

The dataset contains 9358 instances of hourly averaged responses from an array of 5 metal oxide chemical sensors embedded in an Air Quality Chemical Multisensor Device. The device was located on the field in a significantly polluted area, at road level,within an Italian city. Data were recorded from March 2004 to February 2005 (one year)representing the longest freely available recordings of on field deployed air quality chemical sensor devices responses. Ground Truth hourly averaged concentrations for CO, Non Metanic Hydrocarbons, Benzene, Total Nitrogen Oxides (NOx) and Nitrogen Dioxide (NO2) and were provided by a co-located reference certified analyzer. Evidences of cross-sensitivities as well as both concept and sensor drifts are present as described in De Vito et al., Sens. And Act. B, Vol. 129,2,2008 (citation required) eventually affecting sensors concentration estimation capabilities. Missing values are tagged with -200 value.
This dataset can be used exclusively for research purposes. Commercial purposes are fully excluded.

Supplementing data for further analysis

Attribute Information:

Date (DD/MM/YYYY)
Time (HH.MM.SS)
True hourly averaged concentration CO in mg/m^3 (reference analyzer)
PT08.S1 (tin oxide) hourly averaged sensor response (nominally CO targeted)
True hourly averaged overall Non Metanic HydroCarbons concentration in microg/m^3 (reference analyzer)
True hourly averaged Benzene concentration in microg/m^3 (reference analyzer)
PT08.S2 (titania) hourly averaged sensor response (nominally NMHC targeted)
True hourly averaged NOx concentration in ppb (reference analyzer)
PT08.S3 (tungsten oxide) hourly averaged sensor response (nominally NOx targeted)
True hourly averaged NO2 concentration in microg/m^3 (reference analyzer)
PT08.S4 (tungsten oxide) hourly averaged sensor response (nominally NO2 targeted)
PT08.S5 (indium oxide) hourly averaged sensor response (nominally O3 targeted)
Temperature in Â°C
Relative Humidity (
AH Absolute Humidity

Tutorial: Supplementing data for further analysis

Step 1. Data completeness check

df.isnull().sum()

Date              114
Time              114
CO(GT)            114
PT08.S1(CO)       114
NMHC(GT)          114
C6H6(GT)          114
PT08.S2(NMHC)     114
NOx(GT)           114
PT08.S3(NOx)      114
NO2(GT)           114
PT08.S4(NO2)      114
PT08.S5(O3)       114
T                 114
RH                114
AH                114
Unnamed: 15      9471
Unnamed: 16      9471
dtype: int64

There are a lot of missing values. In addition, we learned that the value -200 means no data. We’ll deal with this in a moment. We will now check the statistics of variables in the database.

df.agg(['min', 'max', 'mean', 'median'])

C:ProgramDataAnaconda3envsOLD_TFlibsite-packagesnumpylibnanfunctions.py:1112: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis, out=out, keepdims=keepdims)

(9471, 17)

Date             114
Time             114
CO(GT)           114
PT08.S1(CO)      114
NMHC(GT)         114
C6H6(GT)         114
PT08.S2(NMHC)    114
NOx(GT)          114
PT08.S3(NOx)     114
NO2(GT)          114
PT08.S4(NO2)     114
PT08.S5(O3)      114
T                114
RH               114
AH               114
dtype: int64

Date             0
Time             0
CO(GT)           0
PT08.S1(CO)      0
NMHC(GT)         0
C6H6(GT)         0
PT08.S2(NMHC)    0
NOx(GT)          0
PT08.S3(NOx)     0
NO2(GT)          0
PT08.S4(NO2)     0
PT08.S5(O3)      0
T                0
RH               0
AH               0
dtype: int64

Date                0
Time                0
CO(GT)           1683
PT08.S1(CO)       366
NMHC(GT)         8443
C6H6(GT)          366
PT08.S2(NMHC)     366
NOx(GT)          1639
PT08.S3(NOx)      366
NO2(GT)          1642
PT08.S4(NO2)      366
PT08.S5(O3)       366
T                 366
RH                366
AH                366
dtype: int64

Date              object
Time              object
CO(GT)            object
PT08.S1(CO)      float64
C6H6(GT)          object
PT08.S2(NMHC)    float64
NOx(GT)          float64
PT08.S3(NOx)     float64
NO2(GT)          float64
PT08.S4(NO2)     float64
PT08.S5(O3)      float64
T                 object
RH                object
AH                object
dtype: object

Date              object
Time              object
CO(GT)           float64
PT08.S1(CO)      float64
C6H6(GT)         float64
PT08.S2(NMHC)    float64
NOx(GT)          float64
PT08.S3(NOx)     float64
NO2(GT)          float64
PT08.S4(NO2)     float64
PT08.S5(O3)      float64
T                float64
RH               float64
AH               float64
dtype: object

Date              object
Time              object
CO(GT)           float64
PT08.S1(CO)      float64
C6H6(GT)         float64
PT08.S2(NMHC)    float64
NOx(GT)          float64
PT08.S3(NOx)     float64
NO2(GT)          float64
PT08.S4(NO2)     float64
PT08.S5(O3)      float64
T                float64
RH               float64
AH               float64
dtype: object

df.shape

(9471, 17)

We delete two empty columns.

del df['Unnamed: 15']
del df['Unnamed: 16']

Step 1: Preliminary analysis of data gaps

One more look at how many NaN cells there are.

df.isnull().sum()

Date             114
Time             114
CO(GT)           114
PT08.S1(CO)      114
NMHC(GT)         114
C6H6(GT)         114
PT08.S2(NMHC)    114
NOx(GT)          114
PT08.S3(NOx)     114
NO2(GT)          114
PT08.S4(NO2)     114
PT08.S5(O3)      114
T                114
RH               114
AH               114
dtype: int64

Now I will try to see these empty cells.

df[df['NMHC(GT)'].isnull()]

These are completely empty time series. The device was probably cut off from the power supply, no sensor was working.

df = df.dropna(how='all')
df.isnull().sum()

Date             0
Time             0
CO(GT)           0
PT08.S1(CO)      0
NMHC(GT)         0
C6H6(GT)         0
PT08.S2(NMHC)    0
NOx(GT)          0
PT08.S3(NOx)     0
NO2(GT)          0
PT08.S4(NO2)     0
PT08.S5(O3)      0
T                0
RH               0
AH               0
dtype: int64

We are looking for variables with the value -200 because this means there is no data. The -200 values are entered differently, so I have to do the replacement process in many ways.

import numpy as np

df = df.replace(-200,np.NaN)
df = df.replace('-200',np.NaN)
df = df.replace('-200.0',np.NaN)
df = df.replace('-200,0',np.NaN)

The value of -200 has been changed to NaN and we will see how many empty records there are now.

df.isnull().sum()

Date                0
Time                0
CO(GT)           1683
PT08.S1(CO)       366
NMHC(GT)         8443
C6H6(GT)          366
PT08.S2(NMHC)     366
NOx(GT)          1639
PT08.S3(NOx)      366
NO2(GT)          1642
PT08.S4(NO2)      366
PT08.S5(O3)       366
T                 366
RH                366
AH                366
dtype: int64

Chart of missing data structure.

import seaborn as sns
import matplotlib.pyplot as plt

sns.heatmap(df.isnull(),yticklabels=False,cbar=False,cmap='viridis')
plt.show

The NMHC (GT) variable is the most incomplete, we eliminate it from research

del df['NMHC(GT)']

We displaying records with missing data – Function isna ()

df1 = df[df.isna().any(axis=1)]
df1

Step 2: Check the level of direct correlation to complete the data

CO (GT) there is no data there every few measurements, you have to check what this variable correlates with. I check the data type to make a correlation.

df.dtypes

Date              object
Time              object
CO(GT)            object
PT08.S1(CO)      float64
C6H6(GT)          object
PT08.S2(NMHC)    float64
NOx(GT)          float64
PT08.S3(NOx)     float64
NO2(GT)          float64
PT08.S4(NO2)     float64
PT08.S5(O3)      float64
T                 object
RH                object
AH                object
dtype: object

# df['CO(GT)'].astype(float)

ValueError: could not convert string to float: '2,6′

It turns out that it is not so easy to convert text to number format – the problem is in commas. We replace commas with dots.

df['CO(GT)'] = df['CO(GT)'].str.replace(',','.')
df['C6H6(GT)'] = df['C6H6(GT)'].str.replace(',','.')
df['T'] = df['T'].str.replace(',','.')
df['RH'] = df['RH'].str.replace(',','.')
df['AH'] = df['AH'].str.replace(',','.')

We change the format from object to float

df[['CO(GT)','C6H6(GT)', 'T','RH','AH']] = df[['CO(GT)','C6H6(GT)', 'T','RH','AH']].astype(float)

df.dtypes

Date              object
Time              object
CO(GT)           float64
PT08.S1(CO)      float64
C6H6(GT)         float64
PT08.S2(NMHC)    float64
NOx(GT)          float64
PT08.S3(NOx)     float64
NO2(GT)          float64
PT08.S4(NO2)     float64
PT08.S5(O3)      float64
T                float64
RH               float64
AH               float64
dtype: object

We can now check the level of direct correlation.

df.corr()

sns.set(style="ticks")

corr = df.corr()

mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True

f, ax = plt.subplots(figsize=(22, 10))
cmap = sns.diverging_palette(180, 50, as_cmap=True)

sns.heatmap(corr, mask=mask, cmap=cmap, vmax=1.3, center=0.1,annot=True,
            square=True, linewidths=.9, cbar_kws={"shrink": 0.8})

Date              object
Time              object
CO(GT)           float64
PT08.S1(CO)      float64
C6H6(GT)         float64
PT08.S2(NMHC)    float64
NOx(GT)          float64
PT08.S3(NOx)     float64
NO2(GT)          float64
PT08.S4(NO2)     float64
PT08.S5(O3)      float64
T                float64
RH               float64
AH               float64
dtype: object

missing value in CO(GT):  1683

missing value:  383

missing value:  370

missing value:  0

missing value:  366

missing value:  0

brakuje wartości:  1639

missing value:  0

Step 3. Filling the gaps in variables based on other variables correlated with it

Filling gaps in the CO (GT) variable.

I check what this variable is strongly correlated with and supplement based on this variable, if not, I supplement it as the last or next value.

df.corr()

df.dtypes

Date              object
Time              object
CO(GT)           float64
PT08.S1(CO)      float64
C6H6(GT)         float64
PT08.S2(NMHC)    float64
NOx(GT)          float64
PT08.S3(NOx)     float64
NO2(GT)          float64
PT08.S4(NO2)     float64
PT08.S5(O3)      float64
T                float64
RH               float64
AH               float64
dtype: object

print('missing value in CO(GT): ',df['CO(GT)'].isnull().sum())

missing value in CO(GT):  1683

missing value:  383

missing value:  370

missing value:  0

missing value:  366

missing value:  0

brakuje wartości:  1639

missing value:  0

missing value:  1642

missing value:  0

CO (GT) correlation with other variables.

CORREL = df.corr()
CORREL['CO(GT)'].to_frame().sort_values('CO(GT)')

The largest correlation with CO (GT) occurs for C6H6 (GT) which is quite complete. Based on this variable, I fill in the deficiencies in CO (GT).

df['CO(GT)'] = df.groupby('C6H6(GT)')['CO(GT)'].apply(lambda x: x.ffill().bfill())

print('missing value: ',df['CO(GT)'].isnull().sum())

missing value:  383

missing value:  370

missing value:  0

missing value:  366

missing value:  0

brakuje wartości:  1639

missing value:  0

missing value:  1642

missing value:  0

df['CO(GT)'] = df.groupby('PT08.S1(CO)')['CO(GT)'].apply(lambda x: x.ffill().bfill())

print('missing value: ',df['CO(GT)'].isnull().sum())

missing value:  370

missing value:  0

missing value:  366

missing value:  0

brakuje wartości:  1639

missing value:  0

missing value:  1642

missing value:  0

(9357, 14)

Now I do simple refilling – the last good value.

df['CO(GT)'].fillna(method='ffill', inplace=True)

print('missing value: ',df['CO(GT)'].isnull().sum())

missing value:  0

missing value:  366

missing value:  0

brakuje wartości:  1639

missing value:  0

missing value:  1642

missing value:  0

(9357, 14)

(9357, 14)

Filling gaps in the variable 'C6H6 (GT)’

print('missing value: ',df['C6H6(GT)'].isnull().sum())

missing value:  366

missing value:  0

brakuje wartości:  1639

missing value:  0

missing value:  1642

missing value:  0

(9357, 14)

(9357, 14)

df['C6H6(GT)'] = df.groupby('CO(GT)')['C6H6(GT)'].apply(lambda x: x.ffill().bfill())

print('missing value: ',df['C6H6(GT)'].isnull().sum())

missing value:  0

brakuje wartości:  1639

missing value:  0

missing value:  1642

missing value:  0

(9357, 14)

(9357, 14)

Date             0
Time             0
CO(GT)           0
PT08.S1(CO)      0
C6H6(GT)         0
PT08.S2(NMHC)    0
NOx(GT)          0
PT08.S3(NOx)     0
NO2(GT)          0
PT08.S4(NO2)     0
PT08.S5(O3)      0
T                0
RH               0
AH               0
dtype: int64

Filling gaps in the variable 'NOx(GT)’

print('brakuje wartości: ',df['NOx(GT)'].isnull().sum())

brakuje wartości:  1639

missing value:  0

missing value:  1642

missing value:  0

(9357, 14)

(9357, 14)

Date             0
Time             0
CO(GT)           0
PT08.S1(CO)      0
C6H6(GT)         0
PT08.S2(NMHC)    0
NOx(GT)          0
PT08.S3(NOx)     0
NO2(GT)          0
PT08.S4(NO2)     0
PT08.S5(O3)      0
T                0
RH               0
AH               0
dtype: int64

CORREL['NOx(GT)'].to_frame().sort_values('NOx(GT)')

df['NOx(GT)'] = df.groupby('CO(GT)')['NOx(GT)'].apply(lambda x: x.ffill().bfill())

print('missing value: ',df['NOx(GT)'].isnull().sum())

missing value:  0

missing value:  1642

missing value:  0

(9357, 14)

(9357, 14)

Date             0
Time             0
CO(GT)           0
PT08.S1(CO)      0
C6H6(GT)         0
PT08.S2(NMHC)    0
NOx(GT)          0
PT08.S3(NOx)     0
NO2(GT)          0
PT08.S4(NO2)     0
PT08.S5(O3)      0
T                0
RH               0
AH               0
dtype: int64

Filling gaps in the variable 'C6H6 (GT)’

print('missing value: ',df['NO2(GT)'].isnull().sum())

missing value:  1642

missing value:  0

(9357, 14)

(9357, 14)

Date             0
Time             0
CO(GT)           0
PT08.S1(CO)      0
C6H6(GT)         0
PT08.S2(NMHC)    0
NOx(GT)          0
PT08.S3(NOx)     0
NO2(GT)          0
PT08.S4(NO2)     0
PT08.S5(O3)      0
T                0
RH               0
AH               0
dtype: int64

CORREL['NO2(GT)'].to_frame().sort_values('NO2(GT)')

df['NO2(GT)'] = df.groupby('PT08.S5(O3)')['NO2(GT)'].apply(lambda x: x.ffill().bfill())

df['NO2(GT)'] = df.groupby('CO(GT)')['NO2(GT)'].apply(lambda x: x.ffill().bfill())

print('missing value: ',df['NO2(GT)'].isnull().sum())

missing value:  0

(9357, 14)

(9357, 14)

Date             0
Time             0
CO(GT)           0
PT08.S1(CO)      0
C6H6(GT)         0
PT08.S2(NMHC)    0
NOx(GT)          0
PT08.S3(NOx)     0
NO2(GT)          0
PT08.S4(NO2)     0
PT08.S5(O3)      0
T                0
RH               0
AH               0
dtype: int64

sns.heatmap(df.isnull(),yticklabels=False,cbar=False,cmap='YlGnBu')

(9357, 14)

(9357, 14)

Date             0
Time             0
CO(GT)           0
PT08.S1(CO)      0
C6H6(GT)         0
PT08.S2(NMHC)    0
NOx(GT)          0
PT08.S3(NOx)     0
NO2(GT)          0
PT08.S4(NO2)     0
PT08.S5(O3)      0
T                0
RH               0
AH               0
dtype: int64

I complete the records where the entire measuring device did not work.

In the drawing it can be seen as solid lines.

df.shape

(9357, 14)

df.fillna(method='ffill', inplace=True)

df.shape

(9357, 14)

sns.heatmap(df.isnull(),yticklabels=False,cbar=False,cmap='Reds')

Date             0
Time             0
CO(GT)           0
PT08.S1(CO)      0
C6H6(GT)         0
PT08.S2(NMHC)    0
NOx(GT)          0
PT08.S3(NOx)     0
NO2(GT)          0
PT08.S4(NO2)     0
PT08.S5(O3)      0
T                0
RH               0
AH               0
dtype: int64

df.isnull().sum()

Date             0
Time             0
CO(GT)           0
PT08.S1(CO)      0
C6H6(GT)         0
PT08.S2(NMHC)    0
NOx(GT)          0
PT08.S3(NOx)     0
NO2(GT)          0
PT08.S4(NO2)     0
PT08.S5(O3)      0
T                0
RH               0
AH               0
dtype: int64

Data have been completed! In the next parts of this tutorial, we will carry out the process of building a linear regression model at TensorFlow.

Part 2. Simple multifactorial linear regression

In the previous part of this tutorial, we cleaned the data file from the measuring station. A new, completed measurement data file has been created, which we will now open.

We will now continue to prepare data for further analysis. One of the most important variables describing in linear regression is time. Most artificial and natural phenomena operate in hourly, daily and monthly cycles.

import pandas as pd
df = pd.read_csv('c:/TF/AirQ_filled.csv')
df.head(3)

Step 1. Launching the time variable

We check what the date format is

df[['Date','Time']].dtypes

Date    object
Time    object
dtype: object

There is no date format in dataframe. Link columns containing time.

df['DATE'] = df['Date']+' '+df['Time']
df['DATE'].head()

0    10/03/2004 18.00.00
1    10/03/2004 19.00.00
2    10/03/2004 20.00.00
3    10/03/2004 21.00.00
4    10/03/2004 22.00.00
Name: DATE, dtype: object

We create a new column containing the date and time. Then we convert the object format to the date format.

df['DATE'] = pd.to_datetime(df.DATE, format='
df.dtypes

Unnamed: 0                int64
Date                     object
Time                     object
CO(GT)                  float64
PT08.S1(CO)             float64
C6H6(GT)                float64
PT08.S2(NMHC)           float64
NOx(GT)                 float64
PT08.S3(NOx)            float64
NO2(GT)                 float64
PT08.S4(NO2)            float64
PT08.S5(O3)             float64
T                       float64
RH                      float64
AH                      float64
DATE             datetime64[ns]
dtype: object

Step 2. We add more columns based on the time variable

In industry, the day of the week is very important, so in such models it is worth adding a column with the number of the day.

df['Month'] = df['DATE'].dt.month
df['Weekday'] = df['DATE'].dt.weekday
df['Weekday_name'] = df['DATE'].dt.weekday_name
df['Hours'] = df['DATE'].dt.hour

df[['DATE','Month','Weekday','Weekday_name','Hours']].sample(3)

Graphical analysis of pollution according to time variables

df.pivot_table(index='Weekday_name', values='CO(GT)', aggfunc='mean').plot(kind='bar')

df.pivot_table(index='Month', values='CO(GT)', aggfunc='mean').plot(kind='bar')

Text(0, 0.5, 'Continuous independent variables')

Optimal shift for RH:  12

0.39204313671898056

Optimal shift for AH:  12

0.043756364102677595

Optimal shift for T:  12

-0.22446569561762522

df.pivot_table(index='Hours', values='CO(GT)', aggfunc='mean').plot(kind='bar')

Text(0, 0.5, 'Continuous independent variables')

Optimal shift for RH:  12

0.39204313671898056

Optimal shift for AH:  12

0.043756364102677595

Optimal shift for T:  12

-0.22446569561762522

C:ProgramDataAnaconda3envsOLD_TFlibsite-packagesipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.

Step 3. Correlation analysis¶

we set the result variable as:

CO(GT) – actual hourly average CO concentration in mg / m^3 (reference analyzer)

del df['Unnamed: 0']

CORREL = df.corr()
PKP = CORREL['CO(GT)'].to_frame().sort_values('CO(GT)')
PKP

import matplotlib.pyplot as plt

plt.figure(figsize=(10,8))
PKP.plot(kind='barh', color='red')
plt.title('Correlation with the resulting variable: CO ', fontsize=20)
plt.xlabel('Correlation level')
plt.ylabel('Continuous independent variables')

Text(0, 0.5, 'Continuous independent variables')

Optimal shift for RH:  12

0.39204313671898056

Optimal shift for AH:  12

0.043756364102677595

Optimal shift for T:  12

-0.22446569561762522

C:ProgramDataAnaconda3envsOLD_TFlibsite-packagesipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Variables based on time are not very well correlated with the described variable: CO (GT).
The temptation arises to use better, better correlated independent variables for the model. The problem is that these variables can be part of the result. So if there is pollution then all substances are in the air.

Our task is to examine how weather and time affect the level of pollution. We’ll cover this task in the next part of the tutorial.

Step 4. We are now checking shift

for independent variables with low direct correlation.
How does weather affect CO2 levels?

Variable RH – Relative humidity (
We check a variable with very low correlation with the resulting CO (GT) variable

def cross_corr(x, y, lag=0):
    return x.corr(y.shift(lag))

def shift_Factor(x,y,R):
    x_corr = [cross_corr(x, y, lag=i) for i in range(R)]
    
    # R factor is the number of the shifts who should be checked by the function
    Kot = pd.DataFrame(list(x_corr)).reset_index()
    Kot.rename(columns={0:'Corr', 'index':'Shift_num'}, inplace=True)
    
    # We find optimal correlation shift
    Kot['abs'] = Kot['Corr'].abs()
    SF = Kot.loc[Kot['abs']==Kot['abs'].max(), 'Shift_num']
    p1 = SF.to_frame()
    SF = p1.Shift_num.max()
    
    return SF

x = df.RH       # independent variable
y = df['CO(GT)']    # dependent variable
R = 20           # number of shifts who will be checked

SKO = shift_Factor(x,y,R)
print('Optimal shift for RH: ',SKO)

Optimal shift for RH:  12

0.39204313671898056

Optimal shift for AH:  12

0.043756364102677595

Optimal shift for T:  12

-0.22446569561762522

C:ProgramDataAnaconda3envsOLD_TFlibsite-packagesipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Mean Absolute Error:     1.0011099195710456
Mean Squared Error:      1.779567238605898
Root Mean Squared Error: 1.3340042123643756

cross_corr(x, y, lag=SKO)

0.39204313671898056

Variable AH – Absolute humidity

We check a variable with very low correlation with the resulting CO (GT) variable

x = df.AH       # independent variable
SKP = shift_Factor(x,y,R)
print('Optimal shift for AH: ',SKP)

Optimal shift for AH:  12

0.043756364102677595

Optimal shift for T:  12

-0.22446569561762522

C:ProgramDataAnaconda3envsOLD_TFlibsite-packagesipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Mean Absolute Error:     1.0011099195710456
Mean Squared Error:      1.779567238605898
Root Mean Squared Error: 1.3340042123643756

Mean Squared Error:      0.15437562015505324

cross_corr(x, y, lag=SKP)

0.043756364102677595

Absolute humidity AH does not correlate with the variable CO (GT) so we eliminate it from the model

Variable: Temperature in ° C¶

We check a variable with very low correlation with the resulting CO (GT) variable.

x = df['T']      # independent variable
PKP = shift_Factor(x,y,R)
print('Optimal shift for T: ',PKP)

Optimal shift for T:  12

-0.22446569561762522

C:ProgramDataAnaconda3envsOLD_TFlibsite-packagesipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Mean Absolute Error:     1.0011099195710456
Mean Squared Error:      1.779567238605898
Root Mean Squared Error: 1.3340042123643756

Mean Squared Error:      0.15437562015505324

cross_corr(x, y, lag=PKP)

-0.22446569561762522

We are now creating a new DataFrame with a 12-hour shift

It turns out that temperature and humidity only correlate after 12 hours from the time the CO contamination changes.
Data shift creation function.

def df_shif(df, target=None, lag=0):
    if not lag and not target:
        return df       
    new = {}
    for h in df.columns:
        if h == target:
            new[h] = df[target]
        else:
            new[h] = df[h].shift(periods=lag)
    return  pd.DataFrame(data=new)

Our goal is to create a multiple regression model:

- Independent variables are: Temperature (T) and Relative humidity RH (
- The dependent variable is the level of CO (GT)

df2 = df[['DATE', 'CO(GT)','RH', 'T']]

Adds a date and time to record temperature and humidity.

df2['weather_time'] = df2['DATE']

C:ProgramDataAnaconda3envsOLD_TFlibsite-packagesipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Mean Absolute Error:     1.0011099195710456
Mean Squared Error:      1.779567238605898
Root Mean Squared Error: 1.3340042123643756

Mean Squared Error:      0.15437562015505324

df2.head(3)

df3 = df_shif(df2, 'weather_time', lag=12)
df3.rename(columns={'weather_time':'Shift_weather_time'}, inplace=True) 
df3.head(13)

df4 = df_shif(df3, 'RH', lag=12)
df4.rename(columns={'RH':'Shift_RH'}, inplace=True)

df5 = df_shif(df4, 'T', lag=12)
df5.rename(columns={'T':'Shift_T'}, inplace=True)

Deletes rows with incomplete data.

df5 = df5.dropna(how ='any')

df5.head()

The table can be understood as meaning that a specific temperature at 6:00 gives a specific concentration of carbon monoxide at 18:00.

Graphical analysis of relationships – Humidity and temperature to carbon monoxide

It looks rather poor

import matplotlib.pyplot as plt

df5.plot(x='Shift_T', y='CO(GT)', style='o')  
plt.title('Shift_T vs CO(GT)')  
plt.xlabel('Shift_T')  
plt.ylabel('CO(GT)')  
plt.show()

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Mean Absolute Error:     1.0011099195710456
Mean Squared Error:      1.779567238605898
Root Mean Squared Error: 1.3340042123643756

Mean Squared Error:      0.15437562015505324

df5.plot(x='Shift_RH', y='CO(GT)', style='o')  
plt.title('Shift_RH vs CO(GT)')  
plt.xlabel('Shift_RH')  
plt.ylabel('CO(GT)')  
plt.show()

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Mean Absolute Error:     1.0011099195710456
Mean Squared Error:      1.779567238605898
Root Mean Squared Error: 1.3340042123643756

Mean Squared Error:      0.15437562015505324

Step 5. Building a multiple linear regression model in Sklearn

Declares X, y variables into the model.

X = df5[['Shift_RH', 'Shift_T']].values
y = df5['CO(GT)'].values

I divide the collection into training variables and test variables.

from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LinearRegression

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

I am building a regression model.

regressor = LinearRegression()  
regressor.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

import numpy as np

y_pred = regressor.predict(X_test)
y_pred = np.round(y_pred, decimals=2)

Comparison of variables from the model with real variables.

dfKK = pd.DataFrame({'CO(GT) Actual': y_test, 'CO(GT)_Predicted': y_pred})
dfKK.head(5)

from sklearn import metrics

dfKK.head(50).plot()

Mean Absolute Error:     1.0011099195710456
Mean Squared Error:      1.779567238605898
Root Mean Squared Error: 1.3340042123643756

Mean Squared Error:      0.15437562015505324

from sklearn import metrics

print('Mean Absolute Error:    ', metrics.mean_absolute_error(y_test, y_pred))  
print('Mean Squared Error:     ', metrics.mean_squared_error(y_test, y_pred))  
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

Mean Absolute Error:     1.0011099195710456
Mean Squared Error:      1.779567238605898
Root Mean Squared Error: 1.3340042123643756

Mean Squared Error:      0.15437562015505324

print('Mean Squared Error:     ', metrics.r2_score(y_test, y_pred))

Mean Squared Error:      0.15437562015505324

Carbon monoxide contamination cannot be predicted based on humidity and temperature.
In the next part, we will continue the analysis and preparation of data for linear regression.

In [1]:

import pandas as pd
import tensorflow as tf
import itertools

Source of data: https://archive.ics.uci.edu/ml/datasets/combined+cycle+power+plant

Combined Cycle Power Plant Data Set

Data Set Information:

The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the power plant was set to work with full load. Features consist of hourly average ambient variables Temperature (T), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V) to predict the net hourly electrical energy output (EP) of the plant.
A combined cycle power plant (CCPP) is composed of gas turbines (GT), steam turbines (ST) and heat recovery steam generators. In a CCPP, the electricity is generated by gas and steam turbines, which are combined in one cycle, and is transferred from one turbine to another. While the Vacuum is colected from and has effect on the Steam Turbine, he other three of the ambient variables effect the GT performance.
For comparability with our baseline studies, and to allow 5×2 fold statistical tests be carried out, we provide the data shuffled five times. For each shuffling 2-fold CV is carried out and the resulting 10 measurements are used for statistical testing.
We provide the data both in .ods and in .xlsx formats.

Attribute Information:

Features consist of hourly average ambient variables

Temperature (T) in the range 1.81°C and 37.11°C,
Ambient Pressure (AP) in the range 992.89-1033.30 milibar,
Relative Humidity (RH) in the range 25.56
Exhaust Vacuum (V) in teh range 25.36-81.56 cm Hg
Net hourly electrical energy output (EP) 420.26-495.76 MW
The averages are taken from various sensors located around the plant that record the ambient variables every second. The variables are given without normalization.

Step 1: prepare the data

df = pd.read_csv('c:/1/Folds5x2_pp.csv')
df.sample(3)

del df['Unnamed: 0']
df.columns

df.columns = ['Temperature', 'Exhaust_Vacuum', 'Ambient_Pressure', 'Relative_Humidity', 'Energy_output']
df.sample(3)

Step 2: Convert Data

We convert numeric variables in the correct Tensorflow format. Tensorflow provides a continuous variable conversion method: tf.feature_column.numeric_column ().

Separation of a column into an independent variable and a dependent variable.

FEATURES = ['Temperature', 'Exhaust_Vacuum', 'Ambient_Pressure', 'Relative_Humidity']
LABEL = 'Energy_output'

Ewa = [tf.feature_column.numeric_column(k) for k in FEATURES]
Ewa

Step 3: Defining the estimator

Tensorflow will automatically create a file called „train2” in your working directory. You must use this path to access Tensorboard. The estimator applies to independent variables.

estimator = tf.estimator.LinearRegressor(    
        feature_columns=Ewa,   
        model_dir="train2")

To instruct Tensorflow how to feed the model, you can use pandas_input_fn. This object needs 5 parameters:
x: function data y: label data batch_size: batch. Default 128 num_epoch: by default number of epochs 1 random: Random or not data. Default None

def get_input_fn(data_set, num_epochs=None, n_batch = 128, shuffle=True):    
         return tf.estimator.inputs.pandas_input_fn(       
         x=pd.DataFrame({k: data_set[k].values for k in FEATURES}),       
         y = pd.Series(data_set[LABEL].values),       
         batch_size=n_batch,          
         num_epochs=num_epochs,       
         shuffle=shuffle)

Step 4: Model training

- To feed the model you can use the function created above: get_input_fn.
- Then you instruct the model to iterate 1000 times.
- Remember that you do not specify the number of epochs (num_epochs).
- It is better to set the number of epochs to none and define the number of iterations.

To test the model, we must divide the data set into a test set and a training set.

df_train=df.sample(frac=0.8,random_state=200)
df_test=df.drop(df_train.index)
print(df_train.shape, df_test.shape)

estimator.train(input_fn=get_input_fn(df_train,                                       
                                           num_epochs=None,                                      
                                           n_batch = 356,                                      
                                           shuffle=False),                                      
                                           steps=1000)

We check the CMD TensorBoard command console.

tensorboard –logdir=.trainlinreg

Tensorboard is located in this URL: http://localhost:6006

It could also be located at the following location.

http://wo_moszczynski:6006

Step 5. Model assessment

To enter a test set, use the following code:

ev = estimator.evaluate(    
          input_fn=get_input_fn(df_test,                          
          num_epochs=1,                          
          n_batch = 356,                          
          shuffle=False))

Print the loss using by the code below:

average_loss = ev["average_loss"]
print("average_loss: ",format(average_loss))

loss_score = ev["loss"]
print("Loss: {0:f}".format(loss_score))

The model has a average loss of 26. You can check the summary statistics to find out how big the error is.

df_test['Energy_output'].describe()

PKP=(average_loss/ df_test['Energy_output'].mean())*100
print('Average error in relation to the average value: ',PKP)

Step 6. Making a forecast

Making a forecast is based on the fact that we have a model and we have a set of independent variables. Now we substitute the independent variables into the model and get the result. We will create 4 random variables and make a forecast for these records.

We create a sample of 4 records without output variables.

import numpy as np

sample4 =df.sample(4)
result = sample4['Energy_output'].copy() ## <= to have a comparison later
sample4['Energy_output']=np.nan
sample4

y = estimator.predict(    
         input_fn=get_input_fn(sample4,                          
         num_epochs=1,                          
         n_batch = 256,                          
         shuffle=False))

predictions = list(p["predictions"] for p in itertools.islice(y, 4))
print("Predictions: {}".format(str(predictions)))

predictions

I’m converting array to dataframe

conc = np.vstack(predictions)
conc

newdf = pd.DataFrame(conc)
newdf

result

	y	y_pred
0	264.0	319.324890
1	651.0	437.016418
2	572.0	476.246918
3	471.0	495.862152
4	282.0	326.493286

	y	y_pred	SSE
0	264.0	319.324890	3060.843506
1	651.0	437.016418	45788.972656
2	572.0	476.246918	9168.652344

	y	y_pred	SSE	ave_y
0	264.0	319.324890	3060.843506	463.973297
1	651.0	437.016418	45788.972656	463.973297
2	572.0	476.246918	9168.652344	463.973297

	y	y_pred	SSE	ave_y	SST
0	264.0	319.324890	3060.843506	463.973297	39989.320312
1	651.0	437.016418	45788.972656	463.973297	34978.988281
2	572.0	476.246918	9168.652344	463.973297	11669.768555

	CO(GT)	PT08.S1(CO)	C6H6(GT)	PT08.S2(NMHC)	NOx(GT)	PT08.S3(NOx)	NO2(GT)	PT08.S4(NO2)	PT08.S5(O3)	T	RH	AH	Month	Weekday	Hours
0	2.6	1360.0	11.9	1046.0	166.0	1056.0	113.0	1692.0	1268.0	13.6	48.9	0.7578	3	2	18
1	2.0	1292.0	9.4	955.0	103.0	1174.0	92.0	1559.0	972.0	13.3	47.7	0.7255	3	2	19
2	2.2	1402.0	9.0	939.0	131.0	1140.0	114.0	1555.0	1074.0	11.9	54.0	0.7502	3	2	20

	SystemCodeNumber	Capacity	Occupancy	LastUpdated
0	BHMBCCMKT01	577	61	2016-10-04 07:59:42
1	BHMBCCMKT01	577	64	2016-10-04 08:25:42
2	BHMBCCMKT01	577	80	2016-10-04 08:59:42

	SystemCodeNumber	Capacity	Occupancy	LastUpdated	month	hour	weekday_name	weekday
0	BHMBCCMKT01	577	61	2016-10-04 07:59:42	10	7	Tuesday	1
1	BHMBCCMKT01	577	64	2016-10-04 08:25:42	10	8	Tuesday	1
2	BHMBCCMKT01	577	80	2016-10-04 08:59:42	10	8	Tuesday	1
3	BHMBCCMKT01	577	107	2016-10-04 09:32:46	10	9	Tuesday	1

	y	y_pred
0	2.2	2.290434
1	1.2	1.419513
2	1.0	0.991711
3	1.5	1.413460
4	1.6	1.471673

	y	y_pred	SSE
0	2.2	2.290434	0.008178
1	1.2	1.419513	0.048186
2	1.0	0.991711	0.000069

	y	y_pred	SSE	ave_y
0	2.2	2.290434	0.008178	2.061304
1	1.2	1.419513	0.048186	2.061304
2	1.0	0.991711	0.000069	2.061304

	Date	Time	CO(GT)	PT08.S1(CO)	NMHC(GT)	C6H6(GT)	PT08.S2(NMHC)	NOx(GT)	PT08.S3(NOx)	NO2(GT)	PT08.S4(NO2)	PT08.S5(O3)	T	RH	AH	Unnamed: 15	Unnamed: 16
0	10/03/2004	18.00.00	2,6	1360.0	150.0	11,9	1046.0	166.0	1056.0	113.0	1692.0	1268.0	13,6	48,9	0,7578	NaN	NaN
1	10/03/2004	19.00.00	2	1292.0	112.0	9,4	955.0	103.0	1174.0	92.0	1559.0	972.0	13,3	47,7	0,7255	NaN	NaN
2	10/03/2004	20.00.00	2,2	1402.0	88.0	9,0	939.0	131.0	1140.0	114.0	1555.0	1074.0	11,9	54,0	0,7502	NaN	NaN

	PT08.S1(CO)	NMHC(GT)	PT08.S2(NMHC)	NOx(GT)	PT08.S3(NOx)	NO2(GT)	PT08.S4(NO2)	PT08.S5(O3)	Unnamed: 15	Unnamed: 16
min	-200.000000	-200.000000	-200.000000	-200.000000	-200.000000	-200.000000	-200.000000	-200.000000	NaN	NaN
max	2040.000000	1189.000000	2214.000000	1479.000000	2683.000000	340.000000	2775.000000	2523.000000	NaN	NaN
mean	1048.990061	-159.090093	894.595276	168.616971	794.990168	58.148873	1391.479641	975.072032	NaN	NaN
median	1053.000000	-200.000000	895.000000	141.000000	794.000000	96.000000	1446.000000	942.000000	NaN	NaN

	Date	Time	CO(GT)	PT08.S1(CO)	NMHC(GT)	C6H6(GT)	PT08.S2(NMHC)	NOx(GT)	PT08.S3(NOx)	NO2(GT)	PT08.S4(NO2)	PT08.S5(O3)	T	RH	AH
9357	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9358	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9359	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9360	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9361	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…
9466	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9467	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9468	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9469	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9470	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	Date	Time	CO(GT)	PT08.S1(CO)	C6H6(GT)	PT08.S2(NMHC)	NOx(GT)	PT08.S3(NOx)	NO2(GT)	PT08.S4(NO2)	PT08.S5(O3)	T	RH	AH
9	11/03/2004	03.00.00	0,6	1010.0	1,7	561.0	NaN	1705.0	NaN	1235.0	501.0	10,3	60,2	0,7517
10	11/03/2004	04.00.00	NaN	1011.0	1,3	527.0	21.0	1818.0	34.0	1197.0	445.0	10,1	60,5	0,7465
33	12/03/2004	03.00.00	0,8	889.0	1,9	574.0	NaN	1680.0	NaN	1187.0	512.0	7,0	62,3	0,6261
34	12/03/2004	04.00.00	NaN	831.0	1,1	506.0	21.0	1893.0	32.0	1134.0	384.0	6,1	65,9	0,6248
39	12/03/2004	09.00.00	NaN	1545.0	22,1	1353.0	NaN	767.0	NaN	2058.0	1588.0	9,2	56,2	0,6561
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…
9058	23/03/2005	04.00.00	NaN	993.0	2,3	604.0	85.0	848.0	65.0	1160.0	762.0	14,5	66,4	1,0919
9130	26/03/2005	04.00.00	NaN	1122.0	6,0	811.0	181.0	641.0	92.0	1336.0	1122.0	16,2	71,2	1,3013
9202	29/03/2005	04.00.00	NaN	883.0	1,3	530.0	63.0	997.0	46.0	1102.0	617.0	13,7	68,2	1,0611
9274	01/04/2005	04.00.00	NaN	818.0	0,8	473.0	47.0	1257.0	41.0	898.0	323.0	13,7	48,8	0,7606
9346	04/04/2005	04.00.00	NaN	864.0	0,8	478.0	52.0	1116.0	43.0	958.0	489.0	11,8	56,0	0,7743

	CO(GT)	PT08.S1(CO)	C6H6(GT)	PT08.S2(NMHC)	NOx(GT)	PT08.S3(NOx)	NO2(GT)	PT08.S4(NO2)	PT08.S5(O3)	T	RH	AH
CO(GT)	1.000000	0.879288	0.931078	0.915514	0.795028	-0.703446	0.683343	0.630703	0.854182	0.022109	0.048890	0.048556
PT08.S1(CO)	0.879288	1.000000	0.883795	0.892964	0.713654	-0.771938	0.641529	0.682881	0.899324	0.048627	0.114606	0.135324
C6H6(GT)	0.931078	0.883795	1.000000	0.981950	0.718839	-0.735744	0.614474	0.765731	0.865689	0.198956	-0.061681	0.167972
PT08.S2(NMHC)	0.915514	0.892964	0.981950	1.000000	0.704435	-0.796703	0.646245	0.777254	0.880578	0.241373	-0.090380	0.186933
NOx(GT)	0.795028	0.713654	0.718839	0.704435	1.000000	-0.655707	0.763111	0.233731	0.787046	-0.269683	0.221032	-0.149323
PT08.S3(NOx)	-0.703446	-0.771938	-0.735744	-0.796703	-0.655707	1.000000	-0.652083	-0.538468	-0.796569	-0.145112	-0.056740	-0.232017
NO2(GT)	0.683343	0.641529	0.614474	0.646245	0.763111	-0.652083	1.000000	0.157360	0.708128	-0.186533	-0.091759	-0.335022
PT08.S4(NO2)	0.630703	0.682881	0.765731	0.777254	0.233731	-0.538468	0.157360	1.000000	0.591144	0.561270	-0.032188	0.629641
PT08.S5(O3)	0.854182	0.899324	0.865689	0.880578	0.787046	-0.796569	0.708128	0.591144	1.000000	-0.027172	0.124956	0.070751
T	0.022109	0.048627	0.198956	0.241373	-0.269683	-0.145112	-0.186533	0.561270	-0.027172	1.000000	-0.578621	0.656397
RH	0.048890	0.114606	-0.061681	-0.090380	0.221032	-0.056740	-0.091759	-0.032188	0.124956	-0.578621	1.000000	0.167971
AH	0.048556	0.135324	0.167972	0.186933	-0.149323	-0.232017	-0.335022	0.629641	0.070751	0.656397	0.167971	1.000000

	DATE	Month	Weekday	Weekday_name	Hours
6109	2004-11-20 07:00:00	11	5	Saturday	7
3537	2004-08-05 03:00:00	8	3	Thursday	3
8053	2005-02-09 07:00:00	2	2	Wednesday	7

	CO(GT)
PT08.S3(NOx)	-0.715683
Weekday	-0.140231
RH	0.020122
AH	0.025227
T	0.025639
Month	0.112291
Hours	0.344071
PT08.S4(NO2)	0.631854
NO2(GT)	0.682774
NOx(GT)	0.773677
PT08.S5(O3)	0.858762
PT08.S1(CO)	0.886114
PT08.S2(NMHC)	0.918386
C6H6(GT)	0.932584
CO(GT)	1.000000

	DATE	CO(GT)	RH	T	weather_time
0	2004-03-10 18:00:00	2.6	48.9	13.6	2004-03-10 18:00:00
1	2004-03-10 19:00:00	2.0	47.7	13.3	2004-03-10 19:00:00
2	2004-03-10 20:00:00	2.2	54.0	11.9	2004-03-10 20:00:00

	DATE	CO(GT)	RH	T	Shift_weather_time
0	NaT	NaN	NaN	NaN	2004-03-10 18:00:00
1	NaT	NaN	NaN	NaN	2004-03-10 19:00:00
2	NaT	NaN	NaN	NaN	2004-03-10 20:00:00
3	NaT	NaN	NaN	NaN	2004-03-10 21:00:00
4	NaT	NaN	NaN	NaN	2004-03-10 22:00:00
5	NaT	NaN	NaN	NaN	2004-03-10 23:00:00
6	NaT	NaN	NaN	NaN	2004-03-11 00:00:00
7	NaT	NaN	NaN	NaN	2004-03-11 01:00:00
8	NaT	NaN	NaN	NaN	2004-03-11 02:00:00
9	NaT	NaN	NaN	NaN	2004-03-11 03:00:00
10	NaT	NaN	NaN	NaN	2004-03-11 04:00:00
11	NaT	NaN	NaN	NaN	2004-03-11 05:00:00
12	2004-03-10 18:00:00	2.6	48.9	13.6	2004-03-11 06:00:00

	DATE	CO(GT)	Shift_RH	Shift_T	Shift_weather_time
36	2004-03-10 18:00:00	2.6	58.1	10.5	2004-03-11 06:00:00
37	2004-03-10 19:00:00	2.0	59.6	10.2	2004-03-11 07:00:00
38	2004-03-10 20:00:00	2.2	57.4	10.8	2004-03-11 08:00:00
39	2004-03-10 21:00:00	2.2	60.6	10.5	2004-03-11 09:00:00
40	2004-03-10 22:00:00	1.6	58.4	10.8	2004-03-11 10:00:00

TensorFlow - THE DATA SCIENCE LIBRARY

Exercises on mathematical operations in TensorFlow 1.4

Exercise 1: Perform the following equation on the tensor.

Exercise 2: Perform the following equation on the tensor

Exercise 3: Mathematical operations on tensors

Exercise 4: Mathematical operations on tensors

Exercise 5: Perform the following equation on the tensor

Exercise 5: Perform the following equation on the tensor

Exercise 6: Mathematical operations on tensors

Exercise 7: actions on Tensorflow tensors

Exercise 8: actions on Tensorflow tensors

Exercise 9. Change of the tensor type from float to int

Exercises on matrix operations in TensorFlow 1.4

Exercises on matrix operations in TensorFlow 1.4

Exercise 1. Please define a tensor: constant = vector [1, 2, 3] in format int16¶

Exercise 2. Please define a tensor: constant = matrix [1, 2, 3, 4] in format int16¶

Exercise 3. Please define a tensor: constant = matrix [1, 2, 3, 4, 5, 6] in format int16¶

Exercise 4. Create a tensor in the form of a vector of a specific shape 5, filled with zeros¶

Exercise 5. Create a tensor in the form of a matrix with a specific 4×4 shape, filled with only ones¶

Exercise 6. Matrix adding¶

Exercise 7. Matrix subtraction¶

$ begin{eqnarray}¶

Exercise 8. Multiply the matrix by a number¶

$ begin{eqnarray}¶

Exercise 9. Multiply the matrix by a number TensorFlow¶

$ begin{eqnarray}¶

Exercise 10. Multiply the matrix by a matrix TensorFlow¶

$ begin{eqnarray}¶

Exercise 11. Multiply the matrix by a matrix TensorFlow¶

Exercise 11. Multiplication of C⋅D square matrices¶

Exercise 11.Transpose the matrix¶

Exercise 12. Determinant of the matrix¶

Exercise 13. Diagonal matrix¶

Exercise 14. Outputs random values from a truncated normal distribution¶

Exercise 15. Creates a tensor filled with a scalar value.¶

Exercise 16. Outputs random values from a uniform distribution.¶

Exercise 17. Transforming the matrix into a Tensorflow matrix¶

Zabezpieczone: TENSORFLOW LINE CLASSIFICATOR (base example)

Tensorflow – Calculation of R square for linear regression

Step 1: Convert Data

Step 2: Defining the estimator

Step 3: Model training

Step 4. Model evaluation

Step 5. Calculation of R Square

Calculation of R Square parameter using Tensorflow

I make a prediction on a test set

Calculation of R Square parameter using Pandas

Point 2. We calculate the average empirical value of y¶

Point 3. We calculate the difference between empirical values y and the average of empirical values y¶

Point 4. We calculate the difference between sum of SST and sum of SSE

Point 5. We calculate the R Square parameter

Tutorial: Linear Regression – Tensorflow, calculation of R Square (#4/281120191525)

Step 1: Convert Data

Step 2: Defining the estimator

Step 3: Model training

Step 4. Model evaluation

Calculation of R Square parameter using Tensorflow

I make a prediction on a test set

Calculation of R Square parameter using Pandas

Point 2. We calculate the average empirical value of y

Point 3. We calculate the difference between empirical values y and the average of empirical values y

Point 4. We calculate the difference between sum of SST and sum of SSE

Point 5. We calculate the R Square parameter

Tutorial: Linear Regression – preliminary data preparation (#1/271120191024)

Part 1. Preliminary data preparation

AirQualityUCI

Data Set Information:

Attribute Information:

Step 1: Preliminary analysis of data gaps

Step 2: Check the level of direct correlation to complete the data

Step 3. Filling the gaps in variables based on other variables correlated with it

Filling gaps in the variable 'C6H6 (GT)’

Filling gaps in the variable 'NOx(GT)’

Filling gaps in the variable 'C6H6 (GT)’

I complete the records where the entire measuring device did not work.

Tutorial: Linear Regression – Time variables and shifts. Use of offset in variable correlation (#2/271120191334)

Part 2. Simple multifactorial linear regression

Step 1. Launching the time variable

Step 2. We add more columns based on the time variable

Graphical analysis of pollution according to time variables

Variable RH – Relative humidity (
We check a variable with very low correlation with the resulting CO (GT) variable