We have replaced the engine in our tank with a more universal one

https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset

import torch

I’m starting a GPU graphics card (which I don’t have)

Odpalam karte graficzną GPU (której nie mam)

device = torch.device('cpu') # obliczenia robie na CPU
#device = torch.device('cuda') # obliczenia robie na GPU

Output variables (3):¶

SLUMP (cm)
FLOW (cm)
28-day Compressive Strength (Mpa)

import pandas as pd

df = pd.read_csv('/home/wojciech/Pulpit/3/BikeSharing.csv')
print(df.shape)
df.head(3)

(17379, 17)

Text(0.5, 1, 'Macierz korelacji ze zmienną wynikową y')

Text(0, 0.5, 'Zmienne nezależne ciągłe')

NUMBER OF EMPTY RECORDS vs. FULL RECORDS
----------------------------------------

<matplotlib.axes._subplots.AxesSubplot at 0x7fb3439c2650>

instant       0
dteday        0
season        0
yr            0
mnth          0
hr            0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64

instant         int64
dteday         object
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
dtype: object

instant         int64
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
month           int64
weekofyear      int64
dtype: object

DISCRETE FUNCTIONS CODED
------------------------

instant         int64
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
month           int64
weekofyear      int64
dtype: object

cnt: count of total rental bikes including both casual and registered

I fill all holes with values out of range

Wypełniam wszystkie dziury wartościami z poza zakresu

import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(10,6))
CORREL =df.corr()
sns.heatmap(CORREL, annot=True, cbar=False, cmap="coolwarm")
plt.title('Macierz korelacji ze zmienną wynikową y', fontsize=20)

Text(0.5, 1, 'Macierz korelacji ze zmienną wynikową y')

import matplotlib.pyplot as plt

plt.figure(figsize=(10,6))
CORREL['cnt'].plot(kind='barh', color='red')
plt.title('Korelacja ze zmienną wynikową', fontsize=20)
plt.xlabel('Poziom korelacji')
plt.ylabel('Zmienne nezależne ciągłe')

Text(0, 0.5, 'Zmienne nezależne ciągłe')

NUMBER OF EMPTY RECORDS vs. FULL RECORDS
----------------------------------------

<matplotlib.axes._subplots.AxesSubplot at 0x7fb3439c2650>

instant       0
dteday        0
season        0
yr            0
mnth          0
hr            0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64

instant         int64
dteday         object
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
dtype: object

instant         int64
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
month           int64
weekofyear      int64
dtype: object

DISCRETE FUNCTIONS CODED
------------------------

instant         int64
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
month           int64
weekofyear      int64
dtype: object

Index(['instant', 'season', 'yr', 'mnth', 'hr', 'holiday', 'weekday',
       'workingday', 'weathersit', 'temp', 'atemp', 'hum', 'windspeed',
       'casual', 'registered', 'cnt', 'month', 'weekofyear'],
      dtype='object')

1.0 -0.0

Zmienne: ‘registered’,’casual’ są to też wyniki tylko inazej pokazane dlatego trzeba je usunąć z danych

a,b = df.shape     #<- ile mamy kolumn
b

print('NUMBER OF EMPTY RECORDS vs. FULL RECORDS')
print('----------------------------------------')
for i in range(1,b):
    i = df.columns[i]
    r = df[i].isnull().sum()
    h = df[i].count()
    pr = (r/h)*100
   
    if r > 0:
        print(i,"--------",r,"--------",h,"--------",pr)

NUMBER OF EMPTY RECORDS vs. FULL RECORDS
----------------------------------------

<matplotlib.axes._subplots.AxesSubplot at 0x7fb3439c2650>

instant       0
dteday        0
season        0
yr            0
mnth          0
hr            0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64

instant         int64
dteday         object
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
dtype: object

instant         int64
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
month           int64
weekofyear      int64
dtype: object

DISCRETE FUNCTIONS CODED
------------------------

instant         int64
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
month           int64
weekofyear      int64
dtype: object

Index(['instant', 'season', 'yr', 'mnth', 'hr', 'holiday', 'weekday',
       'workingday', 'weathersit', 'temp', 'atemp', 'hum', 'windspeed',
       'casual', 'registered', 'cnt', 'month', 'weekofyear'],
      dtype='object')

1.0 -0.0

5      260
6      236
4      231
3      224
2      208
      ... 
725      1
709      1
661      1
629      1
887      1
Name: cnt, Length: 869, dtype: int64

import seaborn as sns

sns.heatmap(df.isnull(),yticklabels=False,cbar=False,cmap='viridis')

<matplotlib.axes._subplots.AxesSubplot at 0x7fb3439c2650>

instant       0
dteday        0
season        0
yr            0
mnth          0
hr            0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64

instant         int64
dteday         object
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
dtype: object

instant         int64
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
month           int64
weekofyear      int64
dtype: object

DISCRETE FUNCTIONS CODED
------------------------

instant         int64
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
month           int64
weekofyear      int64
dtype: object

Index(['instant', 'season', 'yr', 'mnth', 'hr', 'holiday', 'weekday',
       'workingday', 'weathersit', 'temp', 'atemp', 'hum', 'windspeed',
       'casual', 'registered', 'cnt', 'month', 'weekofyear'],
      dtype='object')

1.0 -0.0

5      260
6      236
4      231
3      224
2      208
      ... 
725      1
709      1
661      1
629      1
887      1
Name: cnt, Length: 869, dtype: int64

1.81 1.89

#del df['Unnamed: 15']
#del df['Unnamed: 16']

df = df.dropna(how='any') # jednak je kasuje te dziury

# df.fillna(-777, inplace=True)
df.isnull().sum()

instant       0
dteday        0
season        0
yr            0
mnth          0
hr            0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64

print(df.dtypes)
df.head(3)

instant         int64
dteday         object
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
dtype: object

instant         int64
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
month           int64
weekofyear      int64
dtype: object

DISCRETE FUNCTIONS CODED
------------------------

instant         int64
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
month           int64
weekofyear      int64
dtype: object

Index(['instant', 'season', 'yr', 'mnth', 'hr', 'holiday', 'weekday',
       'workingday', 'weathersit', 'temp', 'atemp', 'hum', 'windspeed',
       'casual', 'registered', 'cnt', 'month', 'weekofyear'],
      dtype='object')

1.0 -0.0

5      260
6      236
4      231
3      224
2      208
      ... 
725      1
709      1
661      1
629      1
887      1
Name: cnt, Length: 869, dtype: int64

1.81 1.89

tensor([[-1.7320, -1.3566, -1.0051, -1.6104, -1.6700, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.3346, -1.0933,  0.9474, -1.5539, -1.6104,  1.6913],
        [-1.7318, -1.3566, -1.0051, -1.6104, -1.5254, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913],
        [-1.7316, -1.3566, -1.0051, -1.6104, -1.3807, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913]],
       dtype=torch.float64)

tensor([[-1.7320, -1.3566, -1.0051, -1.6104, -1.6700, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.3346, -1.0933,  0.9474, -1.5539, -1.6104,  1.6913],
        [-1.7318, -1.3566, -1.0051, -1.6104, -1.5254, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913],
        [-1.7316, -1.3566, -1.0051, -1.6104, -1.3807, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913]])

to_datetime¶

df['dteday'] =  pd.to_datetime(df['dteday'])
df['weekday'] = df.dteday.dt.weekday
df['month'] =df.dteday.dt.month
df['weekofyear'] =df.dteday.dt.weekofyear

del df['dteday']

print(df.dtypes)
df.head(3)

instant         int64
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
month           int64
weekofyear      int64
dtype: object

DISCRETE FUNCTIONS CODED
------------------------

instant         int64
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
month           int64
weekofyear      int64
dtype: object

Index(['instant', 'season', 'yr', 'mnth', 'hr', 'holiday', 'weekday',
       'workingday', 'weathersit', 'temp', 'atemp', 'hum', 'windspeed',
       'casual', 'registered', 'cnt', 'month', 'weekofyear'],
      dtype='object')

1.0 -0.0

5      260
6      236
4      231
3      224
2      208
      ... 
725      1
709      1
661      1
629      1
887      1
Name: cnt, Length: 869, dtype: int64

1.81 1.89

tensor([[-1.7320, -1.3566, -1.0051, -1.6104, -1.6700, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.3346, -1.0933,  0.9474, -1.5539, -1.6104,  1.6913],
        [-1.7318, -1.3566, -1.0051, -1.6104, -1.5254, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913],
        [-1.7316, -1.3566, -1.0051, -1.6104, -1.3807, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913]],
       dtype=torch.float64)

tensor([[-1.7320, -1.3566, -1.0051, -1.6104, -1.6700, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.3346, -1.0933,  0.9474, -1.5539, -1.6104,  1.6913],
        [-1.7318, -1.3566, -1.0051, -1.6104, -1.5254, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913],
        [-1.7316, -1.3566, -1.0051, -1.6104, -1.3807, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913]])

tensor([0.1600, 0.4000, 0.3200], dtype=torch.float64)

Encodes text values¶

Koduje wartości tekstowe

import numpy as np

a,b = df.shape     #<- ile mamy kolumn
b

print('DISCRETE FUNCTIONS CODED')
print('------------------------')
for i in range(1,b):
    i = df.columns[i]
    f = df[i].dtypes
    if f == np.object:
        print(i,"---",f)   
    
        if f == np.object:
        
            df[i] = pd.Categorical(df[i]).codes
        
            continue

DISCRETE FUNCTIONS CODED
------------------------

instant         int64
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
month           int64
weekofyear      int64
dtype: object

Index(['instant', 'season', 'yr', 'mnth', 'hr', 'holiday', 'weekday',
       'workingday', 'weathersit', 'temp', 'atemp', 'hum', 'windspeed',
       'casual', 'registered', 'cnt', 'month', 'weekofyear'],
      dtype='object')

1.0 -0.0

5      260
6      236
4      231
3      224
2      208
      ... 
725      1
709      1
661      1
629      1
887      1
Name: cnt, Length: 869, dtype: int64

1.81 1.89

tensor([[-1.7320, -1.3566, -1.0051, -1.6104, -1.6700, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.3346, -1.0933,  0.9474, -1.5539, -1.6104,  1.6913],
        [-1.7318, -1.3566, -1.0051, -1.6104, -1.5254, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913],
        [-1.7316, -1.3566, -1.0051, -1.6104, -1.3807, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913]],
       dtype=torch.float64)

tensor([[-1.7320, -1.3566, -1.0051, -1.6104, -1.6700, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.3346, -1.0933,  0.9474, -1.5539, -1.6104,  1.6913],
        [-1.7318, -1.3566, -1.0051, -1.6104, -1.5254, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913],
        [-1.7316, -1.3566, -1.0051, -1.6104, -1.3807, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913]])

tensor([0.1600, 0.4000, 0.3200], dtype=torch.float64)

X: torch.Size([17379, 15])
y: torch.Size([17379])

df.dtypes

instant         int64
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
month           int64
weekofyear      int64
dtype: object

df.columns

Index(['instant', 'season', 'yr', 'mnth', 'hr', 'holiday', 'weekday',
       'workingday', 'weathersit', 'temp', 'atemp', 'hum', 'windspeed',
       'casual', 'registered', 'cnt', 'month', 'weekofyear'],
      dtype='object')

I specify what is X and what is y¶

Określam co jest X a co y

X = df.drop(['cnt','registered','casual'],1)
y = df['cnt']

Scaling (normalization) of the X value¶

X should never be too big. Ideally, it should be in the range [-1, 1]. If this is not the case, normalize the input.

Skalowanie (normalizacja) wartości X

X nigdy nie powinien być zbyt duży. Idealnie powinien być w zakresie [-1, 1]. Jeśli tak nie jest, należy znormalizować dane wejściowe.

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X = sc.fit_transform(X)

print(np.round(X.std(), decimals=2), np.round(X.mean(), decimals=2))

1.0 -0.0

5      260
6      236
4      231
3      224
2      208
      ... 
725      1
709      1
661      1
629      1
887      1
Name: cnt, Length: 869, dtype: int64

1.81 1.89

tensor([[-1.7320, -1.3566, -1.0051, -1.6104, -1.6700, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.3346, -1.0933,  0.9474, -1.5539, -1.6104,  1.6913],
        [-1.7318, -1.3566, -1.0051, -1.6104, -1.5254, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913],
        [-1.7316, -1.3566, -1.0051, -1.6104, -1.3807, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913]],
       dtype=torch.float64)

tensor([[-1.7320, -1.3566, -1.0051, -1.6104, -1.6700, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.3346, -1.0933,  0.9474, -1.5539, -1.6104,  1.6913],
        [-1.7318, -1.3566, -1.0051, -1.6104, -1.5254, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913],
        [-1.7316, -1.3566, -1.0051, -1.6104, -1.3807, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913]])

tensor([0.1600, 0.4000, 0.3200], dtype=torch.float64)

X: torch.Size([17379, 15])
y: torch.Size([17379])

torch.Size([17379, 1])

X_train:  torch.Size([13904, 15])
X_test:   torch.Size([3475, 15])
----------------------------------------------------
y_train:  torch.Size([13904, 1])
y_test:   torch.Size([3475, 1])

tensor([[-0.0874],
        [-0.0926],
        [-0.0942],
        ...,
        [ 0.0814],
        [ 0.0851],
        [ 0.1024]], grad_fn=<AddmmBackward>)

y.value_counts()

5      260
6      236
4      231
3      224
2      208
      ... 
725      1
709      1
661      1
629      1
887      1
Name: cnt, Length: 869, dtype: int64

y = (y / 100)  # max test score is 100
#print(y.head(3))
print(np.round(y.std(), decimals=2), np.round(y.mean(), decimals=2))

1.81 1.89

tensor([[-1.7320, -1.3566, -1.0051, -1.6104, -1.6700, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.3346, -1.0933,  0.9474, -1.5539, -1.6104,  1.6913],
        [-1.7318, -1.3566, -1.0051, -1.6104, -1.5254, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913],
        [-1.7316, -1.3566, -1.0051, -1.6104, -1.3807, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913]],
       dtype=torch.float64)

tensor([[-1.7320, -1.3566, -1.0051, -1.6104, -1.6700, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.3346, -1.0933,  0.9474, -1.5539, -1.6104,  1.6913],
        [-1.7318, -1.3566, -1.0051, -1.6104, -1.5254, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913],
        [-1.7316, -1.3566, -1.0051, -1.6104, -1.3807, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913]])

tensor([0.1600, 0.4000, 0.3200], dtype=torch.float64)

X: torch.Size([17379, 15])
y: torch.Size([17379])

torch.Size([17379, 1])

X_train:  torch.Size([13904, 15])
X_test:   torch.Size([3475, 15])
----------------------------------------------------
y_train:  torch.Size([13904, 1])
y_test:   torch.Size([3475, 1])

tensor([[-0.0874],
        [-0.0926],
        [-0.0942],
        ...,
        [ 0.0814],
        [ 0.0851],
        [ 0.1024]], grad_fn=<AddmmBackward>)

0 5.711716175079346
200 0.14712589979171753
400 0.10049592703580856
600 0.08884098380804062
800 0.08310769498348236
1000 0.07949385792016983
1200 0.07695646584033966
1400 0.07494604587554932
1600 0.073524110019207
1800 0.07221145927906036

Loss train_set: 3875.08569336

Creates random input and output¶

Tworzy losowe dane wejściowe i wyjściowe

import numpy as np

#X = X.values       #- jak była normalizacja to to nie działa
X = torch.tensor(X)
print(X[:3])

tensor([[-1.7320, -1.3566, -1.0051, -1.6104, -1.6700, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.3346, -1.0933,  0.9474, -1.5539, -1.6104,  1.6913],
        [-1.7318, -1.3566, -1.0051, -1.6104, -1.5254, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913],
        [-1.7316, -1.3566, -1.0051, -1.6104, -1.3807, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913]],
       dtype=torch.float64)

tensor([[-1.7320, -1.3566, -1.0051, -1.6104, -1.6700, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.3346, -1.0933,  0.9474, -1.5539, -1.6104,  1.6913],
        [-1.7318, -1.3566, -1.0051, -1.6104, -1.5254, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913],
        [-1.7316, -1.3566, -1.0051, -1.6104, -1.3807, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913]])

tensor([0.1600, 0.4000, 0.3200], dtype=torch.float64)

X: torch.Size([17379, 15])
y: torch.Size([17379])

torch.Size([17379, 1])

X_train:  torch.Size([13904, 15])
X_test:   torch.Size([3475, 15])
----------------------------------------------------
y_train:  torch.Size([13904, 1])
y_test:   torch.Size([3475, 1])

tensor([[-0.0874],
        [-0.0926],
        [-0.0942],
        ...,
        [ 0.0814],
        [ 0.0851],
        [ 0.1024]], grad_fn=<AddmmBackward>)

0 5.711716175079346
200 0.14712589979171753
400 0.10049592703580856
600 0.08884098380804062
800 0.08310769498348236
1000 0.07949385792016983
1200 0.07695646584033966
1400 0.07494604587554932
1600 0.073524110019207
1800 0.07221145927906036

Loss train_set: 3875.08569336

tensor([[2.9172],
        [3.5864],
        [2.9354],
        [4.5793],
        [7.5099]])

X = X.type(torch.FloatTensor)
print(X[:3])

tensor([[-1.7320, -1.3566, -1.0051, -1.6104, -1.6700, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.3346, -1.0933,  0.9474, -1.5539, -1.6104,  1.6913],
        [-1.7318, -1.3566, -1.0051, -1.6104, -1.5254, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913],
        [-1.7316, -1.3566, -1.0051, -1.6104, -1.3807, -0.1721,  0.9933, -1.4669,
         -0.6652, -1.4385, -1.1817,  0.8955, -1.5539, -1.6104,  1.6913]])

tensor([0.1600, 0.4000, 0.3200], dtype=torch.float64)

X: torch.Size([17379, 15])
y: torch.Size([17379])

torch.Size([17379, 1])

X_train:  torch.Size([13904, 15])
X_test:   torch.Size([3475, 15])
----------------------------------------------------
y_train:  torch.Size([13904, 1])
y_test:   torch.Size([3475, 1])

tensor([[-0.0874],
        [-0.0926],
        [-0.0942],
        ...,
        [ 0.0814],
        [ 0.0851],
        [ 0.1024]], grad_fn=<AddmmBackward>)

0 5.711716175079346
200 0.14712589979171753
400 0.10049592703580856
600 0.08884098380804062
800 0.08310769498348236
1000 0.07949385792016983
1200 0.07695646584033966
1400 0.07494604587554932
1600 0.073524110019207
1800 0.07221145927906036

Loss train_set: 3875.08569336

tensor([[2.9172],
        [3.5864],
        [2.9354],
        [4.5793],
        [7.5099]])

Sequential(
  (0): Linear(in_features=15, out_features=100, bias=True)
  (1): LeakyReLU(negative_slope=0.01)
  (2): Linear(in_features=100, out_features=100, bias=True)
  (3): LeakyReLU(negative_slope=0.01)
  (4): Linear(in_features=100, out_features=1, bias=True)
)

y = y.values   # tworzymy macierz numpy - jak była normalizacja to to nie działa

y = torch.tensor(y)
print(y[:3])

tensor([0.1600, 0.4000, 0.3200], dtype=torch.float64)

X: torch.Size([17379, 15])
y: torch.Size([17379])

torch.Size([17379, 1])

X_train:  torch.Size([13904, 15])
X_test:   torch.Size([3475, 15])
----------------------------------------------------
y_train:  torch.Size([13904, 1])
y_test:   torch.Size([3475, 1])

tensor([[-0.0874],
        [-0.0926],
        [-0.0942],
        ...,
        [ 0.0814],
        [ 0.0851],
        [ 0.1024]], grad_fn=<AddmmBackward>)

0 5.711716175079346
200 0.14712589979171753
400 0.10049592703580856
600 0.08884098380804062
800 0.08310769498348236
1000 0.07949385792016983
1200 0.07695646584033966
1400 0.07494604587554932
1600 0.073524110019207
1800 0.07221145927906036

Loss train_set: 3875.08569336

tensor([[2.9172],
        [3.5864],
        [2.9354],
        [4.5793],
        [7.5099]])

Sequential(
  (0): Linear(in_features=15, out_features=100, bias=True)
  (1): LeakyReLU(negative_slope=0.01)
  (2): Linear(in_features=100, out_features=100, bias=True)
  (3): LeakyReLU(negative_slope=0.01)
  (4): Linear(in_features=100, out_features=1, bias=True)
)

0    29.0
1    36.0
2    29.0
Name: y_pred, dtype: float32

TRanspends the resulting vector to become a column¶

TRansponuje wektor wynikowy aby stał się kolumną¶

y = y.type(torch.FloatTensor)

print('X:',X.shape)
print('y:',y.shape)

X: torch.Size([17379, 15])
y: torch.Size([17379])

torch.Size([17379, 1])

X_train:  torch.Size([13904, 15])
X_test:   torch.Size([3475, 15])
----------------------------------------------------
y_train:  torch.Size([13904, 1])
y_test:   torch.Size([3475, 1])

tensor([[-0.0874],
        [-0.0926],
        [-0.0942],
        ...,
        [ 0.0814],
        [ 0.0851],
        [ 0.1024]], grad_fn=<AddmmBackward>)

0 5.711716175079346
200 0.14712589979171753
400 0.10049592703580856
600 0.08884098380804062
800 0.08310769498348236
1000 0.07949385792016983
1200 0.07695646584033966
1400 0.07494604587554932
1600 0.073524110019207
1800 0.07221145927906036

Loss train_set: 3875.08569336

tensor([[2.9172],
        [3.5864],
        [2.9354],
        [4.5793],
        [7.5099]])

Sequential(
  (0): Linear(in_features=15, out_features=100, bias=True)
  (1): LeakyReLU(negative_slope=0.01)
  (2): Linear(in_features=100, out_features=100, bias=True)
  (3): LeakyReLU(negative_slope=0.01)
  (4): Linear(in_features=100, out_features=1, bias=True)
)

0    29.0
1    36.0
2    29.0
Name: y_pred, dtype: float32

Text(0.5, 1.0, 'COURSE OF THE PROJECTING PROCESS ON THE TEST SET')

Dodanie jednego wymiaru do wektora wynikowego

y = y.view(y.shape[0],1)
y.shape

torch.Size([17379, 1])

Podział na zbiór testowy i zbiór treningowy¶

a,b = X.shape
a

total_records = a
test_records = int(a * .2)

X_train = X[:total_records-test_records]
X_test = X[total_records-test_records:total_records]

y_train = y[:total_records-test_records]
y_test = y[total_records-test_records:total_records]

print('X_train: ',X_train.shape)
print('X_test:  ',X_test.shape)
print('----------------------------------------------------')
print('y_train: ',y_train.shape)
print('y_test:  ',y_test.shape)

X_train:  torch.Size([13904, 15])
X_test:   torch.Size([3475, 15])
----------------------------------------------------
y_train:  torch.Size([13904, 1])
y_test:   torch.Size([3475, 1])

tensor([[-0.0874],
        [-0.0926],
        [-0.0942],
        ...,
        [ 0.0814],
        [ 0.0851],
        [ 0.1024]], grad_fn=<AddmmBackward>)

0 5.711716175079346
200 0.14712589979171753
400 0.10049592703580856
600 0.08884098380804062
800 0.08310769498348236
1000 0.07949385792016983
1200 0.07695646584033966
1400 0.07494604587554932
1600 0.073524110019207
1800 0.07221145927906036

Loss train_set: 3875.08569336

tensor([[2.9172],
        [3.5864],
        [2.9354],
        [4.5793],
        [7.5099]])

Sequential(
  (0): Linear(in_features=15, out_features=100, bias=True)
  (1): LeakyReLU(negative_slope=0.01)
  (2): Linear(in_features=100, out_features=100, bias=True)
  (3): LeakyReLU(negative_slope=0.01)
  (4): Linear(in_features=100, out_features=1, bias=True)
)

0    29.0
1    36.0
2    29.0
Name: y_pred, dtype: float32

Text(0.5, 1.0, 'COURSE OF THE PROJECTING PROCESS ON THE TEST SET')

<seaborn.axisgrid.FacetGrid at 0x7fb3400a0990>

<Figure size 432x288 with 0 Axes>

Definiowanie sieci neuronowej¶

Programowanie torch.nn.Module¶

class Net(torch.nn.Module):
    def __init__(self, n_feature, n_hidden, n_output):
        super(Net, self).__init__()
        self.hidden = torch.nn.Linear(n_feature, n_hidden)   # hidden layer
        self.predict = torch.nn.Linear(n_hidden, n_output)   # output layer

    def forward(self, x):
        x = F.relu(self.hidden(x))      # activation function for hidden layer
        x = self.predict(x)             # linear output
        return x

Definicja krztałtu sieci¶

N, D_in = X.shape
N, D_out = y.shape

H = 100
device = torch.device('cpu')

net = torch.nn.Sequential(
        torch.nn.Linear(D_in,  H),
        torch.nn.LeakyReLU(),
        torch.nn.Linear(H, H),
        torch.nn.LeakyReLU(),
        torch.nn.Linear(H, D_out),
    ).to(device)

net(X_train)

tensor([[-0.0874],
        [-0.0926],
        [-0.0942],
        ...,
        [ 0.0814],
        [ 0.0851],
        [ 0.1024]], grad_fn=<AddmmBackward>)

Algorytm optymalizacji:¶

Optymalizator¶

lr: Szybkość uczenia się -> Szybkość, z jaką nasz model aktualizuje wagi w komórkach za każdym razem, gdy przeprowadzana jest wsteczna propagacja

#optimizer = torch.optim.SGD(net.parameters(), lr=0.01, momentum=0, dampening=0, weight_decay=0, nesterov=False)
#optimizer = torch.optim.SGD(net.parameters(), lr=0.1)
optimizer = torch.optim.Adam(net.parameters(), lr=0.01)
#optimizer = torch.optim.Adamax(net.parameters(), lr=0.01)
#optimizer = torch.optim.ASGD(net.parameters(), lr=0.01, lambd=0.0001, alpha=0.15, t0=000000.0)
#optimizer = torch.optim.LBFGS(net.parameters(), lr=0.01, max_iter=20, max_eval=None, tolerance_grad=1e-05, tolerance_change=1e-09, history_size=100, line_search_fn=None)
#optimizer = torch.optim.RMSprop(net.parameters(), lr=0.01, alpha=0.99, eps=1e-08)
#optimizer = torch.optim.Rprop(net.parameters(), lr=0.01, etas=(0.5, 1.2), step_sizes=(1e-06, 50))  #R2:0.77

Definicja funkcji straty¶

to jest R2 dla regresji

loss_func = torch.nn.MSELoss()

Definiowanie procesu nauki i nauka¶

inputs = X_train                          #1. deklarujemy x i y do nauki
outputs = y_train
for i in range(2000):                          #2. pętla 1050 powtórzeń (epok)
   prediction = net(inputs)
   loss = loss_func(prediction, outputs) 
   optimizer.zero_grad()
   loss.backward()        
   optimizer.step()       

   if i print(i, loss.item())     # <=# wartości y, a funkcja straty zwraca Tensor zawierający stratę.

0 5.711716175079346
200 0.14712589979171753
400 0.10049592703580856
600 0.08884098380804062
800 0.08310769498348236
1000 0.07949385792016983
1200 0.07695646584033966
1400 0.07494604587554932
1600 0.073524110019207
1800 0.07221145927906036

Loss train_set: 3875.08569336

tensor([[2.9172],
        [3.5864],
        [2.9354],
        [4.5793],
        [7.5099]])

Sequential(
  (0): Linear(in_features=15, out_features=100, bias=True)
  (1): LeakyReLU(negative_slope=0.01)
  (2): Linear(in_features=100, out_features=100, bias=True)
  (3): LeakyReLU(negative_slope=0.01)
  (4): Linear(in_features=100, out_features=1, bias=True)
)

0    29.0
1    36.0
2    29.0
Name: y_pred, dtype: float32

Text(0.5, 1.0, 'COURSE OF THE PROJECTING PROCESS ON THE TEST SET')

<seaborn.axisgrid.FacetGrid at 0x7fb3400a0990>

<Figure size 432x288 with 0 Axes>

-----two methods--------------
r2_score:           0.770
r2_score:           0.770

-------------------------------
Mean absolute error     MAE:  7.37 
Root mean squared error RMSE: 10.57 
Mean absolute error     MAPE: 143.06 
-------------------------------
Źle - Resztki modelu RÓŻNIĄ SIĘ OD ZERA - pvalue: 0.0000 < 0.01 (Odrzucamy H0)
--------------------------------------------------------------------------------------------
Źle Shapiro-Wilk - y NIE MA ROZKŁADU NORMALNEGO - pvalue: 0.0000 < 0.01 (Odrzucamy H0)
--------------------------------------------------------------------------------------------
Źle Shapiro-Wilk y_pred NIE MA ROZKŁADU NORMALNEGO - pvalue: 0.0000 < 0.01 (Odrzucamy H0)

--------------------------------------------------------------------------------------------
Shapiro-Wilk: Zmienne nie mają rozkładu normalnego! Nie można zrobić analizy ANOVA
Źle - Kruskal-Wallis: prognoza i obserwacje empir. NIE MAJĄ równych średnich - pvalue: 0.0087 < 0.01 (Odrzucamy H0)
--------------------------------------------------------------------------------------------

There are many potential reasons. Most likely exploding gradients. The two things to try first:¶

Normalize the inputs
Lower the learning rate

Istnieje wiele potencjalnych przyczyn. Najprawdopodobniej wybuchające gradienty. Dwie rzeczy do wypróbowania w pierwszej kolejności:

– Normalizuj wejścia
– Obniż tempo uczenia msię

Forecast based on the model¶

substitute the same equations that were in the model
The following loss result shows the last model sequence
Loss shows how much the model is wrong (loss = sum of error squares) after the last learning sequence

Prognoza na podstawie modelu

podstawiamy te same równania, które były w modelu
Poniższy wynik loss pokazuje ostatnią sekwencje modelu
Loss pokazuuje ile myli się model (loss = suma kwadratu błedów) po ostatniej sekwencji uczenia się

with torch.no_grad():
    y_pred = net(X_test)  
    loss = (y_pred - y_test).pow(2).sum()

    print(f'Loss train_set: {loss:.8f}')

Loss train_set: 3875.08569336

tensor([[2.9172],
        [3.5864],
        [2.9354],
        [4.5793],
        [7.5099]])

Sequential(
  (0): Linear(in_features=15, out_features=100, bias=True)
  (1): LeakyReLU(negative_slope=0.01)
  (2): Linear(in_features=100, out_features=100, bias=True)
  (3): LeakyReLU(negative_slope=0.01)
  (4): Linear(in_features=100, out_features=1, bias=True)
)

0    29.0
1    36.0
2    29.0
Name: y_pred, dtype: float32

Text(0.5, 1.0, 'COURSE OF THE PROJECTING PROCESS ON THE TEST SET')

<seaborn.axisgrid.FacetGrid at 0x7fb3400a0990>

<Figure size 432x288 with 0 Axes>

-----two methods--------------
r2_score:           0.770
r2_score:           0.770

-------------------------------
Mean absolute error     MAE:  7.37 
Root mean squared error RMSE: 10.57 
Mean absolute error     MAPE: 143.06 
-------------------------------
Źle - Resztki modelu RÓŻNIĄ SIĘ OD ZERA - pvalue: 0.0000 < 0.01 (Odrzucamy H0)
--------------------------------------------------------------------------------------------
Źle Shapiro-Wilk - y NIE MA ROZKŁADU NORMALNEGO - pvalue: 0.0000 < 0.01 (Odrzucamy H0)
--------------------------------------------------------------------------------------------
Źle Shapiro-Wilk y_pred NIE MA ROZKŁADU NORMALNEGO - pvalue: 0.0000 < 0.01 (Odrzucamy H0)

--------------------------------------------------------------------------------------------
Shapiro-Wilk: Zmienne nie mają rozkładu normalnego! Nie można zrobić analizy ANOVA
Źle - Kruskal-Wallis: prognoza i obserwacje empir. NIE MAJĄ równych średnich - pvalue: 0.0087 < 0.01 (Odrzucamy H0)
--------------------------------------------------------------------------------------------

Ponieważ ustaliliśmy, że nasza warstwa wyjściowa będzie zawierać 1 neuron, każda prognoza będzie zawierać 1 wartości. Przykładowo pierwsze 5 przewidywanych wartości wygląda następująco:

y_pred[:5]

tensor([[2.9172],
        [3.5864],
        [2.9354],
        [4.5793],
        [7.5099]])

We save the whole model¶

Zapisujemy cały model

torch.save(net,'/home/wojciech/Pulpit/7/byk15.pb')

We play the whole model¶

Odtwarzamy cały model

KOT = torch.load('/home/wojciech/Pulpit/7/byk15.pb')
KOT.eval()

Sequential(
  (0): Linear(in_features=15, out_features=100, bias=True)
  (1): LeakyReLU(negative_slope=0.01)
  (2): Linear(in_features=100, out_features=100, bias=True)
  (3): LeakyReLU(negative_slope=0.01)
  (4): Linear(in_features=100, out_features=1, bias=True)
)

By substituting other independent variables, you can get a vector of output variables¶

We choose a random record from the tensor

Podstawiając inne zmienne niezależne można uzyskać wektor zmiennych wyjściowych

Wybieramy sobie jakąś losowy rekord z tensora

y_pred = y_pred*10
foka = y_pred.cpu().detach().numpy()
df11 = pd.DataFrame(foka)
df11.columns = ['y_pred']
df11=np.round(df11.y_pred)
df11.head(3)

0    29.0
1    36.0
2    29.0
Name: y_pred, dtype: float32

y_test = y_test*10
foka = y_test.cpu().detach().numpy()
df_t = pd.DataFrame(foka)
df_t.columns = ['y']
df_t.head(3)

NOWA = pd.merge(df_t,df11, how='inner', left_index=True, right_index=True)
NOWA.head(3)

NOWA.to_csv('/home/wojciech/Pulpit/7/NOWA.csv')

fig, ax = plt.subplots( figsize=(16, 2))
for ewa in ['y', 'y_pred']:
    ax.plot(NOWA, label=ewa)
    
ax.set_xlim(1340, 1500)
#ax.legend()
ax.set_ylabel('Parameter')
ax.set_title('COURSE OF THE PROJECTING PROCESS ON THE TEST SET')

Text(0.5, 1.0, 'COURSE OF THE PROJECTING PROCESS ON THE TEST SET')

<seaborn.axisgrid.FacetGrid at 0x7fb3400a0990>

<Figure size 432x288 with 0 Axes>

-----two methods--------------
r2_score:           0.770
r2_score:           0.770

-------------------------------
Mean absolute error     MAE:  7.37 
Root mean squared error RMSE: 10.57 
Mean absolute error     MAPE: 143.06 
-------------------------------
Źle - Resztki modelu RÓŻNIĄ SIĘ OD ZERA - pvalue: 0.0000 < 0.01 (Odrzucamy H0)
--------------------------------------------------------------------------------------------
Źle Shapiro-Wilk - y NIE MA ROZKŁADU NORMALNEGO - pvalue: 0.0000 < 0.01 (Odrzucamy H0)
--------------------------------------------------------------------------------------------
Źle Shapiro-Wilk y_pred NIE MA ROZKŁADU NORMALNEGO - pvalue: 0.0000 < 0.01 (Odrzucamy H0)

--------------------------------------------------------------------------------------------
Shapiro-Wilk: Zmienne nie mają rozkładu normalnego! Nie można zrobić analizy ANOVA
Źle - Kruskal-Wallis: prognoza i obserwacje empir. NIE MAJĄ równych średnich - pvalue: 0.0087 < 0.01 (Odrzucamy H0)
--------------------------------------------------------------------------------------------

## marginesy
plt.subplots_adjust( left = None , bottom = None , right = None , top = None , wspace = None , hspace = None )
plt.figure(figsize=(16,5))
ax = plt.subplot(1, 2, 1)
NOWA.plot.kde(ax=ax, legend=True, title='Histogram: y vs. y_pred')
NOWA.plot.hist(density=True,bins=40, ax=ax, alpha=0.3)
ax.set_title("Dystributions")

ax = plt.subplot(1, 2, 2)
sns.boxplot(data = NOWA)
plt.xticks(rotation=-90)
ax.set_title("Boxes")


sns.lmplot(data=NOWA, x='y', y='y_pred')

<seaborn.axisgrid.FacetGrid at 0x7fb3400a0990>

<Figure size 432x288 with 0 Axes>

-----two methods--------------
r2_score:           0.770
r2_score:           0.770

-------------------------------
Mean absolute error     MAE:  7.37 
Root mean squared error RMSE: 10.57 
Mean absolute error     MAPE: 143.06 
-------------------------------
Źle - Resztki modelu RÓŻNIĄ SIĘ OD ZERA - pvalue: 0.0000 < 0.01 (Odrzucamy H0)
--------------------------------------------------------------------------------------------
Źle Shapiro-Wilk - y NIE MA ROZKŁADU NORMALNEGO - pvalue: 0.0000 < 0.01 (Odrzucamy H0)
--------------------------------------------------------------------------------------------
Źle Shapiro-Wilk y_pred NIE MA ROZKŁADU NORMALNEGO - pvalue: 0.0000 < 0.01 (Odrzucamy H0)

--------------------------------------------------------------------------------------------
Shapiro-Wilk: Zmienne nie mają rozkładu normalnego! Nie można zrobić analizy ANOVA
Źle - Kruskal-Wallis: prognoza i obserwacje empir. NIE MAJĄ równych średnich - pvalue: 0.0087 < 0.01 (Odrzucamy H0)
--------------------------------------------------------------------------------------------

Regression_Assessment¶

## Robi ocenę tylko dla jednej zmiennej

def Regression_Assessment(y, y_pred):
    
    from sklearn.metrics import r2_score 
    import scipy.stats as stats
    from statsmodels.graphics.gofplots import qqplot
    from matplotlib import pyplot
       
    print('-----two methods--------------')
    SS_Residual = sum((y-y_pred)**2)       
    SS_Total = sum((y-np.mean(y))**2)     
    r_squared = 1 - (float(SS_Residual))/SS_Total
    adjusted_r_squared = 1 - (1-r_squared)*(len(y)-1)/(len(y)-X.shape[1]-1)
    print('r2_score:           #print('adjusted_r_squared:
    #print('----r2_score------secound-method--------')  
    print('r2_score:           print()
    print('-------------------------------')
    MAE = (abs(y-y_pred)).mean()
    print('Mean absolute error     MAE:  RMSE = np.sqrt(((y-y_pred)**2).mean())
    print('Root mean squared error RMSE: pt = (100*(y-y_pred))/y
    MAPE = (abs(pt)).mean()
    print('Mean absolute error     MAPE: print('-------------------------------')
    
    stat,pvalue0 = stats.ttest_1samp(a=(y-y_pred),popmean=0.0)

    if pvalue0 > 0.01:
        print('t-test H0:suma reszt modelu wynosi zero--')
        print('OK! Resztki modelu nie różnią się od zera - pvalue: else:     
        print('Źle - Resztki modelu RÓŻNIĄ SIĘ OD ZERA - pvalue: print('--------------------------------------------------------------------------------------------') 
  
       
    stat,pvalue2_1 = stats.shapiro(y)
    stat,pvalue2_2 = stats.shapiro(y_pred)

    if pvalue2_1 > 0.01:
        #print('Shapiro-Wilk H0: y maj rozkład normalny?--------------------------------')
        print('OK Shapiro-Wilk! y maja rozkład normalny - pvalue: else:     
        print('Źle Shapiro-Wilk - y NIE MA ROZKŁADU NORMALNEGO - pvalue: print('--------------------------------------------------------------------------------------------')
    if pvalue2_2 > 0.01:
        #print('Shapiro-Wilk: y_pred maj rozkład normalny?--')
        print('OK Shapiro-Wilk! y_pred ma rozkład normalny - pvalue: else:     
        print('Źle Shapiro-Wilk y_pred NIE MA ROZKŁADU NORMALNEGO - pvalue: qqplot(y, line='s')
    pyplot.show()

    qqplot(y_pred, line='s')
    pyplot.show()
       
    print('--------------------------------------------------------------------------------------------')
        
    stat,pvalue3 = stats.kruskal(y_pred,y)
    stat,pvalue4 = stats.f_oneway(y_pred,y)

    if pvalue2_1 < 0.01 or pvalue2_2 < 0.01:
        print('Shapiro-Wilk: Zmienne nie mają rozkładu normalnego! Nie można zrobić analizy ANOVA')
     
        if pvalue3 > 0.01:
            print('Kruskal-Wallis NON-PARAMETRIC TEST: czy prognoza i obserwacje empir. mają równe średnie?')
            print('OK! Kruskal-Wallis H0: prognoza i obserwacje empir. mają równe średnie - pvalue: else:     
            print('Źle - Kruskal-Wallis: prognoza i obserwacje empir. NIE MAJĄ równych średnich - pvalue: else:

        if pvalue4 > 0.01:
            print('F-test (ANOVA): czy prognoza i obserwacje empir. mają równe średnie?--------------------------------')
            print('OK! prognoza i obserwacje empir. mają równe średnie - pvalue: else:     
            print('Źle - prognoza i obserwacje empir. NIE MAJĄ równych średnich - pvalue: print('--------------------------------------------------------------------------------------------')

y = NOWA['y']
y_pred = NOWA['y_pred']

Regression_Assessment(y, y_pred)

-----two methods--------------
r2_score:           0.770
r2_score:           0.770

-------------------------------
Mean absolute error     MAE:  7.37 
Root mean squared error RMSE: 10.57 
Mean absolute error     MAPE: 143.06 
-------------------------------
Źle - Resztki modelu RÓŻNIĄ SIĘ OD ZERA - pvalue: 0.0000 < 0.01 (Odrzucamy H0)
--------------------------------------------------------------------------------------------
Źle Shapiro-Wilk - y NIE MA ROZKŁADU NORMALNEGO - pvalue: 0.0000 < 0.01 (Odrzucamy H0)
--------------------------------------------------------------------------------------------
Źle Shapiro-Wilk y_pred NIE MA ROZKŁADU NORMALNEGO - pvalue: 0.0000 < 0.01 (Odrzucamy H0)

--------------------------------------------------------------------------------------------
Shapiro-Wilk: Zmienne nie mają rozkładu normalnego! Nie można zrobić analizy ANOVA
Źle - Kruskal-Wallis: prognoza i obserwacje empir. NIE MAJĄ równych średnich - pvalue: 0.0087 < 0.01 (Odrzucamy H0)
--------------------------------------------------------------------------------------------

THE DATA SCIENCE LIBRARY

Wojciech Moszczyński

Pytorch regression 3.1 [BikeSharing.csv]