Pytorch regression 3.7 [BikeSharing.csv]

030520201303

Work on diagnostic systems.

The tank prototype can really be checked in combat conditions!

obraz.png
https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset

In [1]:
import torch

I’m starting a GPU graphics card (which I don’t have)

Odpalam karte graficzną GPU (której nie mam)

In [2]:
device = torch.device('cpu') # obliczenia robie na CPU
#device = torch.device('cuda') # obliczenia robie na GPU
In [3]:
import pandas as pd

df = pd.read_csv('/home/wojciech/Pulpit/3/BikeSharing.csv')
print(df.shape)
df.head(3)
(17379, 17)
Out[3]:
instant dteday season yr mnth hr holiday weekday workingday weathersit temp atemp hum windspeed casual registered cnt
0 1 2011-01-01 1 0 1 0 0 6 0 1 0.24 0.2879 0.81 0.0 3 13 16
1 2 2011-01-01 1 0 1 1 0 6 0 1 0.22 0.2727 0.80 0.0 8 32 40
2 3 2011-01-01 1 0 1 2 0 6 0 1 0.22 0.2727 0.80 0.0 5 27 32

cnt: count of total rental bikes including both casual and registered

I fill all holes with values out of range

Wypełniam wszystkie dziury wartościami z poza zakresu

In [4]:
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(10,6))
CORREL =df.corr()
sns.heatmap(CORREL, annot=True, cbar=False, cmap="coolwarm")
plt.title('Macierz korelacji ze zmienną wynikową y', fontsize=20)
Out[4]:
Text(0.5, 1, 'Macierz korelacji ze zmienną wynikową y')
In [5]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10,6))
CORREL['cnt'].plot(kind='barh', color='red')
plt.title('Korelacja ze zmienną wynikową', fontsize=20)
plt.xlabel('Poziom korelacji')
plt.ylabel('Zmienne nezależne ciągłe')
Out[5]:
Text(0, 0.5, 'Zmienne nezależne ciągłe')

Variables: ‘registered’, ‘casual’ are also results only shown differently, therefore they must be removed from the data.

Zmienne: ‘registered’,’casual’ są to też wyniki tylko inazej pokazane dlatego trzeba je usunąć z danych.

In [6]:
a,b = df.shape     #<- ile mamy kolumn
b

print('NUMBER OF EMPTY RECORDS vs. FULL RECORDS')
print('----------------------------------------')
for i in range(1,b):
    i = df.columns[i]
    r = df[i].isnull().sum()
    h = df[i].count()
    pr = (r/h)*100
   
    if r > 0:
        print(i,"--------",r,"--------",h,"--------",pr) 
NUMBER OF EMPTY RECORDS vs. FULL RECORDS
----------------------------------------
In [7]:
import seaborn as sns

sns.heatmap(df.isnull(),yticklabels=False,cbar=False,cmap='viridis')
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f9e0102a110>
In [8]:
#del df['Unnamed: 15']
#del df['Unnamed: 16']

df = df.dropna(how='any') # jednak je kasuje te dziury

# df.fillna(-777, inplace=True)
df.isnull().sum()
Out[8]:
instant       0
dteday        0
season        0
yr            0
mnth          0
hr            0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64
In [9]:
print(df.dtypes)
df.head(3)
instant         int64
dteday         object
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
dtype: object
Out[9]:
instant dteday season yr mnth hr holiday weekday workingday weathersit temp atemp hum windspeed casual registered cnt
0 1 2011-01-01 1 0 1 0 0 6 0 1 0.24 0.2879 0.81 0.0 3 13 16
1 2 2011-01-01 1 0 1 1 0 6 0 1 0.22 0.2727 0.80 0.0 8 32 40
2 3 2011-01-01 1 0 1 2 0 6 0 1 0.22 0.2727 0.80 0.0 5 27 32

to_datetime

In [10]:
df['dteday'] =  pd.to_datetime(df['dteday'])
df['weekday'] = df.dteday.dt.weekday
df['month'] =df.dteday.dt.month
df['weekofyear'] =df.dteday.dt.weekofyear 
In [11]:
del df['dteday']
In [12]:
print(df.dtypes)
df.head(3)
instant         int64
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
month           int64
weekofyear      int64
dtype: object
Out[12]:
instant season yr mnth hr holiday weekday workingday weathersit temp atemp hum windspeed casual registered cnt month weekofyear
0 1 1 0 1 0 0 5 0 1 0.24 0.2879 0.81 0.0 3 13 16 1 52
1 2 1 0 1 1 0 5 0 1 0.22 0.2727 0.80 0.0 8 32 40 1 52
2 3 1 0 1 2 0 5 0 1 0.22 0.2727 0.80 0.0 5 27 32 1 52

Encodes text values

Koduje wartości tekstowe

In [13]:
import numpy as np

a,b = df.shape     #<- ile mamy kolumn
b

print('DISCRETE FUNCTIONS CODED')
print('------------------------')
for i in range(1,b):
    i = df.columns[i]
    f = df[i].dtypes
    if f == np.object:
        print(i,"---",f)   
    
        if f == np.object:
        
            df[i] = pd.Categorical(df[i]).codes
        
            continue
DISCRETE FUNCTIONS CODED
------------------------

df[‘Time’] = pd.Categorical(df[‘Time’]).codes
df[‘Time’] = df[‘Time’].astype(int)

In [14]:
df.dtypes
Out[14]:
instant         int64
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
month           int64
weekofyear      int64
dtype: object
In [15]:
df.columns
Out[15]:
Index(['instant', 'season', 'yr', 'mnth', 'hr', 'holiday', 'weekday',
       'workingday', 'weathersit', 'temp', 'atemp', 'hum', 'windspeed',
       'casual', 'registered', 'cnt', 'month', 'weekofyear'],
      dtype='object')

I’m cutting out an iron test reserve

Wycinam żelazną rezerwę testową

Wycinam 0.5% procent ostatnich rekordów które będa słuzyły do sprawdzenia zdolności prognostycznych

In [16]:
R,C =df.shape
F = R*0.005
L = R - F
L
Out[16]:
17292.105
In [17]:
df5 = df[df.index>=L]
df2 = df[df.index<L]
print('Zbiór super testowy df5:',df5.shape)
print('df2:                    ',df2.shape) 
Zbiór super testowy df5: (86, 18)
df2:                     (17293, 18)

I specify what is X and what is y

Określam co jest X a co y

In [18]:
X = df2.drop(['cnt','registered','casual'],1)
y = df2['cnt']
In [19]:
X_SuperT = df5.drop(['cnt','registered','casual'],1)
y_SuperT = df5['cnt']

Scaling (normalization) of the X value

X should never be too big. Ideally, it should be in the range [-1, 1]. If this is not the case, normalize the input.

Skalowanie (normalizacja) wartości X

X nigdy nie powinien być zbyt duży. Idealnie powinien być w zakresie [-1, 1]. Jeśli tak nie jest, należy znormalizować dane wejściowe.

In [20]:
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X = sc.fit_transform(X)

print(np.round(X.std(), decimals=2), np.round(X.mean(), decimals=2))
1.0 0.0
In [21]:
y.value_counts()
Out[21]:
5      260
6      235
4      231
3      221
2      207
      ... 
725      1
709      1
661      1
629      1
887      1
Name: cnt, Length: 869, dtype: int64
In [22]:
y = (y / 100)  # max test score is 100
#print(y.head(3))
print(np.round(y.std(), decimals=2), np.round(y.mean(), decimals=2))
1.82 1.9

Creates random input and output

Tworzy losowe dane wejściowe i wyjściowe

In [23]:
import numpy as np

#X = X.values       #- jak była normalizacja to to nie działa
X = torch.tensor(X)
print(X[:3])
tensor([[-1.7320, -1.3663, -1.0002, -1.6087, -1.6694, -0.1726,  0.9965, -1.4710,
         -0.6633, -1.3437, -1.1028,  0.9456, -1.5557, -1.6087,  1.7030],
        [-1.7318, -1.3663, -1.0002, -1.6087, -1.5247, -0.1726,  0.9965, -1.4710,
         -0.6633, -1.4477, -1.1914,  0.8938, -1.5557, -1.6087,  1.7030],
        [-1.7316, -1.3663, -1.0002, -1.6087, -1.3801, -0.1726,  0.9965, -1.4710,
         -0.6633, -1.4477, -1.1914,  0.8938, -1.5557, -1.6087,  1.7030]],
       dtype=torch.float64)
In [24]:
X = X.type(torch.FloatTensor)
print(X[:3])
tensor([[-1.7320, -1.3663, -1.0002, -1.6087, -1.6694, -0.1726,  0.9965, -1.4710,
         -0.6633, -1.3437, -1.1028,  0.9456, -1.5557, -1.6087,  1.7030],
        [-1.7318, -1.3663, -1.0002, -1.6087, -1.5247, -0.1726,  0.9965, -1.4710,
         -0.6633, -1.4477, -1.1914,  0.8938, -1.5557, -1.6087,  1.7030],
        [-1.7315, -1.3663, -1.0002, -1.6087, -1.3801, -0.1726,  0.9965, -1.4710,
         -0.6633, -1.4477, -1.1914,  0.8938, -1.5557, -1.6087,  1.7030]])
In [25]:
y = y.values   # tworzymy macierz numpy - jak była normalizacja to to nie działa
In [26]:
y = torch.tensor(y)
print(y[:3])
tensor([0.1600, 0.4000, 0.3200], dtype=torch.float64)

TRanspends the resulting vector to become a column

TRansponuje wektor wynikowy aby stał się kolumną

y = y.view(y.shape[0],1)
y[:5]

In [27]:
y = y.type(torch.FloatTensor)
In [28]:
print('X:',X.shape)
print('y:',y.shape)
X: torch.Size([17293, 15])
y: torch.Size([17293])

Dodanie jednego wymiaru do wektora wynikowego

In [29]:
y = y.view(y.shape[0],1)
y.shape
Out[29]:
torch.Size([17293, 1])

Podział na zbiór testowy i zbiór treningowy

In [30]:
a,b = X.shape
a

total_records = a
test_records = int(a * .2)

X_train = X[:total_records-test_records]
X_test = X[total_records-test_records:total_records]

y_train = y[:total_records-test_records]
y_test = y[total_records-test_records:total_records]
In [31]:
print('X_train: ',X_train.shape)
print('X_test:  ',X_test.shape)
print('----------------------------------------------------')
print('y_train: ',y_train.shape)
print('y_test:  ',y_test.shape)
X_train:  torch.Size([13835, 15])
X_test:   torch.Size([3458, 15])
----------------------------------------------------
y_train:  torch.Size([13835, 1])
y_test:   torch.Size([3458, 1])

Defining the neural network

Programowanie torch.nn.Module
In [32]:
class Net(torch.nn.Module):
    def __init__(self, n_feature, n_hidden, n_output):
        super(Net, self).__init__()
        self.hidden = torch.nn.Linear(n_feature, n_hidden)   # hidden layer
        self.predict = torch.nn.Linear(n_hidden, n_output)   # output layer

    def forward(self, x):
        x = F.relu(self.hidden(x))      # activation function for hidden layer
        x = self.predict(x)             # linear output
        return x
Definicja krztałtu sieci
In [33]:
N, D_in = X.shape
N, D_out = y.shape

H = 100
device = torch.device('cpu')
In [34]:
net = torch.nn.Sequential(
        torch.nn.Linear(D_in,  H),
        torch.nn.LeakyReLU(),
        torch.nn.Linear(H, H),
        torch.nn.LeakyReLU(),
        torch.nn.Linear(H, D_out),
    ).to(device)  
In [35]:
net(X_train)
Out[35]:
tensor([[-0.1001],
        [-0.0887],
        [-0.0803],
        ...,
        [-0.0541],
        [-0.0833],
        [-0.0890]], grad_fn=<AddmmBackward>)

Алгоритм оптимизации:

Optymalizator

lr: Speed of learning -> The speed at which our model updates the weights in the cells each time backward propagation is carried out

lr: Szybkość uczenia się -> Szybkość, z jaką nasz model aktualizuje wagi w komórkach za każdym razem, gdy przeprowadzana jest wsteczna propagacja

In [36]:
#optimizer = torch.optim.SGD(net.parameters(), lr=0.01, momentum=0, dampening=0, weight_decay=0, nesterov=False) #-2.401
#optimizer = torch.optim.SGD(net.parameters(), lr=0.1) #-4.086
optimizer = torch.optim.Adam(net.parameters(), lr=0.01) #-5.298
#optimizer = torch.optim.Adamax(net.parameters(), lr=0.01) #-6.610
#optimizer = torch.optim.ASGD(net.parameters(), lr=0.01, lambd=0.0001, alpha=0.15, t0=000000.0) #-2.315
#optimizer = torch.optim.LBFGS(net.parameters(), lr=0.01, max_iter=20, max_eval=None, tolerance_grad=1e-05, tolerance_change=1e-09, history_size=100, line_search_fn=None)
#optimizer = torch.optim.RMSprop(net.parameters(), lr=0.01, alpha=0.99, eps=1e-08) #-5.152
#optimizer = torch.optim.Rprop(net.parameters(), lr=0.01, etas=(0.5, 1.2), step_sizes=(1e-06, 50))  #R2:-7.388

Определение функции потерь

to jest R2 dla regresji

In [37]:
loss_func = torch.nn.MSELoss()

Definiowanie procesu nauki i nauka

In [38]:
inputs = X_train                          #1. deklarujemy x i y do nauki
outputs = y_train
for i in range(2000):                          #2. pętla 1050 powtórzeń (epok)
   prediction = net(inputs)
   loss = loss_func(prediction, outputs) 
   optimizer.zero_grad()
   loss.backward()        
   optimizer.step()       

   if i print(i, loss.item())     # <=# wartości y, a funkcja straty zwraca Tensor zawierający stratę.
0 5.6923956871032715
200 0.27917081117630005
400 0.12776944041252136
600 0.10168859362602234
800 0.11591815203428268
1000 0.11488667875528336
1200 0.08809500187635422
1400 0.08998128771781921
1600 0.07167605310678482
1800 0.07469654083251953

There are many potential reasons. Most likely exploding gradients. The two things to try first:

  • Normalize the inputs
  • Lower the learning rate

Istnieje wiele potencjalnych przyczyn. Najprawdopodobniej wybuchające gradienty. Dwie rzeczy do wypróbowania w pierwszej kolejności:

  • – Normalizuj wejścia
  • – Obniż tempo uczenia msię

import matplotlib.pyplot as plt

plt.plot(range(epochs), aggregated_losses)
plt.ylabel(‘Loss’)
plt.xlabel(‘epoch’)
plt.show

Forecast based on the model

  • substitute the same equations that were in the model
  • The following loss result shows the last model sequence
  • Loss shows how much the model is wrong (loss = sum of error squares) after the last learning sequence

Prognoza na podstawie modelu

  • podstawiamy te same równania, które były w modelu
  • Poniższy wynik loss pokazuje ostatnią sekwencje modelu
  • Loss pokazuuje ile myli się model (loss = suma kwadratu błedów) po ostatniej sekwencji uczenia się
    obraz.png
In [39]:
with torch.no_grad():
    y_pred = net(X_test)  
    loss = (y_pred - y_test).pow(2).sum()

    print(f'Loss train_set: {loss:.8f}')
Loss train_set: 3405.68505859

Ponieważ ustaliliśmy, że nasza warstwa wyjściowa będzie zawierać 1 neuron, każda prognoza będzie zawierać 1 wartości. Przykładowo pierwsze 5 przewidywanych wartości wygląda następująco:

In [40]:
y_pred[:5]
Out[40]:
tensor([[3.9657],
        [4.3590],
        [4.1216],
        [3.7958],
        [3.2563]])

We save the whole model

Zapisujemy cały model

In [41]:
torch.save(net,'/home/wojciech/Pulpit/7/byk15.pb')

We play the whole model

Odtwarzamy cały model

In [42]:
KOT = torch.load('/home/wojciech/Pulpit/7/byk15.pb')
KOT.eval()
Out[42]:
Sequential(
  (0): Linear(in_features=15, out_features=100, bias=True)
  (1): LeakyReLU(negative_slope=0.01)
  (2): Linear(in_features=100, out_features=100, bias=True)
  (3): LeakyReLU(negative_slope=0.01)
  (4): Linear(in_features=100, out_features=1, bias=True)
)

By substituting other independent variables, you can get a vector of output variables

We choose a random record from the tensor

Podstawiając inne zmienne niezależne można uzyskać wektor zmiennych wyjściowych

Wybieramy sobie jakąś losowy rekord z tensora

In [43]:
y_pred = y_pred*10
foka = y_pred.cpu().detach().numpy()
df11 = pd.DataFrame(foka)
df11.columns = ['y_pred']
df11=np.round(df11.y_pred)
df11.head(3)
Out[43]:
0    40.0
1    44.0
2    41.0
Name: y_pred, dtype: float32
In [44]:
y_test = y_test*10
foka = y_test.cpu().detach().numpy()
df_t = pd.DataFrame(foka)
df_t.columns = ['y']
df_t.head(3)
Out[44]:
y
0 45.000000
1 49.200001
2 49.000000
In [45]:
NOWA = pd.merge(df_t,df11, how='inner', left_index=True, right_index=True)
NOWA.head(3)
Out[45]:
y y_pred
0 45.000000 40.0
1 49.200001 44.0
2 49.000000 41.0
In [46]:
NOWA.to_csv('/home/wojciech/Pulpit/7/NOWA.csv')
In [47]:
fig, ax = plt.subplots( figsize=(16, 2))
for ewa in ['y', 'y_pred']:
    ax.plot(NOWA, label=ewa)
    
ax.set_xlim(1340, 1500)
#ax.legend()
ax.set_ylabel('Parameter')
ax.set_title('COURSE OF THE PROJECTING PROCESS ON THE TEST SET')
Out[47]:
Text(0.5, 1.0, 'COURSE OF THE PROJECTING PROCESS ON THE TEST SET')
In [48]:
## marginesy
plt.subplots_adjust( left = None , bottom = None , right = None , top = None , wspace = None , hspace = None )
plt.figure(figsize=(16,5))
ax = plt.subplot(1, 2, 1)
NOWA.plot.kde(ax=ax, legend=True, title='Histogram: y vs. y_pred')
NOWA.plot.hist(density=True,bins=40, ax=ax, alpha=0.3)
ax.set_title("Dystributions")

ax = plt.subplot(1, 2, 2)
sns.boxplot(data = NOWA)
plt.xticks(rotation=-90)
ax.set_title("Boxes")


sns.lmplot(data=NOWA, x='y', y='y_pred')
Out[48]:
<seaborn.axisgrid.FacetGrid at 0x7f9df115b2d0>
<Figure size 432x288 with 0 Axes>

Regression_Assessment

In [49]:
## Robi ocenę tylko dla jednej zmiennej

def Regression_Assessment(y, y_pred):
    
    from sklearn.metrics import r2_score 
    import scipy.stats as stats
    from statsmodels.graphics.gofplots import qqplot
    from matplotlib import pyplot
       
    print('-----two methods--------------')
    SS_Residual = sum((y-y_pred)**2)       
    SS_Total = sum((y-np.mean(y))**2)     
    r_squared = 1 - (float(SS_Residual))/SS_Total
    adjusted_r_squared = 1 - (1-r_squared)*(len(y)-1)/(len(y)-X.shape[1]-1)
    print('r2_score:           #print('adjusted_r_squared:
    #print('----r2_score------secound-method--------')  
    print('r2_score:           print()
    print('-------------------------------')
    MAE = (abs(y-y_pred)).mean()
    print('Mean absolute error     MAE:  RMSE = np.sqrt(((y-y_pred)**2).mean())
    print('Root mean squared error RMSE: pt = (100*(y-y_pred))/y
    MAPE = (abs(pt)).mean()
    print('Mean absolute error     MAPE: print('-------------------------------')
    
    stat,pvalue0 = stats.ttest_1samp(a=(y-y_pred),popmean=0.0)

    if pvalue0 > 0.01:
        print('t-test H0: the sum of the model residuals is zero')
        print('OKAY! Model remains do not differ from zero - pvalue:else:     
        print('Bad - Model remains DIFFERENT FROM ZERO - pvalue:print('--------------------------------------------------------------------------------------------') 
  
       
    stat,pvalue2_1 = stats.shapiro(y)
    stat,pvalue2_2 = stats.shapiro(y_pred)

    if pvalue2_1 > 0.01:
        #print('Shapiro-Wilk H0: y have normal distribution?--------------------------------')
        print('OK Shapiro-Wolf! y have normal distribution - pvalue:else:     
        print('Bad Shapiro-Wilk - y NO NORMAL DISTRIBUTION - pvalue:print('--------------------------------------------------------------------------------------------')
    if pvalue2_2 > 0.01:
        #print('Shapiro-Wilk: y_pred have a normal distribution?--')
        print('OK Shapiro-Wolf! y_pred has a normal distribution - pvalue:else:     
        print('Bad Shapiro-Wilk y_pred NO NORMAL DISTRIBUTION - pvalue:qqplot(y, line='s')
    pyplot.show()

    qqplot(y_pred, line='s')
    pyplot.show()
       
    print('--------------------------------------------------------------------------------------------')
        
    stat,pvalue3 = stats.kruskal(y_pred,y)
    stat,pvalue4 = stats.f_oneway(y_pred,y)

    if pvalue2_1 < 0.01 or pvalue2_2 < 0.01:
        print('Шапиро-Вилк: Переменные не имеют нормального распределения! Не могу сделать анализ ANOV')
     
        if pvalue3 > 0.01:
            print('Kruskal-Wallis NON-PARAMETRIC TEST: whether empirical forecast and observations. have equal means?')
            print('OKAY! Kruskal-Wallis H0: forecast and observations empir. have equal means - pvalue:else:     
            print('Bad - Kruskal-Wallis: forecast and observations empir. DO NOT HAVE EQUAL Averages - pvalue:else:

        if pvalue4 > 0.01:
            print('F-test (ANOVA): whether empirical forecast and observations. have equal means?--------------------------------')
            print('OKAY! forecast and observations empir. have equal means - pvalue:else:     
            print('Bad - forecast and observations empir. DO NOT HAVE EQUAL Averages - pvalue:print('--------------------------------------------------------------------------------------------')
In [50]:
y = NOWA['y']
y_pred = NOWA['y_pred']

Regression_Assessment(y, y_pred)
-----two methods--------------
r2_score:           0.799
r2_score:           0.799

-------------------------------
Mean absolute error     MAE:  6.95 
Root mean squared error RMSE: 9.93 
Mean absolute error     MAPE: 91.06 
-------------------------------
Bad - Model remains DIFFERENT FROM ZERO - pvalue: 0.0000 <0.01 (We reject H0)
--------------------------------------------------------------------------------------------
Bad Shapiro-Wilk - y NO NORMAL DISTRIBUTION - pvalue: 0.0000 <0.01 (We reject H0)
--------------------------------------------------------------------------------------------
Bad Shapiro-Wilk y_pred NO NORMAL DISTRIBUTION - pvalue: 0.0000 <0.01 (We reject H0)
--------------------------------------------------------------------------------------------
Шапиро-Вилк: Переменные не имеют нормального распределения! Не могу сделать анализ ANOV
Bad - Kruskal-Wallis: forecast and observations empir. DO NOT HAVE EQUAL Averages - pvalue: 0.0000 <0.01 (We reject H0)
--------------------------------------------------------------------------------------------

Танк Супер Тест в боевых условиях!

obraz.png

In [51]:
print(X_SuperT.shape)
X_SuperT.head(3)
(86, 15)
Out[51]:
instant season yr mnth hr holiday weekday workingday weathersit temp atemp hum windspeed month weekofyear
17293 17294 1 1 12 10 0 4 1 2 0.26 0.2424 0.56 0.2537 12 52
17294 17295 1 1 12 11 0 4 1 2 0.28 0.2727 0.52 0.2239 12 52
17295 17296 1 1 12 12 0 4 1 2 0.30 0.3030 0.49 0.1343 12 52
In [52]:
y_SuperT.head(3)
Out[52]:
17293    162
17294    178
17295    222
Name: cnt, dtype: int64
In [53]:
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_SuperT = sc.fit_transform(X_SuperT)

print(np.round(X_SuperT.std(), decimals=2), np.round(X_SuperT.mean(), decimals=2))
0.82 -0.0
In [ ]:
 
In [54]:
X_SuperT = torch.tensor(X_SuperT)
X_SuperT = X_SuperT.type(torch.FloatTensor)
print(X_SuperT[:3])
tensor([[-1.7120,  0.0000,  0.0000,  0.0000, -0.3405,  0.0000,  0.1161,  1.1239,
          0.6165,  0.3527,  0.1062, -0.2904,  0.3681,  0.0000,  0.6222],
        [-1.6717,  0.0000,  0.0000,  0.0000, -0.1934,  0.0000,  0.1161,  1.1239,
          0.6165,  0.8419,  0.9404, -0.5800,  0.1717,  0.0000,  0.6222],
        [-1.6315,  0.0000,  0.0000,  0.0000, -0.0462,  0.0000,  0.1161,  1.1239,
          0.6165,  1.3312,  1.7746, -0.7971, -0.4189,  0.0000,  0.6222]])
In [55]:
y_SuperT = (y_SuperT / 100)  # max test score is 100
#print(y.head(3))
print(np.round(y_SuperT.std(), decimals=2), np.round(y_SuperT.mean(), decimals=2))
0.77 0.97
In [56]:
y_SuperT = y_SuperT.values
y_SuperT = torch.tensor(y_SuperT)
y_SuperT = y_SuperT.view(y_SuperT.shape[0],1)
y_SuperT.shape
Out[56]:
torch.Size([86, 1])
In [57]:
print('X_SuperT:',X_SuperT.shape)
print('y_SuperT:',y_SuperT.shape)
X_SuperT: torch.Size([86, 15])
y_SuperT: torch.Size([86, 1])
In [ ]:
 
In [58]:
with torch.no_grad():
    y_predST = net(X_SuperT)  
    loss = (y_predST - y_SuperT).pow(2).sum()

    print(f'Loss train_set: {loss:.8f}')
Loss train_set: 232.59281707
In [ ]:
 
In [59]:
y_predST = y_predST*100
foka = y_predST.cpu().detach().numpy()
df11 = pd.DataFrame(foka)
df11.columns = ['y_predST']
df11=np.round(df11.y_predST)
df11.head(3)
Out[59]:
0    157.0
1    132.0
2    105.0
Name: y_predST, dtype: float32
In [60]:
y_SuperT = y_SuperT*100
y_SuperT = np.round(y_SuperT)
foka = y_SuperT.cpu().detach().numpy()
df_t = pd.DataFrame(foka)
df_t.columns = ['y_ST']
df_t.head(3)
Out[60]:
y_ST
0 162.0
1 178.0
2 222.0
In [61]:
Super_NOWA = pd.merge(df_t,df11, how='inner', left_index=True, right_index=True)
Super_NOWA.head(3)
Out[61]:
y_ST y_predST
0 162.0 157.0
1 178.0 132.0
2 222.0 105.0
In [62]:
fig, ax = plt.subplots( figsize=(16, 2))
for ewa in ['y_ST', 'y_predST']:
    ax.plot(Super_NOWA, label=ewa)
    
#ax.set_xlim(1340, 1500)
#ax.legend()
ax.set_ylabel('Parameter')
ax.set_title('COURSE OF THE PROJECTING PROCESS ON THE TEST SET')
Out[62]:
Text(0.5, 1.0, 'COURSE OF THE PROJECTING PROCESS ON THE TEST SET')
In [63]:
## marginesy
plt.subplots_adjust( left = None , bottom = None , right = None , top = None , wspace = None , hspace = None )
plt.figure(figsize=(16,5))
ax = plt.subplot(1, 2, 1)
Super_NOWA.plot.kde(ax=ax, legend=True, title='Histogram: y vs. y_pred')
Super_NOWA.plot.hist(density=True,bins=40, ax=ax, alpha=0.3)
ax.set_title("Dystributions")

ax = plt.subplot(1, 2, 2)
sns.boxplot(data = Super_NOWA)
plt.xticks(rotation=-90)
ax.set_title("Boxes")


sns.lmplot(data=Super_NOWA, x='y_ST', y='y_predST')
Out[63]:
<seaborn.axisgrid.FacetGrid at 0x7f9df0d58750>
<Figure size 432x288 with 0 Axes>
In [64]:
y = Super_NOWA['y_ST']
y_pred = Super_NOWA['y_predST']

Regression_Assessment(y, y_pred)
-----two methods--------------
r2_score:           -3.593
r2_score:           -3.593

-------------------------------
Mean absolute error     MAE:  114.26 
Root mean squared error RMSE: 164.43 
Mean absolute error     MAPE: 192.44 
-------------------------------
Bad - Model remains DIFFERENT FROM ZERO - pvalue: 0.0000 <0.01 (We reject H0)
--------------------------------------------------------------------------------------------
Bad Shapiro-Wilk - y NO NORMAL DISTRIBUTION - pvalue: 0.0001 <0.01 (We reject H0)
--------------------------------------------------------------------------------------------
Bad Shapiro-Wilk y_pred NO NORMAL DISTRIBUTION - pvalue: 0.0000 <0.01 (We reject H0)
--------------------------------------------------------------------------------------------
Шапиро-Вилк: Переменные не имеют нормального распределения! Не могу сделать анализ ANOV
Bad - Kruskal-Wallis: forecast and observations empir. DO NOT HAVE EQUAL Averages - pvalue: 0.0044 <0.01 (We reject H0)
--------------------------------------------------------------------------------------------

Вышло плохо – методы устранения явления перенапряжения должны быть реализованы!

obraz.png

Mean absolute error MAE i RMSE

obraz.png

Percentage errors MAPE

obraz.png

obraz.png

obraz.png