### Feature Selection Techniques – Recursive Feature Elimination and cross-validated selection (RFECV)

March 30, 2020

300320202100 RFECV differs from Recursive Feature Elimination (RFE) in the function selection process in that it indicates the OPTIMAL NUMBER OF VARIABLES and not the […]

### Feature Selection Techniques – Embedded Method (Lasso)

March 30, 2020

300320202027 Embedded methods are iterative in a sense that takes care of each iteration of the model training process and carefully extract those features which […]

### Feature Selection Techniques – Recursive Feature Elimination (RFE)

March 30, 2020

300320201719 It is a greedy optimization algorithm which aims to find the best performing feature subset. It repeatedly creates models and keeps aside the best […]

### Feature Selection Techniques – Backward Elimination

March 30, 2020

300320201313 In backward elimination, we start with all the features and removes the least significant feature at each iteration which improves the performance of the […]

### Feature Selection Techniques [numerical result] – Step Forward Selection

March 30, 2020

300320201248 Forward selection is an iterative method in which we start with no function in the model. In each iteration, we add a function that […]

### Feature Selection Techniques – Variance Inflation Factor (VIF)

March 29, 2020

290320202006 Collinearity is the state where two variables are highly correlated and contain similar information about the variance within a given dataset. The Variance Inflation […]

### Feature Selection Techniques – Pearson correlation

March 29, 2020

290320201454 In [1]: import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from sklearn.preprocessing import LabelEncoder, OneHotEncoder import warnings […]

### Feature Selection Techniques (by filter methods): numerical_ input, categorical output

March 28, 2020

280320200940 Source of data: https://archive.ics.uci.edu/ml/datasets/Air+Quality In this case, statistical methods are used: We always have continuous and discrete variables in the data set. This procedure […]

March 28, 2020

### Feature Selection Techniques (by filter methods): categorical input, categorical output

March 26, 2020

categorical input – categorical output 260320201223 In this case, statistical methods are used: We always have continuous and discrete variables in the data set. This […]

### Perfect model: Random forest classifier (1)

March 23, 2020

part 1: Determining the depth of trees by visualization using visualization¶ 230320201052   In [1]: import numpy as np import matplotlib.pyplot as plt import seaborn as […]

### How to use PCA in logistic regression?

March 23, 2020

230320200907 Principal component analysis (PCA) https://jakevdp.github.io/PythonDataScienceHandbook/05.08-random-forests.html https://www.geeksforgeeks.org/principal-component-analysis-with-python/ In [1]: import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt df= […]

### Part. 2 How to improve the classification model? Principal component analysis (PCA)

March 20, 2020

200320200904 In this case, the method did not improve the model. However, there are models in which the PCA method is a very important reason […]

### Feature Selection Techniques – Random Forest Classifier

March 20, 2020

200320200724 In [1]: import pandas as pd df = pd.read_csv(‘/home/wojciech/Pulpit/1/kaggletrain.csv’) df = df.dropna(how=’any’) df.dtypes Out[1]: Unnamed: 0 int64 PassengerId int64 Survived int64 Pclass int64 Name […]

### Testy Kruskal -Wallis

March 16, 2020

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kruskal.html Unless you have a large sample size and can clearly demonstrate that your data are normal, you should routinely use Kruskal–Wallis; they think it […]