(B29) CatBoostClassifier #2

060420202152

https://github.com/catboost/tutorials/blob/master/python_tutorial.ipynb

CatBoostClassifier sam koduje sobie zmienne tekstowe kategoryczne na zmienne kategoryczne wyrażone numerycznie. Jeżeli sami przeprowadzimy codowanie i zakodujemy zmienne kategoryczne na format cyfrowy, wyniki naszych modeli będą takie same (przynajmniej takie jest moje doświadczenie). Aby przeprowadzić eksperyment i przetestować model CatBoostClassifier bez wskazania na zmienne kategoryczne (cat_features) oraz ze wskazaniem na zmienne musimy sami zakodoać tekstowe zmienne kategoryczne na format cyfrowy. W przeciwnym razie gdy będziemy mieli zmienne tekstowe a nie wskarzemy CatBoostClassifier że to zmienne kategoryczne, wyskoczy nam błąd.

In [1]:
##  colorful prints
def black(text):
     print('33[30m', text, '33[0m', sep='')  
def red(text):
     print('33[31m', text, '33[0m', sep='')  
def green(text):
     print('33[32m', text, '33[0m', sep='')  
def yellow(text):     
     print('33[33m', text, '33[0m', sep='')  
def blue(text):
     print('33[34m', text, '33[0m', sep='') 
def magenta(text):
     print('33[35m', text, '33[0m', sep='')  
def cyan(text):
     print('33[36m', text, '33[0m', sep='')  
def gray(text):
     print('33[90m', text, '33[0m', sep='')

1.2 Załadowanie danych

inny sposób na załadowanie tych samych danych o Tytaniku.

In [2]:
from catboost.datasets import titanic
import numpy as np
import pandas as pd

train_df, test_df = titanic()

train_df.head()
Out[2]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th… female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S

Sprawdzam kompletność zbioru

metoda pokazuje tylko te zmienna, w których brakuje danych.

In [3]:
null_value_stats = train_df.isnull().sum(axis=0)
null_value_stats[null_value_stats != 0]
Out[3]:
Age         177
Cabin       687
Embarked      2
dtype: int64

W miejcu gdzie były puste rekordy wstawiana jest wartość -777

In [4]:
train_df.fillna(-777, inplace=True)
train_df.fillna(-777, inplace=True)

Dzielimy na zmienne opisujące i wynikowe

In [5]:
X = train_df.drop('Survived', axis=1)
y = train_df.Survived

Szukamy zmiennych kategorycznych

Zostały wybrane takie kolumny jako kolumny zmiennych kategorycznych.

In [6]:
print(X.dtypes)

categorical_features_indices = np.where(X.dtypes != np.float)[0]
PassengerId      int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object
In [7]:
categorical_features_indices
Out[7]:
array([ 0,  1,  2,  3,  5,  6,  7,  9, 10])

Wyświetlamy co to za kolumny

In [8]:
PPS = categorical_features_indices

KOT_MIC = dict(zip(train_df, PPS))
KOT_sorted_keys_MIC = sorted(KOT_MIC, key=KOT_MIC.get, reverse=True)

for r in KOT_sorted_keys_MIC:
    print (r, KOT_MIC[r])
Ticket 10
Parch 9
SibSp 7
Age 6
Sex 5
Name 3
Pclass 2
Survived 1
PassengerId 0

Można też użyć mojego sposobu na identyfikację zmiennych kategorycznych. Tutaj mamy nazwiska i kabiny więc ten sposób idetyfikacji zmiennych kategorycznych nie będzie właściwy.

In [9]:
import numpy as np

categorical_fuX = np.where(train_df.nunique() <8) [0]
categorical_fuX
Out[9]:
array([ 1,  2,  4,  6,  7, 11])

Dzielimy zbiór na zbiory treningowe i testowe

In [10]:
from sklearn.model_selection import train_test_split

X_train, X_validation, y_train, y_validation = train_test_split(X, y, train_size=0.75, random_state=42)
In [11]:
X_train.head(3)
Out[11]:
PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
298 299 1 Saalfeld, Mr. Adolphe male -777.0 0 0 19988 30.50 C106 S
884 885 3 Sutehall, Mr. Henry Jr male 25.0 0 0 SOTON/OQ 392076 7.05 -777 S
247 248 2 Hamalainen, Mrs. William (Anna) female 24.0 0 2 250649 14.50 -777 S

Poziom zbilansowania zbioru wynikowego

In [12]:
y_train.value_counts(dropna = False, normalize=True).plot(kind='pie')
Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f84617d24d0>

2.1 Szkolenie modelowe

Teraz stwórzmy sam model: poszlibyśmy tutaj z parametrami domyślnymi (ponieważ zapewniają one naprawdę dobrą linię bazową prawie przez cały czas), jedyną rzeczą, którą chcielibyśmy tutaj określić, jest parametr custom_loss, ponieważ dałoby to nam możliwość zobaczenia co się dzieje pod względem tego wskaźnika konkurencji – dokładności, a także możliwości obserwowania utraty logów, ponieważ byłoby to bardziej płynne w przypadku zestawu danych o takim rozmiarze.

  • custom_loss metryka użyta podczas szkolenia, wybrane: [„accuracy”] https://catboost.ai/docs/search/?query=%27Accuracy%27
  • random_seed = 42 Losowe nasiona użyte do treningu. Te losowe wartości są za każdym razem takie same.
  • logging_level = ‘Silent’ Poziom logowania, aby przejść do standardowego wyjścia. „Cichy” – nie wysyłaj żadnych danych logowania na standardowe wyjście. „Verbose” – wyślij następujące dane na standardowe wyjście, a następnie pokaże w modelu. Dopasuj całą ścieżkę uczenia się. „Informacje” lub „Debugowanie” – wyświetlanie dodatkowych informacji i liczby drzew.
In [13]:
from catboost import CatBoostClassifier, Pool, cv
from sklearn.metrics import accuracy_score

Zdefiniowanie modelu bez deklarowania zmiennych kategorycznych

Optymalizacja pod kontem powierzchni AUC.

In [14]:
model = CatBoostClassifier(
    custom_loss=['Accuracy'],
    random_seed=42,
    logging_level='Silent'
)
In [15]:
model.fit(
    X_train, y_train,
    cat_features=categorical_features_indices,
    eval_set=(X_validation, y_validation),
#     logging_level='Verbose',  # you can uncomment this for text output
    plot=True
);

‘;
}

this.layout = $(‘

‘ +

‘ +

‘ +
‘ +

Learn’ +
‘ +

Eval’ +

‘ +

‘ +

‘ +

‘ +
‘ +
‘ +
‘ +
‘ +

‘ +
‘ +
‘ +
‘ +
‘ +

‘ +
cvAreaControls +

‘ +

‘ +

‘ +

‘ +

‘ +

‘ +

‘);
$(parent).append(this.layout);

this.addTabEvents();
this.addControlEvents();
};

CatboostIpython.prototype.addTabEvents = function() {
var self = this;

$(‘.catboost-graph__tabs’, this.layout).click(function(e) {
if (!$(e.target).is(‘.catboost-graph__tab:not(.catboost-graph__tab_active)’)) {
return;
}

var id = $(e.target).attr(‘tabid’);

self.activeTab = id;

$(‘.catboost-graph__tab_active’, self.layout).removeClass(‘catboost-graph__tab_active’);
$(‘.catboost-graph__chart_active’, self.layout).removeClass(‘catboost-graph__chart_active’);

$(‘.catboost-graph__tab[tabid=”‘ + id + ‘”]’, self.layout).addClass(‘catboost-graph__tab_active’);
$(‘.catboost-graph__chart[tabid=”‘ + id + ‘”]’, self.layout).addClass(‘catboost-graph__chart_active’);

self.cleanSeries();

self.redrawActiveChart();
self.resizeCharts();
});
};

CatboostIpython.prototype.addControlEvents = function() {
var self = this;

$(‘#catboost-control-learn’ + this.index, this.layout).click(function() {
self.layoutDisabled.learn = !$(this)[0].checked;

$(‘.catboost-panel__series’, self.layout).toggleClass(‘catboost-panel__series_learn_disabled’, self.layoutDisabled.learn);

self.redrawActiveChart();
});

$(‘#catboost-control-test’ + this.index, this.layout).click(function() {
self.layoutDisabled.test = !$(this)[0].checked;

$(‘.catboost-panel__series’, self.layout).toggleClass(‘catboost-panel__series_test_disabled’, self.layoutDisabled.test);

self.redrawActiveChart();
});

$(‘#catboost-control2-clickmode’ + this.index, this.layout).click(function() {
self.clickMode = $(this)[0].checked;
});

$(‘#catboost-control2-log’ + this.index, this.layout).click(function() {
self.logarithmMode = $(this)[0].checked ? ‘log’ : ‘linear’;

self.forEveryLayout(function(layout) {
layout.yaxis = {type: self.logarithmMode};
});

self.redrawActiveChart();
});

var slider = $(‘#catboost-control2-slider’ + this.index),
sliderValue = $(‘#catboost-control2-slidervalue’ + this.index);

$(‘#catboost-control2-smooth’ + this.index, this.layout).click(function() {
var enabled = $(this)[0].checked;

self.setSmoothness(enabled ? self.lastSmooth : -1);

slider.prop(‘disabled’, !enabled);
sliderValue.prop(‘disabled’, !enabled);

self.redrawActiveChart();
});

$(‘#catboost-control2-cvstddev’ + this.index, this.layout).click(function() {
var enabled = $(this)[0].checked;

self.setStddev(enabled);

self.redrawActiveChart();
});

slider.on(‘input change’, function() {
var smooth = Number($(this).val());

sliderValue.val(isNaN(smooth) ? 0 : smooth);

self.setSmoothness(smooth);
self.lastSmooth = smooth;

self.redrawActiveChart();
});

sliderValue.on(‘input change’, function() {
var smooth = Number($(this).val());

slider.val(isNaN(smooth) ? 0 : smooth);

self.setSmoothness(smooth);
self.lastSmooth = smooth;

self.redrawActiveChart();
});
};

CatboostIpython.prototype.setTraceVisibility = function(trace, visibility) {
if (trace) {
trace.visible = visibility;
}
};

CatboostIpython.prototype.updateTracesVisibility = function() {
var tracesHash = this.groupTraces(),
traces,
smoothDisabled = this.getSmoothness() === -1,
self = this;

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train)) {
traces = tracesHash[train].traces;

if (this.layoutDisabled.traces[train]) {
traces.forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
} else {
traces.forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

if (this.hasCVMode) {
if (this.stddevEnabled) {
self.filterTracesOne(traces, {type: ‘learn’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesOne(traces, {type: ‘test’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, best_point: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesOne(traces, {cv_stddev_first: true}).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesOne(traces, {cv_stddev_last: true}).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
} else {
self.filterTracesOne(traces, {cv_stddev_first: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesOne(traces, {cv_stddev_last: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, best_point: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}
}

if (smoothDisabled) {
self.filterTracesOne(traces, {smoothed: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}

if (this.layoutDisabled[‘learn’]) {
self.filterTracesOne(traces, {type: ‘learn’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}

if (this.layoutDisabled[‘test’]) {
self.filterTracesOne(traces, {type: ‘test’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}
}
}
}
};

CatboostIpython.prototype.getSmoothness = function() {
return this.smoothness && this.smoothness > -1 ? this.smoothness : -1;
};

CatboostIpython.prototype.setSmoothness = function(weight) {
if (weight 1) {
return;
}

this.smoothness = weight;
};

CatboostIpython.prototype.setStddev = function(enabled) {
this.stddevEnabled = enabled;
};

CatboostIpython.prototype.redrawActiveChart = function() {
this.chartsToRedraw[this.activeTab] = true;

this.redrawAll();
};

CatboostIpython.prototype.redraw = function() {
if (this.chartsToRedraw[this.activeTab]) {
this.chartsToRedraw[this.activeTab] = false;

this.updateTracesVisibility();
this.updateTracesCV();
this.updateTracesBest();
this.updateTracesValues();
this.updateTracesSmoothness();

this.plotly.redraw(this.traces[this.activeTab].parent);
}

this.drawTraces();
};

CatboostIpython.prototype.addRedrawFunc = function() {
this.redrawFunc = throttle(this.redraw, 400, false, this);
};

CatboostIpython.prototype.redrawAll = function() {
if (!this.redrawFunc) {
this.addRedrawFunc();
}

this.redrawFunc();
};

CatboostIpython.prototype.addPoints = function(parent, data) {
var self = this;

data.chunks.forEach(function(item) {
if (typeof item.remaining_time !== ‘undefined’ && typeof item.passed_time !== ‘undefined’) {
if (!self.timeLeft[data.path]) {
self.timeLeft[data.path] = [];
}

self.timeLeft[data.path][item.iteration] = [item.remaining_time, item.passed_time];
}

[‘test’, ‘learn’].forEach(function(type) {
var sets = self.meta[data.path][type + ‘_sets’],
metrics = self.meta[data.path][type + ‘_metrics’];

for (var i = 0; i ‘ + parameter + ‘ : ‘ + valueOfParameter;
}
}
}
if (!hovertextParametersAdded && type === ‘test’) {
hovertextParametersAdded = true;
trace.hovertext[pointIndex] += self.hovertextParameters[pointIndex];
}
smoothedTrace.x[pointIndex] = pointIndex;
}

if (bestValueTrace) {
bestValueTrace.x[pointIndex] = pointIndex;
bestValueTrace.y[pointIndex] = self.lossFuncs[nameOfMetric];
}

if (launchMode === ‘CV’ && !cvAdded) {
cvAdded = true;

self.getTrace(parent, $.extend({cv_stddev_first: true}, params));
self.getTrace(parent, $.extend({cv_stddev_last: true}, params));

self.getTrace(parent, $.extend({cv_stddev_first: true, smoothed: true}, params));
self.getTrace(parent, $.extend({cv_stddev_last: true, smoothed: true}, params));

self.getTrace(parent, $.extend({cv_avg: true}, params));
self.getTrace(parent, $.extend({cv_avg: true, smoothed: true}, params));

if (type === ‘test’) {
self.getTrace(parent, $.extend({cv_avg: true, best_point: true}, params));
}
}
}

self.chartsToRedraw[key.chartId] = true;

self.redrawAll();
}
});
});
};

CatboostIpython.prototype.getLaunchMode = function(path) {
return this.meta[path].launch_mode;
};

CatboostIpython.prototype.getChartNode = function(params, active) {
var node = $(‘

‘);

if (active) {
node.addClass(‘catboost-graph__chart_active’);
}

return node;
};

CatboostIpython.prototype.getChartTab = function(params, active) {
var node = $(‘

‘ + params.name + ‘

‘);

if (active) {
node.addClass(‘catboost-graph__tab_active’);
}

return node;
};

CatboostIpython.prototype.forEveryChart = function(callback) {
for (var name in this.traces) {
if (this.traces.hasOwnProperty(name)) {
callback(this.traces[name]);
}
}
};

CatboostIpython.prototype.forEveryLayout = function(callback) {
this.forEveryChart(function(chart) {
callback(chart.layout);
});
};

CatboostIpython.prototype.getChart = function(parent, params) {
var id = params.id,
self = this;

if (this.charts[id]) {
return this.charts[id];
}

this.addLayout(parent);

var active = this.activeTab === params.id,
chartNode = this.getChartNode(params, active),
chartTab = this.getChartTab(params, active);

$(‘.catboost-graph__charts’, this.layout).append(chartNode);
$(‘.catboost-graph__tabs’, this.layout).append(chartTab);

this.traces[id] = {
id: params.id,
name: params.name,
parent: chartNode[0],
traces: [],
layout: {
xaxis: {
range: [0, Number(this.meta[params.path].iteration_count)],
type: ‘linear’,
tickmode: ‘auto’,
showspikes: true,
spikethickness: 1,
spikedash: ‘longdashdot’,
spikemode: ‘across’,
zeroline: false,
showgrid: false
},
yaxis: {
zeroline: false
//showgrid: false
//hoverformat : ‘.7f’
},
separators: ‘. ‘,
//hovermode: ‘x’,
margin: {l: 38, r: 0, t: 35, b: 30},
autosize: true,
showlegend: false
},
options: {
scrollZoom: false,
modeBarButtonsToRemove: [‘toggleSpikelines’],
displaylogo: false
}
};

this.charts[id] = this.plotly.plot(chartNode[0], this.traces[id].traces, this.traces[id].layout, this.traces[id].options);

chartNode[0].on(‘plotly_hover’, function(e) {
self.updateTracesValues(e.points[0].x);
});

chartNode[0].on(‘plotly_click’, function(e) {
self.updateTracesValues(e.points[0].x, true);
});

return this.charts[id];
};

CatboostIpython.prototype.getTrace = function(parent, params) {
var key = this.getKey(params),
chartSeries = [];

if (this.traces[key.chartId]) {
chartSeries = this.traces[key.chartId].traces.filter(function(trace) {
return trace.name === key.traceName;
});
}

if (chartSeries.length) {
return chartSeries[0];
} else {
this.getChart(parent, {id: key.chartId, name: params.chartName, path: params.path});

var plotParams = {
color: this.getNextColor(params.path, params.smoothed ? 0.2 : 1),
fillsmoothcolor: this.getNextColor(params.path, 0.1),
fillcolor: this.getNextColor(params.path, 0.4),
hoverinfo: params.cv_avg ? ‘skip’ : ‘text+x’,
width: params.cv_avg ? 2 : 1,
dash: params.type === ‘test’ ? ‘solid’ : ‘dot’
},
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
hovertext: [],
hoverinfo: plotParams.hoverinfo,
line: {
width: plotParams.width,
dash: plotParams.dash,
color: plotParams.color
},
mode: ‘lines’,
hoveron: ‘points’,
connectgaps: true
};

if (params.best_point) {
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
marker: {
width: 2,
color: plotParams.color
},
hovertext: [],
hoverinfo: ‘text’,
mode: ‘markers’,
type: ‘scatter’
};
}

if (params.best_value) {
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
line: {
width: 1,
dash: ‘dash’,
color: ‘#CCCCCC’
},
mode: ‘lines’,
connectgaps: true,
hoverinfo: ‘skip’
};
}

if (params.cv_stddev_last) {
trace.fill = ‘tonexty’;
}

trace._params.plotParams = plotParams;

this.traces[key.chartId].traces.push(trace);

return trace;
}
};

CatboostIpython.prototype.getKey = function(params) {
var traceName = [
params.train,
params.type,
params.indexOfSet,
(params.smoothed ? ‘smoothed’ : ”),
(params.best_point ? ‘best_pount’ : ”),
(params.best_value ? ‘best_value’ : ”),
(params.cv_avg ? ‘cv_avg’ : ”),
(params.cv_stddev_first ? ‘cv_stddev_first’ : ”),
(params.cv_stddev_last ? ‘cv_stddev_last’ : ”)
].join(‘;’);

return {
chartId: params.chartName,
traceName: traceName,
colorId: params.train
};
};

CatboostIpython.prototype.filterTracesEvery = function(traces, filter) {
traces = traces || this.traces[this.activeTab].traces;

return traces.filter(function(trace) {
for (var prop in filter) {
if (filter.hasOwnProperty(prop)) {
if (filter[prop] !== trace._params[prop]) {
return false;
}
}
}

return true;
});
};

CatboostIpython.prototype.filterTracesOne = function(traces, filter) {
traces = traces || this.traces[this.activeTab].traces;

return traces.filter(function(trace) {
for (var prop in filter) {
if (filter.hasOwnProperty(prop)) {
if (filter[prop] === trace._params[prop]) {
return true;
}
}
}

return false;
});
};

CatboostIpython.prototype.cleanSeries = function() {
$(‘.catboost-panel__series’, this.layout).html(”);
};

CatboostIpython.prototype.groupTraces = function() {
var traces = this.traces[this.activeTab].traces,
index = 0,
tracesHash = {};

traces.map(function(trace) {
var train = trace._params.train;

if (!tracesHash[train]) {
tracesHash[train] = {
index: index,
traces: [],
info: {
path: trace._params.path,
color: trace._params.plotParams.color
}
};

index++;
}

tracesHash[train].traces.push(trace);
});

return tracesHash;
};

CatboostIpython.prototype.drawTraces = function() {
if ($(‘.catboost-panel__series .catboost-panel__serie’, this.layout).length) {
return;
}

var html = ”,
tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train)) {
html += this.drawTrace(train, tracesHash[train]);
}
}

$(‘.catboost-panel__series’, this.layout).html(html);

this.updateTracesValues();

this.addTracesEvents();
};

CatboostIpython.prototype.getTraceDefParams = function(params) {
var defParams = {
smoothed: undefined,
best_point: undefined,
best_value: undefined,
cv_avg: undefined,
cv_stddev_first: undefined,
cv_stddev_last: undefined
};

if (params) {
return $.extend(defParams, params);
} else {
return defParams;
}
};

CatboostIpython.prototype.drawTrace = function(train, hash) {
var info = hash.info,
id = ‘catboost-serie-‘ + this.index + ‘-‘ + hash.index,
traces = {
learn: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘learn’})),
test: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘test’}))
},
items = {
learn: {
middle: ”,
bottom: ”
},
test: {
middle: ”,
bottom: ”
}
},
tracesNames = ”;

[‘learn’, ‘test’].forEach(function(type) {
traces[type].forEach(function(trace) {
items[type].middle += ‘

‘ +

‘;

items[type].bottom += ‘

‘ +

‘;

tracesNames += ‘

‘ +

‘ + trace._params.nameOfSet + ‘

‘;
});
});

var timeSpendHtml = ‘

‘ +

‘ +

‘;

var html = ‘

‘ +

‘ +
‘ +

‘ +
(this.getLaunchMode(info.path) !== ‘Eval’ ? timeSpendHtml : ”) +

‘ +

curr

‘ +

best

‘ +

‘ +

‘ +

‘ +

‘ +
tracesNames +

‘ +

‘ +
items.learn.middle +
items.test.middle +

‘ +

‘ +
items.learn.bottom +
items.test.bottom +

‘ +

‘ +

‘;

return html;
};

CatboostIpython.prototype.updateTracesValues = function(iteration, click) {
var tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train) && !this.layoutDisabled.traces[train]) {
this.updateTraceValues(train, tracesHash[train], iteration, click);
}
}
};

CatboostIpython.prototype.updateTracesBest = function() {
var tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train) && !this.layoutDisabled.traces[train]) {
this.updateTraceBest(train, tracesHash[train]);
}
}
};

CatboostIpython.prototype.getBestValue = function(data) {
if (!data.length) {
return {
best: undefined,
index: -1
};
}

var best = data[0],
index = 0,
func = this.lossFuncs[this.traces[this.activeTab].name],
bestDiff = typeof func === ‘number’ ? Math.abs(data[0] – func) : 0;

for (var i = 1, l = data.length; i best) {
best = data[i];
index = i;
}

if (typeof func === ‘number’ && Math.abs(data[i] – func) maxLength) {
maxLength = origTrace.y.length;
}
});

for (var i = 0; i 0) {
avgTrace.x[i] = i;
avgTrace.y[i] = sum / count;
}
}
};

CatboostIpython.prototype.updateTracesCVStdDev = function() {
var tracesHash = this.groupTraces(),
firstTraces = this.filterTracesOne(tracesHash.traces, {cv_stddev_first: true}),
self = this;

firstTraces.forEach(function(trace) {
var origTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
smoothed: trace._params.smoothed
})),
lastTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
smoothed: trace._params.smoothed,
cv_stddev_last: true
}));

if (origTraces.length && lastTraces.length === 1) {
self.cvStdDevFunc(origTraces, trace, lastTraces[0]);
}
});
};

CatboostIpython.prototype.cvStdDevFunc = function(origTraces, firstTrace, lastTrace) {
var maxCount = origTraces.length,
maxLength = -1,
count,
sum,
i, j;

origTraces.forEach(function(origTrace) {
if (origTrace.y.length > maxLength) {
maxLength = origTrace.y.length;
}
});

for (i = 0; i i) {
firstTrace.hovertext[i] += this.hovertextParameters[i];
lastTrace.hovertext[i] += this.hovertextParameters[i];
}
}
};

CatboostIpython.prototype.updateTracesSmoothness = function() {
var tracesHash = this.groupTraces(),
smoothedTraces = this.filterTracesOne(tracesHash.traces, {smoothed: true}),
enabled = this.getSmoothness() > -1,
self = this;

smoothedTraces.forEach(function(trace) {
var origTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
indexOfSet: trace._params.indexOfSet,
cv_avg: trace._params.cv_avg,
cv_stddev_first: trace._params.cv_stddev_first,
cv_stddev_last: trace._params.cv_stddev_last
})),
colorFlag = false;

if (origTraces.length === 1) {
origTraces = origTraces[0];

if (origTraces.visible) {
if (enabled) {
self.smoothFunc(origTraces, trace);
colorFlag = true;
}

self.highlightSmoothedTrace(origTraces, trace, colorFlag);
}
}
});
};

CatboostIpython.prototype.highlightSmoothedTrace = function(trace, smoothedTrace, flag) {
if (flag) {
smoothedTrace.line.color = trace._params.plotParams.color;
trace.line.color = smoothedTrace._params.plotParams.color;
trace.hoverinfo = ‘skip’;

if (trace._params.cv_stddev_last) {
trace.fillcolor = trace._params.plotParams.fillsmoothcolor;
}
} else {
trace.line.color = trace._params.plotParams.color;
trace.hoverinfo = trace._params.plotParams.hoverinfo;

if (trace._params.cv_stddev_last) {
trace.fillcolor = trace._params.plotParams.fillcolor;
}
}
};

CatboostIpython.prototype.smoothFunc = function(origTrace, smoothedTrace) {
var data = origTrace.y,
smoothedPoints = this.smooth(data, this.getSmoothness()),
smoothedIndex = 0,
self = this;

if (smoothedPoints.length) {
data.forEach(function (d, index) {
if (!smoothedTrace.x[index]) {
smoothedTrace.x[index] = index;
}

var nameOfSet = smoothedTrace._params.nameOfSet;

if (smoothedTrace._params.cv_stddev_first || smoothedTrace._params.cv_stddev_last) {
nameOfSet = smoothedTrace._params.type + ‘ std’;
}

smoothedTrace.y[index] = smoothedPoints[smoothedIndex];
smoothedTrace.hovertext[index] = nameOfSet + ‘`: ‘ + smoothedPoints[smoothedIndex].toPrecision(7);
if (self.hovertextParameters.length > index) {
smoothedTrace.hovertext[index] += self.hovertextParameters[index];
}
smoothedIndex++;
});
}
};

CatboostIpython.prototype.formatItemValue = function(value, index, suffix) {
if (typeof value === ‘undefined’) {
return ”;
}

suffix = suffix || ”;

return ‘‘ + value + ‘‘;
};

CatboostIpython.prototype.updateTraceBest = function(train, hash) {
var traces = this.filterTracesOne(hash.traces, {best_point: true}),
self = this;

traces.forEach(function(trace) {
var testTrace = self.filterTracesEvery(hash.traces, self.getTraceDefParams({
train: trace._params.train,
type: ‘test’,
indexOfSet: trace._params.indexOfSet
}));

if (self.hasCVMode) {
testTrace = self.filterTracesEvery(hash.traces, self.getTraceDefParams({
train: trace._params.train,
type: ‘test’,
cv_avg: true
}));
}

var bestValue = self.getBestValue(testTrace.length === 1 ? testTrace[0].y : []);

if (bestValue.index !== -1) {
trace.x[0] = bestValue.index;
trace.y[0] = bestValue.best;
trace.hovertext[0] = bestValue.func + ‘ (‘ + (self.hasCVMode ? ‘avg’ : trace._params.nameOfSet) + ‘): ‘ + bestValue.index + ‘ ‘ + bestValue.best;
}
});
};

CatboostIpython.prototype.updateTraceValues = function(name, hash, iteration, click) {
var id = ‘catboost-serie-‘ + this.index + ‘-‘ + hash.index,
traces = {
learn: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘learn’})),
test: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘test’}))
},
path = hash.info.path,
self = this;

[‘learn’, ‘test’].forEach(function(type) {
traces[type].forEach(function(trace) {
var data = trace.y || [],
index = typeof iteration !== ‘undefined’ && iteration -1 ? bestValue.index : ”);

$(‘#’ + id + ‘ .catboost-panel__serie_best_test_value[data-index=’ + trace._params.indexOfSet + ‘]’, self.layout)
.html(self.formatItemValue(bestValue.best, bestValue.index, ‘best ‘ + trace._params.nameOfSet + ‘ ‘));
}
});
});

if (this.hasCVMode) {
var testTrace = this.filterTracesEvery(hash.traces, this.getTraceDefParams({
type: ‘test’,
cv_avg: true
})),
bestValue = this.getBestValue(testTrace.length === 1 ? testTrace[0].y : []);

$(‘#’ + id + ‘ .catboost-panel__serie_best_iteration’, this.layout).html(bestValue.index > -1 ? bestValue.index : ”);
}

if (click) {
this.clickMode = true;

$(‘#catboost-control2-clickmode’ + this.index, this.layout)[0].checked = true;
}
};

CatboostIpython.prototype.addTracesEvents = function() {
var self = this;

$(‘.catboost-panel__serie_checkbox’, this.layout).click(function() {
var name = $(this).data(‘seriename’);

self.layoutDisabled.traces[name] = !$(this)[0].checked;

self.redrawActiveChart();
});
};

CatboostIpython.prototype.getNextColor = function(path, opacity) {
var color;

if (this.colorsByPath[path]) {
color = this.colorsByPath[path];
} else {
color = this.colors[this.colorIndex];
this.colorsByPath[path] = color;

this.colorIndex++;

if (this.colorIndex > this.colors.length – 1) {
this.colorIndex = 0;
}
}

return this.hexToRgba(color, opacity);
};

CatboostIpython.prototype.hexToRgba = function(value, opacity) {
if (value.length 0) {
out += hours + ‘h ‘;
seconds = 0;
millis = 0;
}
if (minutes && minutes > 0) {
out += minutes + ‘m ‘;
millis = 0;
}
if (seconds && seconds > 0) {
out += seconds + ‘s ‘;
}
if (millis && millis > 0) {
out += millis + ‘ms’;
}

return out.trim();
};

CatboostIpython.prototype.mean = function(values, valueof) {
var n = values.length,
m = n,
i = -1,
value,
sum = 0,
number = function(x) {
return x === null ? NaN : +x;
};

if (valueof === null) {
while (++i


Jak widać, można zobaczyć, jak nasz model uczy się na podstawie pełnych wyników lub ładnych wykresów (osobiście zdecydowanie wybrałbym drugą opcję – po prostu sprawdź te wykresy: możesz na przykład powiększyć obszary zainteresowania!)

Dzięki temu możemy zobaczyć, że najlepsza wartość dokładności 0,8340 (na zestawie walidacyjnym) została osiągnięta na 157 etapie wzmocnienia.

Żeby to zobaczyć trzeba kliknąć na Accuracy i stanąć myszą na linii ciągłej (oznaczającej zmienne testowe) nie linii przerywanej(dane treningowe)Wartość accurace wysokości 0.834 osiąga u mnie przy 451 petli. To miejsce gdzie jest kropka!

Co to jest loglost?

Jeśli tylko przewidujesz prawdopodobieństwo dla klasy dodatniej, to funkcję utraty logarytmicznej można obliczyć dla jednej prognozy klasyfikacji binarnej ( yhat ) w porównaniu do oczekiwanego prawdopodobieństwa ( y ) w następujący sposób:

LogLoss = – ((1 – y) log (1 – yhat) + y log (yhat))

Obliczamy predykcję modelu

In [16]:
yhatA = model.predict(X_validation)
print(yhatA[:12])
[0 0 0 1 1 1 1 0 1 1 0 0]

y_validation[4]

In [17]:
y_train[12]
Out[17]:
0

Sprawdzenie tego modelu klasyfikacji

In [18]:
# Classification Assessment
def Classification_Assessment(model ,Xtrain, ytrain, Xtest, ytest, y_pred):
    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn import metrics
    from sklearn.metrics import classification_report, confusion_matrix
    from sklearn.metrics import confusion_matrix, log_loss, auc, roc_curve, roc_auc_score, recall_score, precision_recall_curve
    from sklearn.metrics import make_scorer, precision_score, fbeta_score, f1_score, classification_report
    def green(text):
        print('33[32m', text, '33[0m', sep='')  
    def blue(text):
        print('33[34m', text, '33[0m', sep='')         
    
    print("Recall Training data:     ", np.round(recall_score(ytrain, model.predict(Xtrain)), decimals=4))
    print("Precision Training data:  ", np.round(precision_score(ytrain, model.predict(Xtrain)), decimals=4))
    print("----------------------------------------------------------------------")
    print("Recall Test data:         ", np.round(recall_score(ytest, model.predict(Xtest)), decimals=4)) 
    print("Precision Test data:      ", np.round(precision_score(ytest, model.predict(Xtest)), decimals=4))
    print("----------------------------------------------------------------------")
    print("Confusion Matrix Test data")
    print(confusion_matrix(ytest, model.predict(Xtest)))
    print("----------------------------------------------------------------------")
    green('Valuation for test data only:')
    print(classification_report(ytest, model.predict(Xtest)))
      
    green('Valuation for test data only:')
    y_pred_proba = model.predict_proba(Xtest)[::,1]
    fpr, tpr, _ = metrics.roc_curve(ytest,  y_pred)
    auc = metrics.roc_auc_score(ytest, y_pred)
    plt.plot(fpr, tpr, label='ROC (roc_auc = %0.2f)' % auc)
    plt.xlabel('False Positive Rate',color='grey', fontsize = 13)
    plt.ylabel('True Positive Rate',color='grey', fontsize = 13)
    plt.title('Receiver operating characteristic')
    plt.legend(loc="lower right")
    plt.legend(loc=4)
    plt.plot([0, 1], [0, 1],'r--')
    plt.show()
    print('roc_auc %.3f' % auc)
    
   
    blue('---------------------') 
    AUC_train_1 = metrics.roc_auc_score(ytrain,model.predict_proba(Xtrain)[:,1])
    blue('AUC_train: %.3f' % AUC_train_1)
    AUC_test_1 = metrics.roc_auc_score(ytest,model.predict_proba(Xtest)[:,1])
    blue('AUC_test:  %.3f' % AUC_test_1)
    blue('---------------------')    

      
    print("Accuracy Training data:     ", np.round(accuracy_score(ytrain, model.predict(Xtrain)), decimals=4))
    green("----------------------------------------------------------------------")
    print("Accuracy Test data:         ", np.round(accuracy_score(ytest, model.predict(Xtest)), decimals=4)) 
    green("----------------------------------------------------------------------")
In [19]:
##  colorful prints
def black(text):
     print('33[30m', text, '33[0m', sep='')  
def red(text):
     print('33[31m', text, '33[0m', sep='')  
def green(text):
     print('33[32m', text, '33[0m', sep='')  
def yellow(text):     
    print('33[33m', text, '33[0m', sep='')  
def blue(text):
     print('33[34m', text, '33[0m', sep='') 
def magenta(text):
     print('33[35m', text, '33[0m', sep='')  
def cyan(text):
     print('33[36m', text, '33[0m', sep='')  
def gray(text):
     print('33[90m', text, '33[0m', sep='')
In [20]:
blue(X_train.shape)
green(y_train.shape)
blue(X_validation.shape)
green(y_validation.shape)
(668, 11)
(668,)
(223, 11)
(223,)
In [21]:
Classification_Assessment(model,X_train, y_train, X_validation, y_validation, yhatA)
Recall Training data:      0.7391
Precision Training data:   0.9791
----------------------------------------------------------------------
Recall Test data:          0.6629
Precision Test data:       0.8429
----------------------------------------------------------------------
Confusion Matrix Test data
[[123  11]
 [ 30  59]]
----------------------------------------------------------------------
Valuation for test data only:
              precision    recall  f1-score   support

           0       0.80      0.92      0.86       134
           1       0.84      0.66      0.74        89

    accuracy                           0.82       223
   macro avg       0.82      0.79      0.80       223
weighted avg       0.82      0.82      0.81       223

Valuation for test data only:
roc_auc 0.790
---------------------
AUC_train: 0.968
AUC_test:  0.905
---------------------
Accuracy Training data:      0.8952
----------------------------------------------------------------------
Accuracy Test data:          0.8161
----------------------------------------------------------------------

2.2 Walidacja krzyżowa modelu

Dobrze jest zweryfikować swój model, ale zweryfikować go – nawet lepiej. A także z działkami! Bez słów:

Pokazuje parametry modelu

In [22]:
cv_params = model.get_params()
cv_params
Out[22]:
{'random_seed': 42, 'logging_level': 'Silent', 'custom_loss': ['Accuracy']}

Dodaje jeszcze jeden parametr do moedlu

In [23]:
cv_params.update({'loss_function': 'Logloss'})

Nie wiem co to jest

In [24]:
cv_data = cv(
    Pool(X, y, cat_features=categorical_features_indices),
    cv_params,
    plot=True)

‘;
}

this.layout = $(‘

‘ +

‘ +

‘ +
‘ +

Learn’ +
‘ +

Eval’ +

‘ +

‘ +

‘ +

‘ +
‘ +
‘ +
‘ +
‘ +

‘ +
‘ +
‘ +
‘ +
‘ +

‘ +
cvAreaControls +

‘ +

‘ +

‘ +

‘ +

‘ +

‘ +

‘);
$(parent).append(this.layout);

this.addTabEvents();
this.addControlEvents();
};

CatboostIpython.prototype.addTabEvents = function() {
var self = this;

$(‘.catboost-graph__tabs’, this.layout).click(function(e) {
if (!$(e.target).is(‘.catboost-graph__tab:not(.catboost-graph__tab_active)’)) {
return;
}

var id = $(e.target).attr(‘tabid’);

self.activeTab = id;

$(‘.catboost-graph__tab_active’, self.layout).removeClass(‘catboost-graph__tab_active’);
$(‘.catboost-graph__chart_active’, self.layout).removeClass(‘catboost-graph__chart_active’);

$(‘.catboost-graph__tab[tabid=”‘ + id + ‘”]’, self.layout).addClass(‘catboost-graph__tab_active’);
$(‘.catboost-graph__chart[tabid=”‘ + id + ‘”]’, self.layout).addClass(‘catboost-graph__chart_active’);

self.cleanSeries();

self.redrawActiveChart();
self.resizeCharts();
});
};

CatboostIpython.prototype.addControlEvents = function() {
var self = this;

$(‘#catboost-control-learn’ + this.index, this.layout).click(function() {
self.layoutDisabled.learn = !$(this)[0].checked;

$(‘.catboost-panel__series’, self.layout).toggleClass(‘catboost-panel__series_learn_disabled’, self.layoutDisabled.learn);

self.redrawActiveChart();
});

$(‘#catboost-control-test’ + this.index, this.layout).click(function() {
self.layoutDisabled.test = !$(this)[0].checked;

$(‘.catboost-panel__series’, self.layout).toggleClass(‘catboost-panel__series_test_disabled’, self.layoutDisabled.test);

self.redrawActiveChart();
});

$(‘#catboost-control2-clickmode’ + this.index, this.layout).click(function() {
self.clickMode = $(this)[0].checked;
});

$(‘#catboost-control2-log’ + this.index, this.layout).click(function() {
self.logarithmMode = $(this)[0].checked ? ‘log’ : ‘linear’;

self.forEveryLayout(function(layout) {
layout.yaxis = {type: self.logarithmMode};
});

self.redrawActiveChart();
});

var slider = $(‘#catboost-control2-slider’ + this.index),
sliderValue = $(‘#catboost-control2-slidervalue’ + this.index);

$(‘#catboost-control2-smooth’ + this.index, this.layout).click(function() {
var enabled = $(this)[0].checked;

self.setSmoothness(enabled ? self.lastSmooth : -1);

slider.prop(‘disabled’, !enabled);
sliderValue.prop(‘disabled’, !enabled);

self.redrawActiveChart();
});

$(‘#catboost-control2-cvstddev’ + this.index, this.layout).click(function() {
var enabled = $(this)[0].checked;

self.setStddev(enabled);

self.redrawActiveChart();
});

slider.on(‘input change’, function() {
var smooth = Number($(this).val());

sliderValue.val(isNaN(smooth) ? 0 : smooth);

self.setSmoothness(smooth);
self.lastSmooth = smooth;

self.redrawActiveChart();
});

sliderValue.on(‘input change’, function() {
var smooth = Number($(this).val());

slider.val(isNaN(smooth) ? 0 : smooth);

self.setSmoothness(smooth);
self.lastSmooth = smooth;

self.redrawActiveChart();
});
};

CatboostIpython.prototype.setTraceVisibility = function(trace, visibility) {
if (trace) {
trace.visible = visibility;
}
};

CatboostIpython.prototype.updateTracesVisibility = function() {
var tracesHash = this.groupTraces(),
traces,
smoothDisabled = this.getSmoothness() === -1,
self = this;

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train)) {
traces = tracesHash[train].traces;

if (this.layoutDisabled.traces[train]) {
traces.forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
} else {
traces.forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

if (this.hasCVMode) {
if (this.stddevEnabled) {
self.filterTracesOne(traces, {type: ‘learn’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesOne(traces, {type: ‘test’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, best_point: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesOne(traces, {cv_stddev_first: true}).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesOne(traces, {cv_stddev_last: true}).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
} else {
self.filterTracesOne(traces, {cv_stddev_first: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesOne(traces, {cv_stddev_last: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, best_point: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}
}

if (smoothDisabled) {
self.filterTracesOne(traces, {smoothed: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}

if (this.layoutDisabled[‘learn’]) {
self.filterTracesOne(traces, {type: ‘learn’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}

if (this.layoutDisabled[‘test’]) {
self.filterTracesOne(traces, {type: ‘test’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}
}
}
}
};

CatboostIpython.prototype.getSmoothness = function() {
return this.smoothness && this.smoothness > -1 ? this.smoothness : -1;
};

CatboostIpython.prototype.setSmoothness = function(weight) {
if (weight 1) {
return;
}

this.smoothness = weight;
};

CatboostIpython.prototype.setStddev = function(enabled) {
this.stddevEnabled = enabled;
};

CatboostIpython.prototype.redrawActiveChart = function() {
this.chartsToRedraw[this.activeTab] = true;

this.redrawAll();
};

CatboostIpython.prototype.redraw = function() {
if (this.chartsToRedraw[this.activeTab]) {
this.chartsToRedraw[this.activeTab] = false;

this.updateTracesVisibility();
this.updateTracesCV();
this.updateTracesBest();
this.updateTracesValues();
this.updateTracesSmoothness();

this.plotly.redraw(this.traces[this.activeTab].parent);
}

this.drawTraces();
};

CatboostIpython.prototype.addRedrawFunc = function() {
this.redrawFunc = throttle(this.redraw, 400, false, this);
};

CatboostIpython.prototype.redrawAll = function() {
if (!this.redrawFunc) {
this.addRedrawFunc();
}

this.redrawFunc();
};

CatboostIpython.prototype.addPoints = function(parent, data) {
var self = this;

data.chunks.forEach(function(item) {
if (typeof item.remaining_time !== ‘undefined’ && typeof item.passed_time !== ‘undefined’) {
if (!self.timeLeft[data.path]) {
self.timeLeft[data.path] = [];
}

self.timeLeft[data.path][item.iteration] = [item.remaining_time, item.passed_time];
}

[‘test’, ‘learn’].forEach(function(type) {
var sets = self.meta[data.path][type + ‘_sets’],
metrics = self.meta[data.path][type + ‘_metrics’];

for (var i = 0; i ‘ + parameter + ‘ : ‘ + valueOfParameter;
}
}
}
if (!hovertextParametersAdded && type === ‘test’) {
hovertextParametersAdded = true;
trace.hovertext[pointIndex] += self.hovertextParameters[pointIndex];
}
smoothedTrace.x[pointIndex] = pointIndex;
}

if (bestValueTrace) {
bestValueTrace.x[pointIndex] = pointIndex;
bestValueTrace.y[pointIndex] = self.lossFuncs[nameOfMetric];
}

if (launchMode === ‘CV’ && !cvAdded) {
cvAdded = true;

self.getTrace(parent, $.extend({cv_stddev_first: true}, params));
self.getTrace(parent, $.extend({cv_stddev_last: true}, params));

self.getTrace(parent, $.extend({cv_stddev_first: true, smoothed: true}, params));
self.getTrace(parent, $.extend({cv_stddev_last: true, smoothed: true}, params));

self.getTrace(parent, $.extend({cv_avg: true}, params));
self.getTrace(parent, $.extend({cv_avg: true, smoothed: true}, params));

if (type === ‘test’) {
self.getTrace(parent, $.extend({cv_avg: true, best_point: true}, params));
}
}
}

self.chartsToRedraw[key.chartId] = true;

self.redrawAll();
}
});
});
};

CatboostIpython.prototype.getLaunchMode = function(path) {
return this.meta[path].launch_mode;
};

CatboostIpython.prototype.getChartNode = function(params, active) {
var node = $(‘

‘);

if (active) {
node.addClass(‘catboost-graph__chart_active’);
}

return node;
};

CatboostIpython.prototype.getChartTab = function(params, active) {
var node = $(‘

‘ + params.name + ‘

‘);

if (active) {
node.addClass(‘catboost-graph__tab_active’);
}

return node;
};

CatboostIpython.prototype.forEveryChart = function(callback) {
for (var name in this.traces) {
if (this.traces.hasOwnProperty(name)) {
callback(this.traces[name]);
}
}
};

CatboostIpython.prototype.forEveryLayout = function(callback) {
this.forEveryChart(function(chart) {
callback(chart.layout);
});
};

CatboostIpython.prototype.getChart = function(parent, params) {
var id = params.id,
self = this;

if (this.charts[id]) {
return this.charts[id];
}

this.addLayout(parent);

var active = this.activeTab === params.id,
chartNode = this.getChartNode(params, active),
chartTab = this.getChartTab(params, active);

$(‘.catboost-graph__charts’, this.layout).append(chartNode);
$(‘.catboost-graph__tabs’, this.layout).append(chartTab);

this.traces[id] = {
id: params.id,
name: params.name,
parent: chartNode[0],
traces: [],
layout: {
xaxis: {
range: [0, Number(this.meta[params.path].iteration_count)],
type: ‘linear’,
tickmode: ‘auto’,
showspikes: true,
spikethickness: 1,
spikedash: ‘longdashdot’,
spikemode: ‘across’,
zeroline: false,
showgrid: false
},
yaxis: {
zeroline: false
//showgrid: false
//hoverformat : ‘.7f’
},
separators: ‘. ‘,
//hovermode: ‘x’,
margin: {l: 38, r: 0, t: 35, b: 30},
autosize: true,
showlegend: false
},
options: {
scrollZoom: false,
modeBarButtonsToRemove: [‘toggleSpikelines’],
displaylogo: false
}
};

this.charts[id] = this.plotly.plot(chartNode[0], this.traces[id].traces, this.traces[id].layout, this.traces[id].options);

chartNode[0].on(‘plotly_hover’, function(e) {
self.updateTracesValues(e.points[0].x);
});

chartNode[0].on(‘plotly_click’, function(e) {
self.updateTracesValues(e.points[0].x, true);
});

return this.charts[id];
};

CatboostIpython.prototype.getTrace = function(parent, params) {
var key = this.getKey(params),
chartSeries = [];

if (this.traces[key.chartId]) {
chartSeries = this.traces[key.chartId].traces.filter(function(trace) {
return trace.name === key.traceName;
});
}

if (chartSeries.length) {
return chartSeries[0];
} else {
this.getChart(parent, {id: key.chartId, name: params.chartName, path: params.path});

var plotParams = {
color: this.getNextColor(params.path, params.smoothed ? 0.2 : 1),
fillsmoothcolor: this.getNextColor(params.path, 0.1),
fillcolor: this.getNextColor(params.path, 0.4),
hoverinfo: params.cv_avg ? ‘skip’ : ‘text+x’,
width: params.cv_avg ? 2 : 1,
dash: params.type === ‘test’ ? ‘solid’ : ‘dot’
},
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
hovertext: [],
hoverinfo: plotParams.hoverinfo,
line: {
width: plotParams.width,
dash: plotParams.dash,
color: plotParams.color
},
mode: ‘lines’,
hoveron: ‘points’,
connectgaps: true
};

if (params.best_point) {
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
marker: {
width: 2,
color: plotParams.color
},
hovertext: [],
hoverinfo: ‘text’,
mode: ‘markers’,
type: ‘scatter’
};
}

if (params.best_value) {
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
line: {
width: 1,
dash: ‘dash’,
color: ‘#CCCCCC’
},
mode: ‘lines’,
connectgaps: true,
hoverinfo: ‘skip’
};
}

if (params.cv_stddev_last) {
trace.fill = ‘tonexty’;
}

trace._params.plotParams = plotParams;

this.traces[key.chartId].traces.push(trace);

return trace;
}
};

CatboostIpython.prototype.getKey = function(params) {
var traceName = [
params.train,
params.type,
params.indexOfSet,
(params.smoothed ? ‘smoothed’ : ”),
(params.best_point ? ‘best_pount’ : ”),
(params.best_value ? ‘best_value’ : ”),
(params.cv_avg ? ‘cv_avg’ : ”),
(params.cv_stddev_first ? ‘cv_stddev_first’ : ”),
(params.cv_stddev_last ? ‘cv_stddev_last’ : ”)
].join(‘;’);

return {
chartId: params.chartName,
traceName: traceName,
colorId: params.train
};
};

CatboostIpython.prototype.filterTracesEvery = function(traces, filter) {
traces = traces || this.traces[this.activeTab].traces;

return traces.filter(function(trace) {
for (var prop in filter) {
if (filter.hasOwnProperty(prop)) {
if (filter[prop] !== trace._params[prop]) {
return false;
}
}
}

return true;
});
};

CatboostIpython.prototype.filterTracesOne = function(traces, filter) {
traces = traces || this.traces[this.activeTab].traces;

return traces.filter(function(trace) {
for (var prop in filter) {
if (filter.hasOwnProperty(prop)) {
if (filter[prop] === trace._params[prop]) {
return true;
}
}
}

return false;
});
};

CatboostIpython.prototype.cleanSeries = function() {
$(‘.catboost-panel__series’, this.layout).html(”);
};

CatboostIpython.prototype.groupTraces = function() {
var traces = this.traces[this.activeTab].traces,
index = 0,
tracesHash = {};

traces.map(function(trace) {
var train = trace._params.train;

if (!tracesHash[train]) {
tracesHash[train] = {
index: index,
traces: [],
info: {
path: trace._params.path,
color: trace._params.plotParams.color
}
};

index++;
}

tracesHash[train].traces.push(trace);
});

return tracesHash;
};

CatboostIpython.prototype.drawTraces = function() {
if ($(‘.catboost-panel__series .catboost-panel__serie’, this.layout).length) {
return;
}

var html = ”,
tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train)) {
html += this.drawTrace(train, tracesHash[train]);
}
}

$(‘.catboost-panel__series’, this.layout).html(html);

this.updateTracesValues();

this.addTracesEvents();
};

CatboostIpython.prototype.getTraceDefParams = function(params) {
var defParams = {
smoothed: undefined,
best_point: undefined,
best_value: undefined,
cv_avg: undefined,
cv_stddev_first: undefined,
cv_stddev_last: undefined
};

if (params) {
return $.extend(defParams, params);
} else {
return defParams;
}
};

CatboostIpython.prototype.drawTrace = function(train, hash) {
var info = hash.info,
id = ‘catboost-serie-‘ + this.index + ‘-‘ + hash.index,
traces = {
learn: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘learn’})),
test: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘test’}))
},
items = {
learn: {
middle: ”,
bottom: ”
},
test: {
middle: ”,
bottom: ”
}
},
tracesNames = ”;

[‘learn’, ‘test’].forEach(function(type) {
traces[type].forEach(function(trace) {
items[type].middle += ‘

‘ +

‘;

items[type].bottom += ‘

‘ +

‘;

tracesNames += ‘

‘ +

‘ + trace._params.nameOfSet + ‘

‘;
});
});

var timeSpendHtml = ‘

‘ +

‘ +

‘;

var html = ‘

‘ +

‘ +
‘ +

‘ +
(this.getLaunchMode(info.path) !== ‘Eval’ ? timeSpendHtml : ”) +

‘ +

curr

‘ +

best

‘ +

‘ +

‘ +

‘ +

‘ +
tracesNames +

‘ +

‘ +
items.learn.middle +
items.test.middle +

‘ +

‘ +
items.learn.bottom +
items.test.bottom +

‘ +

‘ +

‘;

return html;
};

CatboostIpython.prototype.updateTracesValues = function(iteration, click) {
var tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train) && !this.layoutDisabled.traces[train]) {
this.updateTraceValues(train, tracesHash[train], iteration, click);
}
}
};

CatboostIpython.prototype.updateTracesBest = function() {
var tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train) && !this.layoutDisabled.traces[train]) {
this.updateTraceBest(train, tracesHash[train]);
}
}
};

CatboostIpython.prototype.getBestValue = function(data) {
if (!data.length) {
return {
best: undefined,
index: -1
};
}

var best = data[0],
index = 0,
func = this.lossFuncs[this.traces[this.activeTab].name],
bestDiff = typeof func === ‘number’ ? Math.abs(data[0] – func) : 0;

for (var i = 1, l = data.length; i best) {
best = data[i];
index = i;
}

if (typeof func === ‘number’ && Math.abs(data[i] – func) maxLength) {
maxLength = origTrace.y.length;
}
});

for (var i = 0; i 0) {
avgTrace.x[i] = i;
avgTrace.y[i] = sum / count;
}
}
};

CatboostIpython.prototype.updateTracesCVStdDev = function() {
var tracesHash = this.groupTraces(),
firstTraces = this.filterTracesOne(tracesHash.traces, {cv_stddev_first: true}),
self = this;

firstTraces.forEach(function(trace) {
var origTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
smoothed: trace._params.smoothed
})),
lastTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
smoothed: trace._params.smoothed,
cv_stddev_last: true
}));

if (origTraces.length && lastTraces.length === 1) {
self.cvStdDevFunc(origTraces, trace, lastTraces[0]);
}
});
};

CatboostIpython.prototype.cvStdDevFunc = function(origTraces, firstTrace, lastTrace) {
var maxCount = origTraces.length,
maxLength = -1,
count,
sum,
i, j;

origTraces.forEach(function(origTrace) {
if (origTrace.y.length > maxLength) {
maxLength = origTrace.y.length;
}
});

for (i = 0; i i) {
firstTrace.hovertext[i] += this.hovertextParameters[i];
lastTrace.hovertext[i] += this.hovertextParameters[i];
}
}
};

CatboostIpython.prototype.updateTracesSmoothness = function() {
var tracesHash = this.groupTraces(),
smoothedTraces = this.filterTracesOne(tracesHash.traces, {smoothed: true}),
enabled = this.getSmoothness() > -1,
self = this;

smoothedTraces.forEach(function(trace) {
var origTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
indexOfSet: trace._params.indexOfSet,
cv_avg: trace._params.cv_avg,
cv_stddev_first: trace._params.cv_stddev_first,
cv_stddev_last: trace._params.cv_stddev_last
})),
colorFlag = false;

if (origTraces.length === 1) {
origTraces = origTraces[0];

if (origTraces.visible) {
if (enabled) {
self.smoothFunc(origTraces, trace);
colorFlag = true;
}

self.highlightSmoothedTrace(origTraces, trace, colorFlag);
}
}
});
};

CatboostIpython.prototype.highlightSmoothedTrace = function(trace, smoothedTrace, flag) {
if (flag) {
smoothedTrace.line.color = trace._params.plotParams.color;
trace.line.color = smoothedTrace._params.plotParams.color;
trace.hoverinfo = ‘skip’;

if (trace._params.cv_stddev_last) {
trace.fillcolor = trace._params.plotParams.fillsmoothcolor;
}
} else {
trace.line.color = trace._params.plotParams.color;
trace.hoverinfo = trace._params.plotParams.hoverinfo;

if (trace._params.cv_stddev_last) {
trace.fillcolor = trace._params.plotParams.fillcolor;
}
}
};

CatboostIpython.prototype.smoothFunc = function(origTrace, smoothedTrace) {
var data = origTrace.y,
smoothedPoints = this.smooth(data, this.getSmoothness()),
smoothedIndex = 0,
self = this;

if (smoothedPoints.length) {
data.forEach(function (d, index) {
if (!smoothedTrace.x[index]) {
smoothedTrace.x[index] = index;
}

var nameOfSet = smoothedTrace._params.nameOfSet;

if (smoothedTrace._params.cv_stddev_first || smoothedTrace._params.cv_stddev_last) {
nameOfSet = smoothedTrace._params.type + ‘ std’;
}

smoothedTrace.y[index] = smoothedPoints[smoothedIndex];
smoothedTrace.hovertext[index] = nameOfSet + ‘`: ‘ + smoothedPoints[smoothedIndex].toPrecision(7);
if (self.hovertextParameters.length > index) {
smoothedTrace.hovertext[index] += self.hovertextParameters[index];
}
smoothedIndex++;
});
}
};

CatboostIpython.prototype.formatItemValue = function(value, index, suffix) {
if (typeof value === ‘undefined’) {
return ”;
}

suffix = suffix || ”;

return ‘‘ + value + ‘‘;
};

CatboostIpython.prototype.updateTraceBest = function(train, hash) {
var traces = this.filterTracesOne(hash.traces, {best_point: true}),
self = this;

traces.forEach(function(trace) {
var testTrace = self.filterTracesEvery(hash.traces, self.getTraceDefParams({
train: trace._params.train,
type: ‘test’,
indexOfSet: trace._params.indexOfSet
}));

if (self.hasCVMode) {
testTrace = self.filterTracesEvery(hash.traces, self.getTraceDefParams({
train: trace._params.train,
type: ‘test’,
cv_avg: true
}));
}

var bestValue = self.getBestValue(testTrace.length === 1 ? testTrace[0].y : []);

if (bestValue.index !== -1) {
trace.x[0] = bestValue.index;
trace.y[0] = bestValue.best;
trace.hovertext[0] = bestValue.func + ‘ (‘ + (self.hasCVMode ? ‘avg’ : trace._params.nameOfSet) + ‘): ‘ + bestValue.index + ‘ ‘ + bestValue.best;
}
});
};

CatboostIpython.prototype.updateTraceValues = function(name, hash, iteration, click) {
var id = ‘catboost-serie-‘ + this.index + ‘-‘ + hash.index,
traces = {
learn: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘learn’})),
test: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘test’}))
},
path = hash.info.path,
self = this;

[‘learn’, ‘test’].forEach(function(type) {
traces[type].forEach(function(trace) {
var data = trace.y || [],
index = typeof iteration !== ‘undefined’ && iteration -1 ? bestValue.index : ”);

$(‘#’ + id + ‘ .catboost-panel__serie_best_test_value[data-index=’ + trace._params.indexOfSet + ‘]’, self.layout)
.html(self.formatItemValue(bestValue.best, bestValue.index, ‘best ‘ + trace._params.nameOfSet + ‘ ‘));
}
});
});

if (this.hasCVMode) {
var testTrace = this.filterTracesEvery(hash.traces, this.getTraceDefParams({
type: ‘test’,
cv_avg: true
})),
bestValue = this.getBestValue(testTrace.length === 1 ? testTrace[0].y : []);

$(‘#’ + id + ‘ .catboost-panel__serie_best_iteration’, this.layout).html(bestValue.index > -1 ? bestValue.index : ”);
}

if (click) {
this.clickMode = true;

$(‘#catboost-control2-clickmode’ + this.index, this.layout)[0].checked = true;
}
};

CatboostIpython.prototype.addTracesEvents = function() {
var self = this;

$(‘.catboost-panel__serie_checkbox’, this.layout).click(function() {
var name = $(this).data(‘seriename’);

self.layoutDisabled.traces[name] = !$(this)[0].checked;

self.redrawActiveChart();
});
};

CatboostIpython.prototype.getNextColor = function(path, opacity) {
var color;

if (this.colorsByPath[path]) {
color = this.colorsByPath[path];
} else {
color = this.colors[this.colorIndex];
this.colorsByPath[path] = color;

this.colorIndex++;

if (this.colorIndex > this.colors.length – 1) {
this.colorIndex = 0;
}
}

return this.hexToRgba(color, opacity);
};

CatboostIpython.prototype.hexToRgba = function(value, opacity) {
if (value.length 0) {
out += hours + ‘h ‘;
seconds = 0;
millis = 0;
}
if (minutes && minutes > 0) {
out += minutes + ‘m ‘;
millis = 0;
}
if (seconds && seconds > 0) {
out += seconds + ‘s ‘;
}
if (millis && millis > 0) {
out += millis + ‘ms’;
}

return out.trim();
};

CatboostIpython.prototype.mean = function(values, valueof) {
var n = values.length,
m = n,
i = -1,
value,
sum = 0,
number = function(x) {
return x === null ? NaN : +x;
};

if (valueof === null) {
while (++i


to nie działa

yhatCV = cv_data.predict(X_validation)

Teraz mamy wartości naszych funkcji strat na każdym etapie wzmocnienia uśrednione 3-krotnie, co powinno zapewnić nam dokładniejsze oszacowanie wydajności naszego modelu:

In [25]:
cv_data.head(2)
Out[25]:
iterations test-Logloss-mean test-Logloss-std train-Logloss-mean train-Logloss-std test-Accuracy-mean test-Accuracy-std train-Accuracy-mean train-Accuracy-std
0 0 0.675761 0.001280 0.675172 0.002037 0.789001 0.018544 0.808642 0.007956
1 1 0.658254 0.002262 0.656563 0.003342 0.796857 0.023888 0.817621 0.014119
In [26]:
print('Best validation accuracy score: {:.2f}±{:.2f} on step {}'.format(
    np.max(cv_data['test-Accuracy-mean']),
    cv_data['test-Accuracy-std'][np.argmax(cv_data['test-Accuracy-mean'])],
    np.argmax(cv_data['test-Accuracy-mean'])
))
Best validation accuracy score: 0.83±0.03 on step 527
/home/wojciech/anaconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py:61: FutureWarning: 
The current behaviour of 'Series.argmax' is deprecated, use 'idxmax'
instead.
The behavior of 'argmax' will be corrected to return the positional
maximum in the future. For now, use 'series.values.argmax' or
'np.argmax(np.array(values))' to get the position of the maximum
row.
  return bound(*args, **kwds)
In [27]:
 np.max(cv_data['test-Accuracy-mean'])
Out[27]:
0.8294051627384961
In [28]:
 np.argmax(cv_data['test-Accuracy-mean'])
Out[28]:
527
In [29]:
print('Precise validation accuracy score: {}'.format(np.max(cv_data['test-Accuracy-mean'])))
Precise validation accuracy score: 0.8294051627384961

Jak widzimy, nasze wstępne oszacowanie wydajności przy pojedynczym foldowaniu sprawdzania poprawności było zbyt optymistyczne – dlatego tak ważna jest krzyżowa weryfikacja!

odpuszczam – nie rozumiem tej sekcji

2.3 Stosowanie modelu

Wszystko, co musisz zrobić, aby uzyskać prognozy, to

In [30]:
predictions = model.predict(X_validation)
predictions[:15]
Out[30]:
array([0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0])
In [31]:
predictions_probs = model.predict_proba(X_validation)
predictions_probs[:5]
Out[31]:
array([[0.70226591, 0.29773409],
       [0.87482065, 0.12517935],
       [0.87590587, 0.12409413],
       [0.03421529, 0.96578471],
       [0.34333994, 0.65666006]])

Ale spróbujmy uzyskać lepsze prognozy, a funkcje Catboost nam w tym pomogą.

Być może zauważyłeś, że na etapie tworzenia modelu podałem nie tylko parametr custom_loss, ale także parametr random_seed. Zostało to zrobione, aby ten notatnik był odtwarzalny – domyślnie catboost wybiera losową wartość dla seed:

In [32]:
model_without_seed = CatBoostClassifier(iterations=10, logging_level='Silent')
model_without_seed.fit(X, y, cat_features=categorical_features_indices)

print('Random seed assigned for this model: {}'.format(model_without_seed.random_seed_))
Random seed assigned for this model: 0

Zdefiniujmy niektóre parametry i utwórz Pool dla większej wygody. Pool Przechowuje wszystkie informacje o zbiorze danych (cechy, etykiety, wskaźniki cech jakościowych, wagi i wiele innych).

To taki zbiornik z parametrami modelu

In [33]:
params = {
    'iterations': 500,
    'learning_rate': 0.1,
    'eval_metric': 'Accuracy',
    'random_seed': 42,
    'logging_level': 'Silent',
    'use_best_model': False
}

Pool dla zmiennych treningowych – to nie model, to taki zbiornik

In [34]:
train_pool = Pool(X_train, y_train, cat_features=categorical_features_indices)
train_pool
Out[34]:
<catboost.core.Pool at 0x7f8460ed4ec0>

Pool dla zmiennych testowych – to nie model, to taki zbiornik

In [35]:
validate_pool = Pool(X_validation, y_validation, cat_features=categorical_features_indices)
validate_pool
Out[35]:
<catboost.core.Pool at 0x7f8460ee5280>

3.1 Korzystanie z najlepszego modelu

Jeśli zasadniczo masz zestaw sprawdzania poprawności, zawsze lepiej jest używać parametru use_best_model podczas treningu. Domyślnie ten parametr jest włączony. Jeśli jest włączony, wynikowy zestaw drzew zmniejsza się do najlepszej iteracji.

In [36]:
## -------linijka jak wywołać najlepsze parametry modelu ---------------------

model = CatBoostClassifier(**params)
model.fit(train_pool, eval_set=validate_pool)

best_model_params = params.copy()
best_model_params.update({                  ## <- tutaj model wkłada 'use_best_model'
    'use_best_model': True                  ## to nie są lepsze parametry tylko ten jeden nowy parametr   
})

### ----------------------------------------------------------------------------

best_model = CatBoostClassifier(**best_model_params)
best_model.fit(train_pool, eval_set=validate_pool);

print('Simple model validation accuracy: {:.4}'.format(
    accuracy_score(y_validation, model.predict(X_validation))
))
print('')

print('Best model validation accuracy: {:.4}'.format(
    accuracy_score(y_validation, best_model.predict(X_validation))
))
Simple model validation accuracy: 0.8072

Best model validation accuracy: 0.8296

Rozbieram powyższe na czynniki pierwsze

In [37]:
params
Out[37]:
{'iterations': 500,
 'learning_rate': 0.1,
 'eval_metric': 'Accuracy',
 'random_seed': 42,
 'logging_level': 'Silent',
 'use_best_model': False}

wyświetlam stare parametry modelu

In [38]:
params
Out[38]:
{'iterations': 500,
 'learning_rate': 0.1,
 'eval_metric': 'Accuracy',
 'random_seed': 42,
 'logging_level': 'Silent',
 'use_best_model': False}

wyświetlam nowe parametry modelu

In [39]:
best_model_params
Out[39]:
{'iterations': 500,
 'learning_rate': 0.1,
 'eval_metric': 'Accuracy',
 'random_seed': 42,
 'logging_level': 'Silent',
 'use_best_model': True}
In [40]:
best_model = CatBoostClassifier(**best_model_params)
In [41]:
best_model.fit(train_pool, eval_set=validate_pool,plot=True )

‘;
}

this.layout = $(‘

‘ +

‘ +

‘ +
‘ +

Learn’ +
‘ +

Eval’ +

‘ +

‘ +

‘ +

‘ +
‘ +
‘ +
‘ +
‘ +

‘ +
‘ +
‘ +
‘ +
‘ +

‘ +
cvAreaControls +

‘ +

‘ +

‘ +

‘ +

‘ +

‘ +

‘);
$(parent).append(this.layout);

this.addTabEvents();
this.addControlEvents();
};

CatboostIpython.prototype.addTabEvents = function() {
var self = this;

$(‘.catboost-graph__tabs’, this.layout).click(function(e) {
if (!$(e.target).is(‘.catboost-graph__tab:not(.catboost-graph__tab_active)’)) {
return;
}

var id = $(e.target).attr(‘tabid’);

self.activeTab = id;

$(‘.catboost-graph__tab_active’, self.layout).removeClass(‘catboost-graph__tab_active’);
$(‘.catboost-graph__chart_active’, self.layout).removeClass(‘catboost-graph__chart_active’);

$(‘.catboost-graph__tab[tabid=”‘ + id + ‘”]’, self.layout).addClass(‘catboost-graph__tab_active’);
$(‘.catboost-graph__chart[tabid=”‘ + id + ‘”]’, self.layout).addClass(‘catboost-graph__chart_active’);

self.cleanSeries();

self.redrawActiveChart();
self.resizeCharts();
});
};

CatboostIpython.prototype.addControlEvents = function() {
var self = this;

$(‘#catboost-control-learn’ + this.index, this.layout).click(function() {
self.layoutDisabled.learn = !$(this)[0].checked;

$(‘.catboost-panel__series’, self.layout).toggleClass(‘catboost-panel__series_learn_disabled’, self.layoutDisabled.learn);

self.redrawActiveChart();
});

$(‘#catboost-control-test’ + this.index, this.layout).click(function() {
self.layoutDisabled.test = !$(this)[0].checked;

$(‘.catboost-panel__series’, self.layout).toggleClass(‘catboost-panel__series_test_disabled’, self.layoutDisabled.test);

self.redrawActiveChart();
});

$(‘#catboost-control2-clickmode’ + this.index, this.layout).click(function() {
self.clickMode = $(this)[0].checked;
});

$(‘#catboost-control2-log’ + this.index, this.layout).click(function() {
self.logarithmMode = $(this)[0].checked ? ‘log’ : ‘linear’;

self.forEveryLayout(function(layout) {
layout.yaxis = {type: self.logarithmMode};
});

self.redrawActiveChart();
});

var slider = $(‘#catboost-control2-slider’ + this.index),
sliderValue = $(‘#catboost-control2-slidervalue’ + this.index);

$(‘#catboost-control2-smooth’ + this.index, this.layout).click(function() {
var enabled = $(this)[0].checked;

self.setSmoothness(enabled ? self.lastSmooth : -1);

slider.prop(‘disabled’, !enabled);
sliderValue.prop(‘disabled’, !enabled);

self.redrawActiveChart();
});

$(‘#catboost-control2-cvstddev’ + this.index, this.layout).click(function() {
var enabled = $(this)[0].checked;

self.setStddev(enabled);

self.redrawActiveChart();
});

slider.on(‘input change’, function() {
var smooth = Number($(this).val());

sliderValue.val(isNaN(smooth) ? 0 : smooth);

self.setSmoothness(smooth);
self.lastSmooth = smooth;

self.redrawActiveChart();
});

sliderValue.on(‘input change’, function() {
var smooth = Number($(this).val());

slider.val(isNaN(smooth) ? 0 : smooth);

self.setSmoothness(smooth);
self.lastSmooth = smooth;

self.redrawActiveChart();
});
};

CatboostIpython.prototype.setTraceVisibility = function(trace, visibility) {
if (trace) {
trace.visible = visibility;
}
};

CatboostIpython.prototype.updateTracesVisibility = function() {
var tracesHash = this.groupTraces(),
traces,
smoothDisabled = this.getSmoothness() === -1,
self = this;

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train)) {
traces = tracesHash[train].traces;

if (this.layoutDisabled.traces[train]) {
traces.forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
} else {
traces.forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

if (this.hasCVMode) {
if (this.stddevEnabled) {
self.filterTracesOne(traces, {type: ‘learn’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesOne(traces, {type: ‘test’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, best_point: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesOne(traces, {cv_stddev_first: true}).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesOne(traces, {cv_stddev_last: true}).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
} else {
self.filterTracesOne(traces, {cv_stddev_first: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesOne(traces, {cv_stddev_last: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, best_point: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}
}

if (smoothDisabled) {
self.filterTracesOne(traces, {smoothed: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}

if (this.layoutDisabled[‘learn’]) {
self.filterTracesOne(traces, {type: ‘learn’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}

if (this.layoutDisabled[‘test’]) {
self.filterTracesOne(traces, {type: ‘test’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}
}
}
}
};

CatboostIpython.prototype.getSmoothness = function() {
return this.smoothness && this.smoothness > -1 ? this.smoothness : -1;
};

CatboostIpython.prototype.setSmoothness = function(weight) {
if (weight 1) {
return;
}

this.smoothness = weight;
};

CatboostIpython.prototype.setStddev = function(enabled) {
this.stddevEnabled = enabled;
};

CatboostIpython.prototype.redrawActiveChart = function() {
this.chartsToRedraw[this.activeTab] = true;

this.redrawAll();
};

CatboostIpython.prototype.redraw = function() {
if (this.chartsToRedraw[this.activeTab]) {
this.chartsToRedraw[this.activeTab] = false;

this.updateTracesVisibility();
this.updateTracesCV();
this.updateTracesBest();
this.updateTracesValues();
this.updateTracesSmoothness();

this.plotly.redraw(this.traces[this.activeTab].parent);
}

this.drawTraces();
};

CatboostIpython.prototype.addRedrawFunc = function() {
this.redrawFunc = throttle(this.redraw, 400, false, this);
};

CatboostIpython.prototype.redrawAll = function() {
if (!this.redrawFunc) {
this.addRedrawFunc();
}

this.redrawFunc();
};

CatboostIpython.prototype.addPoints = function(parent, data) {
var self = this;

data.chunks.forEach(function(item) {
if (typeof item.remaining_time !== ‘undefined’ && typeof item.passed_time !== ‘undefined’) {
if (!self.timeLeft[data.path]) {
self.timeLeft[data.path] = [];
}

self.timeLeft[data.path][item.iteration] = [item.remaining_time, item.passed_time];
}

[‘test’, ‘learn’].forEach(function(type) {
var sets = self.meta[data.path][type + ‘_sets’],
metrics = self.meta[data.path][type + ‘_metrics’];

for (var i = 0; i ‘ + parameter + ‘ : ‘ + valueOfParameter;
}
}
}
if (!hovertextParametersAdded && type === ‘test’) {
hovertextParametersAdded = true;
trace.hovertext[pointIndex] += self.hovertextParameters[pointIndex];
}
smoothedTrace.x[pointIndex] = pointIndex;
}

if (bestValueTrace) {
bestValueTrace.x[pointIndex] = pointIndex;
bestValueTrace.y[pointIndex] = self.lossFuncs[nameOfMetric];
}

if (launchMode === ‘CV’ && !cvAdded) {
cvAdded = true;

self.getTrace(parent, $.extend({cv_stddev_first: true}, params));
self.getTrace(parent, $.extend({cv_stddev_last: true}, params));

self.getTrace(parent, $.extend({cv_stddev_first: true, smoothed: true}, params));
self.getTrace(parent, $.extend({cv_stddev_last: true, smoothed: true}, params));

self.getTrace(parent, $.extend({cv_avg: true}, params));
self.getTrace(parent, $.extend({cv_avg: true, smoothed: true}, params));

if (type === ‘test’) {
self.getTrace(parent, $.extend({cv_avg: true, best_point: true}, params));
}
}
}

self.chartsToRedraw[key.chartId] = true;

self.redrawAll();
}
});
});
};

CatboostIpython.prototype.getLaunchMode = function(path) {
return this.meta[path].launch_mode;
};

CatboostIpython.prototype.getChartNode = function(params, active) {
var node = $(‘

‘);

if (active) {
node.addClass(‘catboost-graph__chart_active’);
}

return node;
};

CatboostIpython.prototype.getChartTab = function(params, active) {
var node = $(‘

‘ + params.name + ‘

‘);

if (active) {
node.addClass(‘catboost-graph__tab_active’);
}

return node;
};

CatboostIpython.prototype.forEveryChart = function(callback) {
for (var name in this.traces) {
if (this.traces.hasOwnProperty(name)) {
callback(this.traces[name]);
}
}
};

CatboostIpython.prototype.forEveryLayout = function(callback) {
this.forEveryChart(function(chart) {
callback(chart.layout);
});
};

CatboostIpython.prototype.getChart = function(parent, params) {
var id = params.id,
self = this;

if (this.charts[id]) {
return this.charts[id];
}

this.addLayout(parent);

var active = this.activeTab === params.id,
chartNode = this.getChartNode(params, active),
chartTab = this.getChartTab(params, active);

$(‘.catboost-graph__charts’, this.layout).append(chartNode);
$(‘.catboost-graph__tabs’, this.layout).append(chartTab);

this.traces[id] = {
id: params.id,
name: params.name,
parent: chartNode[0],
traces: [],
layout: {
xaxis: {
range: [0, Number(this.meta[params.path].iteration_count)],
type: ‘linear’,
tickmode: ‘auto’,
showspikes: true,
spikethickness: 1,
spikedash: ‘longdashdot’,
spikemode: ‘across’,
zeroline: false,
showgrid: false
},
yaxis: {
zeroline: false
//showgrid: false
//hoverformat : ‘.7f’
},
separators: ‘. ‘,
//hovermode: ‘x’,
margin: {l: 38, r: 0, t: 35, b: 30},
autosize: true,
showlegend: false
},
options: {
scrollZoom: false,
modeBarButtonsToRemove: [‘toggleSpikelines’],
displaylogo: false
}
};

this.charts[id] = this.plotly.plot(chartNode[0], this.traces[id].traces, this.traces[id].layout, this.traces[id].options);

chartNode[0].on(‘plotly_hover’, function(e) {
self.updateTracesValues(e.points[0].x);
});

chartNode[0].on(‘plotly_click’, function(e) {
self.updateTracesValues(e.points[0].x, true);
});

return this.charts[id];
};

CatboostIpython.prototype.getTrace = function(parent, params) {
var key = this.getKey(params),
chartSeries = [];

if (this.traces[key.chartId]) {
chartSeries = this.traces[key.chartId].traces.filter(function(trace) {
return trace.name === key.traceName;
});
}

if (chartSeries.length) {
return chartSeries[0];
} else {
this.getChart(parent, {id: key.chartId, name: params.chartName, path: params.path});

var plotParams = {
color: this.getNextColor(params.path, params.smoothed ? 0.2 : 1),
fillsmoothcolor: this.getNextColor(params.path, 0.1),
fillcolor: this.getNextColor(params.path, 0.4),
hoverinfo: params.cv_avg ? ‘skip’ : ‘text+x’,
width: params.cv_avg ? 2 : 1,
dash: params.type === ‘test’ ? ‘solid’ : ‘dot’
},
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
hovertext: [],
hoverinfo: plotParams.hoverinfo,
line: {
width: plotParams.width,
dash: plotParams.dash,
color: plotParams.color
},
mode: ‘lines’,
hoveron: ‘points’,
connectgaps: true
};

if (params.best_point) {
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
marker: {
width: 2,
color: plotParams.color
},
hovertext: [],
hoverinfo: ‘text’,
mode: ‘markers’,
type: ‘scatter’
};
}

if (params.best_value) {
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
line: {
width: 1,
dash: ‘dash’,
color: ‘#CCCCCC’
},
mode: ‘lines’,
connectgaps: true,
hoverinfo: ‘skip’
};
}

if (params.cv_stddev_last) {
trace.fill = ‘tonexty’;
}

trace._params.plotParams = plotParams;

this.traces[key.chartId].traces.push(trace);

return trace;
}
};

CatboostIpython.prototype.getKey = function(params) {
var traceName = [
params.train,
params.type,
params.indexOfSet,
(params.smoothed ? ‘smoothed’ : ”),
(params.best_point ? ‘best_pount’ : ”),
(params.best_value ? ‘best_value’ : ”),
(params.cv_avg ? ‘cv_avg’ : ”),
(params.cv_stddev_first ? ‘cv_stddev_first’ : ”),
(params.cv_stddev_last ? ‘cv_stddev_last’ : ”)
].join(‘;’);

return {
chartId: params.chartName,
traceName: traceName,
colorId: params.train
};
};

CatboostIpython.prototype.filterTracesEvery = function(traces, filter) {
traces = traces || this.traces[this.activeTab].traces;

return traces.filter(function(trace) {
for (var prop in filter) {
if (filter.hasOwnProperty(prop)) {
if (filter[prop] !== trace._params[prop]) {
return false;
}
}
}

return true;
});
};

CatboostIpython.prototype.filterTracesOne = function(traces, filter) {
traces = traces || this.traces[this.activeTab].traces;

return traces.filter(function(trace) {
for (var prop in filter) {
if (filter.hasOwnProperty(prop)) {
if (filter[prop] === trace._params[prop]) {
return true;
}
}
}

return false;
});
};

CatboostIpython.prototype.cleanSeries = function() {
$(‘.catboost-panel__series’, this.layout).html(”);
};

CatboostIpython.prototype.groupTraces = function() {
var traces = this.traces[this.activeTab].traces,
index = 0,
tracesHash = {};

traces.map(function(trace) {
var train = trace._params.train;

if (!tracesHash[train]) {
tracesHash[train] = {
index: index,
traces: [],
info: {
path: trace._params.path,
color: trace._params.plotParams.color
}
};

index++;
}

tracesHash[train].traces.push(trace);
});

return tracesHash;
};

CatboostIpython.prototype.drawTraces = function() {
if ($(‘.catboost-panel__series .catboost-panel__serie’, this.layout).length) {
return;
}

var html = ”,
tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train)) {
html += this.drawTrace(train, tracesHash[train]);
}
}

$(‘.catboost-panel__series’, this.layout).html(html);

this.updateTracesValues();

this.addTracesEvents();
};

CatboostIpython.prototype.getTraceDefParams = function(params) {
var defParams = {
smoothed: undefined,
best_point: undefined,
best_value: undefined,
cv_avg: undefined,
cv_stddev_first: undefined,
cv_stddev_last: undefined
};

if (params) {
return $.extend(defParams, params);
} else {
return defParams;
}
};

CatboostIpython.prototype.drawTrace = function(train, hash) {
var info = hash.info,
id = ‘catboost-serie-‘ + this.index + ‘-‘ + hash.index,
traces = {
learn: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘learn’})),
test: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘test’}))
},
items = {
learn: {
middle: ”,
bottom: ”
},
test: {
middle: ”,
bottom: ”
}
},
tracesNames = ”;

[‘learn’, ‘test’].forEach(function(type) {
traces[type].forEach(function(trace) {
items[type].middle += ‘

‘ +

‘;

items[type].bottom += ‘

‘ +

‘;

tracesNames += ‘

‘ +

‘ + trace._params.nameOfSet + ‘

‘;
});
});

var timeSpendHtml = ‘

‘ +

‘ +

‘;

var html = ‘

‘ +

‘ +
‘ +

‘ +
(this.getLaunchMode(info.path) !== ‘Eval’ ? timeSpendHtml : ”) +

‘ +

curr

‘ +

best

‘ +

‘ +

‘ +

‘ +

‘ +
tracesNames +

‘ +

‘ +
items.learn.middle +
items.test.middle +

‘ +

‘ +
items.learn.bottom +
items.test.bottom +

‘ +

‘ +

‘;

return html;
};

CatboostIpython.prototype.updateTracesValues = function(iteration, click) {
var tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train) && !this.layoutDisabled.traces[train]) {
this.updateTraceValues(train, tracesHash[train], iteration, click);
}
}
};

CatboostIpython.prototype.updateTracesBest = function() {
var tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train) && !this.layoutDisabled.traces[train]) {
this.updateTraceBest(train, tracesHash[train]);
}
}
};

CatboostIpython.prototype.getBestValue = function(data) {
if (!data.length) {
return {
best: undefined,
index: -1
};
}

var best = data[0],
index = 0,
func = this.lossFuncs[this.traces[this.activeTab].name],
bestDiff = typeof func === ‘number’ ? Math.abs(data[0] – func) : 0;

for (var i = 1, l = data.length; i best) {
best = data[i];
index = i;
}

if (typeof func === ‘number’ && Math.abs(data[i] – func) maxLength) {
maxLength = origTrace.y.length;
}
});

for (var i = 0; i 0) {
avgTrace.x[i] = i;
avgTrace.y[i] = sum / count;
}
}
};

CatboostIpython.prototype.updateTracesCVStdDev = function() {
var tracesHash = this.groupTraces(),
firstTraces = this.filterTracesOne(tracesHash.traces, {cv_stddev_first: true}),
self = this;

firstTraces.forEach(function(trace) {
var origTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
smoothed: trace._params.smoothed
})),
lastTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
smoothed: trace._params.smoothed,
cv_stddev_last: true
}));

if (origTraces.length && lastTraces.length === 1) {
self.cvStdDevFunc(origTraces, trace, lastTraces[0]);
}
});
};

CatboostIpython.prototype.cvStdDevFunc = function(origTraces, firstTrace, lastTrace) {
var maxCount = origTraces.length,
maxLength = -1,
count,
sum,
i, j;

origTraces.forEach(function(origTrace) {
if (origTrace.y.length > maxLength) {
maxLength = origTrace.y.length;
}
});

for (i = 0; i i) {
firstTrace.hovertext[i] += this.hovertextParameters[i];
lastTrace.hovertext[i] += this.hovertextParameters[i];
}
}
};

CatboostIpython.prototype.updateTracesSmoothness = function() {
var tracesHash = this.groupTraces(),
smoothedTraces = this.filterTracesOne(tracesHash.traces, {smoothed: true}),
enabled = this.getSmoothness() > -1,
self = this;

smoothedTraces.forEach(function(trace) {
var origTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
indexOfSet: trace._params.indexOfSet,
cv_avg: trace._params.cv_avg,
cv_stddev_first: trace._params.cv_stddev_first,
cv_stddev_last: trace._params.cv_stddev_last
})),
colorFlag = false;

if (origTraces.length === 1) {
origTraces = origTraces[0];

if (origTraces.visible) {
if (enabled) {
self.smoothFunc(origTraces, trace);
colorFlag = true;
}

self.highlightSmoothedTrace(origTraces, trace, colorFlag);
}
}
});
};

CatboostIpython.prototype.highlightSmoothedTrace = function(trace, smoothedTrace, flag) {
if (flag) {
smoothedTrace.line.color = trace._params.plotParams.color;
trace.line.color = smoothedTrace._params.plotParams.color;
trace.hoverinfo = ‘skip’;

if (trace._params.cv_stddev_last) {
trace.fillcolor = trace._params.plotParams.fillsmoothcolor;
}
} else {
trace.line.color = trace._params.plotParams.color;
trace.hoverinfo = trace._params.plotParams.hoverinfo;

if (trace._params.cv_stddev_last) {
trace.fillcolor = trace._params.plotParams.fillcolor;
}
}
};

CatboostIpython.prototype.smoothFunc = function(origTrace, smoothedTrace) {
var data = origTrace.y,
smoothedPoints = this.smooth(data, this.getSmoothness()),
smoothedIndex = 0,
self = this;

if (smoothedPoints.length) {
data.forEach(function (d, index) {
if (!smoothedTrace.x[index]) {
smoothedTrace.x[index] = index;
}

var nameOfSet = smoothedTrace._params.nameOfSet;

if (smoothedTrace._params.cv_stddev_first || smoothedTrace._params.cv_stddev_last) {
nameOfSet = smoothedTrace._params.type + ‘ std’;
}

smoothedTrace.y[index] = smoothedPoints[smoothedIndex];
smoothedTrace.hovertext[index] = nameOfSet + ‘`: ‘ + smoothedPoints[smoothedIndex].toPrecision(7);
if (self.hovertextParameters.length > index) {
smoothedTrace.hovertext[index] += self.hovertextParameters[index];
}
smoothedIndex++;
});
}
};

CatboostIpython.prototype.formatItemValue = function(value, index, suffix) {
if (typeof value === ‘undefined’) {
return ”;
}

suffix = suffix || ”;

return ‘‘ + value + ‘‘;
};

CatboostIpython.prototype.updateTraceBest = function(train, hash) {
var traces = this.filterTracesOne(hash.traces, {best_point: true}),
self = this;

traces.forEach(function(trace) {
var testTrace = self.filterTracesEvery(hash.traces, self.getTraceDefParams({
train: trace._params.train,
type: ‘test’,
indexOfSet: trace._params.indexOfSet
}));

if (self.hasCVMode) {
testTrace = self.filterTracesEvery(hash.traces, self.getTraceDefParams({
train: trace._params.train,
type: ‘test’,
cv_avg: true
}));
}

var bestValue = self.getBestValue(testTrace.length === 1 ? testTrace[0].y : []);

if (bestValue.index !== -1) {
trace.x[0] = bestValue.index;
trace.y[0] = bestValue.best;
trace.hovertext[0] = bestValue.func + ‘ (‘ + (self.hasCVMode ? ‘avg’ : trace._params.nameOfSet) + ‘): ‘ + bestValue.index + ‘ ‘ + bestValue.best;
}
});
};

CatboostIpython.prototype.updateTraceValues = function(name, hash, iteration, click) {
var id = ‘catboost-serie-‘ + this.index + ‘-‘ + hash.index,
traces = {
learn: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘learn’})),
test: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘test’}))
},
path = hash.info.path,
self = this;

[‘learn’, ‘test’].forEach(function(type) {
traces[type].forEach(function(trace) {
var data = trace.y || [],
index = typeof iteration !== ‘undefined’ && iteration -1 ? bestValue.index : ”);

$(‘#’ + id + ‘ .catboost-panel__serie_best_test_value[data-index=’ + trace._params.indexOfSet + ‘]’, self.layout)
.html(self.formatItemValue(bestValue.best, bestValue.index, ‘best ‘ + trace._params.nameOfSet + ‘ ‘));
}
});
});

if (this.hasCVMode) {
var testTrace = this.filterTracesEvery(hash.traces, this.getTraceDefParams({
type: ‘test’,
cv_avg: true
})),
bestValue = this.getBestValue(testTrace.length === 1 ? testTrace[0].y : []);

$(‘#’ + id + ‘ .catboost-panel__serie_best_iteration’, this.layout).html(bestValue.index > -1 ? bestValue.index : ”);
}

if (click) {
this.clickMode = true;

$(‘#catboost-control2-clickmode’ + this.index, this.layout)[0].checked = true;
}
};

CatboostIpython.prototype.addTracesEvents = function() {
var self = this;

$(‘.catboost-panel__serie_checkbox’, this.layout).click(function() {
var name = $(this).data(‘seriename’);

self.layoutDisabled.traces[name] = !$(this)[0].checked;

self.redrawActiveChart();
});
};

CatboostIpython.prototype.getNextColor = function(path, opacity) {
var color;

if (this.colorsByPath[path]) {
color = this.colorsByPath[path];
} else {
color = this.colors[this.colorIndex];
this.colorsByPath[path] = color;

this.colorIndex++;

if (this.colorIndex > this.colors.length – 1) {
this.colorIndex = 0;
}
}

return this.hexToRgba(color, opacity);
};

CatboostIpython.prototype.hexToRgba = function(value, opacity) {
if (value.length 0) {
out += hours + ‘h ‘;
seconds = 0;
millis = 0;
}
if (minutes && minutes > 0) {
out += minutes + ‘m ‘;
millis = 0;
}
if (seconds && seconds > 0) {
out += seconds + ‘s ‘;
}
if (millis && millis > 0) {
out += millis + ‘ms’;
}

return out.trim();
};

CatboostIpython.prototype.mean = function(values, valueof) {
var n = values.length,
m = n,
i = -1,
value,
sum = 0,
number = function(x) {
return x === null ? NaN : +x;
};

if (valueof === null) {
while (++i


Out[41]:
<catboost.core.CatBoostClassifier at 0x7f8460ee3590>
In [42]:
print('ZWYKŁY MODEL    - Na zbiorze testowym accurace: ', accuracy_score(y_validation, model.predict(X_validation)))
print('NAJLEPSZY MODEL - Na zbiorze testowym accurace: ', accuracy_score(y_validation, best_model.predict(X_validation)))
ZWYKŁY MODEL    - Na zbiorze testowym accurace:  0.8071748878923767
NAJLEPSZY MODEL - Na zbiorze testowym accurace:  0.8295964125560538

SPRAWDZAM jaki jest ten najlepszy model najlepszy

In [43]:
y_bestPred = best_model.predict(X_validation)
In [44]:
Classification_Assessment(best_model,X_train, y_train, X_validation, y_validation, y_bestPred)
Recall Training data:      0.7391
Precision Training data:   0.974
----------------------------------------------------------------------
Recall Test data:          0.6854
Precision Test data:       0.8592
----------------------------------------------------------------------
Confusion Matrix Test data
[[124  10]
 [ 28  61]]
----------------------------------------------------------------------
Valuation for test data only:
              precision    recall  f1-score   support

           0       0.82      0.93      0.87       134
           1       0.86      0.69      0.76        89

    accuracy                           0.83       223
   macro avg       0.84      0.81      0.81       223
weighted avg       0.83      0.83      0.83       223

Valuation for test data only:
roc_auc 0.805
---------------------
AUC_train: 0.966
AUC_test:  0.906
---------------------
Accuracy Training data:      0.8937
----------------------------------------------------------------------
Accuracy Test data:          0.8296
----------------------------------------------------------------------

3.2 Wczesne zatrzymanie

Jeśli zasadniczo masz zestaw sprawdzania poprawności, zawsze łatwiej i lepiej jest skorzystać z wczesnego zatrzymania. Ta funkcja jest podobna do poprzedniej, ale oprócz poprawy jakości wciąż oszczędza czas.

Czas robienia modelu bez ‘earlystop’

In [45]:
%%time
model = CatBoostClassifier(**params)
model.fit(train_pool, eval_set=validate_pool)
CPU times: user 12.3 s, sys: 2.15 s, total: 14.4 s
Wall time: 3.37 s
Out[45]:
<catboost.core.CatBoostClassifier at 0x7f846038d0d0>

Czas robienia modelu z ‘earlystop’

In [46]:
%%time
earlystop_params = params.copy()   #<-- tradycyjne dodawanie parametrów
earlystop_params.update({
    'od_type': 'Iter',
    'od_wait': 40
})
earlystop_model = CatBoostClassifier(**earlystop_params)
earlystop_model.fit(train_pool, eval_set=validate_pool)
CPU times: user 1.25 s, sys: 212 ms, total: 1.46 s
Wall time: 364 ms
Out[46]:
<catboost.core.CatBoostClassifier at 0x7f846038a410>

Nowe parametry ‘earlystop’:

In [47]:
earlystop_params
Out[47]:
{'iterations': 500,
 'learning_rate': 0.1,
 'eval_metric': 'Accuracy',
 'random_seed': 42,
 'logging_level': 'Silent',
 'use_best_model': False,
 'od_type': 'Iter',
 'od_wait': 40}
In [48]:
print('Simple model tree count: {}'.format(model.tree_count_))
print('Simple model validation accuracy: {:.4}'.format(
    accuracy_score(y_validation, model.predict(X_validation))
))
print('')

print('Early-stopped model tree count: {}'.format(earlystop_model.tree_count_))
print('Early-stopped model validation accuracy: {:.4}'.format(
    accuracy_score(y_validation, earlystop_model.predict(X_validation))
))
Simple model tree count: 500
Simple model validation accuracy: 0.8072

Early-stopped model tree count: 57
Early-stopped model validation accuracy: 0.8161

Dzięki temu uzyskujemy lepszą jakość w krótszym czasie.

Chociaż, jak pokazano wcześniej, prosty schemat sprawdzania poprawności nie opisuje dokładnie wyniku poza zmiennymi treningowymi (może być tendencyjny z powodu podziału zestawu danych), nadal dobrze jest śledzić dynamikę ulepszeń modelu – a zatem, jak widać z tego przykładu, jest to naprawdę dobrze jest wcześniej zatrzymać proces wzmacniania (zanim rozpocznie się nadmierne dopasowanie)

Rozkładam to na czynniki pierwsze

Czyli model kończy się na najlepszym uzyskanym wyniku ‘accuracy’

In [49]:
earlystop_model.fit(train_pool, eval_set=validate_pool, plot=True)

‘;
}

this.layout = $(‘

‘ +

‘ +

‘ +
‘ +

Learn’ +
‘ +

Eval’ +

‘ +

‘ +

‘ +

‘ +
‘ +
‘ +
‘ +
‘ +

‘ +
‘ +
‘ +
‘ +
‘ +

‘ +
cvAreaControls +

‘ +

‘ +

‘ +

‘ +

‘ +

‘ +

‘);
$(parent).append(this.layout);

this.addTabEvents();
this.addControlEvents();
};

CatboostIpython.prototype.addTabEvents = function() {
var self = this;

$(‘.catboost-graph__tabs’, this.layout).click(function(e) {
if (!$(e.target).is(‘.catboost-graph__tab:not(.catboost-graph__tab_active)’)) {
return;
}

var id = $(e.target).attr(‘tabid’);

self.activeTab = id;

$(‘.catboost-graph__tab_active’, self.layout).removeClass(‘catboost-graph__tab_active’);
$(‘.catboost-graph__chart_active’, self.layout).removeClass(‘catboost-graph__chart_active’);

$(‘.catboost-graph__tab[tabid=”‘ + id + ‘”]’, self.layout).addClass(‘catboost-graph__tab_active’);
$(‘.catboost-graph__chart[tabid=”‘ + id + ‘”]’, self.layout).addClass(‘catboost-graph__chart_active’);

self.cleanSeries();

self.redrawActiveChart();
self.resizeCharts();
});
};

CatboostIpython.prototype.addControlEvents = function() {
var self = this;

$(‘#catboost-control-learn’ + this.index, this.layout).click(function() {
self.layoutDisabled.learn = !$(this)[0].checked;

$(‘.catboost-panel__series’, self.layout).toggleClass(‘catboost-panel__series_learn_disabled’, self.layoutDisabled.learn);

self.redrawActiveChart();
});

$(‘#catboost-control-test’ + this.index, this.layout).click(function() {
self.layoutDisabled.test = !$(this)[0].checked;

$(‘.catboost-panel__series’, self.layout).toggleClass(‘catboost-panel__series_test_disabled’, self.layoutDisabled.test);

self.redrawActiveChart();
});

$(‘#catboost-control2-clickmode’ + this.index, this.layout).click(function() {
self.clickMode = $(this)[0].checked;
});

$(‘#catboost-control2-log’ + this.index, this.layout).click(function() {
self.logarithmMode = $(this)[0].checked ? ‘log’ : ‘linear’;

self.forEveryLayout(function(layout) {
layout.yaxis = {type: self.logarithmMode};
});

self.redrawActiveChart();
});

var slider = $(‘#catboost-control2-slider’ + this.index),
sliderValue = $(‘#catboost-control2-slidervalue’ + this.index);

$(‘#catboost-control2-smooth’ + this.index, this.layout).click(function() {
var enabled = $(this)[0].checked;

self.setSmoothness(enabled ? self.lastSmooth : -1);

slider.prop(‘disabled’, !enabled);
sliderValue.prop(‘disabled’, !enabled);

self.redrawActiveChart();
});

$(‘#catboost-control2-cvstddev’ + this.index, this.layout).click(function() {
var enabled = $(this)[0].checked;

self.setStddev(enabled);

self.redrawActiveChart();
});

slider.on(‘input change’, function() {
var smooth = Number($(this).val());

sliderValue.val(isNaN(smooth) ? 0 : smooth);

self.setSmoothness(smooth);
self.lastSmooth = smooth;

self.redrawActiveChart();
});

sliderValue.on(‘input change’, function() {
var smooth = Number($(this).val());

slider.val(isNaN(smooth) ? 0 : smooth);

self.setSmoothness(smooth);
self.lastSmooth = smooth;

self.redrawActiveChart();
});
};

CatboostIpython.prototype.setTraceVisibility = function(trace, visibility) {
if (trace) {
trace.visible = visibility;
}
};

CatboostIpython.prototype.updateTracesVisibility = function() {
var tracesHash = this.groupTraces(),
traces,
smoothDisabled = this.getSmoothness() === -1,
self = this;

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train)) {
traces = tracesHash[train].traces;

if (this.layoutDisabled.traces[train]) {
traces.forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
} else {
traces.forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

if (this.hasCVMode) {
if (this.stddevEnabled) {
self.filterTracesOne(traces, {type: ‘learn’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesOne(traces, {type: ‘test’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, best_point: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesOne(traces, {cv_stddev_first: true}).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesOne(traces, {cv_stddev_last: true}).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
} else {
self.filterTracesOne(traces, {cv_stddev_first: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesOne(traces, {cv_stddev_last: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, best_point: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}
}

if (smoothDisabled) {
self.filterTracesOne(traces, {smoothed: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}

if (this.layoutDisabled[‘learn’]) {
self.filterTracesOne(traces, {type: ‘learn’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}

if (this.layoutDisabled[‘test’]) {
self.filterTracesOne(traces, {type: ‘test’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}
}
}
}
};

CatboostIpython.prototype.getSmoothness = function() {
return this.smoothness && this.smoothness > -1 ? this.smoothness : -1;
};

CatboostIpython.prototype.setSmoothness = function(weight) {
if (weight 1) {
return;
}

this.smoothness = weight;
};

CatboostIpython.prototype.setStddev = function(enabled) {
this.stddevEnabled = enabled;
};

CatboostIpython.prototype.redrawActiveChart = function() {
this.chartsToRedraw[this.activeTab] = true;

this.redrawAll();
};

CatboostIpython.prototype.redraw = function() {
if (this.chartsToRedraw[this.activeTab]) {
this.chartsToRedraw[this.activeTab] = false;

this.updateTracesVisibility();
this.updateTracesCV();
this.updateTracesBest();
this.updateTracesValues();
this.updateTracesSmoothness();

this.plotly.redraw(this.traces[this.activeTab].parent);
}

this.drawTraces();
};

CatboostIpython.prototype.addRedrawFunc = function() {
this.redrawFunc = throttle(this.redraw, 400, false, this);
};

CatboostIpython.prototype.redrawAll = function() {
if (!this.redrawFunc) {
this.addRedrawFunc();
}

this.redrawFunc();
};

CatboostIpython.prototype.addPoints = function(parent, data) {
var self = this;

data.chunks.forEach(function(item) {
if (typeof item.remaining_time !== ‘undefined’ && typeof item.passed_time !== ‘undefined’) {
if (!self.timeLeft[data.path]) {
self.timeLeft[data.path] = [];
}

self.timeLeft[data.path][item.iteration] = [item.remaining_time, item.passed_time];
}

[‘test’, ‘learn’].forEach(function(type) {
var sets = self.meta[data.path][type + ‘_sets’],
metrics = self.meta[data.path][type + ‘_metrics’];

for (var i = 0; i ‘ + parameter + ‘ : ‘ + valueOfParameter;
}
}
}
if (!hovertextParametersAdded && type === ‘test’) {
hovertextParametersAdded = true;
trace.hovertext[pointIndex] += self.hovertextParameters[pointIndex];
}
smoothedTrace.x[pointIndex] = pointIndex;
}

if (bestValueTrace) {
bestValueTrace.x[pointIndex] = pointIndex;
bestValueTrace.y[pointIndex] = self.lossFuncs[nameOfMetric];
}

if (launchMode === ‘CV’ && !cvAdded) {
cvAdded = true;

self.getTrace(parent, $.extend({cv_stddev_first: true}, params));
self.getTrace(parent, $.extend({cv_stddev_last: true}, params));

self.getTrace(parent, $.extend({cv_stddev_first: true, smoothed: true}, params));
self.getTrace(parent, $.extend({cv_stddev_last: true, smoothed: true}, params));

self.getTrace(parent, $.extend({cv_avg: true}, params));
self.getTrace(parent, $.extend({cv_avg: true, smoothed: true}, params));

if (type === ‘test’) {
self.getTrace(parent, $.extend({cv_avg: true, best_point: true}, params));
}
}
}

self.chartsToRedraw[key.chartId] = true;

self.redrawAll();
}
});
});
};

CatboostIpython.prototype.getLaunchMode = function(path) {
return this.meta[path].launch_mode;
};

CatboostIpython.prototype.getChartNode = function(params, active) {
var node = $(‘

‘);

if (active) {
node.addClass(‘catboost-graph__chart_active’);
}

return node;
};

CatboostIpython.prototype.getChartTab = function(params, active) {
var node = $(‘

‘ + params.name + ‘

‘);

if (active) {
node.addClass(‘catboost-graph__tab_active’);
}

return node;
};

CatboostIpython.prototype.forEveryChart = function(callback) {
for (var name in this.traces) {
if (this.traces.hasOwnProperty(name)) {
callback(this.traces[name]);
}
}
};

CatboostIpython.prototype.forEveryLayout = function(callback) {
this.forEveryChart(function(chart) {
callback(chart.layout);
});
};

CatboostIpython.prototype.getChart = function(parent, params) {
var id = params.id,
self = this;

if (this.charts[id]) {
return this.charts[id];
}

this.addLayout(parent);

var active = this.activeTab === params.id,
chartNode = this.getChartNode(params, active),
chartTab = this.getChartTab(params, active);

$(‘.catboost-graph__charts’, this.layout).append(chartNode);
$(‘.catboost-graph__tabs’, this.layout).append(chartTab);

this.traces[id] = {
id: params.id,
name: params.name,
parent: chartNode[0],
traces: [],
layout: {
xaxis: {
range: [0, Number(this.meta[params.path].iteration_count)],
type: ‘linear’,
tickmode: ‘auto’,
showspikes: true,
spikethickness: 1,
spikedash: ‘longdashdot’,
spikemode: ‘across’,
zeroline: false,
showgrid: false
},
yaxis: {
zeroline: false
//showgrid: false
//hoverformat : ‘.7f’
},
separators: ‘. ‘,
//hovermode: ‘x’,
margin: {l: 38, r: 0, t: 35, b: 30},
autosize: true,
showlegend: false
},
options: {
scrollZoom: false,
modeBarButtonsToRemove: [‘toggleSpikelines’],
displaylogo: false
}
};

this.charts[id] = this.plotly.plot(chartNode[0], this.traces[id].traces, this.traces[id].layout, this.traces[id].options);

chartNode[0].on(‘plotly_hover’, function(e) {
self.updateTracesValues(e.points[0].x);
});

chartNode[0].on(‘plotly_click’, function(e) {
self.updateTracesValues(e.points[0].x, true);
});

return this.charts[id];
};

CatboostIpython.prototype.getTrace = function(parent, params) {
var key = this.getKey(params),
chartSeries = [];

if (this.traces[key.chartId]) {
chartSeries = this.traces[key.chartId].traces.filter(function(trace) {
return trace.name === key.traceName;
});
}

if (chartSeries.length) {
return chartSeries[0];
} else {
this.getChart(parent, {id: key.chartId, name: params.chartName, path: params.path});

var plotParams = {
color: this.getNextColor(params.path, params.smoothed ? 0.2 : 1),
fillsmoothcolor: this.getNextColor(params.path, 0.1),
fillcolor: this.getNextColor(params.path, 0.4),
hoverinfo: params.cv_avg ? ‘skip’ : ‘text+x’,
width: params.cv_avg ? 2 : 1,
dash: params.type === ‘test’ ? ‘solid’ : ‘dot’
},
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
hovertext: [],
hoverinfo: plotParams.hoverinfo,
line: {
width: plotParams.width,
dash: plotParams.dash,
color: plotParams.color
},
mode: ‘lines’,
hoveron: ‘points’,
connectgaps: true
};

if (params.best_point) {
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
marker: {
width: 2,
color: plotParams.color
},
hovertext: [],
hoverinfo: ‘text’,
mode: ‘markers’,
type: ‘scatter’
};
}

if (params.best_value) {
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
line: {
width: 1,
dash: ‘dash’,
color: ‘#CCCCCC’
},
mode: ‘lines’,
connectgaps: true,
hoverinfo: ‘skip’
};
}

if (params.cv_stddev_last) {
trace.fill = ‘tonexty’;
}

trace._params.plotParams = plotParams;

this.traces[key.chartId].traces.push(trace);

return trace;
}
};

CatboostIpython.prototype.getKey = function(params) {
var traceName = [
params.train,
params.type,
params.indexOfSet,
(params.smoothed ? ‘smoothed’ : ”),
(params.best_point ? ‘best_pount’ : ”),
(params.best_value ? ‘best_value’ : ”),
(params.cv_avg ? ‘cv_avg’ : ”),
(params.cv_stddev_first ? ‘cv_stddev_first’ : ”),
(params.cv_stddev_last ? ‘cv_stddev_last’ : ”)
].join(‘;’);

return {
chartId: params.chartName,
traceName: traceName,
colorId: params.train
};
};

CatboostIpython.prototype.filterTracesEvery = function(traces, filter) {
traces = traces || this.traces[this.activeTab].traces;

return traces.filter(function(trace) {
for (var prop in filter) {
if (filter.hasOwnProperty(prop)) {
if (filter[prop] !== trace._params[prop]) {
return false;
}
}
}

return true;
});
};

CatboostIpython.prototype.filterTracesOne = function(traces, filter) {
traces = traces || this.traces[this.activeTab].traces;

return traces.filter(function(trace) {
for (var prop in filter) {
if (filter.hasOwnProperty(prop)) {
if (filter[prop] === trace._params[prop]) {
return true;
}
}
}

return false;
});
};

CatboostIpython.prototype.cleanSeries = function() {
$(‘.catboost-panel__series’, this.layout).html(”);
};

CatboostIpython.prototype.groupTraces = function() {
var traces = this.traces[this.activeTab].traces,
index = 0,
tracesHash = {};

traces.map(function(trace) {
var train = trace._params.train;

if (!tracesHash[train]) {
tracesHash[train] = {
index: index,
traces: [],
info: {
path: trace._params.path,
color: trace._params.plotParams.color
}
};

index++;
}

tracesHash[train].traces.push(trace);
});

return tracesHash;
};

CatboostIpython.prototype.drawTraces = function() {
if ($(‘.catboost-panel__series .catboost-panel__serie’, this.layout).length) {
return;
}

var html = ”,
tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train)) {
html += this.drawTrace(train, tracesHash[train]);
}
}

$(‘.catboost-panel__series’, this.layout).html(html);

this.updateTracesValues();

this.addTracesEvents();
};

CatboostIpython.prototype.getTraceDefParams = function(params) {
var defParams = {
smoothed: undefined,
best_point: undefined,
best_value: undefined,
cv_avg: undefined,
cv_stddev_first: undefined,
cv_stddev_last: undefined
};

if (params) {
return $.extend(defParams, params);
} else {
return defParams;
}
};

CatboostIpython.prototype.drawTrace = function(train, hash) {
var info = hash.info,
id = ‘catboost-serie-‘ + this.index + ‘-‘ + hash.index,
traces = {
learn: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘learn’})),
test: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘test’}))
},
items = {
learn: {
middle: ”,
bottom: ”
},
test: {
middle: ”,
bottom: ”
}
},
tracesNames = ”;

[‘learn’, ‘test’].forEach(function(type) {
traces[type].forEach(function(trace) {
items[type].middle += ‘

‘ +

‘;

items[type].bottom += ‘

‘ +

‘;

tracesNames += ‘

‘ +

‘ + trace._params.nameOfSet + ‘

‘;
});
});

var timeSpendHtml = ‘

‘ +

‘ +

‘;

var html = ‘

‘ +

‘ +
‘ +

‘ +
(this.getLaunchMode(info.path) !== ‘Eval’ ? timeSpendHtml : ”) +

‘ +

curr

‘ +

best

‘ +

‘ +

‘ +

‘ +

‘ +
tracesNames +

‘ +

‘ +
items.learn.middle +
items.test.middle +

‘ +

‘ +
items.learn.bottom +
items.test.bottom +

‘ +

‘ +

‘;

return html;
};

CatboostIpython.prototype.updateTracesValues = function(iteration, click) {
var tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train) && !this.layoutDisabled.traces[train]) {
this.updateTraceValues(train, tracesHash[train], iteration, click);
}
}
};

CatboostIpython.prototype.updateTracesBest = function() {
var tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train) && !this.layoutDisabled.traces[train]) {
this.updateTraceBest(train, tracesHash[train]);
}
}
};

CatboostIpython.prototype.getBestValue = function(data) {
if (!data.length) {
return {
best: undefined,
index: -1
};
}

var best = data[0],
index = 0,
func = this.lossFuncs[this.traces[this.activeTab].name],
bestDiff = typeof func === ‘number’ ? Math.abs(data[0] – func) : 0;

for (var i = 1, l = data.length; i best) {
best = data[i];
index = i;
}

if (typeof func === ‘number’ && Math.abs(data[i] – func) maxLength) {
maxLength = origTrace.y.length;
}
});

for (var i = 0; i 0) {
avgTrace.x[i] = i;
avgTrace.y[i] = sum / count;
}
}
};

CatboostIpython.prototype.updateTracesCVStdDev = function() {
var tracesHash = this.groupTraces(),
firstTraces = this.filterTracesOne(tracesHash.traces, {cv_stddev_first: true}),
self = this;

firstTraces.forEach(function(trace) {
var origTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
smoothed: trace._params.smoothed
})),
lastTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
smoothed: trace._params.smoothed,
cv_stddev_last: true
}));

if (origTraces.length && lastTraces.length === 1) {
self.cvStdDevFunc(origTraces, trace, lastTraces[0]);
}
});
};

CatboostIpython.prototype.cvStdDevFunc = function(origTraces, firstTrace, lastTrace) {
var maxCount = origTraces.length,
maxLength = -1,
count,
sum,
i, j;

origTraces.forEach(function(origTrace) {
if (origTrace.y.length > maxLength) {
maxLength = origTrace.y.length;
}
});

for (i = 0; i i) {
firstTrace.hovertext[i] += this.hovertextParameters[i];
lastTrace.hovertext[i] += this.hovertextParameters[i];
}
}
};

CatboostIpython.prototype.updateTracesSmoothness = function() {
var tracesHash = this.groupTraces(),
smoothedTraces = this.filterTracesOne(tracesHash.traces, {smoothed: true}),
enabled = this.getSmoothness() > -1,
self = this;

smoothedTraces.forEach(function(trace) {
var origTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
indexOfSet: trace._params.indexOfSet,
cv_avg: trace._params.cv_avg,
cv_stddev_first: trace._params.cv_stddev_first,
cv_stddev_last: trace._params.cv_stddev_last
})),
colorFlag = false;

if (origTraces.length === 1) {
origTraces = origTraces[0];

if (origTraces.visible) {
if (enabled) {
self.smoothFunc(origTraces, trace);
colorFlag = true;
}

self.highlightSmoothedTrace(origTraces, trace, colorFlag);
}
}
});
};

CatboostIpython.prototype.highlightSmoothedTrace = function(trace, smoothedTrace, flag) {
if (flag) {
smoothedTrace.line.color = trace._params.plotParams.color;
trace.line.color = smoothedTrace._params.plotParams.color;
trace.hoverinfo = ‘skip’;

if (trace._params.cv_stddev_last) {
trace.fillcolor = trace._params.plotParams.fillsmoothcolor;
}
} else {
trace.line.color = trace._params.plotParams.color;
trace.hoverinfo = trace._params.plotParams.hoverinfo;

if (trace._params.cv_stddev_last) {
trace.fillcolor = trace._params.plotParams.fillcolor;
}
}
};

CatboostIpython.prototype.smoothFunc = function(origTrace, smoothedTrace) {
var data = origTrace.y,
smoothedPoints = this.smooth(data, this.getSmoothness()),
smoothedIndex = 0,
self = this;

if (smoothedPoints.length) {
data.forEach(function (d, index) {
if (!smoothedTrace.x[index]) {
smoothedTrace.x[index] = index;
}

var nameOfSet = smoothedTrace._params.nameOfSet;

if (smoothedTrace._params.cv_stddev_first || smoothedTrace._params.cv_stddev_last) {
nameOfSet = smoothedTrace._params.type + ‘ std’;
}

smoothedTrace.y[index] = smoothedPoints[smoothedIndex];
smoothedTrace.hovertext[index] = nameOfSet + ‘`: ‘ + smoothedPoints[smoothedIndex].toPrecision(7);
if (self.hovertextParameters.length > index) {
smoothedTrace.hovertext[index] += self.hovertextParameters[index];
}
smoothedIndex++;
});
}
};

CatboostIpython.prototype.formatItemValue = function(value, index, suffix) {
if (typeof value === ‘undefined’) {
return ”;
}

suffix = suffix || ”;

return ‘‘ + value + ‘‘;
};

CatboostIpython.prototype.updateTraceBest = function(train, hash) {
var traces = this.filterTracesOne(hash.traces, {best_point: true}),
self = this;

traces.forEach(function(trace) {
var testTrace = self.filterTracesEvery(hash.traces, self.getTraceDefParams({
train: trace._params.train,
type: ‘test’,
indexOfSet: trace._params.indexOfSet
}));

if (self.hasCVMode) {
testTrace = self.filterTracesEvery(hash.traces, self.getTraceDefParams({
train: trace._params.train,
type: ‘test’,
cv_avg: true
}));
}

var bestValue = self.getBestValue(testTrace.length === 1 ? testTrace[0].y : []);

if (bestValue.index !== -1) {
trace.x[0] = bestValue.index;
trace.y[0] = bestValue.best;
trace.hovertext[0] = bestValue.func + ‘ (‘ + (self.hasCVMode ? ‘avg’ : trace._params.nameOfSet) + ‘): ‘ + bestValue.index + ‘ ‘ + bestValue.best;
}
});
};

CatboostIpython.prototype.updateTraceValues = function(name, hash, iteration, click) {
var id = ‘catboost-serie-‘ + this.index + ‘-‘ + hash.index,
traces = {
learn: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘learn’})),
test: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘test’}))
},
path = hash.info.path,
self = this;

[‘learn’, ‘test’].forEach(function(type) {
traces[type].forEach(function(trace) {
var data = trace.y || [],
index = typeof iteration !== ‘undefined’ && iteration -1 ? bestValue.index : ”);

$(‘#’ + id + ‘ .catboost-panel__serie_best_test_value[data-index=’ + trace._params.indexOfSet + ‘]’, self.layout)
.html(self.formatItemValue(bestValue.best, bestValue.index, ‘best ‘ + trace._params.nameOfSet + ‘ ‘));
}
});
});

if (this.hasCVMode) {
var testTrace = this.filterTracesEvery(hash.traces, this.getTraceDefParams({
type: ‘test’,
cv_avg: true
})),
bestValue = this.getBestValue(testTrace.length === 1 ? testTrace[0].y : []);

$(‘#’ + id + ‘ .catboost-panel__serie_best_iteration’, this.layout).html(bestValue.index > -1 ? bestValue.index : ”);
}

if (click) {
this.clickMode = true;

$(‘#catboost-control2-clickmode’ + this.index, this.layout)[0].checked = true;
}
};

CatboostIpython.prototype.addTracesEvents = function() {
var self = this;

$(‘.catboost-panel__serie_checkbox’, this.layout).click(function() {
var name = $(this).data(‘seriename’);

self.layoutDisabled.traces[name] = !$(this)[0].checked;

self.redrawActiveChart();
});
};

CatboostIpython.prototype.getNextColor = function(path, opacity) {
var color;

if (this.colorsByPath[path]) {
color = this.colorsByPath[path];
} else {
color = this.colors[this.colorIndex];
this.colorsByPath[path] = color;

this.colorIndex++;

if (this.colorIndex > this.colors.length – 1) {
this.colorIndex = 0;
}
}

return this.hexToRgba(color, opacity);
};

CatboostIpython.prototype.hexToRgba = function(value, opacity) {
if (value.length 0) {
out += hours + ‘h ‘;
seconds = 0;
millis = 0;
}
if (minutes && minutes > 0) {
out += minutes + ‘m ‘;
millis = 0;
}
if (seconds && seconds > 0) {
out += seconds + ‘s ‘;
}
if (millis && millis > 0) {
out += millis + ‘ms’;
}

return out.trim();
};

CatboostIpython.prototype.mean = function(values, valueof) {
var n = values.length,
m = n,
i = -1,
value,
sum = 0,
number = function(x) {
return x === null ? NaN : +x;
};

if (valueof === null) {
while (++i


Out[49]:
<catboost.core.CatBoostClassifier at 0x7f846038a410>
In [50]:
y_earlystop = earlystop_model.predict(X_validation)
In [51]:
Classification_Assessment(earlystop_model,X_train, y_train, X_validation, y_validation, y_earlystop)
Recall Training data:      0.7628
Precision Training data:   0.9747
----------------------------------------------------------------------
Recall Test data:          0.6629
Precision Test data:       0.8429
----------------------------------------------------------------------
Confusion Matrix Test data
[[123  11]
 [ 30  59]]
----------------------------------------------------------------------
Valuation for test data only:
              precision    recall  f1-score   support

           0       0.80      0.92      0.86       134
           1       0.84      0.66      0.74        89

    accuracy                           0.82       223
   macro avg       0.82      0.79      0.80       223
weighted avg       0.82      0.82      0.81       223

Valuation for test data only:
roc_auc 0.790
---------------------
AUC_train: 0.961
AUC_test:  0.901
---------------------
Accuracy Training data:      0.9027
----------------------------------------------------------------------
Accuracy Test data:          0.8161
----------------------------------------------------------------------

3.3 Korzystanie z linii bazowej

Możliwe jest wykorzystanie wyników przedtreningowych (wyjściowych) do treningu.

Nie wiem po co to jest – daje słabe wyniki itd

In [54]:
current_params = params.copy()
current_params.update({
    'iterations': 10
})
In [55]:
model = CatBoostClassifier(**current_params).fit(X_train, y_train, categorical_features_indices)
# Get baseline (only with prediction_type='RawFormulaVal')
baseline = model.predict(X_train, prediction_type='RawFormulaVal')
# Fit new model
model.fit(X_train, y_train, categorical_features_indices, baseline=baseline)
Out[55]:
<catboost.core.CatBoostClassifier at 0x7f8465fec1d0>

Znowu zmieniam parapetry dodaje jakiś parametr:’iterations’: 10

In [56]:
model_cp = CatBoostClassifier(**current_params)
model_cp = model_cp.fit(X_train, y_train, categorical_features_indices)

Uzyskaj linię bazową (tylko z prediction_type = ‘RawFormulaVal’)

In [57]:
baseline = model_cp.predict(X_train, prediction_type='RawFormulaVal')

Fit new model

In [58]:
model_cp.fit(X_train, y_train, categorical_features_indices, baseline=baseline)
Out[58]:
<catboost.core.CatBoostClassifier at 0x7f8460468cd0>
In [59]:
y_pred_cp = model_cp.predict(X_validation)
In [60]:
Classification_Assessment(model_cp,X_train, y_train, X_validation, y_validation, y_pred_cp)
Recall Training data:      0.6601
Precision Training data:   0.9227
----------------------------------------------------------------------
Recall Test data:          0.6966
Precision Test data:       0.7848
----------------------------------------------------------------------
Confusion Matrix Test data
[[117  17]
 [ 27  62]]
----------------------------------------------------------------------
Valuation for test data only:
              precision    recall  f1-score   support

           0       0.81      0.87      0.84       134
           1       0.78      0.70      0.74        89

    accuracy                           0.80       223
   macro avg       0.80      0.78      0.79       223
weighted avg       0.80      0.80      0.80       223

Valuation for test data only:
roc_auc 0.785
---------------------
AUC_train: 0.930
AUC_test:  0.879
---------------------
Accuracy Training data:      0.8503
----------------------------------------------------------------------
Accuracy Test data:          0.8027
----------------------------------------------------------------------

3.4 Obsługa migawek

Catboost obsługuje migawki. Możesz go użyć do odzyskania treningu po przerwie lub do rozpoczęcia treningu z wcześniejszymi wynikami.

In [61]:
params_with_snapshot = params.copy()    #<-- tradycyjnie dodajemy nowe parametry
params_with_snapshot.update({
    'iterations': 5,
    'learning_rate': 0.5,
    'logging_level': 'Verbose'
})

model = CatBoostClassifier(**params_with_snapshot).fit(train_pool, eval_set=validate_pool, save_snapshot=True)

In [62]:
params_with_snapshot.update({      #<-- zmieniamy ustawienia migawki
    'iterations': 10,
    'learning_rate': 0.1,
})

model = CatBoostClassifier(**params_with_snapshot).fit(train_pool, eval_set=validate_pool, save_snapshot=True)

3.5 Funkcja celu zdefiniowana przez użytkownika

Możliwe jest stworzenie własnej funkcji celu. Utwórzmy funkcję celu logloss.

przybliżenia, cele, wagi są indeksowanymi pojemnikami pływaków
(pojemniki, które mają zdefiniowane tylko len i getitem).
parametrem wag może być Brak.

Aby zrozumieć, co oznaczają te parametry, załóż, że istnieje podzbiór zestawu danych, który jest obecnie przetwarzany. Program przybliża zawiera bieżące prognozy dla tego podzbioru, cele zawierają wartości docelowe podane w zestawie danych.

Ta funkcja powinna zwrócić listę par (der1, der2), gdzie
der1 jest pierwszą pochodną funkcji straty w odniesieniu do
do przewidywanej wartości, a der2 jest drugą pochodną.

W naszym przypadku logloss jest definiowany za pomocą następującej formuły:
cel log (sigmoid (w przybliżeniu)) + (1 – cel) (1 – sigmoid (w przybliżeniu))
gdzie sigmoid (x) = 1 / (1 + e ^ (- x)).

In [63]:
class LoglossObjective(object):
    def calc_ders_range(self, approxes, targets, weights):
        # approxes, targets, weights are indexed containers of floats
        # (containers which have only __len__ and __getitem__ defined).
        # weights parameter can be None.
        #
        # To understand what these parameters mean, assume that there is
        # a subset of your dataset that is currently being processed.
        # approxes contains current predictions for this subset,
        # targets contains target values you provided with the dataset.
        #
        # This function should return a list of pairs (der1, der2), where
        # der1 is the first derivative of the loss function with respect
        # to the predicted value, and der2 is the second derivative.
        #
        # In our case, logloss is defined by the following formula:
        # target * log(sigmoid(approx)) + (1 - target) * (1 - sigmoid(approx))
        # where sigmoid(x) = 1 / (1 + e^(-x)).
        
        assert len(approxes) == len(targets)
        if weights is not None:
            assert len(weights) == len(approxes)
        
        result = []
        for index in range(len(targets)):
            e = np.exp(approxes[index])
            p = e / (1 + e)
            der1 = (1 - p) if targets[index] > 0.0 else -p
            der2 = -p * (1 - p)

            if weights is not None:
                der1 *= weights[index]
                der2 *= weights[index]

            result.append((der1, der2))
        return result
In [64]:
model = CatBoostClassifier(
    iterations=10,
    random_seed=42, 
    loss_function=LoglossObjective(),   ##<-- dodajemy własnoręcznie wymyśloną funkcję
    eval_metric="Logloss"
)
In [65]:
# Fit model
model.fit(train_pool)
# Only prediction_type='RawFormulaVal' is allowed with custom `loss_function`
preds_raw = model.predict(X_validation, prediction_type='RawFormulaVal')
0:	learn: 0.6827074	total: 21.7ms	remaining: 195ms
1:	learn: 0.6722947	total: 41.5ms	remaining: 166ms
2:	learn: 0.6624914	total: 58.2ms	remaining: 136ms
3:	learn: 0.6528402	total: 77.3ms	remaining: 116ms
4:	learn: 0.6436863	total: 96.3ms	remaining: 96.3ms
5:	learn: 0.6346627	total: 114ms	remaining: 76.2ms
6:	learn: 0.6279562	total: 133ms	remaining: 56.8ms
7:	learn: 0.6201005	total: 154ms	remaining: 38.5ms
8:	learn: 0.6127656	total: 171ms	remaining: 19ms
9:	learn: 0.6053589	total: 189ms	remaining: 0us

3.6 Funkcja metryczna zdefiniowana przez użytkownika

Możliwe jest również utworzenie własnej funkcji metrycznej. Utwórzmy funkcję metryczną logloss.

In [66]:
class LoglossMetric(object):
    def get_final_error(self, error, weight):
        return error / (weight + 1e-38)

    def is_max_optimal(self):
        return False

    def evaluate(self, approxes, target, weight):
        # approxes is a list of indexed containers
        # (containers with only __len__ and __getitem__ defined),
        # one container per approx dimension.
        # Each container contains floats.
        # weight is a one dimensional indexed container.
        # target is float.
        
        # weight parameter can be None.
        # Returns pair (error, weights sum)
        
        assert len(approxes) == 1
        assert len(target) == len(approxes[0])

        approx = approxes[0]

        error_sum = 0.0
        weight_sum = 0.0

        for i in range(len(approx)):
            w = 1.0 if weight is None else weight[i]
            weight_sum += w
            error_sum += -w * (target[i] * approx[i] - np.log(1 + np.exp(approx[i])))

        return error_sum, weight_sum
In [67]:
model = CatBoostClassifier(
    iterations=10,
    random_seed=42, 
    loss_function="Logloss",
    eval_metric=LoglossMetric()
)
# Fit model
model.fit(train_pool)
# Only prediction_type='RawFormulaVal' is allowed with custom `loss_function`
preds_raw = model.predict(X_validation, prediction_type='RawFormulaVal')
Learning rate set to 0.5
0:	learn: 0.5521578	total: 7.79ms	remaining: 70.1ms
1:	learn: 0.4885686	total: 16ms	remaining: 64.2ms
2:	learn: 0.4646498	total: 22.5ms	remaining: 52.5ms
3:	learn: 0.4433198	total: 29.7ms	remaining: 44.6ms
4:	learn: 0.4348036	total: 36.5ms	remaining: 36.5ms
5:	learn: 0.4304872	total: 43.6ms	remaining: 29.1ms
6:	learn: 0.4169664	total: 49.9ms	remaining: 21.4ms
7:	learn: 0.4067507	total: 56.6ms	remaining: 14.1ms
8:	learn: 0.4019576	total: 62.8ms	remaining: 6.98ms
9:	learn: 0.3970545	total: 69.8ms	remaining: 0us

3.7 Przewidywane etapy

Model CatBoost ma metodę staged_predict. Pozwala iteracyjnie uzyskać prognozy dla danego zakresu drzew.

In [68]:
model_kot = CatBoostClassifier(iterations=10, random_seed=42, logging_level='Silent').fit(train_pool)
model_kot
Out[68]:
<catboost.core.CatBoostClassifier at 0x7f84603e9490>

określami ilość drzew

In [69]:
ntree_start, ntree_end, eval_period = 3, 9, 2

predictions_iterator

In [70]:
predictions_iterator = model.staged_predict(validate_pool, 'Probability', ntree_start, ntree_end, eval_period)

nie wiem co on teraz robi

In [71]:
for preds, tree_count in zip(predictions_iterator, range(ntree_start, ntree_end, eval_period)):
    print('First class probabilities using the first {} trees: {}'.format(tree_count, preds[:5, 1]))
First class probabilities using the first 3 trees: [0.42990422 0.42665956 0.4192657  0.56176543 0.4763258 ]
First class probabilities using the first 5 trees: [0.40394604 0.35310234 0.38666939 0.57518619 0.49553116]
First class probabilities using the first 7 trees: [0.39987636 0.34035878 0.3468137  0.53325091 0.54678221]

3.8 Najważniejsze cechy

Czasami bardzo ważne jest, aby zrozumieć, która funkcja miała największy wpływ na końcowy wynik. Aby to zrobić, model CatBoost ma metodę get_feature_importance.

Tworzy się taki model szkieletowy jak w poprzedniej metodzie

In [72]:
model = CatBoostClassifier(iterations=50, random_seed=42, logging_level='Silent').fit(train_pool)

Mówi się modelowi aby: ‘get_feature_importance’

In [73]:
feature_importances = model.get_feature_importance(train_pool)
In [74]:
feature_names = X_train.columns
feature_names
Out[74]:
Index(['PassengerId', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch',
       'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')
In [75]:
for score, name in sorted(zip(feature_importances, feature_names), reverse=True):
    print('{}: {}'.format(name, score))
Sex: 56.4409024798132
Pclass: 16.831468536670158
Ticket: 6.3123775952096715
Parch: 4.157791677223602
Cabin: 3.6700917688447063
Embarked: 3.595172924440488
Age: 3.532435299190085
Fare: 3.002584491481529
SibSp: 2.457175227126605
PassengerId: 0.0
Name: 0.0

To pokazuje, że funkcje Sex i Pclass miały największy wpływ na wynik.

co ciekawe wcale zadeklarowałem w ‘train_pool’, w które zmienne są dyskretne a model sam je sobie zcyfryzował

In [76]:
X_train['Ticket']
Out[76]:
298              19988
884    SOTON/OQ 392076
247             250649
478             350060
305             113781
            ...       
106             343120
270             113798
860             350026
435             113760
102              35281
Name: Ticket, Length: 668, dtype: object

3.9 Wskaźniki oceny¶

CatBoost ma metodę eval_metrics, która pozwala obliczyć dane metryki dla danego zestawu danych. I oczywiście je narysować 🙂

In [77]:
model = CatBoostClassifier(iterations=50, random_seed=42, logging_level='Silent').fit(train_pool)
eval_metrics = model.eval_metrics(validate_pool, ['AUC'], plot=True)

‘;
}

this.layout = $(‘

‘ +

‘ +

‘ +
‘ +

Learn’ +
‘ +

Eval’ +

‘ +

‘ +

‘ +

‘ +
‘ +
‘ +
‘ +
‘ +

‘ +
‘ +
‘ +
‘ +
‘ +

‘ +
cvAreaControls +

‘ +

‘ +

‘ +

‘ +

‘ +

‘ +

‘);
$(parent).append(this.layout);

this.addTabEvents();
this.addControlEvents();
};

CatboostIpython.prototype.addTabEvents = function() {
var self = this;

$(‘.catboost-graph__tabs’, this.layout).click(function(e) {
if (!$(e.target).is(‘.catboost-graph__tab:not(.catboost-graph__tab_active)’)) {
return;
}

var id = $(e.target).attr(‘tabid’);

self.activeTab = id;

$(‘.catboost-graph__tab_active’, self.layout).removeClass(‘catboost-graph__tab_active’);
$(‘.catboost-graph__chart_active’, self.layout).removeClass(‘catboost-graph__chart_active’);

$(‘.catboost-graph__tab[tabid=”‘ + id + ‘”]’, self.layout).addClass(‘catboost-graph__tab_active’);
$(‘.catboost-graph__chart[tabid=”‘ + id + ‘”]’, self.layout).addClass(‘catboost-graph__chart_active’);

self.cleanSeries();

self.redrawActiveChart();
self.resizeCharts();
});
};

CatboostIpython.prototype.addControlEvents = function() {
var self = this;

$(‘#catboost-control-learn’ + this.index, this.layout).click(function() {
self.layoutDisabled.learn = !$(this)[0].checked;

$(‘.catboost-panel__series’, self.layout).toggleClass(‘catboost-panel__series_learn_disabled’, self.layoutDisabled.learn);

self.redrawActiveChart();
});

$(‘#catboost-control-test’ + this.index, this.layout).click(function() {
self.layoutDisabled.test = !$(this)[0].checked;

$(‘.catboost-panel__series’, self.layout).toggleClass(‘catboost-panel__series_test_disabled’, self.layoutDisabled.test);

self.redrawActiveChart();
});

$(‘#catboost-control2-clickmode’ + this.index, this.layout).click(function() {
self.clickMode = $(this)[0].checked;
});

$(‘#catboost-control2-log’ + this.index, this.layout).click(function() {
self.logarithmMode = $(this)[0].checked ? ‘log’ : ‘linear’;

self.forEveryLayout(function(layout) {
layout.yaxis = {type: self.logarithmMode};
});

self.redrawActiveChart();
});

var slider = $(‘#catboost-control2-slider’ + this.index),
sliderValue = $(‘#catboost-control2-slidervalue’ + this.index);

$(‘#catboost-control2-smooth’ + this.index, this.layout).click(function() {
var enabled = $(this)[0].checked;

self.setSmoothness(enabled ? self.lastSmooth : -1);

slider.prop(‘disabled’, !enabled);
sliderValue.prop(‘disabled’, !enabled);

self.redrawActiveChart();
});

$(‘#catboost-control2-cvstddev’ + this.index, this.layout).click(function() {
var enabled = $(this)[0].checked;

self.setStddev(enabled);

self.redrawActiveChart();
});

slider.on(‘input change’, function() {
var smooth = Number($(this).val());

sliderValue.val(isNaN(smooth) ? 0 : smooth);

self.setSmoothness(smooth);
self.lastSmooth = smooth;

self.redrawActiveChart();
});

sliderValue.on(‘input change’, function() {
var smooth = Number($(this).val());

slider.val(isNaN(smooth) ? 0 : smooth);

self.setSmoothness(smooth);
self.lastSmooth = smooth;

self.redrawActiveChart();
});
};

CatboostIpython.prototype.setTraceVisibility = function(trace, visibility) {
if (trace) {
trace.visible = visibility;
}
};

CatboostIpython.prototype.updateTracesVisibility = function() {
var tracesHash = this.groupTraces(),
traces,
smoothDisabled = this.getSmoothness() === -1,
self = this;

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train)) {
traces = tracesHash[train].traces;

if (this.layoutDisabled.traces[train]) {
traces.forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
} else {
traces.forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

if (this.hasCVMode) {
if (this.stddevEnabled) {
self.filterTracesOne(traces, {type: ‘learn’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesOne(traces, {type: ‘test’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, best_point: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesOne(traces, {cv_stddev_first: true}).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesOne(traces, {cv_stddev_last: true}).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
} else {
self.filterTracesOne(traces, {cv_stddev_first: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesOne(traces, {cv_stddev_last: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, best_point: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}
}

if (smoothDisabled) {
self.filterTracesOne(traces, {smoothed: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}

if (this.layoutDisabled[‘learn’]) {
self.filterTracesOne(traces, {type: ‘learn’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}

if (this.layoutDisabled[‘test’]) {
self.filterTracesOne(traces, {type: ‘test’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}
}
}
}
};

CatboostIpython.prototype.getSmoothness = function() {
return this.smoothness && this.smoothness > -1 ? this.smoothness : -1;
};

CatboostIpython.prototype.setSmoothness = function(weight) {
if (weight 1) {
return;
}

this.smoothness = weight;
};

CatboostIpython.prototype.setStddev = function(enabled) {
this.stddevEnabled = enabled;
};

CatboostIpython.prototype.redrawActiveChart = function() {
this.chartsToRedraw[this.activeTab] = true;

this.redrawAll();
};

CatboostIpython.prototype.redraw = function() {
if (this.chartsToRedraw[this.activeTab]) {
this.chartsToRedraw[this.activeTab] = false;

this.updateTracesVisibility();
this.updateTracesCV();
this.updateTracesBest();
this.updateTracesValues();
this.updateTracesSmoothness();

this.plotly.redraw(this.traces[this.activeTab].parent);
}

this.drawTraces();
};

CatboostIpython.prototype.addRedrawFunc = function() {
this.redrawFunc = throttle(this.redraw, 400, false, this);
};

CatboostIpython.prototype.redrawAll = function() {
if (!this.redrawFunc) {
this.addRedrawFunc();
}

this.redrawFunc();
};

CatboostIpython.prototype.addPoints = function(parent, data) {
var self = this;

data.chunks.forEach(function(item) {
if (typeof item.remaining_time !== ‘undefined’ && typeof item.passed_time !== ‘undefined’) {
if (!self.timeLeft[data.path]) {
self.timeLeft[data.path] = [];
}

self.timeLeft[data.path][item.iteration] = [item.remaining_time, item.passed_time];
}

[‘test’, ‘learn’].forEach(function(type) {
var sets = self.meta[data.path][type + ‘_sets’],
metrics = self.meta[data.path][type + ‘_metrics’];

for (var i = 0; i ‘ + parameter + ‘ : ‘ + valueOfParameter;
}
}
}
if (!hovertextParametersAdded && type === ‘test’) {
hovertextParametersAdded = true;
trace.hovertext[pointIndex] += self.hovertextParameters[pointIndex];
}
smoothedTrace.x[pointIndex] = pointIndex;
}

if (bestValueTrace) {
bestValueTrace.x[pointIndex] = pointIndex;
bestValueTrace.y[pointIndex] = self.lossFuncs[nameOfMetric];
}

if (launchMode === ‘CV’ && !cvAdded) {
cvAdded = true;

self.getTrace(parent, $.extend({cv_stddev_first: true}, params));
self.getTrace(parent, $.extend({cv_stddev_last: true}, params));

self.getTrace(parent, $.extend({cv_stddev_first: true, smoothed: true}, params));
self.getTrace(parent, $.extend({cv_stddev_last: true, smoothed: true}, params));

self.getTrace(parent, $.extend({cv_avg: true}, params));
self.getTrace(parent, $.extend({cv_avg: true, smoothed: true}, params));

if (type === ‘test’) {
self.getTrace(parent, $.extend({cv_avg: true, best_point: true}, params));
}
}
}

self.chartsToRedraw[key.chartId] = true;

self.redrawAll();
}
});
});
};

CatboostIpython.prototype.getLaunchMode = function(path) {
return this.meta[path].launch_mode;
};

CatboostIpython.prototype.getChartNode = function(params, active) {
var node = $(‘

‘);

if (active) {
node.addClass(‘catboost-graph__chart_active’);
}

return node;
};

CatboostIpython.prototype.getChartTab = function(params, active) {
var node = $(‘

‘ + params.name + ‘

‘);

if (active) {
node.addClass(‘catboost-graph__tab_active’);
}

return node;
};

CatboostIpython.prototype.forEveryChart = function(callback) {
for (var name in this.traces) {
if (this.traces.hasOwnProperty(name)) {
callback(this.traces[name]);
}
}
};

CatboostIpython.prototype.forEveryLayout = function(callback) {
this.forEveryChart(function(chart) {
callback(chart.layout);
});
};

CatboostIpython.prototype.getChart = function(parent, params) {
var id = params.id,
self = this;

if (this.charts[id]) {
return this.charts[id];
}

this.addLayout(parent);

var active = this.activeTab === params.id,
chartNode = this.getChartNode(params, active),
chartTab = this.getChartTab(params, active);

$(‘.catboost-graph__charts’, this.layout).append(chartNode);
$(‘.catboost-graph__tabs’, this.layout).append(chartTab);

this.traces[id] = {
id: params.id,
name: params.name,
parent: chartNode[0],
traces: [],
layout: {
xaxis: {
range: [0, Number(this.meta[params.path].iteration_count)],
type: ‘linear’,
tickmode: ‘auto’,
showspikes: true,
spikethickness: 1,
spikedash: ‘longdashdot’,
spikemode: ‘across’,
zeroline: false,
showgrid: false
},
yaxis: {
zeroline: false
//showgrid: false
//hoverformat : ‘.7f’
},
separators: ‘. ‘,
//hovermode: ‘x’,
margin: {l: 38, r: 0, t: 35, b: 30},
autosize: true,
showlegend: false
},
options: {
scrollZoom: false,
modeBarButtonsToRemove: [‘toggleSpikelines’],
displaylogo: false
}
};

this.charts[id] = this.plotly.plot(chartNode[0], this.traces[id].traces, this.traces[id].layout, this.traces[id].options);

chartNode[0].on(‘plotly_hover’, function(e) {
self.updateTracesValues(e.points[0].x);
});

chartNode[0].on(‘plotly_click’, function(e) {
self.updateTracesValues(e.points[0].x, true);
});

return this.charts[id];
};

CatboostIpython.prototype.getTrace = function(parent, params) {
var key = this.getKey(params),
chartSeries = [];

if (this.traces[key.chartId]) {
chartSeries = this.traces[key.chartId].traces.filter(function(trace) {
return trace.name === key.traceName;
});
}

if (chartSeries.length) {
return chartSeries[0];
} else {
this.getChart(parent, {id: key.chartId, name: params.chartName, path: params.path});

var plotParams = {
color: this.getNextColor(params.path, params.smoothed ? 0.2 : 1),
fillsmoothcolor: this.getNextColor(params.path, 0.1),
fillcolor: this.getNextColor(params.path, 0.4),
hoverinfo: params.cv_avg ? ‘skip’ : ‘text+x’,
width: params.cv_avg ? 2 : 1,
dash: params.type === ‘test’ ? ‘solid’ : ‘dot’
},
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
hovertext: [],
hoverinfo: plotParams.hoverinfo,
line: {
width: plotParams.width,
dash: plotParams.dash,
color: plotParams.color
},
mode: ‘lines’,
hoveron: ‘points’,
connectgaps: true
};

if (params.best_point) {
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
marker: {
width: 2,
color: plotParams.color
},
hovertext: [],
hoverinfo: ‘text’,
mode: ‘markers’,
type: ‘scatter’
};
}

if (params.best_value) {
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
line: {
width: 1,
dash: ‘dash’,
color: ‘#CCCCCC’
},
mode: ‘lines’,
connectgaps: true,
hoverinfo: ‘skip’
};
}

if (params.cv_stddev_last) {
trace.fill = ‘tonexty’;
}

trace._params.plotParams = plotParams;

this.traces[key.chartId].traces.push(trace);

return trace;
}
};

CatboostIpython.prototype.getKey = function(params) {
var traceName = [
params.train,
params.type,
params.indexOfSet,
(params.smoothed ? ‘smoothed’ : ”),
(params.best_point ? ‘best_pount’ : ”),
(params.best_value ? ‘best_value’ : ”),
(params.cv_avg ? ‘cv_avg’ : ”),
(params.cv_stddev_first ? ‘cv_stddev_first’ : ”),
(params.cv_stddev_last ? ‘cv_stddev_last’ : ”)
].join(‘;’);

return {
chartId: params.chartName,
traceName: traceName,
colorId: params.train
};
};

CatboostIpython.prototype.filterTracesEvery = function(traces, filter) {
traces = traces || this.traces[this.activeTab].traces;

return traces.filter(function(trace) {
for (var prop in filter) {
if (filter.hasOwnProperty(prop)) {
if (filter[prop] !== trace._params[prop]) {
return false;
}
}
}

return true;
});
};

CatboostIpython.prototype.filterTracesOne = function(traces, filter) {
traces = traces || this.traces[this.activeTab].traces;

return traces.filter(function(trace) {
for (var prop in filter) {
if (filter.hasOwnProperty(prop)) {
if (filter[prop] === trace._params[prop]) {
return true;
}
}
}

return false;
});
};

CatboostIpython.prototype.cleanSeries = function() {
$(‘.catboost-panel__series’, this.layout).html(”);
};

CatboostIpython.prototype.groupTraces = function() {
var traces = this.traces[this.activeTab].traces,
index = 0,
tracesHash = {};

traces.map(function(trace) {
var train = trace._params.train;

if (!tracesHash[train]) {
tracesHash[train] = {
index: index,
traces: [],
info: {
path: trace._params.path,
color: trace._params.plotParams.color
}
};

index++;
}

tracesHash[train].traces.push(trace);
});

return tracesHash;
};

CatboostIpython.prototype.drawTraces = function() {
if ($(‘.catboost-panel__series .catboost-panel__serie’, this.layout).length) {
return;
}

var html = ”,
tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train)) {
html += this.drawTrace(train, tracesHash[train]);
}
}

$(‘.catboost-panel__series’, this.layout).html(html);

this.updateTracesValues();

this.addTracesEvents();
};

CatboostIpython.prototype.getTraceDefParams = function(params) {
var defParams = {
smoothed: undefined,
best_point: undefined,
best_value: undefined,
cv_avg: undefined,
cv_stddev_first: undefined,
cv_stddev_last: undefined
};

if (params) {
return $.extend(defParams, params);
} else {
return defParams;
}
};

CatboostIpython.prototype.drawTrace = function(train, hash) {
var info = hash.info,
id = ‘catboost-serie-‘ + this.index + ‘-‘ + hash.index,
traces = {
learn: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘learn’})),
test: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘test’}))
},
items = {
learn: {
middle: ”,
bottom: ”
},
test: {
middle: ”,
bottom: ”
}
},
tracesNames = ”;

[‘learn’, ‘test’].forEach(function(type) {
traces[type].forEach(function(trace) {
items[type].middle += ‘

‘ +

‘;

items[type].bottom += ‘

‘ +

‘;

tracesNames += ‘

‘ +

‘ + trace._params.nameOfSet + ‘

‘;
});
});

var timeSpendHtml = ‘

‘ +

‘ +

‘;

var html = ‘

‘ +

‘ +
‘ +

‘ +
(this.getLaunchMode(info.path) !== ‘Eval’ ? timeSpendHtml : ”) +

‘ +

curr

‘ +

best

‘ +

‘ +

‘ +

‘ +

‘ +
tracesNames +

‘ +

‘ +
items.learn.middle +
items.test.middle +

‘ +

‘ +
items.learn.bottom +
items.test.bottom +

‘ +

‘ +

‘;

return html;
};

CatboostIpython.prototype.updateTracesValues = function(iteration, click) {
var tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train) && !this.layoutDisabled.traces[train]) {
this.updateTraceValues(train, tracesHash[train], iteration, click);
}
}
};

CatboostIpython.prototype.updateTracesBest = function() {
var tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train) && !this.layoutDisabled.traces[train]) {
this.updateTraceBest(train, tracesHash[train]);
}
}
};

CatboostIpython.prototype.getBestValue = function(data) {
if (!data.length) {
return {
best: undefined,
index: -1
};
}

var best = data[0],
index = 0,
func = this.lossFuncs[this.traces[this.activeTab].name],
bestDiff = typeof func === ‘number’ ? Math.abs(data[0] – func) : 0;

for (var i = 1, l = data.length; i best) {
best = data[i];
index = i;
}

if (typeof func === ‘number’ && Math.abs(data[i] – func) maxLength) {
maxLength = origTrace.y.length;
}
});

for (var i = 0; i 0) {
avgTrace.x[i] = i;
avgTrace.y[i] = sum / count;
}
}
};

CatboostIpython.prototype.updateTracesCVStdDev = function() {
var tracesHash = this.groupTraces(),
firstTraces = this.filterTracesOne(tracesHash.traces, {cv_stddev_first: true}),
self = this;

firstTraces.forEach(function(trace) {
var origTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
smoothed: trace._params.smoothed
})),
lastTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
smoothed: trace._params.smoothed,
cv_stddev_last: true
}));

if (origTraces.length && lastTraces.length === 1) {
self.cvStdDevFunc(origTraces, trace, lastTraces[0]);
}
});
};

CatboostIpython.prototype.cvStdDevFunc = function(origTraces, firstTrace, lastTrace) {
var maxCount = origTraces.length,
maxLength = -1,
count,
sum,
i, j;

origTraces.forEach(function(origTrace) {
if (origTrace.y.length > maxLength) {
maxLength = origTrace.y.length;
}
});

for (i = 0; i i) {
firstTrace.hovertext[i] += this.hovertextParameters[i];
lastTrace.hovertext[i] += this.hovertextParameters[i];
}
}
};

CatboostIpython.prototype.updateTracesSmoothness = function() {
var tracesHash = this.groupTraces(),
smoothedTraces = this.filterTracesOne(tracesHash.traces, {smoothed: true}),
enabled = this.getSmoothness() > -1,
self = this;

smoothedTraces.forEach(function(trace) {
var origTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
indexOfSet: trace._params.indexOfSet,
cv_avg: trace._params.cv_avg,
cv_stddev_first: trace._params.cv_stddev_first,
cv_stddev_last: trace._params.cv_stddev_last
})),
colorFlag = false;

if (origTraces.length === 1) {
origTraces = origTraces[0];

if (origTraces.visible) {
if (enabled) {
self.smoothFunc(origTraces, trace);
colorFlag = true;
}

self.highlightSmoothedTrace(origTraces, trace, colorFlag);
}
}
});
};

CatboostIpython.prototype.highlightSmoothedTrace = function(trace, smoothedTrace, flag) {
if (flag) {
smoothedTrace.line.color = trace._params.plotParams.color;
trace.line.color = smoothedTrace._params.plotParams.color;
trace.hoverinfo = ‘skip’;

if (trace._params.cv_stddev_last) {
trace.fillcolor = trace._params.plotParams.fillsmoothcolor;
}
} else {
trace.line.color = trace._params.plotParams.color;
trace.hoverinfo = trace._params.plotParams.hoverinfo;

if (trace._params.cv_stddev_last) {
trace.fillcolor = trace._params.plotParams.fillcolor;
}
}
};

CatboostIpython.prototype.smoothFunc = function(origTrace, smoothedTrace) {
var data = origTrace.y,
smoothedPoints = this.smooth(data, this.getSmoothness()),
smoothedIndex = 0,
self = this;

if (smoothedPoints.length) {
data.forEach(function (d, index) {
if (!smoothedTrace.x[index]) {
smoothedTrace.x[index] = index;
}

var nameOfSet = smoothedTrace._params.nameOfSet;

if (smoothedTrace._params.cv_stddev_first || smoothedTrace._params.cv_stddev_last) {
nameOfSet = smoothedTrace._params.type + ‘ std’;
}

smoothedTrace.y[index] = smoothedPoints[smoothedIndex];
smoothedTrace.hovertext[index] = nameOfSet + ‘`: ‘ + smoothedPoints[smoothedIndex].toPrecision(7);
if (self.hovertextParameters.length > index) {
smoothedTrace.hovertext[index] += self.hovertextParameters[index];
}
smoothedIndex++;
});
}
};

CatboostIpython.prototype.formatItemValue = function(value, index, suffix) {
if (typeof value === ‘undefined’) {
return ”;
}

suffix = suffix || ”;

return ‘‘ + value + ‘‘;
};

CatboostIpython.prototype.updateTraceBest = function(train, hash) {
var traces = this.filterTracesOne(hash.traces, {best_point: true}),
self = this;

traces.forEach(function(trace) {
var testTrace = self.filterTracesEvery(hash.traces, self.getTraceDefParams({
train: trace._params.train,
type: ‘test’,
indexOfSet: trace._params.indexOfSet
}));

if (self.hasCVMode) {
testTrace = self.filterTracesEvery(hash.traces, self.getTraceDefParams({
train: trace._params.train,
type: ‘test’,
cv_avg: true
}));
}

var bestValue = self.getBestValue(testTrace.length === 1 ? testTrace[0].y : []);

if (bestValue.index !== -1) {
trace.x[0] = bestValue.index;
trace.y[0] = bestValue.best;
trace.hovertext[0] = bestValue.func + ‘ (‘ + (self.hasCVMode ? ‘avg’ : trace._params.nameOfSet) + ‘): ‘ + bestValue.index + ‘ ‘ + bestValue.best;
}
});
};

CatboostIpython.prototype.updateTraceValues = function(name, hash, iteration, click) {
var id = ‘catboost-serie-‘ + this.index + ‘-‘ + hash.index,
traces = {
learn: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘learn’})),
test: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘test’}))
},
path = hash.info.path,
self = this;

[‘learn’, ‘test’].forEach(function(type) {
traces[type].forEach(function(trace) {
var data = trace.y || [],
index = typeof iteration !== ‘undefined’ && iteration -1 ? bestValue.index : ”);

$(‘#’ + id + ‘ .catboost-panel__serie_best_test_value[data-index=’ + trace._params.indexOfSet + ‘]’, self.layout)
.html(self.formatItemValue(bestValue.best, bestValue.index, ‘best ‘ + trace._params.nameOfSet + ‘ ‘));
}
});
});

if (this.hasCVMode) {
var testTrace = this.filterTracesEvery(hash.traces, this.getTraceDefParams({
type: ‘test’,
cv_avg: true
})),
bestValue = this.getBestValue(testTrace.length === 1 ? testTrace[0].y : []);

$(‘#’ + id + ‘ .catboost-panel__serie_best_iteration’, this.layout).html(bestValue.index > -1 ? bestValue.index : ”);
}

if (click) {
this.clickMode = true;

$(‘#catboost-control2-clickmode’ + this.index, this.layout)[0].checked = true;
}
};

CatboostIpython.prototype.addTracesEvents = function() {
var self = this;

$(‘.catboost-panel__serie_checkbox’, this.layout).click(function() {
var name = $(this).data(‘seriename’);

self.layoutDisabled.traces[name] = !$(this)[0].checked;

self.redrawActiveChart();
});
};

CatboostIpython.prototype.getNextColor = function(path, opacity) {
var color;

if (this.colorsByPath[path]) {
color = this.colorsByPath[path];
} else {
color = this.colors[this.colorIndex];
this.colorsByPath[path] = color;

this.colorIndex++;

if (this.colorIndex > this.colors.length – 1) {
this.colorIndex = 0;
}
}

return this.hexToRgba(color, opacity);
};

CatboostIpython.prototype.hexToRgba = function(value, opacity) {
if (value.length 0) {
out += hours + ‘h ‘;
seconds = 0;
millis = 0;
}
if (minutes && minutes > 0) {
out += minutes + ‘m ‘;
millis = 0;
}
if (seconds && seconds > 0) {
out += seconds + ‘s ‘;
}
if (millis && millis > 0) {
out += millis + ‘ms’;
}

return out.trim();
};

CatboostIpython.prototype.mean = function(values, valueof) {
var n = values.length,
m = n,
i = -1,
value,
sum = 0,
number = function(x) {
return x === null ? NaN : +x;
};

if (valueof === null) {
while (++i


Można testować więcej wskaźników: https://catboost.ai/docs/search/?query=%27Accuracy%27

In [78]:
model = CatBoostClassifier(iterations=50, random_seed=42, logging_level='Silent').fit(train_pool)
eval_metrics = model.eval_metrics(validate_pool, ['Recall'], plot=True)

‘;
}

this.layout = $(‘

‘ +

‘ +

‘ +
‘ +

Learn’ +
‘ +

Eval’ +

‘ +

‘ +

‘ +

‘ +
‘ +
‘ +
‘ +
‘ +

‘ +
‘ +
‘ +
‘ +
‘ +

‘ +
cvAreaControls +

‘ +

‘ +

‘ +

‘ +

‘ +

‘ +

‘);
$(parent).append(this.layout);

this.addTabEvents();
this.addControlEvents();
};

CatboostIpython.prototype.addTabEvents = function() {
var self = this;

$(‘.catboost-graph__tabs’, this.layout).click(function(e) {
if (!$(e.target).is(‘.catboost-graph__tab:not(.catboost-graph__tab_active)’)) {
return;
}

var id = $(e.target).attr(‘tabid’);

self.activeTab = id;

$(‘.catboost-graph__tab_active’, self.layout).removeClass(‘catboost-graph__tab_active’);
$(‘.catboost-graph__chart_active’, self.layout).removeClass(‘catboost-graph__chart_active’);

$(‘.catboost-graph__tab[tabid=”‘ + id + ‘”]’, self.layout).addClass(‘catboost-graph__tab_active’);
$(‘.catboost-graph__chart[tabid=”‘ + id + ‘”]’, self.layout).addClass(‘catboost-graph__chart_active’);

self.cleanSeries();

self.redrawActiveChart();
self.resizeCharts();
});
};

CatboostIpython.prototype.addControlEvents = function() {
var self = this;

$(‘#catboost-control-learn’ + this.index, this.layout).click(function() {
self.layoutDisabled.learn = !$(this)[0].checked;

$(‘.catboost-panel__series’, self.layout).toggleClass(‘catboost-panel__series_learn_disabled’, self.layoutDisabled.learn);

self.redrawActiveChart();
});

$(‘#catboost-control-test’ + this.index, this.layout).click(function() {
self.layoutDisabled.test = !$(this)[0].checked;

$(‘.catboost-panel__series’, self.layout).toggleClass(‘catboost-panel__series_test_disabled’, self.layoutDisabled.test);

self.redrawActiveChart();
});

$(‘#catboost-control2-clickmode’ + this.index, this.layout).click(function() {
self.clickMode = $(this)[0].checked;
});

$(‘#catboost-control2-log’ + this.index, this.layout).click(function() {
self.logarithmMode = $(this)[0].checked ? ‘log’ : ‘linear’;

self.forEveryLayout(function(layout) {
layout.yaxis = {type: self.logarithmMode};
});

self.redrawActiveChart();
});

var slider = $(‘#catboost-control2-slider’ + this.index),
sliderValue = $(‘#catboost-control2-slidervalue’ + this.index);

$(‘#catboost-control2-smooth’ + this.index, this.layout).click(function() {
var enabled = $(this)[0].checked;

self.setSmoothness(enabled ? self.lastSmooth : -1);

slider.prop(‘disabled’, !enabled);
sliderValue.prop(‘disabled’, !enabled);

self.redrawActiveChart();
});

$(‘#catboost-control2-cvstddev’ + this.index, this.layout).click(function() {
var enabled = $(this)[0].checked;

self.setStddev(enabled);

self.redrawActiveChart();
});

slider.on(‘input change’, function() {
var smooth = Number($(this).val());

sliderValue.val(isNaN(smooth) ? 0 : smooth);

self.setSmoothness(smooth);
self.lastSmooth = smooth;

self.redrawActiveChart();
});

sliderValue.on(‘input change’, function() {
var smooth = Number($(this).val());

slider.val(isNaN(smooth) ? 0 : smooth);

self.setSmoothness(smooth);
self.lastSmooth = smooth;

self.redrawActiveChart();
});
};

CatboostIpython.prototype.setTraceVisibility = function(trace, visibility) {
if (trace) {
trace.visible = visibility;
}
};

CatboostIpython.prototype.updateTracesVisibility = function() {
var tracesHash = this.groupTraces(),
traces,
smoothDisabled = this.getSmoothness() === -1,
self = this;

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train)) {
traces = tracesHash[train].traces;

if (this.layoutDisabled.traces[train]) {
traces.forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
} else {
traces.forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

if (this.hasCVMode) {
if (this.stddevEnabled) {
self.filterTracesOne(traces, {type: ‘learn’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesOne(traces, {type: ‘test’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, best_point: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesOne(traces, {cv_stddev_first: true}).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesOne(traces, {cv_stddev_last: true}).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
} else {
self.filterTracesOne(traces, {cv_stddev_first: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesOne(traces, {cv_stddev_last: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, best_point: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}
}

if (smoothDisabled) {
self.filterTracesOne(traces, {smoothed: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}

if (this.layoutDisabled[‘learn’]) {
self.filterTracesOne(traces, {type: ‘learn’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}

if (this.layoutDisabled[‘test’]) {
self.filterTracesOne(traces, {type: ‘test’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}
}
}
}
};

CatboostIpython.prototype.getSmoothness = function() {
return this.smoothness && this.smoothness > -1 ? this.smoothness : -1;
};

CatboostIpython.prototype.setSmoothness = function(weight) {
if (weight 1) {
return;
}

this.smoothness = weight;
};

CatboostIpython.prototype.setStddev = function(enabled) {
this.stddevEnabled = enabled;
};

CatboostIpython.prototype.redrawActiveChart = function() {
this.chartsToRedraw[this.activeTab] = true;

this.redrawAll();
};

CatboostIpython.prototype.redraw = function() {
if (this.chartsToRedraw[this.activeTab]) {
this.chartsToRedraw[this.activeTab] = false;

this.updateTracesVisibility();
this.updateTracesCV();
this.updateTracesBest();
this.updateTracesValues();
this.updateTracesSmoothness();

this.plotly.redraw(this.traces[this.activeTab].parent);
}

this.drawTraces();
};

CatboostIpython.prototype.addRedrawFunc = function() {
this.redrawFunc = throttle(this.redraw, 400, false, this);
};

CatboostIpython.prototype.redrawAll = function() {
if (!this.redrawFunc) {
this.addRedrawFunc();
}

this.redrawFunc();
};

CatboostIpython.prototype.addPoints = function(parent, data) {
var self = this;

data.chunks.forEach(function(item) {
if (typeof item.remaining_time !== ‘undefined’ && typeof item.passed_time !== ‘undefined’) {
if (!self.timeLeft[data.path]) {
self.timeLeft[data.path] = [];
}

self.timeLeft[data.path][item.iteration] = [item.remaining_time, item.passed_time];
}

[‘test’, ‘learn’].forEach(function(type) {
var sets = self.meta[data.path][type + ‘_sets’],
metrics = self.meta[data.path][type + ‘_metrics’];

for (var i = 0; i ‘ + parameter + ‘ : ‘ + valueOfParameter;
}
}
}
if (!hovertextParametersAdded && type === ‘test’) {
hovertextParametersAdded = true;
trace.hovertext[pointIndex] += self.hovertextParameters[pointIndex];
}
smoothedTrace.x[pointIndex] = pointIndex;
}

if (bestValueTrace) {
bestValueTrace.x[pointIndex] = pointIndex;
bestValueTrace.y[pointIndex] = self.lossFuncs[nameOfMetric];
}

if (launchMode === ‘CV’ && !cvAdded) {
cvAdded = true;

self.getTrace(parent, $.extend({cv_stddev_first: true}, params));
self.getTrace(parent, $.extend({cv_stddev_last: true}, params));

self.getTrace(parent, $.extend({cv_stddev_first: true, smoothed: true}, params));
self.getTrace(parent, $.extend({cv_stddev_last: true, smoothed: true}, params));

self.getTrace(parent, $.extend({cv_avg: true}, params));
self.getTrace(parent, $.extend({cv_avg: true, smoothed: true}, params));

if (type === ‘test’) {
self.getTrace(parent, $.extend({cv_avg: true, best_point: true}, params));
}
}
}

self.chartsToRedraw[key.chartId] = true;

self.redrawAll();
}
});
});
};

CatboostIpython.prototype.getLaunchMode = function(path) {
return this.meta[path].launch_mode;
};

CatboostIpython.prototype.getChartNode = function(params, active) {
var node = $(‘

‘);

if (active) {
node.addClass(‘catboost-graph__chart_active’);
}

return node;
};

CatboostIpython.prototype.getChartTab = function(params, active) {
var node = $(‘

‘ + params.name + ‘

‘);

if (active) {
node.addClass(‘catboost-graph__tab_active’);
}

return node;
};

CatboostIpython.prototype.forEveryChart = function(callback) {
for (var name in this.traces) {
if (this.traces.hasOwnProperty(name)) {
callback(this.traces[name]);
}
}
};

CatboostIpython.prototype.forEveryLayout = function(callback) {
this.forEveryChart(function(chart) {
callback(chart.layout);
});
};

CatboostIpython.prototype.getChart = function(parent, params) {
var id = params.id,
self = this;

if (this.charts[id]) {
return this.charts[id];
}

this.addLayout(parent);

var active = this.activeTab === params.id,
chartNode = this.getChartNode(params, active),
chartTab = this.getChartTab(params, active);

$(‘.catboost-graph__charts’, this.layout).append(chartNode);
$(‘.catboost-graph__tabs’, this.layout).append(chartTab);

this.traces[id] = {
id: params.id,
name: params.name,
parent: chartNode[0],
traces: [],
layout: {
xaxis: {
range: [0, Number(this.meta[params.path].iteration_count)],
type: ‘linear’,
tickmode: ‘auto’,
showspikes: true,
spikethickness: 1,
spikedash: ‘longdashdot’,
spikemode: ‘across’,
zeroline: false,
showgrid: false
},
yaxis: {
zeroline: false
//showgrid: false
//hoverformat : ‘.7f’
},
separators: ‘. ‘,
//hovermode: ‘x’,
margin: {l: 38, r: 0, t: 35, b: 30},
autosize: true,
showlegend: false
},
options: {
scrollZoom: false,
modeBarButtonsToRemove: [‘toggleSpikelines’],
displaylogo: false
}
};

this.charts[id] = this.plotly.plot(chartNode[0], this.traces[id].traces, this.traces[id].layout, this.traces[id].options);

chartNode[0].on(‘plotly_hover’, function(e) {
self.updateTracesValues(e.points[0].x);
});

chartNode[0].on(‘plotly_click’, function(e) {
self.updateTracesValues(e.points[0].x, true);
});

return this.charts[id];
};

CatboostIpython.prototype.getTrace = function(parent, params) {
var key = this.getKey(params),
chartSeries = [];

if (this.traces[key.chartId]) {
chartSeries = this.traces[key.chartId].traces.filter(function(trace) {
return trace.name === key.traceName;
});
}

if (chartSeries.length) {
return chartSeries[0];
} else {
this.getChart(parent, {id: key.chartId, name: params.chartName, path: params.path});

var plotParams = {
color: this.getNextColor(params.path, params.smoothed ? 0.2 : 1),
fillsmoothcolor: this.getNextColor(params.path, 0.1),
fillcolor: this.getNextColor(params.path, 0.4),
hoverinfo: params.cv_avg ? ‘skip’ : ‘text+x’,
width: params.cv_avg ? 2 : 1,
dash: params.type === ‘test’ ? ‘solid’ : ‘dot’
},
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
hovertext: [],
hoverinfo: plotParams.hoverinfo,
line: {
width: plotParams.width,
dash: plotParams.dash,
color: plotParams.color
},
mode: ‘lines’,
hoveron: ‘points’,
connectgaps: true
};

if (params.best_point) {
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
marker: {
width: 2,
color: plotParams.color
},
hovertext: [],
hoverinfo: ‘text’,
mode: ‘markers’,
type: ‘scatter’
};
}

if (params.best_value) {
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
line: {
width: 1,
dash: ‘dash’,
color: ‘#CCCCCC’
},
mode: ‘lines’,
connectgaps: true,
hoverinfo: ‘skip’
};
}

if (params.cv_stddev_last) {
trace.fill = ‘tonexty’;
}

trace._params.plotParams = plotParams;

this.traces[key.chartId].traces.push(trace);

return trace;
}
};

CatboostIpython.prototype.getKey = function(params) {
var traceName = [
params.train,
params.type,
params.indexOfSet,
(params.smoothed ? ‘smoothed’ : ”),
(params.best_point ? ‘best_pount’ : ”),
(params.best_value ? ‘best_value’ : ”),
(params.cv_avg ? ‘cv_avg’ : ”),
(params.cv_stddev_first ? ‘cv_stddev_first’ : ”),
(params.cv_stddev_last ? ‘cv_stddev_last’ : ”)
].join(‘;’);

return {
chartId: params.chartName,
traceName: traceName,
colorId: params.train
};
};

CatboostIpython.prototype.filterTracesEvery = function(traces, filter) {
traces = traces || this.traces[this.activeTab].traces;

return traces.filter(function(trace) {
for (var prop in filter) {
if (filter.hasOwnProperty(prop)) {
if (filter[prop] !== trace._params[prop]) {
return false;
}
}
}

return true;
});
};

CatboostIpython.prototype.filterTracesOne = function(traces, filter) {
traces = traces || this.traces[this.activeTab].traces;

return traces.filter(function(trace) {
for (var prop in filter) {
if (filter.hasOwnProperty(prop)) {
if (filter[prop] === trace._params[prop]) {
return true;
}
}
}

return false;
});
};

CatboostIpython.prototype.cleanSeries = function() {
$(‘.catboost-panel__series’, this.layout).html(”);
};

CatboostIpython.prototype.groupTraces = function() {
var traces = this.traces[this.activeTab].traces,
index = 0,
tracesHash = {};

traces.map(function(trace) {
var train = trace._params.train;

if (!tracesHash[train]) {
tracesHash[train] = {
index: index,
traces: [],
info: {
path: trace._params.path,
color: trace._params.plotParams.color
}
};

index++;
}

tracesHash[train].traces.push(trace);
});

return tracesHash;
};

CatboostIpython.prototype.drawTraces = function() {
if ($(‘.catboost-panel__series .catboost-panel__serie’, this.layout).length) {
return;
}

var html = ”,
tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train)) {
html += this.drawTrace(train, tracesHash[train]);
}
}

$(‘.catboost-panel__series’, this.layout).html(html);

this.updateTracesValues();

this.addTracesEvents();
};

CatboostIpython.prototype.getTraceDefParams = function(params) {
var defParams = {
smoothed: undefined,
best_point: undefined,
best_value: undefined,
cv_avg: undefined,
cv_stddev_first: undefined,
cv_stddev_last: undefined
};

if (params) {
return $.extend(defParams, params);
} else {
return defParams;
}
};

CatboostIpython.prototype.drawTrace = function(train, hash) {
var info = hash.info,
id = ‘catboost-serie-‘ + this.index + ‘-‘ + hash.index,
traces = {
learn: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘learn’})),
test: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘test’}))
},
items = {
learn: {
middle: ”,
bottom: ”
},
test: {
middle: ”,
bottom: ”
}
},
tracesNames = ”;

[‘learn’, ‘test’].forEach(function(type) {
traces[type].forEach(function(trace) {
items[type].middle += ‘

‘ +

‘;

items[type].bottom += ‘

‘ +

‘;

tracesNames += ‘

‘ +

‘ + trace._params.nameOfSet + ‘

‘;
});
});

var timeSpendHtml = ‘

‘ +

‘ +

‘;

var html = ‘

‘ +

‘ +
‘ +

‘ +
(this.getLaunchMode(info.path) !== ‘Eval’ ? timeSpendHtml : ”) +

‘ +

curr

‘ +

best

‘ +

‘ +

‘ +

‘ +

‘ +
tracesNames +

‘ +

‘ +
items.learn.middle +
items.test.middle +

‘ +

‘ +
items.learn.bottom +
items.test.bottom +

‘ +

‘ +

‘;

return html;
};

CatboostIpython.prototype.updateTracesValues = function(iteration, click) {
var tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train) && !this.layoutDisabled.traces[train]) {
this.updateTraceValues(train, tracesHash[train], iteration, click);
}
}
};

CatboostIpython.prototype.updateTracesBest = function() {
var tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train) && !this.layoutDisabled.traces[train]) {
this.updateTraceBest(train, tracesHash[train]);
}
}
};

CatboostIpython.prototype.getBestValue = function(data) {
if (!data.length) {
return {
best: undefined,
index: -1
};
}

var best = data[0],
index = 0,
func = this.lossFuncs[this.traces[this.activeTab].name],
bestDiff = typeof func === ‘number’ ? Math.abs(data[0] – func) : 0;

for (var i = 1, l = data.length; i best) {
best = data[i];
index = i;
}

if (typeof func === ‘number’ && Math.abs(data[i] – func) maxLength) {
maxLength = origTrace.y.length;
}
});

for (var i = 0; i 0) {
avgTrace.x[i] = i;
avgTrace.y[i] = sum / count;
}
}
};

CatboostIpython.prototype.updateTracesCVStdDev = function() {
var tracesHash = this.groupTraces(),
firstTraces = this.filterTracesOne(tracesHash.traces, {cv_stddev_first: true}),
self = this;

firstTraces.forEach(function(trace) {
var origTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
smoothed: trace._params.smoothed
})),
lastTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
smoothed: trace._params.smoothed,
cv_stddev_last: true
}));

if (origTraces.length && lastTraces.length === 1) {
self.cvStdDevFunc(origTraces, trace, lastTraces[0]);
}
});
};

CatboostIpython.prototype.cvStdDevFunc = function(origTraces, firstTrace, lastTrace) {
var maxCount = origTraces.length,
maxLength = -1,
count,
sum,
i, j;

origTraces.forEach(function(origTrace) {
if (origTrace.y.length > maxLength) {
maxLength = origTrace.y.length;
}
});

for (i = 0; i i) {
firstTrace.hovertext[i] += this.hovertextParameters[i];
lastTrace.hovertext[i] += this.hovertextParameters[i];
}
}
};

CatboostIpython.prototype.updateTracesSmoothness = function() {
var tracesHash = this.groupTraces(),
smoothedTraces = this.filterTracesOne(tracesHash.traces, {smoothed: true}),
enabled = this.getSmoothness() > -1,
self = this;

smoothedTraces.forEach(function(trace) {
var origTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
indexOfSet: trace._params.indexOfSet,
cv_avg: trace._params.cv_avg,
cv_stddev_first: trace._params.cv_stddev_first,
cv_stddev_last: trace._params.cv_stddev_last
})),
colorFlag = false;

if (origTraces.length === 1) {
origTraces = origTraces[0];

if (origTraces.visible) {
if (enabled) {
self.smoothFunc(origTraces, trace);
colorFlag = true;
}

self.highlightSmoothedTrace(origTraces, trace, colorFlag);
}
}
});
};

CatboostIpython.prototype.highlightSmoothedTrace = function(trace, smoothedTrace, flag) {
if (flag) {
smoothedTrace.line.color = trace._params.plotParams.color;
trace.line.color = smoothedTrace._params.plotParams.color;
trace.hoverinfo = ‘skip’;

if (trace._params.cv_stddev_last) {
trace.fillcolor = trace._params.plotParams.fillsmoothcolor;
}
} else {
trace.line.color = trace._params.plotParams.color;
trace.hoverinfo = trace._params.plotParams.hoverinfo;

if (trace._params.cv_stddev_last) {
trace.fillcolor = trace._params.plotParams.fillcolor;
}
}
};

CatboostIpython.prototype.smoothFunc = function(origTrace, smoothedTrace) {
var data = origTrace.y,
smoothedPoints = this.smooth(data, this.getSmoothness()),
smoothedIndex = 0,
self = this;

if (smoothedPoints.length) {
data.forEach(function (d, index) {
if (!smoothedTrace.x[index]) {
smoothedTrace.x[index] = index;
}

var nameOfSet = smoothedTrace._params.nameOfSet;

if (smoothedTrace._params.cv_stddev_first || smoothedTrace._params.cv_stddev_last) {
nameOfSet = smoothedTrace._params.type + ‘ std’;
}

smoothedTrace.y[index] = smoothedPoints[smoothedIndex];
smoothedTrace.hovertext[index] = nameOfSet + ‘`: ‘ + smoothedPoints[smoothedIndex].toPrecision(7);
if (self.hovertextParameters.length > index) {
smoothedTrace.hovertext[index] += self.hovertextParameters[index];
}
smoothedIndex++;
});
}
};

CatboostIpython.prototype.formatItemValue = function(value, index, suffix) {
if (typeof value === ‘undefined’) {
return ”;
}

suffix = suffix || ”;

return ‘‘ + value + ‘‘;
};

CatboostIpython.prototype.updateTraceBest = function(train, hash) {
var traces = this.filterTracesOne(hash.traces, {best_point: true}),
self = this;

traces.forEach(function(trace) {
var testTrace = self.filterTracesEvery(hash.traces, self.getTraceDefParams({
train: trace._params.train,
type: ‘test’,
indexOfSet: trace._params.indexOfSet
}));

if (self.hasCVMode) {
testTrace = self.filterTracesEvery(hash.traces, self.getTraceDefParams({
train: trace._params.train,
type: ‘test’,
cv_avg: true
}));
}

var bestValue = self.getBestValue(testTrace.length === 1 ? testTrace[0].y : []);

if (bestValue.index !== -1) {
trace.x[0] = bestValue.index;
trace.y[0] = bestValue.best;
trace.hovertext[0] = bestValue.func + ‘ (‘ + (self.hasCVMode ? ‘avg’ : trace._params.nameOfSet) + ‘): ‘ + bestValue.index + ‘ ‘ + bestValue.best;
}
});
};

CatboostIpython.prototype.updateTraceValues = function(name, hash, iteration, click) {
var id = ‘catboost-serie-‘ + this.index + ‘-‘ + hash.index,
traces = {
learn: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘learn’})),
test: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘test’}))
},
path = hash.info.path,
self = this;

[‘learn’, ‘test’].forEach(function(type) {
traces[type].forEach(function(trace) {
var data = trace.y || [],
index = typeof iteration !== ‘undefined’ && iteration -1 ? bestValue.index : ”);

$(‘#’ + id + ‘ .catboost-panel__serie_best_test_value[data-index=’ + trace._params.indexOfSet + ‘]’, self.layout)
.html(self.formatItemValue(bestValue.best, bestValue.index, ‘best ‘ + trace._params.nameOfSet + ‘ ‘));
}
});
});

if (this.hasCVMode) {
var testTrace = this.filterTracesEvery(hash.traces, this.getTraceDefParams({
type: ‘test’,
cv_avg: true
})),
bestValue = this.getBestValue(testTrace.length === 1 ? testTrace[0].y : []);

$(‘#’ + id + ‘ .catboost-panel__serie_best_iteration’, this.layout).html(bestValue.index > -1 ? bestValue.index : ”);
}

if (click) {
this.clickMode = true;

$(‘#catboost-control2-clickmode’ + this.index, this.layout)[0].checked = true;
}
};

CatboostIpython.prototype.addTracesEvents = function() {
var self = this;

$(‘.catboost-panel__serie_checkbox’, this.layout).click(function() {
var name = $(this).data(‘seriename’);

self.layoutDisabled.traces[name] = !$(this)[0].checked;

self.redrawActiveChart();
});
};

CatboostIpython.prototype.getNextColor = function(path, opacity) {
var color;

if (this.colorsByPath[path]) {
color = this.colorsByPath[path];
} else {
color = this.colors[this.colorIndex];
this.colorsByPath[path] = color;

this.colorIndex++;

if (this.colorIndex > this.colors.length – 1) {
this.colorIndex = 0;
}
}

return this.hexToRgba(color, opacity);
};

CatboostIpython.prototype.hexToRgba = function(value, opacity) {
if (value.length 0) {
out += hours + ‘h ‘;
seconds = 0;
millis = 0;
}
if (minutes && minutes > 0) {
out += minutes + ‘m ‘;
millis = 0;
}
if (seconds && seconds > 0) {
out += seconds + ‘s ‘;
}
if (millis && millis > 0) {
out += millis + ‘ms’;
}

return out.trim();
};

CatboostIpython.prototype.mean = function(values, valueof) {
var n = values.length,
m = n,
i = -1,
value,
sum = 0,
number = function(x) {
return x === null ? NaN : +x;
};

if (valueof === null) {
while (++i


In [79]:
model = CatBoostClassifier(iterations=50, random_seed=42, logging_level='Silent').fit(train_pool)
eval_metrics = model.eval_metrics(validate_pool, ['Accuracy'], plot=True)

‘;
}

this.layout = $(‘

‘ +

‘ +

‘ +
‘ +

Learn’ +
‘ +

Eval’ +

‘ +

‘ +

‘ +

‘ +
‘ +
‘ +
‘ +
‘ +

‘ +
‘ +
‘ +
‘ +
‘ +

‘ +
cvAreaControls +

‘ +

‘ +

‘ +

‘ +

‘ +

‘ +

‘);
$(parent).append(this.layout);

this.addTabEvents();
this.addControlEvents();
};

CatboostIpython.prototype.addTabEvents = function() {
var self = this;

$(‘.catboost-graph__tabs’, this.layout).click(function(e) {
if (!$(e.target).is(‘.catboost-graph__tab:not(.catboost-graph__tab_active)’)) {
return;
}

var id = $(e.target).attr(‘tabid’);

self.activeTab = id;

$(‘.catboost-graph__tab_active’, self.layout).removeClass(‘catboost-graph__tab_active’);
$(‘.catboost-graph__chart_active’, self.layout).removeClass(‘catboost-graph__chart_active’);

$(‘.catboost-graph__tab[tabid=”‘ + id + ‘”]’, self.layout).addClass(‘catboost-graph__tab_active’);
$(‘.catboost-graph__chart[tabid=”‘ + id + ‘”]’, self.layout).addClass(‘catboost-graph__chart_active’);

self.cleanSeries();

self.redrawActiveChart();
self.resizeCharts();
});
};

CatboostIpython.prototype.addControlEvents = function() {
var self = this;

$(‘#catboost-control-learn’ + this.index, this.layout).click(function() {
self.layoutDisabled.learn = !$(this)[0].checked;

$(‘.catboost-panel__series’, self.layout).toggleClass(‘catboost-panel__series_learn_disabled’, self.layoutDisabled.learn);

self.redrawActiveChart();
});

$(‘#catboost-control-test’ + this.index, this.layout).click(function() {
self.layoutDisabled.test = !$(this)[0].checked;

$(‘.catboost-panel__series’, self.layout).toggleClass(‘catboost-panel__series_test_disabled’, self.layoutDisabled.test);

self.redrawActiveChart();
});

$(‘#catboost-control2-clickmode’ + this.index, this.layout).click(function() {
self.clickMode = $(this)[0].checked;
});

$(‘#catboost-control2-log’ + this.index, this.layout).click(function() {
self.logarithmMode = $(this)[0].checked ? ‘log’ : ‘linear’;

self.forEveryLayout(function(layout) {
layout.yaxis = {type: self.logarithmMode};
});

self.redrawActiveChart();
});

var slider = $(‘#catboost-control2-slider’ + this.index),
sliderValue = $(‘#catboost-control2-slidervalue’ + this.index);

$(‘#catboost-control2-smooth’ + this.index, this.layout).click(function() {
var enabled = $(this)[0].checked;

self.setSmoothness(enabled ? self.lastSmooth : -1);

slider.prop(‘disabled’, !enabled);
sliderValue.prop(‘disabled’, !enabled);

self.redrawActiveChart();
});

$(‘#catboost-control2-cvstddev’ + this.index, this.layout).click(function() {
var enabled = $(this)[0].checked;

self.setStddev(enabled);

self.redrawActiveChart();
});

slider.on(‘input change’, function() {
var smooth = Number($(this).val());

sliderValue.val(isNaN(smooth) ? 0 : smooth);

self.setSmoothness(smooth);
self.lastSmooth = smooth;

self.redrawActiveChart();
});

sliderValue.on(‘input change’, function() {
var smooth = Number($(this).val());

slider.val(isNaN(smooth) ? 0 : smooth);

self.setSmoothness(smooth);
self.lastSmooth = smooth;

self.redrawActiveChart();
});
};

CatboostIpython.prototype.setTraceVisibility = function(trace, visibility) {
if (trace) {
trace.visible = visibility;
}
};

CatboostIpython.prototype.updateTracesVisibility = function() {
var tracesHash = this.groupTraces(),
traces,
smoothDisabled = this.getSmoothness() === -1,
self = this;

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train)) {
traces = tracesHash[train].traces;

if (this.layoutDisabled.traces[train]) {
traces.forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
} else {
traces.forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

if (this.hasCVMode) {
if (this.stddevEnabled) {
self.filterTracesOne(traces, {type: ‘learn’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesOne(traces, {type: ‘test’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, best_point: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesOne(traces, {cv_stddev_first: true}).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesOne(traces, {cv_stddev_last: true}).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
} else {
self.filterTracesOne(traces, {cv_stddev_first: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesOne(traces, {cv_stddev_last: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, best_point: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}
}

if (smoothDisabled) {
self.filterTracesOne(traces, {smoothed: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}

if (this.layoutDisabled[‘learn’]) {
self.filterTracesOne(traces, {type: ‘learn’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}

if (this.layoutDisabled[‘test’]) {
self.filterTracesOne(traces, {type: ‘test’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}
}
}
}
};

CatboostIpython.prototype.getSmoothness = function() {
return this.smoothness && this.smoothness > -1 ? this.smoothness : -1;
};

CatboostIpython.prototype.setSmoothness = function(weight) {
if (weight 1) {
return;
}

this.smoothness = weight;
};

CatboostIpython.prototype.setStddev = function(enabled) {
this.stddevEnabled = enabled;
};

CatboostIpython.prototype.redrawActiveChart = function() {
this.chartsToRedraw[this.activeTab] = true;

this.redrawAll();
};

CatboostIpython.prototype.redraw = function() {
if (this.chartsToRedraw[this.activeTab]) {
this.chartsToRedraw[this.activeTab] = false;

this.updateTracesVisibility();
this.updateTracesCV();
this.updateTracesBest();
this.updateTracesValues();
this.updateTracesSmoothness();

this.plotly.redraw(this.traces[this.activeTab].parent);
}

this.drawTraces();
};

CatboostIpython.prototype.addRedrawFunc = function() {
this.redrawFunc = throttle(this.redraw, 400, false, this);
};

CatboostIpython.prototype.redrawAll = function() {
if (!this.redrawFunc) {
this.addRedrawFunc();
}

this.redrawFunc();
};

CatboostIpython.prototype.addPoints = function(parent, data) {
var self = this;

data.chunks.forEach(function(item) {
if (typeof item.remaining_time !== ‘undefined’ && typeof item.passed_time !== ‘undefined’) {
if (!self.timeLeft[data.path]) {
self.timeLeft[data.path] = [];
}

self.timeLeft[data.path][item.iteration] = [item.remaining_time, item.passed_time];
}

[‘test’, ‘learn’].forEach(function(type) {
var sets = self.meta[data.path][type + ‘_sets’],
metrics = self.meta[data.path][type + ‘_metrics’];

for (var i = 0; i ‘ + parameter + ‘ : ‘ + valueOfParameter;
}
}
}
if (!hovertextParametersAdded && type === ‘test’) {
hovertextParametersAdded = true;
trace.hovertext[pointIndex] += self.hovertextParameters[pointIndex];
}
smoothedTrace.x[pointIndex] = pointIndex;
}

if (bestValueTrace) {
bestValueTrace.x[pointIndex] = pointIndex;
bestValueTrace.y[pointIndex] = self.lossFuncs[nameOfMetric];
}

if (launchMode === ‘CV’ && !cvAdded) {
cvAdded = true;

self.getTrace(parent, $.extend({cv_stddev_first: true}, params));
self.getTrace(parent, $.extend({cv_stddev_last: true}, params));

self.getTrace(parent, $.extend({cv_stddev_first: true, smoothed: true}, params));
self.getTrace(parent, $.extend({cv_stddev_last: true, smoothed: true}, params));

self.getTrace(parent, $.extend({cv_avg: true}, params));
self.getTrace(parent, $.extend({cv_avg: true, smoothed: true}, params));

if (type === ‘test’) {
self.getTrace(parent, $.extend({cv_avg: true, best_point: true}, params));
}
}
}

self.chartsToRedraw[key.chartId] = true;

self.redrawAll();
}
});
});
};

CatboostIpython.prototype.getLaunchMode = function(path) {
return this.meta[path].launch_mode;
};

CatboostIpython.prototype.getChartNode = function(params, active) {
var node = $(‘

‘);

if (active) {
node.addClass(‘catboost-graph__chart_active’);
}

return node;
};

CatboostIpython.prototype.getChartTab = function(params, active) {
var node = $(‘

‘ + params.name + ‘

‘);

if (active) {
node.addClass(‘catboost-graph__tab_active’);
}

return node;
};

CatboostIpython.prototype.forEveryChart = function(callback) {
for (var name in this.traces) {
if (this.traces.hasOwnProperty(name)) {
callback(this.traces[name]);
}
}
};

CatboostIpython.prototype.forEveryLayout = function(callback) {
this.forEveryChart(function(chart) {
callback(chart.layout);
});
};

CatboostIpython.prototype.getChart = function(parent, params) {
var id = params.id,
self = this;

if (this.charts[id]) {
return this.charts[id];
}

this.addLayout(parent);

var active = this.activeTab === params.id,
chartNode = this.getChartNode(params, active),
chartTab = this.getChartTab(params, active);

$(‘.catboost-graph__charts’, this.layout).append(chartNode);
$(‘.catboost-graph__tabs’, this.layout).append(chartTab);

this.traces[id] = {
id: params.id,
name: params.name,
parent: chartNode[0],
traces: [],
layout: {
xaxis: {
range: [0, Number(this.meta[params.path].iteration_count)],
type: ‘linear’,
tickmode: ‘auto’,
showspikes: true,
spikethickness: 1,
spikedash: ‘longdashdot’,
spikemode: ‘across’,
zeroline: false,
showgrid: false
},
yaxis: {
zeroline: false
//showgrid: false
//hoverformat : ‘.7f’
},
separators: ‘. ‘,
//hovermode: ‘x’,
margin: {l: 38, r: 0, t: 35, b: 30},
autosize: true,
showlegend: false
},
options: {
scrollZoom: false,
modeBarButtonsToRemove: [‘toggleSpikelines’],
displaylogo: false
}
};

this.charts[id] = this.plotly.plot(chartNode[0], this.traces[id].traces, this.traces[id].layout, this.traces[id].options);

chartNode[0].on(‘plotly_hover’, function(e) {
self.updateTracesValues(e.points[0].x);
});

chartNode[0].on(‘plotly_click’, function(e) {
self.updateTracesValues(e.points[0].x, true);
});

return this.charts[id];
};

CatboostIpython.prototype.getTrace = function(parent, params) {
var key = this.getKey(params),
chartSeries = [];

if (this.traces[key.chartId]) {
chartSeries = this.traces[key.chartId].traces.filter(function(trace) {
return trace.name === key.traceName;
});
}

if (chartSeries.length) {
return chartSeries[0];
} else {
this.getChart(parent, {id: key.chartId, name: params.chartName, path: params.path});

var plotParams = {
color: this.getNextColor(params.path, params.smoothed ? 0.2 : 1),
fillsmoothcolor: this.getNextColor(params.path, 0.1),
fillcolor: this.getNextColor(params.path, 0.4),
hoverinfo: params.cv_avg ? ‘skip’ : ‘text+x’,
width: params.cv_avg ? 2 : 1,
dash: params.type === ‘test’ ? ‘solid’ : ‘dot’
},
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
hovertext: [],
hoverinfo: plotParams.hoverinfo,
line: {
width: plotParams.width,
dash: plotParams.dash,
color: plotParams.color
},
mode: ‘lines’,
hoveron: ‘points’,
connectgaps: true
};

if (params.best_point) {
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
marker: {
width: 2,
color: plotParams.color
},
hovertext: [],
hoverinfo: ‘text’,
mode: ‘markers’,
type: ‘scatter’
};
}

if (params.best_value) {
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
line: {
width: 1,
dash: ‘dash’,
color: ‘#CCCCCC’
},
mode: ‘lines’,
connectgaps: true,
hoverinfo: ‘skip’
};
}

if (params.cv_stddev_last) {
trace.fill = ‘tonexty’;
}

trace._params.plotParams = plotParams;

this.traces[key.chartId].traces.push(trace);

return trace;
}
};

CatboostIpython.prototype.getKey = function(params) {
var traceName = [
params.train,
params.type,
params.indexOfSet,
(params.smoothed ? ‘smoothed’ : ”),
(params.best_point ? ‘best_pount’ : ”),
(params.best_value ? ‘best_value’ : ”),
(params.cv_avg ? ‘cv_avg’ : ”),
(params.cv_stddev_first ? ‘cv_stddev_first’ : ”),
(params.cv_stddev_last ? ‘cv_stddev_last’ : ”)
].join(‘;’);

return {
chartId: params.chartName,
traceName: traceName,
colorId: params.train
};
};

CatboostIpython.prototype.filterTracesEvery = function(traces, filter) {
traces = traces || this.traces[this.activeTab].traces;

return traces.filter(function(trace) {
for (var prop in filter) {
if (filter.hasOwnProperty(prop)) {
if (filter[prop] !== trace._params[prop]) {
return false;
}
}
}

return true;
});
};

CatboostIpython.prototype.filterTracesOne = function(traces, filter) {
traces = traces || this.traces[this.activeTab].traces;

return traces.filter(function(trace) {
for (var prop in filter) {
if (filter.hasOwnProperty(prop)) {
if (filter[prop] === trace._params[prop]) {
return true;
}
}
}

return false;
});
};

CatboostIpython.prototype.cleanSeries = function() {
$(‘.catboost-panel__series’, this.layout).html(”);
};

CatboostIpython.prototype.groupTraces = function() {
var traces = this.traces[this.activeTab].traces,
index = 0,
tracesHash = {};

traces.map(function(trace) {
var train = trace._params.train;

if (!tracesHash[train]) {
tracesHash[train] = {
index: index,
traces: [],
info: {
path: trace._params.path,
color: trace._params.plotParams.color
}
};

index++;
}

tracesHash[train].traces.push(trace);
});

return tracesHash;
};

CatboostIpython.prototype.drawTraces = function() {
if ($(‘.catboost-panel__series .catboost-panel__serie’, this.layout).length) {
return;
}

var html = ”,
tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train)) {
html += this.drawTrace(train, tracesHash[train]);
}
}

$(‘.catboost-panel__series’, this.layout).html(html);

this.updateTracesValues();

this.addTracesEvents();
};

CatboostIpython.prototype.getTraceDefParams = function(params) {
var defParams = {
smoothed: undefined,
best_point: undefined,
best_value: undefined,
cv_avg: undefined,
cv_stddev_first: undefined,
cv_stddev_last: undefined
};

if (params) {
return $.extend(defParams, params);
} else {
return defParams;
}
};

CatboostIpython.prototype.drawTrace = function(train, hash) {
var info = hash.info,
id = ‘catboost-serie-‘ + this.index + ‘-‘ + hash.index,
traces = {
learn: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘learn’})),
test: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘test’}))
},
items = {
learn: {
middle: ”,
bottom: ”
},
test: {
middle: ”,
bottom: ”
}
},
tracesNames = ”;

[‘learn’, ‘test’].forEach(function(type) {
traces[type].forEach(function(trace) {
items[type].middle += ‘

‘ +

‘;

items[type].bottom += ‘

‘ +

‘;

tracesNames += ‘

‘ +

‘ + trace._params.nameOfSet + ‘

‘;
});
});

var timeSpendHtml = ‘

‘ +

‘ +

‘;

var html = ‘

‘ +

‘ +
‘ +

‘ +
(this.getLaunchMode(info.path) !== ‘Eval’ ? timeSpendHtml : ”) +

‘ +

curr

‘ +

best

‘ +

‘ +

‘ +

‘ +

‘ +
tracesNames +

‘ +

‘ +
items.learn.middle +
items.test.middle +

‘ +

‘ +
items.learn.bottom +
items.test.bottom +

‘ +

‘ +

‘;

return html;
};

CatboostIpython.prototype.updateTracesValues = function(iteration, click) {
var tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train) && !this.layoutDisabled.traces[train]) {
this.updateTraceValues(train, tracesHash[train], iteration, click);
}
}
};

CatboostIpython.prototype.updateTracesBest = function() {
var tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train) && !this.layoutDisabled.traces[train]) {
this.updateTraceBest(train, tracesHash[train]);
}
}
};

CatboostIpython.prototype.getBestValue = function(data) {
if (!data.length) {
return {
best: undefined,
index: -1
};
}

var best = data[0],
index = 0,
func = this.lossFuncs[this.traces[this.activeTab].name],
bestDiff = typeof func === ‘number’ ? Math.abs(data[0] – func) : 0;

for (var i = 1, l = data.length; i best) {
best = data[i];
index = i;
}

if (typeof func === ‘number’ && Math.abs(data[i] – func) maxLength) {
maxLength = origTrace.y.length;
}
});

for (var i = 0; i 0) {
avgTrace.x[i] = i;
avgTrace.y[i] = sum / count;
}
}
};

CatboostIpython.prototype.updateTracesCVStdDev = function() {
var tracesHash = this.groupTraces(),
firstTraces = this.filterTracesOne(tracesHash.traces, {cv_stddev_first: true}),
self = this;

firstTraces.forEach(function(trace) {
var origTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
smoothed: trace._params.smoothed
})),
lastTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
smoothed: trace._params.smoothed,
cv_stddev_last: true
}));

if (origTraces.length && lastTraces.length === 1) {
self.cvStdDevFunc(origTraces, trace, lastTraces[0]);
}
});
};

CatboostIpython.prototype.cvStdDevFunc = function(origTraces, firstTrace, lastTrace) {
var maxCount = origTraces.length,
maxLength = -1,
count,
sum,
i, j;

origTraces.forEach(function(origTrace) {
if (origTrace.y.length > maxLength) {
maxLength = origTrace.y.length;
}
});

for (i = 0; i i) {
firstTrace.hovertext[i] += this.hovertextParameters[i];
lastTrace.hovertext[i] += this.hovertextParameters[i];
}
}
};

CatboostIpython.prototype.updateTracesSmoothness = function() {
var tracesHash = this.groupTraces(),
smoothedTraces = this.filterTracesOne(tracesHash.traces, {smoothed: true}),
enabled = this.getSmoothness() > -1,
self = this;

smoothedTraces.forEach(function(trace) {
var origTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
indexOfSet: trace._params.indexOfSet,
cv_avg: trace._params.cv_avg,
cv_stddev_first: trace._params.cv_stddev_first,
cv_stddev_last: trace._params.cv_stddev_last
})),
colorFlag = false;

if (origTraces.length === 1) {
origTraces = origTraces[0];

if (origTraces.visible) {
if (enabled) {
self.smoothFunc(origTraces, trace);
colorFlag = true;
}

self.highlightSmoothedTrace(origTraces, trace, colorFlag);
}
}
});
};

CatboostIpython.prototype.highlightSmoothedTrace = function(trace, smoothedTrace, flag) {
if (flag) {
smoothedTrace.line.color = trace._params.plotParams.color;
trace.line.color = smoothedTrace._params.plotParams.color;
trace.hoverinfo = ‘skip’;

if (trace._params.cv_stddev_last) {
trace.fillcolor = trace._params.plotParams.fillsmoothcolor;
}
} else {
trace.line.color = trace._params.plotParams.color;
trace.hoverinfo = trace._params.plotParams.hoverinfo;

if (trace._params.cv_stddev_last) {
trace.fillcolor = trace._params.plotParams.fillcolor;
}
}
};

CatboostIpython.prototype.smoothFunc = function(origTrace, smoothedTrace) {
var data = origTrace.y,
smoothedPoints = this.smooth(data, this.getSmoothness()),
smoothedIndex = 0,
self = this;

if (smoothedPoints.length) {
data.forEach(function (d, index) {
if (!smoothedTrace.x[index]) {
smoothedTrace.x[index] = index;
}

var nameOfSet = smoothedTrace._params.nameOfSet;

if (smoothedTrace._params.cv_stddev_first || smoothedTrace._params.cv_stddev_last) {
nameOfSet = smoothedTrace._params.type + ‘ std’;
}

smoothedTrace.y[index] = smoothedPoints[smoothedIndex];
smoothedTrace.hovertext[index] = nameOfSet + ‘`: ‘ + smoothedPoints[smoothedIndex].toPrecision(7);
if (self.hovertextParameters.length > index) {
smoothedTrace.hovertext[index] += self.hovertextParameters[index];
}
smoothedIndex++;
});
}
};

CatboostIpython.prototype.formatItemValue = function(value, index, suffix) {
if (typeof value === ‘undefined’) {
return ”;
}

suffix = suffix || ”;

return ‘‘ + value + ‘‘;
};

CatboostIpython.prototype.updateTraceBest = function(train, hash) {
var traces = this.filterTracesOne(hash.traces, {best_point: true}),
self = this;

traces.forEach(function(trace) {
var testTrace = self.filterTracesEvery(hash.traces, self.getTraceDefParams({
train: trace._params.train,
type: ‘test’,
indexOfSet: trace._params.indexOfSet
}));

if (self.hasCVMode) {
testTrace = self.filterTracesEvery(hash.traces, self.getTraceDefParams({
train: trace._params.train,
type: ‘test’,
cv_avg: true
}));
}

var bestValue = self.getBestValue(testTrace.length === 1 ? testTrace[0].y : []);

if (bestValue.index !== -1) {
trace.x[0] = bestValue.index;
trace.y[0] = bestValue.best;
trace.hovertext[0] = bestValue.func + ‘ (‘ + (self.hasCVMode ? ‘avg’ : trace._params.nameOfSet) + ‘): ‘ + bestValue.index + ‘ ‘ + bestValue.best;
}
});
};

CatboostIpython.prototype.updateTraceValues = function(name, hash, iteration, click) {
var id = ‘catboost-serie-‘ + this.index + ‘-‘ + hash.index,
traces = {
learn: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘learn’})),
test: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘test’}))
},
path = hash.info.path,
self = this;

[‘learn’, ‘test’].forEach(function(type) {
traces[type].forEach(function(trace) {
var data = trace.y || [],
index = typeof iteration !== ‘undefined’ && iteration -1 ? bestValue.index : ”);

$(‘#’ + id + ‘ .catboost-panel__serie_best_test_value[data-index=’ + trace._params.indexOfSet + ‘]’, self.layout)
.html(self.formatItemValue(bestValue.best, bestValue.index, ‘best ‘ + trace._params.nameOfSet + ‘ ‘));
}
});
});

if (this.hasCVMode) {
var testTrace = this.filterTracesEvery(hash.traces, this.getTraceDefParams({
type: ‘test’,
cv_avg: true
})),
bestValue = this.getBestValue(testTrace.length === 1 ? testTrace[0].y : []);

$(‘#’ + id + ‘ .catboost-panel__serie_best_iteration’, this.layout).html(bestValue.index > -1 ? bestValue.index : ”);
}

if (click) {
this.clickMode = true;

$(‘#catboost-control2-clickmode’ + this.index, this.layout)[0].checked = true;
}
};

CatboostIpython.prototype.addTracesEvents = function() {
var self = this;

$(‘.catboost-panel__serie_checkbox’, this.layout).click(function() {
var name = $(this).data(‘seriename’);

self.layoutDisabled.traces[name] = !$(this)[0].checked;

self.redrawActiveChart();
});
};

CatboostIpython.prototype.getNextColor = function(path, opacity) {
var color;

if (this.colorsByPath[path]) {
color = this.colorsByPath[path];
} else {
color = this.colors[this.colorIndex];
this.colorsByPath[path] = color;

this.colorIndex++;

if (this.colorIndex > this.colors.length – 1) {
this.colorIndex = 0;
}
}

return this.hexToRgba(color, opacity);
};

CatboostIpython.prototype.hexToRgba = function(value, opacity) {
if (value.length 0) {
out += hours + ‘h ‘;
seconds = 0;
millis = 0;
}
if (minutes && minutes > 0) {
out += minutes + ‘m ‘;
millis = 0;
}
if (seconds && seconds > 0) {
out += seconds + ‘s ‘;
}
if (millis && millis > 0) {
out += millis + ‘ms’;
}

return out.trim();
};

CatboostIpython.prototype.mean = function(values, valueof) {
var n = values.length,
m = n,
i = -1,
value,
sum = 0,
number = function(x) {
return x === null ? NaN : +x;
};

if (valueof === null) {
while (++i


In [80]:
print(eval_metrics['Accuracy'][:16])
[0.7937219730941704, 0.8026905829596412, 0.8071748878923767, 0.8071748878923767, 0.8071748878923767, 0.8026905829596412, 0.8071748878923767, 0.8071748878923767, 0.8116591928251121, 0.8116591928251121, 0.8116591928251121, 0.8116591928251121, 0.8116591928251121, 0.8071748878923767, 0.8116591928251121, 0.8161434977578476]

3.10 Porównanie procesów uczenia się

Możesz także porównać proces uczenia się różnych modeli na jednym wykresie.

In [81]:
model1 = CatBoostClassifier(iterations=1000, depth=1, train_dir='model_depth_1/', logging_level='Silent')
model1.fit(train_pool, eval_set=validate_pool)
model2 = CatBoostClassifier(iterations=1000, depth=5, train_dir='model_depth_5/', logging_level='Silent')
model2.fit(train_pool, eval_set=validate_pool);
In [82]:
from catboost import MetricVisualizer
widget = MetricVisualizer(['model_depth_1', 'model_depth_5'])
widget.start()

‘;
}

this.layout = $(‘

‘ +

‘ +

‘ +
‘ +

Learn’ +
‘ +

Eval’ +

‘ +

‘ +

‘ +

‘ +
‘ +
‘ +
‘ +
‘ +

‘ +
‘ +
‘ +
‘ +
‘ +

‘ +
cvAreaControls +

‘ +

‘ +

‘ +

‘ +

‘ +

‘ +

‘);
$(parent).append(this.layout);

this.addTabEvents();
this.addControlEvents();
};

CatboostIpython.prototype.addTabEvents = function() {
var self = this;

$(‘.catboost-graph__tabs’, this.layout).click(function(e) {
if (!$(e.target).is(‘.catboost-graph__tab:not(.catboost-graph__tab_active)’)) {
return;
}

var id = $(e.target).attr(‘tabid’);

self.activeTab = id;

$(‘.catboost-graph__tab_active’, self.layout).removeClass(‘catboost-graph__tab_active’);
$(‘.catboost-graph__chart_active’, self.layout).removeClass(‘catboost-graph__chart_active’);

$(‘.catboost-graph__tab[tabid=”‘ + id + ‘”]’, self.layout).addClass(‘catboost-graph__tab_active’);
$(‘.catboost-graph__chart[tabid=”‘ + id + ‘”]’, self.layout).addClass(‘catboost-graph__chart_active’);

self.cleanSeries();

self.redrawActiveChart();
self.resizeCharts();
});
};

CatboostIpython.prototype.addControlEvents = function() {
var self = this;

$(‘#catboost-control-learn’ + this.index, this.layout).click(function() {
self.layoutDisabled.learn = !$(this)[0].checked;

$(‘.catboost-panel__series’, self.layout).toggleClass(‘catboost-panel__series_learn_disabled’, self.layoutDisabled.learn);

self.redrawActiveChart();
});

$(‘#catboost-control-test’ + this.index, this.layout).click(function() {
self.layoutDisabled.test = !$(this)[0].checked;

$(‘.catboost-panel__series’, self.layout).toggleClass(‘catboost-panel__series_test_disabled’, self.layoutDisabled.test);

self.redrawActiveChart();
});

$(‘#catboost-control2-clickmode’ + this.index, this.layout).click(function() {
self.clickMode = $(this)[0].checked;
});

$(‘#catboost-control2-log’ + this.index, this.layout).click(function() {
self.logarithmMode = $(this)[0].checked ? ‘log’ : ‘linear’;

self.forEveryLayout(function(layout) {
layout.yaxis = {type: self.logarithmMode};
});

self.redrawActiveChart();
});

var slider = $(‘#catboost-control2-slider’ + this.index),
sliderValue = $(‘#catboost-control2-slidervalue’ + this.index);

$(‘#catboost-control2-smooth’ + this.index, this.layout).click(function() {
var enabled = $(this)[0].checked;

self.setSmoothness(enabled ? self.lastSmooth : -1);

slider.prop(‘disabled’, !enabled);
sliderValue.prop(‘disabled’, !enabled);

self.redrawActiveChart();
});

$(‘#catboost-control2-cvstddev’ + this.index, this.layout).click(function() {
var enabled = $(this)[0].checked;

self.setStddev(enabled);

self.redrawActiveChart();
});

slider.on(‘input change’, function() {
var smooth = Number($(this).val());

sliderValue.val(isNaN(smooth) ? 0 : smooth);

self.setSmoothness(smooth);
self.lastSmooth = smooth;

self.redrawActiveChart();
});

sliderValue.on(‘input change’, function() {
var smooth = Number($(this).val());

slider.val(isNaN(smooth) ? 0 : smooth);

self.setSmoothness(smooth);
self.lastSmooth = smooth;

self.redrawActiveChart();
});
};

CatboostIpython.prototype.setTraceVisibility = function(trace, visibility) {
if (trace) {
trace.visible = visibility;
}
};

CatboostIpython.prototype.updateTracesVisibility = function() {
var tracesHash = this.groupTraces(),
traces,
smoothDisabled = this.getSmoothness() === -1,
self = this;

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train)) {
traces = tracesHash[train].traces;

if (this.layoutDisabled.traces[train]) {
traces.forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
} else {
traces.forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

if (this.hasCVMode) {
if (this.stddevEnabled) {
self.filterTracesOne(traces, {type: ‘learn’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesOne(traces, {type: ‘test’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, best_point: true})).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});

self.filterTracesOne(traces, {cv_stddev_first: true}).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
self.filterTracesOne(traces, {cv_stddev_last: true}).forEach(function(trace) {
self.setTraceVisibility(trace, true);
});
} else {
self.filterTracesOne(traces, {cv_stddev_first: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesOne(traces, {cv_stddev_last: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘learn’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, smoothed: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});

self.filterTracesEvery(traces, this.getTraceDefParams({type: ‘test’, cv_avg: true, best_point: true})).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}
}

if (smoothDisabled) {
self.filterTracesOne(traces, {smoothed: true}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}

if (this.layoutDisabled[‘learn’]) {
self.filterTracesOne(traces, {type: ‘learn’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}

if (this.layoutDisabled[‘test’]) {
self.filterTracesOne(traces, {type: ‘test’}).forEach(function(trace) {
self.setTraceVisibility(trace, false);
});
}
}
}
}
};

CatboostIpython.prototype.getSmoothness = function() {
return this.smoothness && this.smoothness > -1 ? this.smoothness : -1;
};

CatboostIpython.prototype.setSmoothness = function(weight) {
if (weight 1) {
return;
}

this.smoothness = weight;
};

CatboostIpython.prototype.setStddev = function(enabled) {
this.stddevEnabled = enabled;
};

CatboostIpython.prototype.redrawActiveChart = function() {
this.chartsToRedraw[this.activeTab] = true;

this.redrawAll();
};

CatboostIpython.prototype.redraw = function() {
if (this.chartsToRedraw[this.activeTab]) {
this.chartsToRedraw[this.activeTab] = false;

this.updateTracesVisibility();
this.updateTracesCV();
this.updateTracesBest();
this.updateTracesValues();
this.updateTracesSmoothness();

this.plotly.redraw(this.traces[this.activeTab].parent);
}

this.drawTraces();
};

CatboostIpython.prototype.addRedrawFunc = function() {
this.redrawFunc = throttle(this.redraw, 400, false, this);
};

CatboostIpython.prototype.redrawAll = function() {
if (!this.redrawFunc) {
this.addRedrawFunc();
}

this.redrawFunc();
};

CatboostIpython.prototype.addPoints = function(parent, data) {
var self = this;

data.chunks.forEach(function(item) {
if (typeof item.remaining_time !== ‘undefined’ && typeof item.passed_time !== ‘undefined’) {
if (!self.timeLeft[data.path]) {
self.timeLeft[data.path] = [];
}

self.timeLeft[data.path][item.iteration] = [item.remaining_time, item.passed_time];
}

[‘test’, ‘learn’].forEach(function(type) {
var sets = self.meta[data.path][type + ‘_sets’],
metrics = self.meta[data.path][type + ‘_metrics’];

for (var i = 0; i ‘ + parameter + ‘ : ‘ + valueOfParameter;
}
}
}
if (!hovertextParametersAdded && type === ‘test’) {
hovertextParametersAdded = true;
trace.hovertext[pointIndex] += self.hovertextParameters[pointIndex];
}
smoothedTrace.x[pointIndex] = pointIndex;
}

if (bestValueTrace) {
bestValueTrace.x[pointIndex] = pointIndex;
bestValueTrace.y[pointIndex] = self.lossFuncs[nameOfMetric];
}

if (launchMode === ‘CV’ && !cvAdded) {
cvAdded = true;

self.getTrace(parent, $.extend({cv_stddev_first: true}, params));
self.getTrace(parent, $.extend({cv_stddev_last: true}, params));

self.getTrace(parent, $.extend({cv_stddev_first: true, smoothed: true}, params));
self.getTrace(parent, $.extend({cv_stddev_last: true, smoothed: true}, params));

self.getTrace(parent, $.extend({cv_avg: true}, params));
self.getTrace(parent, $.extend({cv_avg: true, smoothed: true}, params));

if (type === ‘test’) {
self.getTrace(parent, $.extend({cv_avg: true, best_point: true}, params));
}
}
}

self.chartsToRedraw[key.chartId] = true;

self.redrawAll();
}
});
});
};

CatboostIpython.prototype.getLaunchMode = function(path) {
return this.meta[path].launch_mode;
};

CatboostIpython.prototype.getChartNode = function(params, active) {
var node = $(‘

‘);

if (active) {
node.addClass(‘catboost-graph__chart_active’);
}

return node;
};

CatboostIpython.prototype.getChartTab = function(params, active) {
var node = $(‘

‘ + params.name + ‘

‘);

if (active) {
node.addClass(‘catboost-graph__tab_active’);
}

return node;
};

CatboostIpython.prototype.forEveryChart = function(callback) {
for (var name in this.traces) {
if (this.traces.hasOwnProperty(name)) {
callback(this.traces[name]);
}
}
};

CatboostIpython.prototype.forEveryLayout = function(callback) {
this.forEveryChart(function(chart) {
callback(chart.layout);
});
};

CatboostIpython.prototype.getChart = function(parent, params) {
var id = params.id,
self = this;

if (this.charts[id]) {
return this.charts[id];
}

this.addLayout(parent);

var active = this.activeTab === params.id,
chartNode = this.getChartNode(params, active),
chartTab = this.getChartTab(params, active);

$(‘.catboost-graph__charts’, this.layout).append(chartNode);
$(‘.catboost-graph__tabs’, this.layout).append(chartTab);

this.traces[id] = {
id: params.id,
name: params.name,
parent: chartNode[0],
traces: [],
layout: {
xaxis: {
range: [0, Number(this.meta[params.path].iteration_count)],
type: ‘linear’,
tickmode: ‘auto’,
showspikes: true,
spikethickness: 1,
spikedash: ‘longdashdot’,
spikemode: ‘across’,
zeroline: false,
showgrid: false
},
yaxis: {
zeroline: false
//showgrid: false
//hoverformat : ‘.7f’
},
separators: ‘. ‘,
//hovermode: ‘x’,
margin: {l: 38, r: 0, t: 35, b: 30},
autosize: true,
showlegend: false
},
options: {
scrollZoom: false,
modeBarButtonsToRemove: [‘toggleSpikelines’],
displaylogo: false
}
};

this.charts[id] = this.plotly.plot(chartNode[0], this.traces[id].traces, this.traces[id].layout, this.traces[id].options);

chartNode[0].on(‘plotly_hover’, function(e) {
self.updateTracesValues(e.points[0].x);
});

chartNode[0].on(‘plotly_click’, function(e) {
self.updateTracesValues(e.points[0].x, true);
});

return this.charts[id];
};

CatboostIpython.prototype.getTrace = function(parent, params) {
var key = this.getKey(params),
chartSeries = [];

if (this.traces[key.chartId]) {
chartSeries = this.traces[key.chartId].traces.filter(function(trace) {
return trace.name === key.traceName;
});
}

if (chartSeries.length) {
return chartSeries[0];
} else {
this.getChart(parent, {id: key.chartId, name: params.chartName, path: params.path});

var plotParams = {
color: this.getNextColor(params.path, params.smoothed ? 0.2 : 1),
fillsmoothcolor: this.getNextColor(params.path, 0.1),
fillcolor: this.getNextColor(params.path, 0.4),
hoverinfo: params.cv_avg ? ‘skip’ : ‘text+x’,
width: params.cv_avg ? 2 : 1,
dash: params.type === ‘test’ ? ‘solid’ : ‘dot’
},
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
hovertext: [],
hoverinfo: plotParams.hoverinfo,
line: {
width: plotParams.width,
dash: plotParams.dash,
color: plotParams.color
},
mode: ‘lines’,
hoveron: ‘points’,
connectgaps: true
};

if (params.best_point) {
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
marker: {
width: 2,
color: plotParams.color
},
hovertext: [],
hoverinfo: ‘text’,
mode: ‘markers’,
type: ‘scatter’
};
}

if (params.best_value) {
trace = {
name: key.traceName,
_params: params,
x: [],
y: [],
line: {
width: 1,
dash: ‘dash’,
color: ‘#CCCCCC’
},
mode: ‘lines’,
connectgaps: true,
hoverinfo: ‘skip’
};
}

if (params.cv_stddev_last) {
trace.fill = ‘tonexty’;
}

trace._params.plotParams = plotParams;

this.traces[key.chartId].traces.push(trace);

return trace;
}
};

CatboostIpython.prototype.getKey = function(params) {
var traceName = [
params.train,
params.type,
params.indexOfSet,
(params.smoothed ? ‘smoothed’ : ”),
(params.best_point ? ‘best_pount’ : ”),
(params.best_value ? ‘best_value’ : ”),
(params.cv_avg ? ‘cv_avg’ : ”),
(params.cv_stddev_first ? ‘cv_stddev_first’ : ”),
(params.cv_stddev_last ? ‘cv_stddev_last’ : ”)
].join(‘;’);

return {
chartId: params.chartName,
traceName: traceName,
colorId: params.train
};
};

CatboostIpython.prototype.filterTracesEvery = function(traces, filter) {
traces = traces || this.traces[this.activeTab].traces;

return traces.filter(function(trace) {
for (var prop in filter) {
if (filter.hasOwnProperty(prop)) {
if (filter[prop] !== trace._params[prop]) {
return false;
}
}
}

return true;
});
};

CatboostIpython.prototype.filterTracesOne = function(traces, filter) {
traces = traces || this.traces[this.activeTab].traces;

return traces.filter(function(trace) {
for (var prop in filter) {
if (filter.hasOwnProperty(prop)) {
if (filter[prop] === trace._params[prop]) {
return true;
}
}
}

return false;
});
};

CatboostIpython.prototype.cleanSeries = function() {
$(‘.catboost-panel__series’, this.layout).html(”);
};

CatboostIpython.prototype.groupTraces = function() {
var traces = this.traces[this.activeTab].traces,
index = 0,
tracesHash = {};

traces.map(function(trace) {
var train = trace._params.train;

if (!tracesHash[train]) {
tracesHash[train] = {
index: index,
traces: [],
info: {
path: trace._params.path,
color: trace._params.plotParams.color
}
};

index++;
}

tracesHash[train].traces.push(trace);
});

return tracesHash;
};

CatboostIpython.prototype.drawTraces = function() {
if ($(‘.catboost-panel__series .catboost-panel__serie’, this.layout).length) {
return;
}

var html = ”,
tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train)) {
html += this.drawTrace(train, tracesHash[train]);
}
}

$(‘.catboost-panel__series’, this.layout).html(html);

this.updateTracesValues();

this.addTracesEvents();
};

CatboostIpython.prototype.getTraceDefParams = function(params) {
var defParams = {
smoothed: undefined,
best_point: undefined,
best_value: undefined,
cv_avg: undefined,
cv_stddev_first: undefined,
cv_stddev_last: undefined
};

if (params) {
return $.extend(defParams, params);
} else {
return defParams;
}
};

CatboostIpython.prototype.drawTrace = function(train, hash) {
var info = hash.info,
id = ‘catboost-serie-‘ + this.index + ‘-‘ + hash.index,
traces = {
learn: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘learn’})),
test: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘test’}))
},
items = {
learn: {
middle: ”,
bottom: ”
},
test: {
middle: ”,
bottom: ”
}
},
tracesNames = ”;

[‘learn’, ‘test’].forEach(function(type) {
traces[type].forEach(function(trace) {
items[type].middle += ‘

‘ +

‘;

items[type].bottom += ‘

‘ +

‘;

tracesNames += ‘

‘ +

‘ + trace._params.nameOfSet + ‘

‘;
});
});

var timeSpendHtml = ‘

‘ +

‘ +

‘;

var html = ‘

‘ +

‘ +
‘ +

‘ +
(this.getLaunchMode(info.path) !== ‘Eval’ ? timeSpendHtml : ”) +

‘ +

curr

‘ +

best

‘ +

‘ +

‘ +

‘ +

‘ +
tracesNames +

‘ +

‘ +
items.learn.middle +
items.test.middle +

‘ +

‘ +
items.learn.bottom +
items.test.bottom +

‘ +

‘ +

‘;

return html;
};

CatboostIpython.prototype.updateTracesValues = function(iteration, click) {
var tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train) && !this.layoutDisabled.traces[train]) {
this.updateTraceValues(train, tracesHash[train], iteration, click);
}
}
};

CatboostIpython.prototype.updateTracesBest = function() {
var tracesHash = this.groupTraces();

for (var train in tracesHash) {
if (tracesHash.hasOwnProperty(train) && !this.layoutDisabled.traces[train]) {
this.updateTraceBest(train, tracesHash[train]);
}
}
};

CatboostIpython.prototype.getBestValue = function(data) {
if (!data.length) {
return {
best: undefined,
index: -1
};
}

var best = data[0],
index = 0,
func = this.lossFuncs[this.traces[this.activeTab].name],
bestDiff = typeof func === ‘number’ ? Math.abs(data[0] – func) : 0;

for (var i = 1, l = data.length; i best) {
best = data[i];
index = i;
}

if (typeof func === ‘number’ && Math.abs(data[i] – func) maxLength) {
maxLength = origTrace.y.length;
}
});

for (var i = 0; i 0) {
avgTrace.x[i] = i;
avgTrace.y[i] = sum / count;
}
}
};

CatboostIpython.prototype.updateTracesCVStdDev = function() {
var tracesHash = this.groupTraces(),
firstTraces = this.filterTracesOne(tracesHash.traces, {cv_stddev_first: true}),
self = this;

firstTraces.forEach(function(trace) {
var origTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
smoothed: trace._params.smoothed
})),
lastTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
smoothed: trace._params.smoothed,
cv_stddev_last: true
}));

if (origTraces.length && lastTraces.length === 1) {
self.cvStdDevFunc(origTraces, trace, lastTraces[0]);
}
});
};

CatboostIpython.prototype.cvStdDevFunc = function(origTraces, firstTrace, lastTrace) {
var maxCount = origTraces.length,
maxLength = -1,
count,
sum,
i, j;

origTraces.forEach(function(origTrace) {
if (origTrace.y.length > maxLength) {
maxLength = origTrace.y.length;
}
});

for (i = 0; i i) {
firstTrace.hovertext[i] += this.hovertextParameters[i];
lastTrace.hovertext[i] += this.hovertextParameters[i];
}
}
};

CatboostIpython.prototype.updateTracesSmoothness = function() {
var tracesHash = this.groupTraces(),
smoothedTraces = this.filterTracesOne(tracesHash.traces, {smoothed: true}),
enabled = this.getSmoothness() > -1,
self = this;

smoothedTraces.forEach(function(trace) {
var origTraces = self.filterTracesEvery(tracesHash.traces, self.getTraceDefParams({
train: trace._params.train,
type: trace._params.type,
indexOfSet: trace._params.indexOfSet,
cv_avg: trace._params.cv_avg,
cv_stddev_first: trace._params.cv_stddev_first,
cv_stddev_last: trace._params.cv_stddev_last
})),
colorFlag = false;

if (origTraces.length === 1) {
origTraces = origTraces[0];

if (origTraces.visible) {
if (enabled) {
self.smoothFunc(origTraces, trace);
colorFlag = true;
}

self.highlightSmoothedTrace(origTraces, trace, colorFlag);
}
}
});
};

CatboostIpython.prototype.highlightSmoothedTrace = function(trace, smoothedTrace, flag) {
if (flag) {
smoothedTrace.line.color = trace._params.plotParams.color;
trace.line.color = smoothedTrace._params.plotParams.color;
trace.hoverinfo = ‘skip’;

if (trace._params.cv_stddev_last) {
trace.fillcolor = trace._params.plotParams.fillsmoothcolor;
}
} else {
trace.line.color = trace._params.plotParams.color;
trace.hoverinfo = trace._params.plotParams.hoverinfo;

if (trace._params.cv_stddev_last) {
trace.fillcolor = trace._params.plotParams.fillcolor;
}
}
};

CatboostIpython.prototype.smoothFunc = function(origTrace, smoothedTrace) {
var data = origTrace.y,
smoothedPoints = this.smooth(data, this.getSmoothness()),
smoothedIndex = 0,
self = this;

if (smoothedPoints.length) {
data.forEach(function (d, index) {
if (!smoothedTrace.x[index]) {
smoothedTrace.x[index] = index;
}

var nameOfSet = smoothedTrace._params.nameOfSet;

if (smoothedTrace._params.cv_stddev_first || smoothedTrace._params.cv_stddev_last) {
nameOfSet = smoothedTrace._params.type + ‘ std’;
}

smoothedTrace.y[index] = smoothedPoints[smoothedIndex];
smoothedTrace.hovertext[index] = nameOfSet + ‘`: ‘ + smoothedPoints[smoothedIndex].toPrecision(7);
if (self.hovertextParameters.length > index) {
smoothedTrace.hovertext[index] += self.hovertextParameters[index];
}
smoothedIndex++;
});
}
};

CatboostIpython.prototype.formatItemValue = function(value, index, suffix) {
if (typeof value === ‘undefined’) {
return ”;
}

suffix = suffix || ”;

return ‘‘ + value + ‘‘;
};

CatboostIpython.prototype.updateTraceBest = function(train, hash) {
var traces = this.filterTracesOne(hash.traces, {best_point: true}),
self = this;

traces.forEach(function(trace) {
var testTrace = self.filterTracesEvery(hash.traces, self.getTraceDefParams({
train: trace._params.train,
type: ‘test’,
indexOfSet: trace._params.indexOfSet
}));

if (self.hasCVMode) {
testTrace = self.filterTracesEvery(hash.traces, self.getTraceDefParams({
train: trace._params.train,
type: ‘test’,
cv_avg: true
}));
}

var bestValue = self.getBestValue(testTrace.length === 1 ? testTrace[0].y : []);

if (bestValue.index !== -1) {
trace.x[0] = bestValue.index;
trace.y[0] = bestValue.best;
trace.hovertext[0] = bestValue.func + ‘ (‘ + (self.hasCVMode ? ‘avg’ : trace._params.nameOfSet) + ‘): ‘ + bestValue.index + ‘ ‘ + bestValue.best;
}
});
};

CatboostIpython.prototype.updateTraceValues = function(name, hash, iteration, click) {
var id = ‘catboost-serie-‘ + this.index + ‘-‘ + hash.index,
traces = {
learn: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘learn’})),
test: this.filterTracesEvery(hash.traces, this.getTraceDefParams({type: ‘test’}))
},
path = hash.info.path,
self = this;

[‘learn’, ‘test’].forEach(function(type) {
traces[type].forEach(function(trace) {
var data = trace.y || [],
index = typeof iteration !== ‘undefined’ && iteration -1 ? bestValue.index : ”);

$(‘#’ + id + ‘ .catboost-panel__serie_best_test_value[data-index=’ + trace._params.indexOfSet + ‘]’, self.layout)
.html(self.formatItemValue(bestValue.best, bestValue.index, ‘best ‘ + trace._params.nameOfSet + ‘ ‘));
}
});
});

if (this.hasCVMode) {
var testTrace = this.filterTracesEvery(hash.traces, this.getTraceDefParams({
type: ‘test’,
cv_avg: true
})),
bestValue = this.getBestValue(testTrace.length === 1 ? testTrace[0].y : []);

$(‘#’ + id + ‘ .catboost-panel__serie_best_iteration’, this.layout).html(bestValue.index > -1 ? bestValue.index : ”);
}

if (click) {
this.clickMode = true;

$(‘#catboost-control2-clickmode’ + this.index, this.layout)[0].checked = true;
}
};

CatboostIpython.prototype.addTracesEvents = function() {
var self = this;

$(‘.catboost-panel__serie_checkbox’, this.layout).click(function() {
var name = $(this).data(‘seriename’);

self.layoutDisabled.traces[name] = !$(this)[0].checked;

self.redrawActiveChart();
});
};

CatboostIpython.prototype.getNextColor = function(path, opacity) {
var color;

if (this.colorsByPath[path]) {
color = this.colorsByPath[path];
} else {
color = this.colors[this.colorIndex];
this.colorsByPath[path] = color;

this.colorIndex++;

if (this.colorIndex > this.colors.length – 1) {
this.colorIndex = 0;
}
}

return this.hexToRgba(color, opacity);
};

CatboostIpython.prototype.hexToRgba = function(value, opacity) {
if (value.length 0) {
out += hours + ‘h ‘;
seconds = 0;
millis = 0;
}
if (minutes && minutes > 0) {
out += minutes + ‘m ‘;
millis = 0;
}
if (seconds && seconds > 0) {
out += seconds + ‘s ‘;
}
if (millis && millis > 0) {
out += millis + ‘ms’;
}

return out.trim();
};

CatboostIpython.prototype.mean = function(values, valueof) {
var n = values.length,
m = n,
i = -1,
value,
sum = 0,
number = function(x) {
return x === null ? NaN : +x;
};

if (valueof === null) {
while (++i


3.11 Zapisywanie modelu

Zawsze bardzo przydatne jest zrzucenie modelu na dysk (szczególnie jeśli szkolenie zajęło trochę czasu).

In [83]:
model = CatBoostClassifier(iterations=10, random_seed=42, logging_level='Silent').fit(train_pool)
model.save_model('catboost_model.dump')
model = CatBoostClassifier()
model.load_model('catboost_model.dump');

hyperopt

Chociaż zawsze można wybrać optymalną liczbę iteracji (etapy przyspieszające) poprzez walidację krzyżową i wykresy krzywej uczenia się, ważne jest również, aby bawić się niektórymi parametrami modelu, i chcielibyśmy zwrócić szczególną uwagę na l2_leaf_reg i learning_rate.

W tej sekcji wybieramy te parametry za pomocą pakietu hyperopt.

Instalujemy to!

!pip install hyperopt

In [ ]:
import hyperopt

def hyperopt_objective(params):
    model = CatBoostClassifier(
        l2_leaf_reg=int(params['l2_leaf_reg']),
        learning_rate=params['learning_rate'],
        iterations=500,
        eval_metric='Accuracy',
        random_seed=42,
        verbose=False,
        loss_function='Logloss',
    )
    
    cv_data = cv(
        Pool(X, y, cat_features=categorical_features_indices),
        model.get_params()
    )
    best_accuracy = np.max(cv_data['test-Accuracy-mean'])
    
    return 1 - best_accuracy # as hyperopt minimises
In [84]:
from numpy.random import RandomState

params_space = {
    'l2_leaf_reg': hyperopt.hp.qloguniform('l2_leaf_reg', 0, 2, 1),
    'learning_rate': hyperopt.hp.uniform('learning_rate', 1e-3, 5e-1),
}

trials = hyperopt.Trials()

best = hyperopt.fmin(
    hyperopt_objective,
    space=params_space,
    algo=hyperopt.tpe.suggest,
    max_evals=50,
    trials=trials,
    rstate=RandomState(123)
)

print(best)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-84-89204c19c526> in <module>
      2 
      3 params_space = {
----> 4     'l2_leaf_reg': hyperopt.hp.qloguniform('l2_leaf_reg', 0, 2, 1),
      5     'learning_rate': hyperopt.hp.uniform('learning_rate', 1e-3, 5e-1),
      6 }

NameError: name 'hyperopt' is not defined

‘l2_leaf_reg’ Współczynnik na poziomie regularyzacji L2 funkcji kosztu. Każda wartość dodatnia jest dozwolona.

Iteracje i szybkość uczenia się (Iterations and learning rate)

Domyślnie CatBoost buduje 1000 drzew. Liczbę iteracji można zmniejszyć, aby przyspieszyć trening.

Gdy liczba iteracji maleje, należy zwiększyć szybkość uczenia się. Domyślnie wartość szybkości uczenia się jest definiowana automatycznie w zależności od liczby iteracji i wejściowego zestawu danych. Zmiana liczby iteracji na mniejszą wartość jest dobrym punktem wyjścia do optymalizacji.

Teraz (po znalezieniu optymalnych parametrów: ‘l2_leaf_reg’ i ‘learning_rate’, zdobądźmy wszystkie dane CV z najlepszymi parametrami:

In [85]:
model = CatBoostClassifier(
    l2_leaf_reg=int(best['l2_leaf_reg']),
    learning_rate=best['learning_rate'],
    iterations=500,
    eval_metric='Accuracy',
    random_seed=42,
    verbose=False,
    loss_function='Logloss',
)
cv_data = cv(Pool(X, y, cat_features=categorical_features_indices), model.get_params())
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-85-8a762d5f7dc8> in <module>
      1 model = CatBoostClassifier(
----> 2     l2_leaf_reg=int(best['l2_leaf_reg']),
      3     learning_rate=best['learning_rate'],
      4     iterations=500,
      5     eval_metric='Accuracy',

NameError: name 'best' is not defined
In [86]:
print('Precise validation accuracy score: {}'.format(np.max(cv_data['test-Accuracy-mean'])))
Precise validation accuracy score: 0.8294051627384961

Przypomnijmy, że przy domyślnych parametrach wynik cv wyniósł 0,8283, a zatem mamy (prawdopodobnie nieistotną statystycznie) pewną poprawę.

Prześlij na konkurs

Teraz zmienilibyśmy nasz dostrojony model na wszystkich danych treningowych, które mamy

In [87]:
model_VV.fit(X, y, cat_features=categorical_features_indices)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-87-73c5b8ae99aa> in <module>
----> 1 model_VV.fit(X, y, cat_features=categorical_features_indices)

NameError: name 'model_VV' is not defined

Na koniec przygotujmy plik zgłoszenia:

In [88]:
import pandas as pd
submisstion = pd.DataFrame()
#submisstion['PassengerId'] = X_train['PassengerId']
#submisstion['Survived'] = model.predict(X_train)
In [89]:
submisstion.to_csv('submission.csv', index=False)

Wreszcie możesz złożyć zgłoszenie w konkursie Titanic Kaggle.

Otóż to! Teraz możesz grać z CatBoost i wygrywać niektóre konkursy! 🙂