Dendrograms is used to count number of clusters. Source of data: https://www.kaggle.com/saurabh00007/diabetescsv We have population consist of 768 females in 9 categories. For the first plot we took two variables: ‘BMI’, ‘Age’ Body mass index (BMI) is a value derived from the mass (weight) and height of a person. The BMI is defined as the body mass divided by the square of the body height, and is universally expressed in units of kg/m2, resulting from mass in kilograms and height in metres. https://en.wikipedia.org/wiki/Body_mass_index We count last legs of dark blue tree. We have counted 6 branches. We ought to use 6 clusters. We have classes in range from 0 to 5.
Dendrogram works on the distance between point of dataframe.import scipy.cluster.hierarchy as shc
import pandas as pd
import matplotlib.pyplot as plt
Clinical tests
df = pd.read_csv('c:/1/diabetes.csv')
df.head()
df.shape
Test for attractiveness of women
plt.figure(figsize=(17, 4), dpi= 280)
plt.title("Dendrogram of female population in BMI and Age", fontsize=22, alpha=0.5)
dend = shc.dendrogram(shc.linkage(df[['BMI', 'Age']], method='ward'), labels=df.Outcome.values, color_threshold=100)
plt.xticks(fontsize=12)
plt.show()
PKP = df[['BMI', 'Age']]
PKP.head()
Clustering population by BMI and Age
from sklearn.cluster import AgglomerativeClustering
cluster = AgglomerativeClustering(n_clusters=6, affinity='euclidean', linkage='ward')
cluster.fit_predict(PKP)
plt.figure(figsize=(10, 7))
plt.scatter(PKP['BMI'], PKP['Age'], c=cluster.labels_, cmap='rainbow')
plt.title('Clustering population by BMI and Age', fontsize=22, alpha=0.5)
plt.xlabel('BMI', fontsize=22, alpha=0.5)
plt.ylabel('Age', fontsize=22, alpha=0.5)
plt.figure(figsize=(17, 4), dpi= 280)
plt.title("Dendrogram of female population in BMI and Age", fontsize=22, alpha=0.5)
dend = shc.dendrogram(shc.linkage(df[['DiabetesPedigreeFunction', 'Insulin']], method='ward'), labels=df.Outcome.values, color_threshold=100)
plt.xticks(fontsize=12)
plt.show()
PGK = df[['DiabetesPedigreeFunction', 'Insulin']]
from sklearn.cluster import AgglomerativeClustering
cluster = AgglomerativeClustering(n_clusters=7, affinity='euclidean', linkage='ward')
cluster.fit_predict(PGK)
plt.figure(figsize=(10, 7))
plt.scatter(PGK['DiabetesPedigreeFunction'], PGK['Insulin'], c=cluster.labels_, cmap='rainbow')
plt.title('Clustering population by BMI and Age', fontsize=22, alpha=0.5)
plt.xlabel('DiabetesPedigreeFunction', fontsize=22, alpha=0.5)
plt.ylabel('Insulin', fontsize=22, alpha=0.5)
Perfect plots: Dendrogram
In [ ]:
In [48]:
Out[48]:
In [47]:
Out[47]:
In [58]:
In [26]:
Out[26]:
In [54]:
Out[54]:
In [56]:
Out[56]:
In [63]:
In [62]:
In [61]:
Out[61]:
In [66]:
Out[66]: