
import scipy.cluster.hierarchy as shc
import pandas as pd
import matplotlib.pyplot as plt
# Import Data
df = pd.read_csv('c:/1/USArrests.csv')
USArrests
Source of data: https://www.kaggle.com/deepakg/usarrests
df.rename(columns = {'Unnamed: 0': 'State'}, inplace=True)
df.head(4)
# Plot
plt.figure(figsize=(17, 4), dpi= 280)
plt.title("USArrests Dendograms", fontsize=22)
dend = shc.dendrogram(shc.linkage(df[['Murder', 'Assault', 'UrbanPop', 'Rape']], method='ward'), labels=df.State.values, color_threshold=100)
plt.xticks(fontsize=12)
plt.show()
Earning and expenses
Source of example: https://stackabuse.com/hierarchical-clustering-with-python-and-scikit-learn/
df3 = pd.read_csv('c:/1/hierarchical-clustering-with-python-and-scikit-learn-shopping-data.csv')
df3.head()
We have a table that shows gender, age, annual income and expenditure. We take a vector of two coordinates from the DataFrame table: annual income in k $ – a tendency to spend on a scale of 1 to 100.
data = df3.iloc[:, 3:5].values
data
plt.figure(figsize=(10, 3))
plt.title("Customer Dendograms")
dend = shc.dendrogram(shc.linkage(data, method='ward'))
The dendrogram showed that there are 5 clusters (5 branches) of the bank’s clients. We create a clustering matrix. Since we had five clusters, we have five labels at the output, i.e. 0 to 4.
from sklearn.cluster import AgglomerativeClustering
cluster = AgglomerativeClustering(n_clusters=5, affinity='euclidean', linkage='ward')
cluster.fit_predict(data)
plt.figure(figsize=(10, 7))
plt.scatter(data[:,0], data[:,1], c=cluster.labels_, cmap='rainbow')
plt.title('CUSTOMERS CLUSTERINGS')
plt.xlabel('Annual earnings')
plt.ylabel('Spending')
Purple cluster – (in the lower right corner) a cluster of clients with high earnings but low expenses. Customers in the middle (blue data points) are those with average income and average salary. The largest number of customers belongs to this category.
Clinical tests
Source of data: https://www.kaggle.com/saurabh00007/diabetescsv
df3 = pd.read_csv('c:/1/diabetes.csv')
df3.head()
PKP = df3[['Age','SkinThickness','BMI']]
PKP.head()
The dendroid chart will tell you how many clusters you want
plt.figure(figsize=(17, 4), dpi= 280)
plt.title("Customer Dendograms")
dend = shc.dendrogram(shc.linkage(PKP, method='ward'))
It seems 5 clusters
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(PKP['Age'], PKP['SkinThickness'], PKP['BMI'], color='black',marker='o')
ax.set_title('Clusters', fontsize= 30, alpha=0.6)
ax.set_xlabel('Age', fontsize= 20, alpha=0.6)
ax.set_ylabel('SkinThickness', fontsize= 20, alpha=0.6)
ax.set_zlabel('BMI', fontsize= 20, alpha=0.6)
from sklearn.cluster import AgglomerativeClustering
cluster = AgglomerativeClustering(n_clusters=5, affinity='euclidean', linkage='ward')
KF = cluster.fit_predict(PKP)
KF
# Initializing KMeans
kmeans = KMeans(n_clusters=5)
# Fitting with inputs
kmeans = kmeans.fit(PKP)
# Predicting the clusters
labels = kmeans.predict(PKP)
# Getting the cluster centers
C = kmeans.cluster_centers_
C
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(PKP['Age'], PKP['SkinThickness'], PKP['BMI'], c=KF)
ax.scatter(C[:, 0], C[:, 1], C[:, 2], marker='.', c='red', s=1000)
ax.set_title('Clusters', fontsize= 30, alpha=0.6)
ax.set_xlabel('Age', fontsize= 20, alpha=0.6)
ax.set_ylabel('SkinThickness', fontsize= 20, alpha=0.6)
ax.set_zlabel('BMI', fontsize= 20, alpha=0.6)