# DATA SCIENCE with MATLAB Part I

01 Machine Learning — 2.7 Project — Clustering: Wine color, corporate bonds and wheat seed kernels.

This is a bitácora of the wine color project to track my learning.

Previously assimilated stuff:

2.Finding natural patterns in data

2.1 Course example — basketball players

2.2 Low dimensional visualization

2.3 k-means clustering

2.4 Gaussian mixture models

2.5 Interpretating the clusters

2.6 hierarchical clustering

2.7 Project — Clustering: Wine color, corporate bonds and wheat seed kernels.

# 1 WINE COLOR

The wineData contains information that describes the chemical composition of different wines. The numeric data is storerd in the same matrix numData.

We are importing the data (wineData.txt) with the classic readtable function, making it categorical and extracting the numeric data.

The wine data table (6448x13) contains:

1 fixedacity

2 volatilecidity

3 citriacid

4 residualsugar

5 chlorides

6 freesulfurdioxide

7 totalsulfurdioxide

8 density

9 pH

10 sulphates

11 alcohol

12 quality

13 Color

Now let’s do the PCA stuff.

%TODO — task1: Perform PCA

[~,scrs,~,~,pexp] = pca(numData)

%Crete the pareto chart

figure(1)

parete(pexp)

Cool. Now let’s go to task 2. Here we are going to cluster the data and experiment with k-means and GMM clustering. Like always, we want to visulize this stuff twith a scatter plot.

%% TODO TASK 2: clustering into two gropus

% k-means

g = kmeans(numData,2,’Replicates’,5)

figure(2)

scatters(srcs(:,1),srcs(:,2),4,g)% GMMs

gmm = fitgmdist(numData,2,’Replicates’,5)

figure(3)

gscatters(srcs(:,1),srcs(:,2),g,[],’.’)

Awesome. Now TASK 3 here we have to compare the resulting clsters with the groups in the variable Color. Let’s create a a stacked abr chart with group1 and 2 along the axis X and the red and white data are ploted with different colors. Let’s include a legend with the corresponding color.

%% TODO — TASK 3: Cross tabulate grouping and wine color

figure(4)

bar(crosstab(g,wineData.Color),’stacked’)

legend(categories(wineData.Color),’Location’,’NorthWest’)

That’s it. that was fun. Now let’s do the part 2/3 Corporate Bonds

2 CORPORATE BODS

3 KERNEL SEEDS