DATA SCIENCE with MATLAB Part I

Jorge A Arriagada Triana
3 min readOct 9, 2019

01 Machine Learning — 2.7 Project — Clustering: Wine color, corporate bonds and wheat seed kernels.

Photo by Marvin L on Unsplash

This is a bitácora of the wine color project to track my learning.
Previously assimilated stuff:

2.Finding natural patterns in data
2.1 Course example — basketball players
2.2 Low dimensional visualization
2.3 k-means clustering
2.4 Gaussian mixture models
2.5 Interpretating the clusters
2.6 hierarchical clustering
2.7 Project — Clustering: Wine color, corporate bonds and wheat seed kernels.

1 WINE COLOR

The wineData contains information that describes the chemical composition of different wines. The numeric data is storerd in the same matrix numData.

task 1
this is what MATLAB shows first

We are importing the data (wineData.txt) with the classic readtable function, making it categorical and extracting the numeric data.

The wine data table (6448x13) contains:

1 fixedacity
2 volatilecidity
3 citriacid
4 residualsugar
5 chlorides
6 freesulfurdioxide
7 totalsulfurdioxide
8 density
9 pH
10 sulphates
11 alcohol
12 quality
13 Color

wineData.txt

Now let’s do the PCA stuff.

%TODO — task1: Perform PCA
[~,scrs,~,~,pexp] = pca(numData)
%Crete the pareto chart
figure(1)
parete(pexp)

pareto chart

Cool. Now let’s go to task 2. Here we are going to cluster the data and experiment with k-means and GMM clustering. Like always, we want to visulize this stuff twith a scatter plot.

task 2

%% TODO TASK 2: clustering into two gropus
% k-means
g = kmeans(numData,2,’Replicates’,5)
figure(2)
scatters(srcs(:,1),srcs(:,2),4,g)

% GMMs
gmm = fitgmdist(numData,2,’Replicates’,5)
figure(3)
gscatters(srcs(:,1),srcs(:,2),g,[],’.’)

k-means
GMM

Awesome. Now TASK 3 here we have to compare the resulting clsters with the groups in the variable Color. Let’s create a a stacked abr chart with group1 and 2 along the axis X and the red and white data are ploted with different colors. Let’s include a legend with the corresponding color.

%% TODO — TASK 3: Cross tabulate grouping and wine color
figure(4)
bar(crosstab(g,wineData.Color),’stacked’)
legend(categories(wineData.Color),’Location’,’NorthWest’)

task3

That’s it. that was fun. Now let’s do the part 2/3 Corporate Bonds

2 CORPORATE BODS

3 KERNEL SEEDS

--

--