My experience doing the Machine learning with Matlab online course

Jorge A Arriagada Triana
3 min readAug 5, 2020

episode 1

The first thing we gotta do is a basketaball player statistics example. Let’s do it!

1.3 Getting started with the data

1 — Importing data 1/8

TASK : Read the file bballPlayers.txt into a table named playerInfo.

So I’m just typing the stuff in matlab and the hit run

playerInfo = readtable('bballPlayers.txt');
playerStats= readtable('bballStats.txt');

Summary: Getting Started with the Data

The readtable function creates a table in MATLAB from a data file.

>> allStats = readtable('bballStats.txt');

You can use a logical vector to index into a table.

>> stats = allStats(allStats.year>=1990);

The categorical function creates a categorical array from data. Use the categories function to save the categories from a categorical array.

>> playerInfo.hsCountry = categorical(playerInfo.hsCountry);
>> cats = categories(playerInfo.hsCountry)

cats =  2x1 cell array      {'HOL'}
{'USA'}

The grpstats function calculates statistics grouped according to a grouping variable.

>> totalStats = grpstats(stats,'playerID',@sum)

The innerjoin function merges two tables, retaining only the common key variable observations.

>> data = innerjoin(playerInfo,totalStats);

You can use element-wise division to scale variables by another variable in the table.

>> data{:,7:end} = data{:,7:end}./data.GP;

The normalize function can perform several methods of normalization.

>> data{:,7:end} = normalize(data{:,7:end});

2.2 Low multidimensional visualizations

step 1 — calculate pairwise distances

we can use the pdist function to calculate pairwise distance between observations. The input should be a numeric matrix…

d = pdist(measurements,distance)

d: A distance or dissimilarity vector containing the distance between each pair of observations.

measurements: A numeric matrix containing the data. Each row is considered as an observation.

distance: An optional input that indicates the method of calculating the dissimilarity or distance. Commonly used methods are 'euclidean' (default), 'cityblock', and 'correlation'.

Step 2 — Perform multidimensional scaling

[x,e] = cmdscale(d)

The m-by-q matrix of the reconstructed coordinates in q-dimensional space. q is the minimum number of dimensions needed to achieve the given pairwise distances.

eEigenvalues of the matrix x*x'.

dA dissimilarity or distance vector.

We can use the classic eigenvalues e to know if a low-dimensional approximation to the points in x bring a reasonable representation of our data. So, if the p eignevalues are considerably larger than the rest, then the poinsts are somehow well approxiameted by the first p dimenisions ( the first p colunmns of x)

i just learned paretto and scatter3 plots

pareto(e)

scatter3(Y(:,1),Y(:,2),Y(:,3))

>> scatter(Y(:,1),Y(:,2))

Principal Component Analysis

--

--