My experience doing the Machine learning with Matlab online course
episode 1
The first thing we gotta do is a basketaball player statistics example. Let’s do it!
1.3 Getting started with the data
1 — Importing data 1/8
TASK : Read the file bballPlayers.txt
into a table named playerInfo
.
So I’m just typing the stuff in matlab and the hit run
playerInfo = readtable('bballPlayers.txt');
playerStats= readtable('bballStats.txt');
Summary: Getting Started with the Data
The readtable
function creates a table in MATLAB from a data file.
>> allStats = readtable('bballStats.txt');
You can use a logical vector to index into a table.
>> stats = allStats(allStats.year>=1990);
The categorical
function creates a categorical array from data. Use the categories
function to save the categories from a categorical array.
>> playerInfo.hsCountry = categorical(playerInfo.hsCountry);
>> cats = categories(playerInfo.hsCountry)
cats = 2x1 cell array {'HOL'}
{'USA'}
The grpstats
function calculates statistics grouped according to a grouping variable.
>> totalStats = grpstats(stats,'playerID',@sum)
The innerjoin
function merges two tables, retaining only the common key variable observations.
>> data = innerjoin(playerInfo,totalStats);
You can use element-wise division to scale variables by another variable in the table.
>> data{:,7:end} = data{:,7:end}./data.GP;
The normalize
function can perform several methods of normalization.
>> data{:,7:end} = normalize(data{:,7:end});
2.2 Low multidimensional visualizations
step 1 — calculate pairwise distances
we can use the pdist function to calculate pairwise distance between observations. The input should be a numeric matrix…
d = pdist(measurements,distance)
d: A distance or dissimilarity vector containing the distance between each pair of observations.
measurements: A numeric matrix containing the data. Each row is considered as an observation.
distance: An optional input that indicates the method of calculating the dissimilarity or distance. Commonly used methods are
'euclidean'
(default),'cityblock'
, and'correlation'
.
Step 2 — Perform multidimensional scaling
[x,e] = cmdscale(d)
The m-by-q matrix of the reconstructed coordinates in q-dimensional space. q is the minimum number of dimensions needed to achieve the given pairwise distances.
e
Eigenvalues of the matrixx*x'
.
d
A dissimilarity or distance vector.
We can use the classic eigenvalues e to know if a low-dimensional approximation to the points in x bring a reasonable representation of our data. So, if the p eignevalues are considerably larger than the rest, then the poinsts are somehow well approxiameted by the first p dimenisions ( the first p colunmns of x)
i just learned paretto and scatter3 plots
pareto(e)
scatter3(Y(:,1),Y(:,2),Y(:,3))
>> scatter(Y(:,1),Y(:,2))