Page 20

X and Y components – Alignment

Standard scaling

■ Split the dataframes into features 𝑋 (col. 1- 41) and labels 𝑌 (col. 42)

– One-hot encode the 𝑋 component, leave 𝑌 as labels (output)

■ Alignment: training 𝑋 component is 125973 × 121, test 𝑋 is 22544 × 115

– Length difference doesn’t matter, but features dimensions need to be the

same

– Fill the empty values from extra columns with 0 in the right place

■ Standard scaler: 𝑥′ =

𝑥−𝜇

𝑠

(normal distribution)

𝑥′: new scaled value, 𝑥: original data value, 𝜇: mean of training samples, 𝑠: standard deviation

■ Two steps:

– Fitting: computes mean and standard deviation of the data → training set only
– Transformation: perform the scaling on the data → both sets