X and Y components β Alignment
Standard scaling
β Split the dataframes into features π (col. 1- 41) and labels π (col. 42)
β One-hot encode the π component, leave π as labels (output)
β Alignment: training π component is 125973 Γ 121, test π is 22544 Γ 115
β Length difference doesnβt matter, but features dimensions need to be the
same
β Fill the empty values from extra columns with 0 in the right place
β Standard scaler: π₯β² =
π₯βπ
π
(normal distribution)
π₯β²: new scaled value, π₯: original data value, π: mean of training samples, π : standard deviation
β Two steps:
β Fitting: computes mean and standard deviation of the data β training set only
β Transformation: perform the scaling on the data β both sets
20