background image

Pre-processing Steps

14

DATAFRAME 

CREATION

Import .csv files

Create separate 

instances 

(multiclass, binary, 

grouped)

DATA CLEANING

Clean out 

empty/wrong 

cells

Separate last 

column

ONE-HOT 

ENCODING

Categorical 

values need to 

be numerical

X AND Y 

COMPONENTS

Separate 

features from 

labels

ALIGNMENT

Training and 

test sets need 

to be uniform

SCALING

Minimize bias

All variables in 

0-1 range

INPUT TO 

MODELS