Page 23

Deviant performances

■ Training and test sets are very different

– Distribution and types of labels
– Difficulty levels
– Services and flags (highly correlated with many other features and with output)

■ Check for overfitting:

– Case A: training set 𝐾𝐷𝐷𝑇𝑟𝑎𝑖𝑛 + and test set (validation) 𝐾𝐷𝐷𝑇𝑒𝑠𝑡 +
– Case B: training and test set are part of 𝐾𝐷𝐷𝑇𝑟𝑎𝑖𝑛 + (using .train_test_split)