Page 12

Distribution of labels by class and subset

Type of traffic

# in training set

% in training set

# in test set

% in test set

normal

67343

53.46%

9711

43.08%

DoS

45927

36.46%

7460

33.09%

Probe

11656

9.25%

2885

12.79%

R2L

995

0.79%

2421

10.74%

U2R

0.04%

0.30%

■ Skewed (but realistic) distribution towards normal and DoS traffic

■ Differences:

– In test set, normal traffic is not more than half of the total
– Boost in R2L attacks
– In training set there are 23 different labels, in test set there are 38 labels

■ Important to test the model with attacks not encountered during training