Distribution of labels by class and subset
Type of traffic
# in training set
% in training set
# in test set
% in test set
normal
67343
53.46%
9711
43.08%
DoS
45927
36.46%
7460
33.09%
Probe
11656
9.25%
2885
12.79%
R2L
995
0.79%
2421
10.74%
U2R
52
0.04%
67
0.30%
■ Skewed (but realistic) distribution towards normal and DoS traffic
■ Differences:
– In test set, normal traffic is not more than half of the total
– Boost in R2L attacks
– In training set there are 23 different labels, in test set there are 38 labels
■ Important to test the model with attacks not encountered during training
12