Page 46

5. Evaluation and results

The training and test datasets, which were pre-processed earlier (section 4.3. Pre-processing
of the NSL-KDD dataset) showed that they were very different from each other, especially
when it came to the distribution of the traffic type labels (

Table 1

) and of the difficulty levels

(

Figure 21

Figure 22

22). This divergence led to interesting results in the experimental phase of

this project, where the models described above ( Classification models analysis) were put to
the test. It was observed that the test dataset, on which predictions were made, showed far
lower accuracy scores than the accuracy acquired during training. Naturally, overfitting was a
problem that was first addressed, but still the models seemed to stabilise at the performance
shown below, in

Table 7

. For a deeper view into the performance of the models and the effect

of the differences between the KDDTrain+ and KDDTest+ datasets, two cases were created: in
case A, the five models were applied to all the classification scenarios (multiclass, binary, 4-
class), using the KDDTrain+ as training dataset, and the KDDTest+ as test (validation) dataset,
as they were prepared during the pre-processing phase (4.3. Pre-processing of the NSL-KDD
dataset). In case B, the same models were used (mostly with the same parameters that
optimized their performance) on all the classifications, but here only the KDDTrain+ was used
as training and test set, by splitting it with the commonly used train_test_split [30] utility from
the sklearn library, after being pre-processed like before.

When the same algorithms were applied to case B, it showed that the models were working
exceptionally, as is shown in

Table 8

. Due to the very homogenous distribution of the training

and validation parts of the dataset, the performance both in training and testing phases is the
same and gets very high results.

Table 7: summary/comparison of classification algorithms performance in case A

CLASSIFICATION ALGORITHM CLASS SCENARIO TRAINING SET TEST SET

ASE

sin

g t

Test

train

g and

est

LOGISTIC REGRESSION

multi

0,99

0,70

binary

0,97

0,75

4-class

0,99

0,76

DECISION TREE

multi

1,00

0,71

binary

1,00

0,79

4-class

1,00

0,76

K NEAREST NEIBOURS

multi

0,99

0,72

binary

0,99

0,77

4-class

0,99

0,74

GAUSSIAN NAÏVE BAYES

multi

0,77

0,53

binary

0,84

0,55

4-class

0,65

0,42

MULTI LAYER PERCEPTRON

multi

1,00

0,72

binary

1,00

0,79

4-class

1,00

0,77