Previous Article in event
Next Article in event
Detection and Classification of Anomalies in Network Traffic Using Generalized Entropies and OC-SVM with Mahalanobis Kernel
Published:
03 November 2014
by MDPI
in 1st International Electronic Conference on Entropy and Its Applications
session Information Theory
Abstract: Network anomaly detection and classification is an important open issue of network security. Several approaches and systems based on different mathematical tools have been studied and developed. Among them, the Anomaly-Network Intrusion Detection System (A-NIDS), this monitors network traffic and compares it against an established baseline of "normal" traffic profile. Then, it is necessary to characterize the "normal" Internet traffic. This paper presents an approach for anomaly detection and classification based on: The entropy of selected features (including Shannon, Renyi and Tsallis entropies); the construction of regions from entropy data employing the Mahalanobis distance (MD) and One Class Support Vector Machine (OC-SVM) with different kernels (RBF and Mahalanobis) for normal and abnormal traffic. Regular and non-regular regions built from "normal" traffic profiles, allow the anomaly detection; whilst the classification is performed under the assumption that regions corresponding to the attack classes have been characterized previously. Although, this approach allows the use of as many features as required, only four well known significant features were selected in our case. To evaluate our approach two different data sets were used: One set of real traffic obtained from an Academic LAN, and the other a subset of the 1998 MIT-DARPA set. The selected features sets computed in our experiments provide detection rates up to 99.98% with "normal" traffic and up to 99.05% with anomalous traffic and false alarm rate of 0.019%. Experimental results show that certain values of the q parameter of the generalized entropies and the use of OC-SVM improves the detection rate of some attack classes, due to a better fit of the region to the data. Besides, our results show that MD allows to obtain high detection rates with an efficient computation time, while OC-SVM achieved detection rates lightly more precise but more expensive computationally.
Keywords: Generalized entropies; network traffic; anomaly detection; OC-SVM; Mahalanobis Kernel; Mahalanobis distance
Comments on this paper
Nabin Malakar
17 November 2014
On Tsallis Entropy, and Classification
Hi Jayro et al,
Interesting work!
Just curious questions:
1. You mentioned that
"parameter q ( in generalized entropy) is used to make less or more sensitive the entropy to certain events within the distribution
"
Could you please clarify about the kind of events for various values of q?
2. How did you decided on the Tsallis entropy with q = 0.01? Can we objectively decide on this number, or it is an empirical value?
3. Could you please tell a bit more about classification scheme used in here. What are the distinguishing features? and how different are they?
Also, what would happen if you were to discretely vary q from 0 to 1 (Shannon)? Does the nature of detection change?
Does if affect the "classification"?
Detection rates for Knn>2 seems to be pretty much flat. Could you please comment on that.
Thanks!
Interesting work!
Just curious questions:
1. You mentioned that
"parameter q ( in generalized entropy) is used to make less or more sensitive the entropy to certain events within the distribution
"
Could you please clarify about the kind of events for various values of q?
2. How did you decided on the Tsallis entropy with q = 0.01? Can we objectively decide on this number, or it is an empirical value?
3. Could you please tell a bit more about classification scheme used in here. What are the distinguishing features? and how different are they?
Also, what would happen if you were to discretely vary q from 0 to 1 (Shannon)? Does the nature of detection change?
Does if affect the "classification"?
Detection rates for Knn>2 seems to be pretty much flat. Could you please comment on that.
Thanks!
Jayro Santiago
20 November 2014
Dear Nabin Malakar,
Thanks for reading our work.
We refer to the fact that the factor q in the equations 2 and 3 modify the entropy values,
and consequently the behavior of the entropy. Also, for a specific event con probability p
and with appropriated q-values we can increase (or decrease) the entropy value respect to
Shannon entropy. By example (closest to our application), whether we have a certain distribution
which we want to identify events with small probability (<0.1) or events likely to occur greater than 0.9 and discriminate all other events, using the Tsallis entropy with q = 0.01
we can achieve, because entropy values within the range of 0.1 to 0.9 tend to be equal and large,
while the values of entropy outside this decrementing, see figure 2 in
https://www.dropbox.com/s/cis9r854vi8dv34/e-conference%20Entropy.pdf?dl=0
About the value of parameter q, the election is obtained by means of experiments, choosing values
that provide higher level of detection.
For the classification, as distinguishing features to this traffic study were selected in this work
IP address (source and destine), and Ports (source and destine), because generally change during
a set of anomalies. Next, the entropy of these features is calculated. The behavior of these features
via theirs entropies in the normal and abnormal network traffic were studied. Our approach is
based on mathematical tools such as: Mahalanobis distance, covariance matrix, OC-SVM,
and Knn Algorithm. It allows the construction of different regions (regular and non-regular),
which encompass the behaviors of the four selected features. . These regions allow, first the
classification as normal or abnormal. And after that, the classification of known attacks is performed.
The results of the experiments show that using non-regular regions we obtain a better classification,
because these regions are more separated from each other, since the regular regions overlap
between them difficult the classification.
In effect, if we discretely vary q from 0 to 1 (for Renyi and Tsallis) the results of the detection and
classification change due to the equations 2 and 3. But, Shannon entropy is independent of q.
About Knn. We assume that every entropy point outside of the normal region is an anomaly.
But, not every anomaly belongs to a specific attack class. If a point is an anomalous but theirs
successive neighbors are normal, then it is normal too. If a point is an anomalous and
theirs successive neighbors belong to an anomaly class A, then it belongs to this class.
When an anomaly occurs in network traffic, the entropy values begin to deviate from the normal
region to concentrate on a new region. If few neighbors K <2 are selected, this state of transition affects the classification, if k is chosen larger can mitigate the effect of this transition and therefore,
the classification rate is stabilized.
Thanks for reading our work.
We refer to the fact that the factor q in the equations 2 and 3 modify the entropy values,
and consequently the behavior of the entropy. Also, for a specific event con probability p
and with appropriated q-values we can increase (or decrease) the entropy value respect to
Shannon entropy. By example (closest to our application), whether we have a certain distribution
which we want to identify events with small probability (<0.1) or events likely to occur greater than 0.9 and discriminate all other events, using the Tsallis entropy with q = 0.01
we can achieve, because entropy values within the range of 0.1 to 0.9 tend to be equal and large,
while the values of entropy outside this decrementing, see figure 2 in
https://www.dropbox.com/s/cis9r854vi8dv34/e-conference%20Entropy.pdf?dl=0
About the value of parameter q, the election is obtained by means of experiments, choosing values
that provide higher level of detection.
For the classification, as distinguishing features to this traffic study were selected in this work
IP address (source and destine), and Ports (source and destine), because generally change during
a set of anomalies. Next, the entropy of these features is calculated. The behavior of these features
via theirs entropies in the normal and abnormal network traffic were studied. Our approach is
based on mathematical tools such as: Mahalanobis distance, covariance matrix, OC-SVM,
and Knn Algorithm. It allows the construction of different regions (regular and non-regular),
which encompass the behaviors of the four selected features. . These regions allow, first the
classification as normal or abnormal. And after that, the classification of known attacks is performed.
The results of the experiments show that using non-regular regions we obtain a better classification,
because these regions are more separated from each other, since the regular regions overlap
between them difficult the classification.
In effect, if we discretely vary q from 0 to 1 (for Renyi and Tsallis) the results of the detection and
classification change due to the equations 2 and 3. But, Shannon entropy is independent of q.
About Knn. We assume that every entropy point outside of the normal region is an anomaly.
But, not every anomaly belongs to a specific attack class. If a point is an anomalous but theirs
successive neighbors are normal, then it is normal too. If a point is an anomalous and
theirs successive neighbors belong to an anomaly class A, then it belongs to this class.
When an anomaly occurs in network traffic, the entropy values begin to deviate from the normal
region to concentrate on a new region. If few neighbors K <2 are selected, this state of transition affects the classification, if k is chosen larger can mitigate the effect of this transition and therefore,
the classification rate is stabilized.