陈万志,张国满,王天元.基于特征耦合泛化的流量异常检测方法[J].电子测量与仪器学报,2024,38(2):120-130 |
基于特征耦合泛化的流量异常检测方法 |
Traffic anomaly detection method based on feature coupling generalization |
|
DOI: |
中文关键词: 异常检测 离群点检测 特征耦合泛化 特征选择 |
英文关键词:anomaly detection outlier detection feature coupling generalization feature selection |
基金项目:国家重点研发计划(2018YFB1403303)、辽宁省教育厅高校科研基金(2021LJKZ0327)项目资助 |
|
|
摘要点击次数: 562 |
全文下载次数: 542 |
中文摘要: |
针对现有流量异常检测模型中稀疏特征易被特征选择算法忽略的问题,提出一种基于特征耦合泛化(FCG)的流量异常检测方法。首先,采用DBSCAN密度聚类算法去除数据中的离群点,降低异常点对后续FCG算法的影响。其次,使用最大相关最小冗余(mRMR)算法对数据特征进行排序,选择对分类最具影响力的特征生成FCG算法中的类别区分特征(CDF),以增强分类能力。利用K最近邻(KNN)算法填补CDF中的缺失值,保持数据完整性。然后,将数据按照攻击类别分组,分别使用mRMR算法对特征进行排序,挑选每种攻击类别数据中具有实例区分能力的稀疏特征作为FCG算法中的实例区分特征(EDF)。利用两种特征在异常检测数据中的耦合程度和EDF的上层概念将EDF转化成更泛化的特征。最后,将经过处理的数据输入基于贝叶斯优化(Bayesian optimization, BO)参数的随机森林(RF)模型进行分类识别。通过在NSL-KDD数据集上进行仿真实验,准确率达到了91.79%,验证了所提方法具有较好的检测性能。 |
英文摘要: |
Considering the problem that the sparse features in the existing traffic anomaly detection models are easily ignored by the feature selection algorithms, a traffic anomaly detection method based on feature coupling generalization (FCG) was proposed. First, the DBSCAN density clustering algorithm was used to remove outliers in the data to reduce the impact of the anomalies on the subsequent FCG algorithm. Second, the minimal-redundancy-maximal-relevance (mRMR) algorithm was used to sort the data features, and the most influential features for classification were selected to generate the class-distinguishing features (CDF) in the FCG algorithm, in order to enhance the classification ability. The K-nearest neighbors (KNN) algorithm was used to fill in the missing values in CDF to maintain data integrity. Then, the data were grouped according to attack categories, and the features were sorted using the mRMR algorithm respectively, and the sparse features with instance-distinguishing ability in the data of each attack category were selected as the example-distinguishing feature (EDF) in the FCG algorithm. The degree of coupling between the two features in the anomaly detection data and the upper concept of EDF were used to transform EDF into more generalized features. Finally, the processed data were fed into the random forest (RF) model based on Bayesian optimization (BO) parameters for classification and identification. Through simulation experiments on the NSL-KDD dataset, the accuracy reached 91.79%, which verifies the proposed method has a good detection performance. |
查看全文 查看/发表评论 下载PDF阅读器 |
|
|
|