陈万志,赵林,王天元.特征增强的改进LightGBM流量异常检测方法[J].电子测量与仪器学报,2024,38(3):195-207
特征增强的改进LightGBM流量异常检测方法
Improved LightGBM for traffic anomaly detectionmethod with feature enhancement
  
DOI:
中文关键词:  流量异常检测  隔离森林  卷积去噪自编码器  自适应合成采样  LightGBM
英文关键词:traffic anomaly detection  isolation forest  convolutional denoising auto-encoder  adaptive synthetic sampling  LightGBM
基金项目:国家重点研发计划(2018YFB1403303)、辽宁省教育厅高校科研基金(2021LJKZ0327)项目资助
作者单位
陈万志 辽宁工程技术大学软件学院葫芦岛125105 
赵林 辽宁工程技术大学软件学院葫芦岛125105 
王天元 国网辽宁省电力有限公司营口115005 
AuthorInstitution
Chen Wanzhi College of Software, Liaoning Technical University, Huludao 125105, China 
Zhao Lin College of Software, Liaoning Technical University, Huludao 125105, China 
Wang Tianyuan 2.State Grid Yingkou Electric Power Company of Liaoning Electric Power Supply Co., Yingkou 115005, China 
摘要点击次数: 314
全文下载次数: 2267
中文摘要:
      针对机器学习在流量异常检测中存在选择特征过于依赖专家经验、原始特征表达能力不足、数据受噪声和离群点影响导致模型鲁棒性差以及处理非平衡海量高维数据时少数异常类检测率低等问题,提出一种特征增强的改进LightGBM(light gradient boosting machine)流量异常检测方法。首先,采用隔离森林(isolation forest, iForest)实现异常值处理,并利用异常值处理后的数据训练引入全局平均池化(global average pooling, GAP)的一维卷积去噪自编码器(convolutional denoising autoencoder, CDAE),间接地消除数据中的噪声,得到原始特征的低维增强表达。然后,采用自适应合成采样(adaptive synthetic, ADASYN)对异常值处理后的数据实现数据增强并运用训练完成的CDAE进行特征提取,将得到的低维特征作为LightGBM的输入,训练并进行贝叶斯参数寻优。最后,通过得到的CDAE+LightGBM组合模型实现对异常流量的精准分类。在NSL-KDD数据集上所提方法的五分类准确率和F1分数分别达到了87.80%和87.75%,能够有效提升检测精度,增强未知攻击的检测能力。在CICIDS2017场景数据集上的测试进一步验证了所提方法可行性,且优于与同类型的深度学习算法。
英文摘要:
      Focusing on the problems of machine learning in traffic anomaly detection, including reliance on expert experience for feature selection, insufficient expression ability of raw features, poor robustness of models due to noise and outliers in data, and low detection rates for minority classes in imbalanced high-dimensional datasets, an improved LightGBM for Traffic Anomaly Detection Method with Feature Enhancement is proposed. Firstly, the isolation forest (iForest) method is utilized to handle outliers, and the data processed by outlier treatment is used to train an one-dimensional convolutional denoising auto-encoder (CDAE) with global average pooling (GAP), which indirectly eliminates noise in the data and obtains low-dimensional enhanced expressions of original features. Then, adaptive synthetic sampling (ADASYN) is applied to the data after outlier treatment for data augmentation, and the trained CDAE is used to extract features. The obtained low-dimensional features are used as input for LightGBM, which is trained and optimized with Bayesian parameter tuning. At last, the precision classification of anomalous traffic is achieved through the utilization of the obtained CDAE+LightGBM ensemble model. The proposed method attains accuracy rates of 87.80% and F1 scores of 87.75% in a five-class classification task on the NSL-KDD dataset. Experimental results demonstrate that the proposed approach significantly enhances detection accuracy and reinforces the capability to identify unknown attacks. The test on CICIDS2017 scene data set further verifies the feasibility of the proposed method, which superior to the same type of deep learning algorithm.
查看全文  查看/发表评论  下载PDF阅读器