特征增强的改进LightGBM流量异常检测方法
DOI:
作者:
作者单位:

1.辽宁工程技术大学软件学院葫芦岛125105;2.国网辽宁省电力有限公司营口115005

作者简介:

通讯作者:

中图分类号:

TP393;TN911.7

基金项目:

国家重点研发计划(2018YFB1403303)、辽宁省教育厅高校科研基金(2021LJKZ0327)项目资助


Improved LightGBM for traffic anomaly detection method with feature enhancement
Author:
Affiliation:

1.College of Software, Liaoning Technical University, Huludao 125105, China; 2.State Grid Yingkou Electric Power Company of Liaoning Electric Power Supply Co., Yingkou 115005, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对机器学习在流量异常检测中存在选择特征过于依赖专家经验、原始特征表达能力不足、数据受噪声和离群点影响导致模型鲁棒性差以及处理非平衡海量高维数据时少数异常类检测率低等问题,提出一种特征增强的改进LightGBM(light gradient boosting machine)流量异常检测方法。首先,采用隔离森林(isolation forest, iForest)实现异常值处理,并利用异常值处理后的数据训练引入全局平均池化(global average pooling, GAP)的一维卷积去噪自编码器(convolutional denoising autoencoder, CDAE),间接地消除数据中的噪声,得到原始特征的低维增强表达。然后,采用自适应合成采样(adaptive synthetic, ADASYN)对异常值处理后的数据实现数据增强并运用训练完成的CDAE进行特征提取,将得到的低维特征作为LightGBM的输入,训练并进行贝叶斯参数寻优。最后,通过得到的CDAE+LightGBM组合模型实现对异常流量的精准分类。在NSL-KDD数据集上所提方法的五分类准确率和F1分数分别达到了87.80%和87.75%,能够有效提升检测精度,增强未知攻击的检测能力。在CICIDS2017场景数据集上的测试进一步验证了所提方法可行性,且优于与同类型的深度学习算法。

    Abstract:

    Focusing on the problems of machine learning in traffic anomaly detection, including reliance on expert experience for feature selection, insufficient expression ability of raw features, poor robustness of models due to noise and outliers in data, and low detection rates for minority classes in imbalanced high-dimensional datasets, an improved LightGBM for Traffic Anomaly Detection Method with Feature Enhancement is proposed. Firstly, the isolation forest (iForest) method is utilized to handle outliers, and the data processed by outlier treatment is used to train an one-dimensional convolutional denoising auto-encoder (CDAE) with global average pooling (GAP), which indirectly eliminates noise in the data and obtains low-dimensional enhanced expressions of original features. Then, adaptive synthetic sampling (ADASYN) is applied to the data after outlier treatment for data augmentation, and the trained CDAE is used to extract features. The obtained low-dimensional features are used as input for LightGBM, which is trained and optimized with Bayesian parameter tuning. At last, the precision classification of anomalous traffic is achieved through the utilization of the obtained CDAE+LightGBM ensemble model. The proposed method attains accuracy rates of 87.80% and F1 scores of 87.75% in a five-class classification task on the NSL-KDD dataset. Experimental results demonstrate that the proposed approach significantly enhances detection accuracy and reinforces the capability to identify unknown attacks. The test on CICIDS2017 scene data set further verifies the feasibility of the proposed method, which superior to the same type of deep learning algorithm.

    参考文献
    相似文献
    引证文献
引用本文

陈万志,赵林,王天元.特征增强的改进LightGBM流量异常检测方法[J].电子测量与仪器学报,2024,38(3):195-207

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-05-23
  • 出版日期: