面向船舶工业新闻的文本分类
DOI:
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(61872231,61701297),


Text classification for ship industry news
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    由于船舶工业领域中的新闻内容篇幅较长且专业性较强,同时包含大量船舶领域专业词汇,目前针对该领域新闻文本分类的研究较少且缺少相应的船舶工业新闻语料。本文构建了一个船舶工业新闻语料库,并提出了一种新的面向船舶工业新闻的文本分类算法,首先基于文档频率、卡方统计量及主题模型LSA进行特征选择和特征降维,将文档-词矩阵映射成文档-主题矩阵后,最终对处理后的特征采用支持向量机进行文本分类。通过新闻文本分类的实验表明,本文提出的算法能够有效解决文本向量的高维度、高稀疏性问题,在小样本集和类别有限的前提下相比传统方法具有较好的分类效果。

    Abstract:

    Since the news content in the field of shipbuilding industry is long and professional, and contains a large number of professional vocabulary, there is currently little research on the classification of news texts in this field and the lack of corresponding shipping industry news corpus. This paper builds a shipping industry news corpus, and advance a new text classification algorithm for ship industry news. Firstly, based on document frequency, chi-square statistic and topic model LSA for feature selection and feature dimension reduction, After that mapping the document-word matrix into the document-topics matrix, the processed features are finally used to classify by support vector machine. Experiments on the classification of news texts show that the proposed algorithm can effectively solve the problem of high dimensional and high sparsity of text vectors and has better classification effect than traditional methods under the premise of small sample sets and limited categories.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2019-06-24
  • 最后修改日期:2019-09-08
  • 录用日期:2019-09-16
  • 在线发布日期:
  • 出版日期:
文章二维码