朱芳鹏,王晓峰.面向船舶工业新闻的文本分类[J].电子测量与仪器学报,2020,34(1):149-155
面向船舶工业新闻的文本分类
Text classification for ship industry news
  
DOI:
中文关键词:  文本分类  主题模型  特征选择  支持向量机(SVM)
英文关键词:text classification  topic model  feature selection  support vector machine (SVM)
基金项目:国家自然科学基金(61872231,61701297)资助项目
作者单位
朱芳鹏 1.上海海事大学信息工程学院 
王晓峰 1.上海海事大学信息工程学院 
AuthorInstitution
Zhu Fangpeng 1.School of Information Engineering, Shanghai Maritime University 
Wang Xiaofeng 1.School of Information Engineering, Shanghai Maritime University 
摘要点击次数: 428
全文下载次数: 0
中文摘要:
      由于船舶工业领域中的新闻内容篇幅较长且专业性较强,同时包含大量船舶领域专业词汇,目前针对该领域新闻文本分类的研究较少且缺少相应的船舶工业新闻语料。构建了一个船舶工业新闻语料库,并提出了一种新的面向船舶工业新闻的文本分类算法,首先基于文档频率、卡方统计量及主题模型LSA进行特征选择和特征降维,将文档 词矩阵映射成文档 主题矩阵后,最终对处理后的特征采用支持向量机进行文本分类。通过新闻文本分类的实验表明,所提出的算法能够有效解决文本向量的高维度、高稀疏性问题,在小样本集和类别有限的前提下相比传统方法具有较好的分类效果。
英文摘要:
      Since the news content in the field of shipbuilding industry is long and professional, and contains a large number of professional vocabulary, there is currently little research on the classification of news texts in this field and the lack of corresponding shipping industry news corpus. This paper builds a shipping industry news corpus, and proposes a new text classification algorithm for ship industry news. Firstly, based on document frequency, chi square statistic and topic model LSA, it conducts feature selection and feature dimension reduction, after mapping the document word matrix into the document topics matrix, the processed features are finally classified by using support vector machine. Experiments on the classification of news texts show that the proposed algorithm can effectively solve the problem of high dimensional and high sparsity of text vectors and has better classification effect than traditional methods under the premise of small sample sets and limited categories.
查看全文  查看/发表评论  下载PDF阅读器