Abstract:Since the news content in the field of shipbuilding industry is long and professional, and contains a large number of professional vocabulary, there is currently little research on the classification of news texts in this field and the lack of corresponding shipping industry news corpus. This paper builds a shipping industry news corpus, and advance a new text classification algorithm for ship industry news. Firstly, based on document frequency, chi-square statistic and topic model LSA for feature selection and feature dimension reduction, After that mapping the document-word matrix into the document-topics matrix, the processed features are finally used to classify by support vector machine. Experiments on the classification of news texts show that the proposed algorithm can effectively solve the problem of high dimensional and high sparsity of text vectors and has better classification effect than traditional methods under the premise of small sample sets and limited categories.