山东科学 ›› 2019, Vol. 32 ›› Issue (6): 106-111.doi: 10.3976/j.issn.1002-4026.2019.06.015

• 其他研究论文 • 上一篇    下一篇

一种基于深度学习的中文文本特征提取与分类方法

曹鲁慧1,邓玉香2,陈通3*,李钊4   

  1. 1.山东大学,山东 济南 250100;2. 山东财源保障评价中心, 山东 济南 250001;3. 山东省电子政务大数据工程技术研究中心,山东 济南 250014;4. 齐鲁工业大学(山东省科学院) 山东省计算中心(国家超级计算济南中心)  山东省计算机网络重点实验室, 山东 济南 250014
  • 收稿日期:2019-08-28 出版日期:2019-12-20 发布日期:2019-12-11
  • 通信作者: 陈通,男,工程师,研究方向为计算机视觉。Tel:18615509610,E-mail: chentong@sdas.org
  • 作者简介:曹鲁慧(1975—),女,工程师,研究方向为智慧校园。E-mail:caolh@sdu.edu.cn
  • 基金资助:
    山东省重点研发计划(2018GGX101012)

A deep learning-based method for Chinese text-feature extraction and classification

CAO Lu-hui1,DENG Yu-xiang2,CHEN Tong3*,LI Zhao4   

  1. 1. Shandong University, Jinan 250100, China;2. Shandong Financial Security and Evaluation Center,Jinan 250001, China;
    3. Big Data Engineering Technology Research Center of E-Government, Jinan 250014, China; 4. Shandong Provincial Key 
    Laboratory of Computer Networks,Shandong Computer Science Center(National Super Computer in Jinan), Qilu University of Technology(Shandong Academy of Sciences) , Jinan 250014, China
  • Received:2019-08-28 Online:2019-12-20 Published:2019-12-11

摘要: 提出了一种基于卷积循环神经网络的文本特征提取方法,同时对比使用统计学中的TF-IDF以及Word2vec方法的文本特征表示,将提取的特征分别放入SVM与随机森林分类器中对来源于中国知网的中文学术论文数据集进行分类。实验结果表明,使用卷积神经网络和卷积循环神经网络特征提取模型提取的特征所取得的分类效果比TF-IDF、Word2vec特征提取方法得到的分类效果更好,同时使用SVM和随机森林分类器取得的分类效果略好于原生的神经网络。

关键词: 卷积神经网络, 卷积循环神经网络, 特征提取, 文本分类

Abstract: This paper proposes a text-feature extraction method based on a convolutional recurrent neural network, and in the meanwhile, it also compares the statistical methods TF-IDF and Word2vec for text-feature representation. Text features are then fed into the SVM and Random forest classifier to classify the Chinese academic papers from CNKI. Experimental results show that the classification results obtained from the feature extraction models based on the convolutional neural network and convolutional recurrent neural network are better than those obtained from the TF-IDF and Word2vec feature extraction methods. Furthermore, the classification results obtained from the SVM and Random forest classifier are slightly better than those obtained from the native neural network

Key words: convolutional neural network, convolutional recurrent neural network, feature extraction, text classification

中图分类号: 

  • TP391.1