Supervised by: Ministry of Culture of PRC

Sponsored by:National Library of China
  Library Society of China

ISSN 1001-8867    CN 11-2746/G2

Exploring Significant Characteristics and Models for Classification of Structure Function of Academic Documents

Abstract: With the increasing abundance of literature resources, how to acquire knowledge elements efficiently and accurately is the key to achieving accurate literature retrieval and utilization of available literature resources. The identification of the structure function of academic documents is a fundamental work to meet the above requirements. In this study, the proceedings of the Association for Computational Linguistics (ACL) conferences are used as the primitive corpus, and the training corpus of chapter category is obtained by manual annotation. Based on the chapter titles and the in-chapter texts, traditional machine learning and deep learning models are both used for classifier training. Our results show that the title of a chapter is more beneficial to the identification of the structure function of academic documents than the in-chapter texts. The highest F1 value in our experiments is 0.9249, which is obtained on the traditional logistic regression (LR) and support vector machine (SVM) models (slightly higher than on the convolutional neural network [CNN]). And through the experiment of adding other chapter characteristics based on the traditional model, we find that combining the relative position of chapters can effectively improve the classification performance. Finally, this study compares the results of experimental groups with different methods, analyzes the misclassification of the structure function of academic documents, and points out the main direction to improve the classification performance in the future.

Keywords: structure function of academic documents, text classification, characteristic selection, machine learning, deep learning