山东科学

• 情报分析与数据管理 •    

基于大语言模型的教育事业统计指标体系知识图谱构建

王朋宇1,姜树明1*,魏志强1,余军1,张萌萌2



收稿时间:2025-07-25    修回时间:2025-11-02

作者简介:王朋宇(2002—),男,硕士研究生,研究方向为数字图书馆技术与系统。E-mail: 19153969328@163.com

*通信作者,姜树明,男,副研究员,研究方向为多媒体数据处理。E-mail: jsm@qlu.edu.cn 电话:0531-68606133

  

  1. 1 齐鲁工业大学(山东省科学院) 山东省科学院情报研究所,山东 济南 2500142 聊城市科技信息研究中心,山东 聊城 252000
  • 收稿日期:2025-07-25 接受日期:2025-11-02 上线日期:2026-01-07
  • 通信作者: 姜树明 E-mail:jsm@qlu.edu.cn
  • 作者简介:王朋宇(2002—),男,硕士研究生,研究方向为数字图书馆技术与系统。E-mail: 19153969328@163.com

Construction of a knowledge graph for the educational statistical indicator system based on large language models

WANG Pengyu1,JIANG Shuming1*,WEI Zhiqiang1,YU Jun1,ZHANG Mengmeng2   

  1. 1. Information Research Institute of Shandong Academy of Sciences, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250014, China;2. Liaocheng Science and Technology Information Research Center, Liaocheng 252000, China
  • Received:2025-07-25 Accepted:2025-11-02 Online:2026-01-07
  • Contact: JIANG Shuming E-mail:jsm@qlu.edu.cn

摘要: 当前教育事业统计工作面临知识检索效率低、数据利用门槛高等问题,文章以教育事业统计工作依据的指标体系相关知识为研究对象,针对知识文档化特点提出一种基于大语言模型的教育事业统计指标体系知识图谱构建方法,旨在为解决上述问题提供知识支撑。具体构建了涵盖数据处理层、本体构建层、图谱构建层和应用展望层的研究框架,在数据处理层,将原始文档转化为Markdown格式并进行清洗,以增强模型的结构解析能力;在本体构建层,采用七步法构建教育事业统计指标体系本体;在图谱构建层,提出融合思维链提示与本体结构的增强提示词策略,引导大语言模型完成知识抽取,并基于Neo4j实现知识存储与可视化。实验结果表明,该策略在实体–属性抽取任务中的F1值为96.22%,在实体–关系抽取任务中的F1值为92.23%,显著优于基础提示词方案,验证了思维链提示在复杂信息抽取中的有效性。研究所构建的知识图谱有效表达了教育事业统计指标体系的复杂知识,为实现领域智能信息检索、自然语言数据查询提供了知识基础,也为其他统计领域的知识组织提供了参考。

关键词: 教育事业统计, 指标体系, 知识图谱, 大语言模型, 本体构建, 思维链提示

Abstract: Current statistical work in the field of educational development faces challenges such as low knowledge-retrieval efficiency and high technical barriers to data utilization. This study focuses on the knowledge related to the indicator system that underpins educational statistical work and proposes a large language model (LLM)-based method for constructing a knowledge graph for the educational statistical indicator system so as to provide knowledge support for addressing the above issues. Specifically, a research framework comprising the following layers is established: data processing, ontology construction, graph construction, and application prospect. In the data processing layer, original documents are converted into Markdown format and cleaned to enhance structural parsing. In the ontology construction layer, a seven-step method is used to construct the ontology of the educational statistical indicator system. In the graph construction layer, an enhanced prompting strategy integrating chain-of-thought prompting with the ontology structure guides the LLM in knowledge extraction, with Neo4j used for storage and visualization. Experimental results show that this strategy achieved an F1 score of 96.22% for entity–attribute extraction and 92.23% for entity–relation extraction, significantly outperforming the basic prompting strategy, thereby verifying the effectiveness of chain-of-thought prompting in complex information extraction. The constructed knowledge graph effectively represents the complex knowledge in the educational statistical indicator system, providing a knowledge foundation for intelligent information retrieval and natural language data querying and offering insights for knowledge organization in other statistical fields.

Key words:  , Educational Statistics, Indicator System, Knowledge Graph, Large Language Model, Ontology Construction, Chain-of-Thought Prompting

中图分类号: 

  • TP391.1

开放获取 本文遵循知识共享-署名-非商业性4.0国际许可协议(CC BY-NC 4.0),允许第三方对本刊发表的论文自由共享(即在任何媒介以任何形式复制、发行原文)、演绎(即修改、转换或以原文为基础进行创作),必须给出适当的署名,提供指向本文许可协议的链接,同时表明是否对原文作了修改,不得将本文用于商业目的。CC BY-NC 4.0许可协议详情请访问 https://creativecommons.org/licenses/by-nc/4.0