山东科学

• 情报分析与数据处理 •    

DeepSeek对干细胞领域科技论文的创新性评价研究

马纯健1,王超2* ,许海云2 ,王乐康2 ,张鑫3 ,陈亮4   

  1. 1.齐鲁工业大学(山东省科学院) 情报研究所,山东 济南 250014; 2.山东理工大学,山东 淄博 255049; 3.中国科学院 成都文献情报中心,四川 成都 610299; 4.中国科学技术信息研究所,北京 100038
  • 收稿日期:2025-12-04 接受日期:2025-12-29 上线日期:2026-06-01
  • 通信作者: 王超 E-mail:kingtaoist@yeah.net
  • 作者简介:马纯健(1997-),男,硕士研究生,研究方向为科学计量学。E-mail: majinlu0531@163.com
  • 基金资助:
    国家自然科学基金项目(72274113)

Evaluation of innovativeness instem cell research articles using DeepSeek

MA Chunjian1, Wang Chao2*, XU Haiyun2, WANG Lekang2, ZHANG Xin3, CHEN Liang4   

  1. 1. Information Research Institute of Shandong Academy Sciences, Qilu University of Technology (Shandong Academy of Sciences),
     Jinan 250014, China; 2. Shandong University of Technology, Zibo 255049, China; 3. Chengdu Library and Information Center, Chinese Academy of Sciences, Chengdu 6102994, China;  4. Institute of Scientific and Technical Information of China, Beijing 100038, China
  • Received:2025-12-04 Accepted:2025-12-29 Online:2026-06-01
  • Contact: Wang Chao E-mail:kingtaoist@yeah.net

摘要: 现有科技论文创新性评价方法主要依赖专家人工评审,效率低且主观性强,或者依赖滞后的量化指标,缺乏前瞻性与解释力。本文提出一种基于DeepSeek模型的干细胞领域科技论文创新性评价框架。首先,以干细胞领域的科技论文为实验对象;其次,调用bge-large-en-v1.5模型对最具代表性论文集的论文标题与摘要进行向量化,构建文本语义向量数据库,并且利用deepseek-reasoner模型抽取创新性特征以组建向量化的创新特征数据库,并将两数据库进行加权融合;最后,将目标论文放入融合后的两数据库,经FAISS向量检索与Top-k相似度匹配后,生成最终创新性打分等级,并和DeepSeek模型在无干预情况下的打分结果进行比较验证,以此检验DeepSeek模型对生物医学领域的科技论文进行创新性评价的效果。实证结果显示,DeepSeek模型在创新性评价中存在整体打分偏高的倾向,在经过训练之后,该模型可以有效提高创新性评价的稳定性与有效性,体现出在识别论文创新维度与差异特征方面具有较高潜力。

关键词: 生成式大语言模型, 创新性评价, 语义嵌入, 自动化评价, 专家评审

Abstract: Current approaches to assessing the innovativeness of scientific and technological papers rely predominantly on expert peer review, a process that is often inefficient and subject to bias. Quantitative metrics offer greater objectivity but are largely retrospective and provide limited foresight or explanatory insight. This study proposes a novel framework for evaluating the innovativeness of stem cellscientific and technological papers using the DeepSeek model. Focusing on a corpus of stem cell research articles, the titles and abstracts of representative papers were vectorized using the bge-large-en-v1.5 model to construct a semantic vector database. Subsequently, the deepseek-reasoner model was applied to extract innovation-related features, which were organized into a vectorized innovation feature database. The two databases were subsequently integrated using a weighted fusion strategy. Target papers were then evaluated through FAISS-based vector retrieval and Top-k similarity matching within the unified database, resulting in a final innovativeness score and ranking. The results were rigorously validated against scores generated by the unassisted DeepSeek model to assess the framework’s effectiveness in evaluating innovativeness in biomedical scientific and technological papers. Empirical results indicate that the DeepSeek model tends to overestimate innovation when used without calibration. However, after targeted training, the model exhibits substantially improved stability and validity in innovation assessment, highlighting its strong potential for identifying innovative dimensions and distinguishing features in scientific literature.

Key words: generative large language models, innovation assessment, semantic embeddings, automated evaluation, peer review

中图分类号: 

  • 引用本文

    马纯健, 王超, 许海云, 王乐康, 张鑫, 陈亮. DeepSeek对干细胞领域科技论文的创新性评价研究[J]. 山东科学, 0, (): 1-.

    MA Chunjian, Wang Chao, XU Haiyun, WANG Lekang, ZHANG Xin, CHEN Liang. Evaluation of innovativeness instem cell research articles using DeepSeek[J]. Shandong Science, 0, (): 1-.

开放获取 本文遵循知识共享-署名-非商业性4.0国际许可协议(CC BY-NC 4.0),允许第三方对本刊发表的论文自由共享(即在任何媒介以任何形式复制、发行原文)、演绎(即修改、转换或以原文为基础进行创作),必须给出适当的署名,提供指向本文许可协议的链接,同时表明是否对原文作了修改,不得将本文用于商业目的。CC BY-NC 4.0许可协议详情请访问 https://creativecommons.org/licenses/by-nc/4.0