DeepSeek对干细胞领域科技论文的创新性评价研究

doi:10.3976/j.issn.1002-4026.2025172

摘要/Abstract

摘要： 现有科技论文创新性评价方法主要依赖专家人工评审，效率低且主观性强，或者依赖滞后的量化指标，缺乏前瞻性与解释力。本文提出一种基于DeepSeek模型的干细胞领域科技论文创新性评价框架。首先，以干细胞领域的科技论文为实验对象；其次，调用bge-large-en-v1.5模型对最具代表性论文集的论文标题与摘要进行向量化，构建文本语义向量数据库，并且利用deepseek-reasoner模型抽取创新性特征以组建向量化的创新特征数据库，并将两数据库进行加权融合；最后，将目标论文放入融合后的两数据库，经FAISS向量检索与Top-k相似度匹配后，生成最终创新性打分等级，并和DeepSeek模型在无干预情况下的打分结果进行比较验证，以此检验DeepSeek模型对生物医学领域的科技论文进行创新性评价的效果。实证结果显示，DeepSeek模型在创新性评价中存在整体打分偏高的倾向，在经过训练之后，该模型可以有效提高创新性评价的稳定性与有效性，体现出在识别论文创新维度与差异特征方面具有较高潜力。

关键词: 生成式大语言模型, 创新性评价, 语义嵌入, 自动化评价, 专家评审

Abstract: Current approaches to assessing the innovativeness of scientific and technological papers rely predominantly on expert peer review, a process that is often inefficient and subject to bias. Quantitative metrics offer greater objectivity but are largely retrospective and provide limited foresight or explanatory insight. This study proposes a novel framework for evaluating the innovativeness of stem cellscientific and technological papers using the DeepSeek model. Focusing on a corpus of stem cell research articles, the titles and abstracts of representative papers were vectorized using the bge-large-en-v1.5 model to construct a semantic vector database. Subsequently, the deepseek-reasoner model was applied to extract innovation-related features, which were organized into a vectorized innovation feature database. The two databases were subsequently integrated using a weighted fusion strategy. Target papers were then evaluated through FAISS-based vector retrieval and Top-k similarity matching within the unified database, resulting in a final innovativeness score and ranking. The results were rigorously validated against scores generated by the unassisted DeepSeek model to assess the framework’s effectiveness in evaluating innovativeness in biomedical scientific and technological papers. Empirical results indicate that the DeepSeek model tends to overestimate innovation when used without calibration. However, after targeted training, the model exhibits substantially improved stability and validity in innovation assessment, highlighting its strong potential for identifying innovative dimensions and distinguishing features in scientific literature.

Key words: generative large language models, innovation assessment, semantic embeddings, automated evaluation, peer review

中图分类号:

引用本文

马纯健, 王超, 许海云, 王乐康, 张鑫, 陈亮. DeepSeek对干细胞领域科技论文的创新性评价研究[J]. 山东科学, 0, (): 1-.

MA Chunjian, Wang Chao, XU Haiyun, WANG Lekang, ZHANG Xin, CHEN Liang. Evaluation of innovativeness instem cell research articles using DeepSeek[J]. Shandong Science, 0, (): 1-.

参考文献

[1] 张光耀, 谢维熙, 姜春林, 等. 科学计量视角下的论文同行评议研究综述[J]. 图书情报工作, 2022, 66(14): 137-149. DOI:10.13266/j.issn.0252-3116.2022.14.014.

[2] 王丽丽, 王银宏, 杨永强, 等. 国内外英文科技期刊同行评议的方法与质量控制研究[J]. 编辑学报, 2024, 36(S2): 37-43.

[3]Thelwall M. In which fields can ChatGPT detect journal article quality? An evaluation of REF2021 results[J]. Journal of Data and Information Science, 2025, 13(1): 1. DOI:10.2478/jdis-2025-0001.

[4] UZZI B, MUKHERJEE S, STRINGER M, et al. Atypical combinations and scientific impact[J]. Science, 2013, 342(6157): 468-472. DOI:10.1126/science.1240474.

[5] 宋歌.科研成果创新力指标S指数的设计与实证[J].图书情报工作, 2016, 60(5): 77-86. DOI:10.13266/j.issn.0252-3116.2016.05.012.

[6] Liu Hua, Dai Ling, Jiang Haozhe. Applied with caution: Extremescenario testing reveals significant risks in using LLMs for humanities and social sciences paper evaluation[J]. Applied Sciences, 2025, 15(19): 10696. DOI:10.3390/app151910696.

[7] Li Junyi, Chen Jie, Ren Ruiyang, et al. The dawn after the dark: An empirical study on factuality hallucination in large language models[C]//Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Bangkok, Thailand. Stroudsburg, PA, USA: ACL, 2024: 10879-10899.. DOI:10.18653/v1/2024.acl-long.586.

[8] Falk Delgado A, Garretson G, Falk Delgado A. The language of peer review reports on articles published in the BMJ, 2014–2017: An observational study[J]. Scientometrics, 2019, 120(3): 1225-1235. DOI:10.1007/s11192-019-03160-6.

[9] Zou Huang, Tang Xinhua, Xie Bin, et al. Sentiment classification using machine learning techniques with syntax features[C]//2015 International Conference on Computational Science and Computational Intelligence (CSCI). Las Vegas, NV, USA. IEEE, 2016: 175-179. DOI:10.1109/CSCI.2015.44.

[10] Han Ruxue, Zhou Haomin, Zhong Jiangtao, et al. Aspect-based sentiment evolution and its correlation with review rounds in multi-round peer reviews: A deep learning approach[J]. Data and Information Management, 2026, 10(1): 100105. DOI:10.1016/j.dim.2025.100105.

[11] 涂子依, 周凯静, 孙梦婷, 等. 打开同行评议的“黑匣子”：专家评审行为特征分析[J]. 图书馆论坛, 2024, 44(10): 131-142.DOI:10.3969/j.issn.10021167.2024.10.014.

[12] 颜兆萍, 石进. 开放同行评议背景下评审意见质量分析：以ICLR会议为例[J]. 图书馆建设,2025(5): 71-81. DOI:10.19764/j.cnki.tsgjs.20241379.

[13] Xu Yejun, Li K W, Wang Huimin. Distance-based consensus models for fuzzy and multiplicative preference relations[J]. Information Sciences, 2013, 253: 56-73. DOI:10.1016/j.ins.2013.08.029.

[14] LyonsWarren A M, Aamodt W W, Pieper K M, et al. A structured, journal-led peer-review mentoring program enhances peer review training[J]. Research Integrity and Peer Review, 2024, 9(1): 3. DOI:10.1186/s41073-024-00143-x.

[15] Aczel B, Szaszi B, Holcombe A O. A billion-dollar donation: Estimating the cost of researchers’time spent on peer review[J]. Research Integrity and Peer Review, 2021, 6(1): 14.DOI:10.1186/s41073-021-00118-2.

[16] 阎雅娜,聂兰渤,王静.单篇文献的引文计量指标与Altmetrics的比较分析——以ESI的HotPapers为例[J].图书馆杂志, 2018, 37(3): 100-107.DOI:10.13663/j.cnki.lj.2018.03.015.

[17] 赵勇.期刊共引分析及可视化实证研究——以图书情报学研究为例[J].图书与情报, 2009(3): 89-94.DOI:10.3969/j.issn.1003-6938.2009.03.021.

[18] 俞立平,张矿伟.学术期刊影响速度、加速度与影响强度研究——以CSSCI经济学期刊为例[J].图书馆杂志, 2021, 40(1): 93-103. DOI:10.13663/j.cnki.lj.2021.01.012.

[19] 林松,张娅彭,张维维,等.科技期刊审稿人推荐作者引用文献的动因分析[J].编辑学报, 2018, 30(4): 358-361.DOI:10.16811/j.cnki.1001-4314.2018.04.006.

[20] 杨素娟.科技项目立项同行评议评审专家反评价体系构建研究[D].沈阳:沈阳理工大学,2009.

[21] Wang Jian, Veugelers R, Stephan P. Bias against novelty in science: A cautionary tale for users of bibliometric indicators[J]. Research Policy, 2017, 46(8): 1416-1436. DOI:10.1016/j.respol.2017.06.006.

[22] 逯万辉,谭宗颖.学术成果主题新颖性测度方法研究——基于Doc2Vec和HMM算法[J].数据分析与知识发现, 2018, 2(3): 22-29.DOI:10.11925/infotech.2096-3467.2017.1012.

[23] Zhang Yi, Tsai F S. Chinese novelty mining[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing Volume 3 - EMNLP '09. Singapore. Morristown, NJ, USA: ACL, 2009: ■-■.DOI:10.3115/1699648.1699703.. DOI:10.3115/1699648.1699703.

[24] 沈律.科技创新的一般均衡理论——关于科技成果创新度评价的科学计量学分析[J].科学学研究, 2003, 21(2): 205-209. DOI:10.16192/j.cnki.1003-2053.2003.02.020.

[25] 沈阳.一种基于关键词的创新度评价方法[J].情报理论与实践, 2007, 30(1): 125-127. DOI:10.16353/j.cnki.1000-7490.2007.01.034.

[26] 许丹,徐爽,陈斯斯,等.基于自然语言词对法的文献主题新颖性探测研究[J].图书情报工作, 2018, 62(8): 130-138.DOI:10.13266/j.issn.0252-3116.2018.08.017.

[27] 阮光册,夏磊.基于Doc2Vec的期刊论文热点选题识别[J].情报理论与实践, 2019, 42(4): 107-111. DOI:10.16353/j.cnki.1000-7490.2019.04.019.

[28] Bommasani R, Hudson D A, Adeli E, et al. On the opportunities and risks of foundation models[PP/OL]. arXiv, ［2025-12-01］. http://arxiv.org/pdf/2108.07258.

[29] Bubeck S, Chandrasekaran V, Eldan R, et al. Sparks of artificial general intelligence: early experiments with GPT-4[A]. arXiv, 2025-12-01］.https://arxiv.org/pdf/2303.12712.

[30] 陆伟,刘家伟,马永强,等. ChatGPT为代表的大模型对信息资源管理的影响[J].图书情报知识, 2023, 40(2): 6-9. DOI:10.13366/j.dik.2023.02.006.

[31] Naddaf M. How are researchers using AI? Survey reveals pros and cons for science[J]. Nature, 2025: 02-04.DOI:10.1038/d41586-025-00343-5 DOI:10.1038/d41586-025-00343-5.

[32] Khalifa M, Albadawy M. Using artificial intelligence in academic writing and research: An essential productivity tool[J]. Computer Methods and Programs in Biomedicine Update, 2024, 5: 100145. DOI:10.1016/j.cmpbup.2024.100145.

[33] 王雅琪,曹树金. ChatGPT用于论文创新性评价的效果及可行性分析[J].情报资料工作, 2023, 44(5): 28-38.DOI:10.12154/j.qbzlgz.2023.05.003.

[34] Huang Shengzhi, Huang Yong, Liu Yinpeng, et al. Are large language models qualified reviewers in originality evaluation?[J]. Information Processing & Management, 2025, 62(3): 103973. DOI:10.1016/j.ipm.2024.103973.

[35] Li Dong, Jin Ruoming, Gao Jing, et al. On sampling top-K recommendation evaluation[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Virtual Event CA USA. ACM, 2020: 2114-2124. DOI:10.1145/3394486.3403262.. DOI:10.1145/3394486.3403262.

[36] Jurgens D, Kumar S, Hoover R, et al. Measuring the evolution of a scientific field through citation frames[J]. Transactions of the Association for Computational Linguistics, 2018, 6: 391-406. DOI:10.1162/tacl_a_00028.

[37] 时宗彬,朱丽雅,乐小虬.基于本地大语言模型和提示工程的材料信息抽取方法研究[J].数据分析与知识发现, 2024, 8(7): 23-31.DOI:10.11925/infotech.2096-3467.2023.1119.

[38] 魏绪秋,申力旭.学术论文创新性研究述评[J].图书情报知识, 2022, 39(4): 68-79.DOI:10.13366/j.dik.2022.04.068.

[39] LuSheng, Kuznetsov I, Gurevych I. Gurevych I. Identifying aspects in peer reviews[C]//Findings of the Association for Computational Linguistics: EMNLP 2025. Suzhou, China. ACL, 2025: 6145-6167. DOI:10.18653/v1/2025.findings-emnlp.326.

[40] Afzal O M, Nakov P, Hope T, et al. Beyond “not novel enough”: enriching scholarly critique with LLM-assisted feedback[A]. arXiv, ［2025-12-01］.http://arxiv.org/abs/2508.10795.

[41] Ginsburg S, Gingerich A, Kogan J R, et al. Idiosyncrasy in assessment comments: Do faculty have distinct writing styles when completing in-training evaluation reports?[J]. Academic Medicine, 2020, 95(11S): S81-S88. DOI:10.1097/acm.0000000000003643.

[42] Xiong L, Xiong C, Li Y, et al. Approximate nearest neighbor negative contrastive learning for dense text retrieval[A]. arXiv, ［2025-12-01］.http://arxiv.org/abs/2007.00808.2020.