基于检索增强生成技术的规章制度问答大语言模型的构建

REGULATORY DOCUMENT QUESTION-ANSWERING LARGE LANGUAGE MODEL BASED ON RETRIEVAL-AUGMENTED GENERATION TECHNOLOGY

  • 摘要: 繁多的规章制度对师生造成了困扰,难以发挥其应有的作用,针对这一需求构建规章问答大语言模型。采用检索增强生成技术,收集并组建校园规章制度知识库,构建检索器与生成器,实现校园规章制度的垂直领域大模型。构建评测实验数据集进行评测,回答相似得分0.9221,回答相关度得分0.8060,回答正确性达0.6006,研究模型分别相较于基座模型高出0.0271分、0.0868分、0.1137分。研究模型有效缓解了基座模型在垂直领域中的垂直领域语义理解、无效回答、事实性回答错误和误判,对推进高校规章制度的研究以及人性化的交互式回答具有重要意义,为促进高校数字化转型、高校管理智能化建设提供了创新方式。

     

    Abstract: The numerous regulations and rules have caused confusion among faculty and students, making it difficult for them to fulfill their intended purpose. To address this need, a regulatory document question-answering large language model is developed. By using retrieval-augmented generation (RAG) technology, a campus regulations knowledge base was collected and constructed. A retriever and generator were built to implement a vertical domain model for campus regulations. Evaluation datasets were created for assessment, achieving a semantic similarity score of 0.922 1, answer relevance score of 0.806 0, and answer correctness of 0.600 6. The research model outperformed the baseline model by 0.027 1, 0.086 8, and 0.113 7 points, respectively. The research model effectively mitigated issues in vertical domains, such as domain-specific semantic comprehension, ineffective responses, and factual inaccuracies/ hallucinations in the base model. This research is of significant importance for advancing the study of university regulations governance and humanized interactive answers. It provides an innovative approach for promoting the digital transformation and intelligent management of universities.

     

/

返回文章
返回