查询结果:   常炳国,刘清星.基于深度学习的慢性肝病CT报告相似度分析[J].计算机应用与软件,2018,35(8):289 - 294,302.
中文标题
基于深度学习的慢性肝病CT报告相似度分析
发表栏目
算法
摘要点击数
1143
英文标题
SIMILARITY ANALYSIS OF CT REPORT OF CHRONIC LIVER DISEASES BASED ON DEEP LEARNING
作 者
常炳国 刘清星 Chang Bingguo Liu Qingxing
作者单位
湖南大学信息科学与工程学院 湖南 长沙 410082     
英文单位
College of Information Science and Engineering, Hunan University, Changsha 410082, Hunan, China     
关键词
慢性肝病 CT报告 深度学习 分词算法 相似度计算
Keywords
Chronic liver disease CT report Deep learning Word segmentation algorithm Similarity calculation
基金项目
湖南省重点研发计划项目(2016GK2050)
作者资料
常炳国,副教授,主研领域:医疗大数据处理,全息系统,机器学习,数据挖掘。刘清星,硕士。 。
文章摘要
肝部CT检查是诊断慢性肝病的必要措施。通常,CT报告由影像所见描述和根据所见给出的诊断建议结果两部分组成。研究肝CT报告影像所见描述文本的相似度,辅助医生在给出新的CT诊断建议结果时参考历史上相似度最高的相应CT报告诊断结论。在研究慢性肝病医学词库基础上,运用网络爬虫技术获取相关网站医学词汇及自定义的否定词汇表,构建了包含约6 000个医学词汇的慢性肝病CT报告分词词库。运用基于词库与最大匹配规则相结合的分词算法,对肝CT报告文本进行分词处理。利用Doc2Vec深度学习算法获取CT报告文本分词表的句向量。通过计算句向量之间的余弦值得出CT报告文本相似度,选择历史CT报告文本中相似度大于阈值的报告用于医生进行参考。整理分析了6 900份真实的影像科检查报告,基于自定义词库及改进的分词算法,分词准确率达到87%。通过与基于TF-IDF的统计算法和基于隐含狄利克雷主题模型(LDA)算法进行对比分析,采用的算法获得的相似文本的平均准确率更高。
Abstract
Liver CT examination is the necessary measure to diagnose chronic liver disease. In general, the CT report consists of two parts: the description of the image and the results of the diagnostic recommendations given. To study the similarity of the descriptive text seen in the liver CT report image, the adjuvant physician, when giving the results of the new CT diagnosis recommendations, refers to the corresponding CT findings of the highest similarity in history. Based on the study of chronic liver disease medical thesaurus, this paper used the web crawler technology to obtain the related website medical vocabulary and the custom negative vocabulary, and constructed the CT word segmentation of chronic liver disease with about 6 000 medical words. The segmentation algorithm based on the combination of the thesaurus and the maximum matching rule was used to segment the liver CT report text. We used the Doc2Vec depth learning algorithm to obtain the sentence vector of the CT report text. We got the similarity of the CT report text by calculating the cosine between the sentence vectors, and selected the report in the history CT report that the similarity was greater than the set threshold for the physician to refer to. The paper analyzed 6 900 real imaging reports, and the accuracy of word segmentation reached 87% based on the definition dictionary and improved word segmentation algorithm. Based on the TF-IDF-based statistical algorithm and the Latent Dirichlet Allocation (LDA) model algorithm, the average accuracy of the similar text obtained by the algorithm is higher.
下载PDF全文