基于自注意力机制的单核苷酸无义突变致病性预测

PATHOGENICITY PREDICTION OF SINGLE NUCLEOTIDE NONSENSE MUTATIONS BASED ON SELF-ATTENTION

  • 摘要: 基因序列上的单核苷酸无义突变会对下游序列造成严重影响,为此,提出一种基于自注意力机制的深度学习模型PON-NS来预测DNA单核苷酸无义突变的致病性。从ClinVar与VariSNP数据库中筛选无义突变数据,构建全新的无义突变数据集;通过Transformer中的自注意力机制学习突变前后突变位点上下文序列中的隐藏特征,并结合序列衍生特征进行预测。与现有方法相比,PON-NS在盲测中取得了更优的性能,ACC、AUC和MCC分别达到了0.920、0.950与0.842。特别地,在ExAC验证集上,PON-NS比同样基于DNA层面预测的DDIG-in方法降低了39.7%的误报率。

     

    Abstract: Single nucleotide nonsense mutations in gene sequences can have severe effects on downstream sequences, in order to solve the problem, a deep learning model based on self-attention is proposed to predict the pathogenicity of single nucleotide nonsense mutations, and named PON-NS. A novel nonsense mutation dataset was constructed by filtering nonsense mutations from ClinVar and VariSNP. The hidden features in the contextual sequence of the mutated location before and after the mutation were learned by the self-attention mechanism in Transformer and combined with sequence-derived features for prediction. Compared with existing methods, PON-NS achieved better performance in blind testing, with ACC, AUC and MCC respectively reaching 0.920, 0.950 and 0.842. In particular, PON-NS reduced the false positive rate by 39.7% in the ExAC validation set compared with the DDIG-in method, which was also based on DNA level prediction.

     

/

返回文章
返回