Abstract:
Single nucleotide nonsense mutations in gene sequences can have severe effects on downstream sequences, in order to solve the problem, a deep learning model based on self-attention is proposed to predict the pathogenicity of single nucleotide nonsense mutations, and named PON-NS. A novel nonsense mutation dataset was constructed by filtering nonsense mutations from ClinVar and VariSNP. The hidden features in the contextual sequence of the mutated location before and after the mutation were learned by the self-attention mechanism in Transformer and combined with sequence-derived features for prediction. Compared with existing methods, PON-NS achieved better performance in blind testing, with ACC, AUC and MCC respectively reaching 0.920, 0.950 and 0.842. In particular, PON-NS reduced the false positive rate by 39.7% in the ExAC validation set compared with the DDIG-in method, which was also based on DNA level prediction.