Abstract:
The current vulnerability assessment methods based on natural language processing (NLP) have the problem of concept drift. The reason is that the assessment of invisible software vulnerabilities over time lacks proper handling of new terms. To perform an automatic software vulnerability evaluation with conceptual drift using the software vulnerability description, a method combining character and word features is proposed. This method was used to predict 7 vulnerability characteristics, the best model for each vulnerability characteristics were selected from natural language processing representations and machine learning models using time-based cross-validation methods. Experimental results show that it can effectively solve the problem of concept drift. Compared with the word-only method, its accuracy and macro F1-score are improved by 1.7%, and the weighted F1-score is increased by 1.3%, which is more competitive.