How Effective Are Neural Networks for Fixing Security Vulnerabilities



Security vulnerability repair is a difficult task that is in dire need of automation. Two groups of techniques have shown promise:

- large code language models (LLMs) that have been pre-trained on source code for tasks such as code completion, and

- automated program repair (APR) techniques that use deep learning (DL) models to automatically fix software bugs.



Findings:

- Existing LLMs and APR models fix very few Java vulnerabilities. Codex fixes 10.2 (20.4%), the most number of vulnerabilities.

- Fine-tuning with general APR data improves LLMs' vulnerability-fixing capabilities.

- New VJBench reveals that LLMs and APR models fail to fix many CWE types, such as CWE-325 Missing cryptographic step and CWE-444 HTTP request smuggling.

- Codex still fixes 8.3 transformed vulnerabilities, outperforming all the other LLMs and APR models on transformed vulnerabilities.