
How Effective Are Neural Networks for Fixing Security Vulnerabilities
Security vulnerability repair is a difficult task that is in dire need of automation. Two groups of techniques have shown promise:
- large code language models (LLMs) that have been pre-trained on source code for tasks such as code completion, and
- automated program repair (APR) techniques that use deep learning (DL) models to automatically fix software bugs.
Findings:
- Existing LLMs and APR models fix very few Java vulnerabilities. Codex fixes 10.2 (20.4%), the most number of vulnerabilities.
- Fine-tuning with general APR data improves LLMs' vulnerability-fixing capabilities.
- New VJBench reveals that LLMs and APR models fail to fix many CWE types, such as CWE-325 Missing cryptographic step and CWE-444 HTTP request smuggling.
- Codex still fixes 8.3 transformed vulnerabilities, outperforming all the other LLMs and APR models on transformed vulnerabilities.
Security vulnerability repair is a difficult task that is in dire need of automation. Two groups of techniques have shown promise:
- large code language models (LLMs) that have been pre-trained on source code for tasks such as code completion, and
- automated program repair (APR) techniques that use deep learning (DL) models to automatically fix software bugs.
Findings:
- Existing LLMs and APR models fix very few Java vulnerabilities. Codex fixes 10.2 (20.4%), the most number of vulnerabilities.
- Fine-tuning with general APR data improves LLMs' vulnerability-fixing capabilities.
- New VJBench reveals that LLMs and APR models fail to fix many CWE types, such as CWE-325 Missing cryptographic step and CWE-444 HTTP request smuggling.
- Codex still fixes 8.3 transformed vulnerabilities, outperforming all the other LLMs and APR models on transformed vulnerabilities.