test12

Analysis of LLMs in Debugging and Software Modification

This website serves as the official presentation of my thesis, which conducts an analysis of the capabilities of Large Language Models (LLMs) in real-world software bugfixing.

Objective

The core question of this project is: How do the bug-fixing capabilities of modern LLMs compare to those of human developers when applied to established, open-source software projects?

To answer this, I developed a semi-automated Python framework that systematically benchmarks LLM performance against human-written solutions. The tool creates a quality corpus of real-world bug fixes, provides the LLM with the same context a human developer would have (the bug report and relevant code), and then uses the project’s own test suite to objectively validate the correctness of the generated fix.

Project Resources

  • The Live Analysis Notebook: You can explore the full data analysis in the The Analysis section of this site. For an interactive, executable version, please use the “Open in Colab” badge at the top of the analysis page.

  • The Source Code: The complete source code for the analysis tool itself is hosted in a shared academic repository. You can view it here: