Evaluating the Quality of GPT-4 Generated Unit Tests for JavaScript

João Vítor Santana Depollo

This research presents a focused evaluation of the quality of unit tests generated by the OpenAI GPT-4 model for the JavaScript programming language. The study is conducted using Unit Cloud Gen, a platform developed to automate the generation and analysis of tests. The work concentrates on a corpus of problems from LeetCode, specifically chosen to assess the LLM’s performance on code of varying complexity, with the objective of identifying where the model succeeds and where it fails. The evaluation goes beyond traditional code coverage, incorporating metrics such as fault detection capability, readability, and maintainability. The tests are executed in isolated Docker containers, and the results are stored for analysis and benchmarking. This study aims to identify the strengths and limitations of GPT-4 in generating tests for algorithmic problems, providing a foundation for improving the reliability and efficiency of AI-assisted test generation. generation.

2025/2 - MSI2

Orientador: Eduardo Figueiredo

Palavras-chave: unit testing, software quality, test evaluation,large language models,automated test generation

PDF Disponível