Study Finds 75% of AI Models Fail at Real Programming Tasks

Neural networks still cannot replace human software developers—this is the conclusion of a study by Alibaba Group and Sun Yat-sen University in Guangzhou.

Researchers tested AI agents on 100 real-world codebases that had been maintained for 233 days.

The neural networks were required not merely to complete a one-off task—as they are usually tested—but to sustain the long-term evolution of a codebase, adding new features without breaking existing ones.

Seventy-five percent of the neural networks failed to cope with this task. The models, it turned out, tend to accumulate technical problems, produce “fragile” code, and sacrifice quality in pursuit of quick results.

Study Finds 75% of AI Models Fail at Real Programming Tasks

Models Produce Unreliable Code and Sacrifice Quality for Faster Results