posted 7th March 2024
In an era where software development is experiencing unprecedented acceleration, thanks in large part to the advent of artificial intelligence (AI) and large language models (LLMs), a surprising trend has emerged. Industry reports indicate a dramatic drop in the demand for IT software testing roles, with a 63% decrease this year compared to last. At first glance, this trend appears counterintuitive.
One might expect that as AI and LLMs enhance productivity, generating more code at faster rates, the need for thorough testing would proportionally increase. However, a deeper dive into this phenomenon reveals several plausible explanations for this paradox.
The landscape of software testing is rapidly evolving, marked by the significant rise of automated testing and the burgeoning influence of machine learning (ML) and large language models (LLMs). These advancements are not just reshaping our tools and techniques but also how we conceptualize the very nature of testing in a software development lifecycle.
The Rise of Automated Testing
Industry reports indicate a 63% drop in the demand for IT software testing roles from this year to last.
Automated testing has long since transitioned from a luxury to a necessity in most development pipelines. It offers unmatched speed, reliability, and efficiency, enabling teams to execute thousands of tests with the push of a button that would otherwise take humans days or even weeks to complete.
This shift towards automation is driven by the need to meet the market's demand for faster development cycles and higher quality standards.
However, with the integration of ML and LLMs into software products, automated testing faces new challenges. These technologies introduce non-deterministic outcomes, meaning the results of a given input can vary, making traditional testing approaches less effective. Therefore, our testing strategies must evolve to accommodate these nuances, ensuring that they can handle the unpredictability inherent in ML-based applications.
Testing in the Age of Machine Learning and LLMs
ML and LLMs have shown remarkable capabilities in understanding and generating human-like text, which naturally extends to writing code. Tools powered by these technologies can significantly accelerate development by suggesting code, completing chunks of programming tasks, and even identifying potential fixes for bugs. The allure is undeniable: faster development cycles, reduced workload for developers, and potentially even creativity in solving complex problems.
The convenience of using LLMs for both writing and testing code might seem efficient, but it introduces a critical concern: bias. An LLM trained on a particular dataset will carry the inherent biases and limitations of that data. Using the same model for testing the code it generated can lead to a lack of diversity in testing scenarios, potentially missing edge cases or bugs that a human or a differently trained system might catch.
Furthermore, the non-deterministic nature of some AI outputs can make it challenging to establish a single source of truth for what the correct behaviour should be. This ambiguity complicates testing, as the criteria for success might not be as clear-cut as with traditional software.
Testing software that incorporates ML and LLMs requires a different mindset.
The convenience of using LLMs for both writing and testing code might seem efficient, but it introduces a critical concern: bias.
Traditional tests assume deterministic outcomes: if you input X, you expect Y every time. With ML, however, the same input might yield slightly different outputs on different occasions, based on the model's learning and adaptation over time. This necessitates a shift towards probabilistic testing, where the focus is on whether the output falls within an acceptable range of expected outcomes, rather than matching a single, specific result.
Furthermore, we've seen the emergence of ML & LLMOps that focus on bringing lifecycle management to models given their complexity, size and critical nature of the applications they power. Part of this best practice includes continuous monitoring of the performance of these models to detect and address issues such as drift in model accuracy.
A blog in its own right, there is a wide spectrum of testing techniques aimed at addressing this from data drift detection using statistical and distribution metrics, outlier and anomaly detection, A/B testing all the way through to using ML models to monitor the results that are designed to detect meaningful variations in output.
The non-deterministic nature of some AI outputs can make it challenging to establish a single source of truth for what the correct behaviour should be.
Best Practices for Balancing Automation and Assurance
To harness the benefits of ML and LLMs in software development while mitigating risks, we see several best practices being adopted by our clients.
- Diverse Testing Strategies: Employ a mix of AI-generated and human-written tests. This approach combines the efficiency and innovative testing scenarios that AI can provide with the critical thinking and experience of human testers to ensure comprehensive coverage, especially for edge cases.
- Independent Verification Tools: Tools like Diffblue Cover and Facebook's Infer offer automated ways to generate test cases or find bugs by analyzing code. These tools, built on principles different from those of the LLMs generating the code, can provide an independent check on the quality and security of the AI-written software.
- Human Oversight: While automation can accelerate development, human oversight remains indispensable. Code reviews, especially for critical or sensitive parts of the application, should involve experienced developers who can evaluate the code from perspectives that AI might not consider, such as long-term maintainability, architectural consistency, and security implications.
- Continuous Learning and Adaptation: Just as ML models learn and evolve, so too should our approach to using them in software development. Collecting data on the effectiveness of AI-generated code and tests can help refine the models, improve testing strategies, and identify areas where human intervention is most valuable.
- Ethical and Responsible AI Use: Ensure that the use of AI in development processes adheres to ethical guidelines and responsible AI practices. This includes transparency about the use of AI-generated code, acknowledgement of its limitations, and measures to prevent bias and ensure fairness.
Conclusion
The integration of ML and LLMs into the software development and testing process is not just a technological advancement; it's a paradigm shift. While these technologies offer the promise of accelerated development and innovative approaches to coding, they also introduce complexities that require careful management. By adopting a balanced approach that leverages the strengths of AI while incorporating human expertise and independent verification tools, we can navigate these challenges. This strategy ensures that we not only keep pace with technological advancements but also maintain the high standards of reliability, security, and quality that software users deserve.
In the journey toward the future of software development, striking the right balance between automation and human oversight will be key to harnessing the full potential of ML and LLMs while safeguarding against their pitfalls. The revenue lost from a bad release will almost always far outweigh the money you might have saved by cutting back on human oversight.
So, is the need for software testing dropping off? We don't think so. We think the data is misleading and that software testing is required more than ever. What's more, just as with previous evolutions in our industry, the role is morphing and while the continued shift left for traditional development remains, ever-increasing for ML/LLM-powered apps is the need to "test in production", something we used to joke about 20 years ago.
The Difference Engine is a recruitment and executive search firm specialising in technology, operating globally.