Developers: Pick Your LLM Carefully

March 25, 2024

Peter Schneider
Qt Group

Software developers probably don't need to worry as much as they think about GenAI taking their jobs. But they do need to think twice about which language model they use. In fact, the Large Language Model (LLM) space is seeing something of a code generation arms race.

How do you know which one's right for you?

The size of your LLM matters

The hope behind LLMs is that they might help transform coders into architects. I say "hope" because mainstream models like GPT-4 can barely solve 5% of real-world development issues.

My own personal experience with chatbots for AI-assisted coding has been a frustrating endeavour. From imagining fake variables to concepts that were deprecated a decade ago, there's a lot of nonsense that might go unnoticed by the untrained eye. Even a meticulous amount of "prompt engineering" can sometimes only do so much. There's a sweet spot to how much context actually helps before it just creates more confused and random results at the cost of more processing power.

The pool that mainstream LLMs draw data from has typically been too large, which should be a huge concern for developers and organisations, and not just out of concern for quality. It's about trust. If the LLM you're using functions like a digital vacuum cleaner, without telling you where it's sourcing data from, that's a problem. You don't want to ship a product, only to then find out that a chunk of the code you generated is actually from another organization's copyrighted code. Even a small bit of code of code that's been accidentally generated by a LLM as a copy of the training data could land a company in extremely hot legal waters.

Want to use an LLM for coding? Use one that was built for coding

We're finally seeing LLMs from both Big Tech and small tech players that clearly demonstrate an effort to acknowledge the challenge developers face with AI-generated coding. Some are even trained on billions of tokens that pertain to specific languages like Python.

It's an exciting hint at where LLMs could yet go in terms of hyper-specialised relevancy to coders. Looking more broadly at LLMs beyond code generation, we're seeing models as small as two billion parameters — so small you can run them locally on a laptop. Such granular fine tuning is great, but based on how some developers are responding to some market offerings, we need even more fine tuning. Ask developers about their pet peeves for LLMs and you'll still hear a familiar pattern: complicated prompt formats, strict guardrails, and hallucinations — a reminder that any model is only as good as the data it's trained on.

Still, this tailored approach has drawn important attention to the fact that large language models are not the only way to succeed in AI-assisted code generation. There's more momentum than ever for smaller LLMs that focus exclusively on coding. Some are better at certain tasks than others, but if you want safety, go small. If you're just programming in C++, do you need extraneous "guff" knowledge on German folklore like, "who was the Pied Piper of Hamelin?" When you have a small data pool, it's easier for data to stay relevant, cheaper to train the model, and you're also far less likely to accidentally use another company's copyrighted data.

Research all your LLM options thoroughly, because there will no doubt be even more choice next year, and even more than that in five years. Don't pick what's popular because it's popular.

Development Means More Than Just Coding

Unless models reach an accuracy of coding answers within a 98-100% margin of error, I don't suspect GenAI will wholly replace humans for coding. But if it did, some are questioning whether software engineers will transition into becoming "code reviewers" who simply verify AI-generated code instead of writing it.

Would they, though? They might if an organization has poor internal risk control processes. Good risk control involves using the four-eyes principle, which says that any activity of material risk (like shipping software) should be reviewed and double-checked by a second, independent, and competent individual. For the time being at least, I think we're a long way off from AI being reclassified as an independent and competent lifeform.

There's also the fact that end-to-end development, and things like building Human-Machine Interfaces, involve so much more than just coding. LLMs can respectably interact with text and elements in an image, with more tools popping up that can convert web designs into frontend code. But AI single-handedly assuming competent control of design that relates to graphical and UI/UX workflows? That's much harder than coding, though perhaps not impossible. And coding is one part of development. The rest is investing in something novel, figuring out who the audience is, translating ideas into something buildable, and polishing. That's where the human element comes in.

Regardless of how good LLMs ever get, every programmer should always treat every code like it's their own. Always do the peer review and ask your colleague, "is my good code?" Blind trust gets you nowhere.

Peter Schneider is Senior Product Manager at Qt Group

Industry News

GitHub Copilot Free Released

January 06, 2025

GitHub announced GitHub Copilot Free.

Veracode Acquires Phylum

January 06, 2025

Veracode acquired certain assets of Phylum, including its malicious package analysis, detection, and mitigation technology.

Haveli Investments Completes Acquisition of AppViewX

January 06, 2025

AppViewX announced the completion of its acquisition by Haveli Investments.

Check Point Software Recognized as a Leader in Email Security in Inaugural Gartner Magic Quadrant for Email Security Platforms

December 19, 2024

Check Point® Software Technologies Ltd. has been recognized as a Leader in the 2024 Gartner® Magic Quadrant™ for Email Security Platforms (ESP).

Progress ShareFile Selected as Latest Addition to American Institute of CPAs Member Discount Program, Offering Benefits for AICPA Members

December 19, 2024

Progress announced its partnership with the American Institute of CPAs (AICPA), the world’s largest member association representing the CPA profession.

Kurrent Enterprise Edition Released

December 18, 2024

Kurrent announced $12 million in funding, its rebrand from Event Store and the official launch of Kurrent Enterprise Edition, now commercially available.

Blitzy Platform Released

December 18, 2024

Blitzy announced the launch of the Blitzy Platform, a category-defining agentic platform that accelerates software development for enterprises by autonomously batch building up to 80% of software applications.

Sonata Software Launches IntellQA

December 17, 2024

Sonata Software launched IntellQA, a Harmoni.AI powered testing automation and acceleration platform designed to transform software delivery for global enterprises.

Sonar to Acquire Tidelift

December 17, 2024

Sonar signed a definitive agreement to acquire Tidelift, a provider of software supply chain security solutions that help organizations manage the risk of open source software.

Kindo Launches Channel Partner Program

December 17, 2024

Kindo formally launched its channel partner program.

Red Hat Enterprise Linux AI 1.3 Released

December 16, 2024

Red Hat announced the latest release of Red Hat Enterprise Linux AI (RHEL AI), Red Hat’s foundation model platform for more seamlessly developing, testing and running generative artificial intelligence (gen AI) models for enterprise applications.

Fastly AI Accelerator Released

December 16, 2024

Fastly announced the general availability of Fastly AI Accelerator.

Amazon Q Developer Plugins for AWS Management Console Released

December 12, 2024

Amazon Web Services (AWS) announced the launch and general availability of Amazon Q Developer plugins for Datadog and Wiz in the AWS Management Console.

vFunction Releases New Functionality

December 12, 2024

vFunction released new capabilities that solve a major microservices headache for development teams – keeping documentation current as systems evolve – and make it simpler to manage and remediate tech debt.

Check Point Infinity XDR/XPR Achieves 100% Detection Rate in 2024 MITRE ATT&CK Evaluations

December 11, 2024

Check Point® Software Technologies Ltd. announced that Infinity XDR/XPR achieved a 100% detection rate in the rigorous 2024 MITRE ATT&CK® Evaluations.

DEVOPSdigest

The size of your LLM matters

Want to use an LLM for coding? Use one that was built for coding

Development Means More Than Just Coding

Industry News

Upcoming Webinars

On-Demand Webinars

Analyst Reports

White Papers

Media Partners

The Latest

Hot Topics

The size of your LLM matters

Want to use an LLM for coding? Use one that was built for coding

Development Means More Than Just Coding

Related Links

Industry News

Search form

Upcoming Webinars

On-Demand Webinars

Analyst Reports

White Papers

Media Partners

User login

The Latest

Hot Topics