I’ve been involved in technology for a long time, so there’s little that excites me, much less surprises me. But shortly after Open AI’s ChatGPT was released, I was asked to write a WordPress plugin for my wife’s e-commerce site. When it worked, I was blown away.
That was the beginning of my deep dive into chatbots and AI-assisted programming. Since then, I have applied 10 large machine models (LLMs) to 4 real-world tests.
How to write using ChatGPT: Resume | Excel Formulas | Essay | Personal Statement
Unfortunately, not all chatbots are coded the same. It’s been 18 months since my first test, and even now, out of the 10 LLMs I tested, 5 still can’t create a working plugin.
In this article, I will show you how each LLM performed in my tests. There are two chatbots that I recommend, but they cost $20 a month. The free versions of the same chatbots work well enough that you can probably do without paying. However, the rest are not that good, whether free or paid. I would not risk my programming project until they improve their performance, and I would not recommend them.
Also: How to Test Your AI Chatbot’s Coding Skills – You Can Do It Too
I’ve written a lot about using AI to help with programming. AI can’t write entire apps or programs unless it’s a small, simple project like my wife’s plugin. But it’s good at writing a few lines and not bad at editing code.
Rather than repeating everything I’ve written, read this: How to write code using ChatGPT: What ChatGPT can and cannot do.
If you want to know why I chose a coding test and how it relates to this review of 10 LLMs, read this: How I Tested My AI Chatbot’s Coding Skills – So Can You, Too.
First, let’s compare the performance of chatbots.
Next, let’s look at each chatbot individually. The chart above shows 10 LLMs, but I will discuss 9 chatbots. The results for both GPT-4 and GPT-4o are included in ChatGPT Plus. Ready? Let’s get started.
- All tests passed
- Solid coding results
- Mac App
- hallucination
- No Windows app yet
- Sometimes they don’t cooperate
- Price: $20/month
- LLM: GPT-4o, GPT-4, GPT-3.5
- Desktop browser interface: Yes
- Dedicated Mac app: Yes
- Dedicated Windows app: None
- Multi-factor authentication: Yes
- Tests passed: 4/4
ChatGPT Plus with GPT-4 and GPT-4o passed all my tests. One of my favorite features is that it has a dedicated app. When testing web programming, I set up my browser as one, open my IDE, and run the ChatGPT Mac app on a separate screen.
Also: I used GPT-4o on coding tests and it passed all tests except for one weird result.
Additionally, you can conveniently connect Logitech’s Prompt Builder, which appears when you press the mouse button, to the upgraded GPT-4o and link it to your OpenAI account, allowing you to execute prompts with a simple tap of your thumb.
The only thing I didn’t like was that one of the GPT-4o tests came up with a double-choice answer, and one of the answers was wrong. I would have just given the correct answer. I still did a quick test to see which answer would work, but that was a bit annoying. Since GPT-4 didn’t have that problem, I now use the LLM setting when coding in ChatGPT.
- Multiple LLMs
- Search criteria are displayed
- Good sourcing
- Email Only Login
- No desktop app
- Price: $20/month
- LLM: GPT-4o, Claude 3.5 Sonnet, Sonar Large, Claude 3 Opus, Rama 3.1 405B
- Desktop browser interface: Yes
- Dedicated Mac app: None
- Dedicated Windows app: None
- Multi-factor authentication: None
- Tests passed: 4/4
I seriously considered Perplexity Pro as the best overall AI chatbot for coding, but it missed out on the top spot because of one thing: the login method. Perplexity doesn’t use usernames/passwords or passkeys, and there’s no multi-factor authentication. It just emails you a login pin. The AI also doesn’t have a separate desktop app like ChatGPT does on the Mac.
What sets Perplexity apart from other tools is that it can run multiple LLMs. You can’t set up an LLM for a given session, but you can easily go into settings and select the active model.
Also: Can Perplexity Pro help me with coding? I got a perfect score on my programming test thanks to GPT-4.
For programming, it is recommended to stick with GPT-4o, which ranked first in all tests. However, it may also be interesting to cross-check your code on different LLMs. For example, if you have written regular expression code on GPT-4o, you may want to consider switching to another LLM and seeing what that LLM thinks about the generated code.
As you can see below, most LLMs are not reliable, so don’t take the results as gospel. However, you can use the results to get more information to verify the original code. It’s like a kind of AI-based code review.
Don’t forget to switch back to GPT-4o.
- Instant adjustment
- No matter what you are doing, it can block you in the middle
- Price: Free
- LLM: GPT-4o, GPT-3.5
- Desktop browser interface: Yes
- Dedicated Mac app: Yes
- Dedicated Windows app: None
- Multi-factor authentication: Yes
- Passed tests: 3 out of 4 in GPT-3.5 mode
ChatGPT is free for everyone to use. Both the Plus and free versions support GPT-4o, which passed all my programming tests, but there are limitations when using the free app.
OpenAI treats free ChatGPT users as if they are sitting in the cheap seats. When traffic is high or the server is busy, free ChatGPT only provides GPT-3.5 to free users. The tool only allows a certain number of queries before it downgrades or shuts down.
Also: How to Use ChatGPT: What You Need to Know Now
While using the free version of ChatGPT, I realized several times that I was asking too many questions.
ChatGPT is a great tool, if you don’t mind it crashing sometimes. GPT-3.5 also outperformed all other chatbots in the test, and the only test it failed was against a pretty obscure programming tool made by an Australian programmer.
So if your budget is tight and you can wait until time is short, use ChatGPT for free.
- free
- Passed most tests
- Various research tools
- Limited to GPT-3.5
- Throttle prompt results
- Price: Free
- LLM: GPT-3.5
- Desktop browser interface: Yes
- Dedicated Mac app: None
- Dedicated Windows app: None
- Multi-factor authentication: None
- Tests passed: 3 out of 4
Here, we are threading a very thin needle, Perplexity AI’The free version is based on GPT-3.5, and the test results were noticeably better than other AI chatbots.
Also: 5 Reasons Why I Prefer Perplexity Over All Other AI Chatbots
From a programming standpoint, that’s pretty much it. But from a research and organizational standpoint, my ZDNET colleague Steven Vaughan-Nichols favors Perplexity over other AIs.
He likes how Perplexity provides more complete sources for research questions, cites sources, structures answers, and provides questions for further research.
So if you program but also do other research, consider the free version of Perplexity.
Chatbots to avoid for programming support
I tested 9 chatbots, and 4 passed most of my tests. The other chatbots, including a few that claimed to be great at programming, passed only one of my tests, and Microsoft’s Copilot did not pass any of them.
I’m mentioning them here because people will ask, and I’ve tested them thoroughly. Some of them work well for other tasks, so if you’re curious about how they perform, check out my more general reviews.
Meta AI
Meta AI is Facebook’s general-purpose AI. As you can see above, it failed three out of our four tests.
Also: How to get started with Meta AI on Facebook, Instagram, and more
The AI generated a nice user interface, but it had no functionality at all. And it found my annoying bug, which was a pretty serious challenge. Given the specific knowledge required to find bugs, I was surprised it choked on a simple regular expression challenge. But it did.
Meta Code Rama
Meta Code Llama is Facebook’s AI specifically designed for coding assistance. It can be downloaded and installed on your server. I tested it by running it on my Hugging Face AI instance.
Also: Can Meta AI write code? I tested it on Llama, Gemini, ChatGPT. It didn’t even come close.
Oddly enough, Meta AI and Meta Code Llama choked on three of my four tests, but choked on the other problems. There’s no guarantee that an AI will give the same answer twice, but this result was surprising. It remains to be seen whether this changes over time.
Claude 3.5 Sonnet
Anthropic claims that the 3.5 Sonnet version of their Claude AI chatbot is ideal for programming. After failing all but one of their tests, we can’t be so sure.
If you’re not using it for programming, Claude may be a better choice than the free version of ChatGPT.
Also: 4 Things Claude AI Can Do That ChatGPT Can’t
My ZDNET colleague Maria Diaz reports that Claude can process uploaded files, handle more words than the free version of ChatGPT, is about a year more up-to-date than GPT-3.5, and can access websites.
Gemini Advanced
Gemini Advanced is the $20 Pro version of Google’s Gemini (formerly Bard) chatbot. I expected this tool to do better than 1 out of 4. Interestingly, it passed one test that all the AIs except GPT-4/4o failed: knowledge of a fairly obscure programming language created by an Australian programmer.
Also: 3 Ways Gemini Advanced Outperforms Other AI Assistants, According to Google
So if you know the language, why can’t you handle basic regular expressions or other problems that first-year programming students face?
Microsoft Copilot
You’d think a company whose DNA says “Developer! Developer! Developer!” would have an AI that performs better on programming tests. Microsoft produces the best coding tools on the planet. And yet Copilot sucks.
Also: What are the different Copilots from Microsoft? Here’s how they differ and how to use them.
One positive is Microsoft. always We learn from our mistakes, so we’ll come back later and see if this result improves.
It’s just a matter of time.
My test results were quite surprising, especially considering the huge investment from Microsoft and Google. However, this area of innovation is improving at a tremendous rate, so I will be back with updated tests and results over time. Stay tuned.
Have you used this AI chatbot for programming? What was your experience? Let me know in the comments below.
You can follow daily project updates on social media. Be sure to subscribe. My weekly update newsletterAnd follow me on Twitter/X @DavidGewirtzOn Facebook Facebook.com/DavidGewirtzOn Instagram Instagram.com/DavidGewirtzAnd on YouTube YouTube.com/DavidGewirtzTV.