Sign up for our daily and weekly newsletters, featuring the latest updates and exclusive content on industry-leading AI reporting. Learn more
More than 40% of marketing, sales and customer service organizations Adopted generative AI — Second only to IT and cybersecurity, conversational AI will be the fastest-growing of all AI technologies because of its ability to bridge the current communication gap between businesses and their customers.
But many marketing business leaders I met were stuck in a crossroads of how to implement the technology. They didn’t know which available large-scale language models (LLMs) to choose, whether they were open source or closed source. They were worried about spending too much money on a new and unknown technology.
Of course, businesses can buy off-the-shelf conversational AI tools, but if they are to become a core part of the business, they need to develop them in-house.
To help ease the fears of those who choose to build, I wanted to share some of the internal research my team and I conducted to find the best LLM for building conversational AI. We spent time looking at different LLM providers and how much you should pay for each based on their inherent costs and the type of usage you expect your target audience to have.
We decided to compare GPT-4o (OpenAI) and Llama 3 (Meta). These are the two main LLMs that most companies will compare against each other, and we consider them to be the best quality models. This also allows us to compare closed source (GPT) and open source (Llama) LLMs.
How to calculate the cost of an LLM in Conversational AI?
The two main financial factors to consider when choosing an LLM are set-up costs and final processing costs.
The setup cost includes everything needed to get LLM up and running towards its end goal, including development and operational costs. The processing cost is the actual cost of each conversation after the tool is activated.
The cost-to-value ratio when it comes to setup will depend on what you will be using the LLM for and how much you will be using it for. If you need to deploy your product as quickly as possible, If so, it might be worth paying a premium for a model that requires little or no setup, like GPT-4o. It might take a few weeks to set up Llama 3, and by that time, they might have already fine-tuned the GPT product for market.
However, if you plan to manage a large number of clients or want more control over your LLM, it may be worth taking on the higher setup costs up front and reaping greater benefits later.
Let’s look at token usage in relation to the cost of processing conversations, as this allows for the most direct comparison. LLMs such as GPT-4o and Llama 3 use a basic metric called “tokens.” Tokens are text units that these models can process as inputs and outputs. There is no universal standard for how tokens are defined across different LLMs. Some count tokens per word, per subword, per character, or other variations.
All of these factors make it difficult to directly compare LLMs apples to apples, but we approximated this by simplifying the implicit costs of each model as much as possible.
We found that while GPT-4o is cheaper in terms of initial cost, Llama 3 becomes exponentially more cost-effective over time. Let’s start with the setup considerations and see why.
What is the basic cost of each LLM?
Before looking at the cost per conversation for each LLM, you need to understand the cost it takes to get to that level.
GPT-4o is a closed source model hosted by OpenAI. Because of this, you only need to set up your tool to ping GPT’s infrastructure and data library via a simple API call. Minimal setup is required.
Llama 3, on the other hand, is an open source model that must be hosted on your own private server or a cloud infrastructure provider. You can download the model components for free, then it’s up to you to find a host.
Hosting costs are a consideration here. Unless you buy your own server, which is relatively rare to start with, you will have to pay a cloud provider for the use of their infrastructure, and each provider may have different pricing structures.
Most hosting providers “rent” instances and charge for compute capacity by the hour or second. For example, AWS’s ml.g5.12xlarge instance charges per server hour. Other providers may bundle usage into different packages and charge a flat annual or monthly fee based on various factors such as storage requirements.
However, the provider Amazon Bedrock charges based on the number of tokens processed, so it can be a cost-effective solution for businesses even with low usage. Bedrock simplifies LLM deployment by handling the underlying infrastructure with AWS’s managed serverless platform.
In addition to the direct costs, running conversational AI on Llama 3 requires much more time and money to be allocated to operations, including initial selection and setup of servers or serverless options, and maintenance runs. It also requires more development costs, such as error logging tools and system notifications for issues that may occur on the LLM server.
Key factors to consider when calculating your baseline cost-to-value ratio include deployment time, product usage level (if you’re handling millions of conversations per month, setup costs will quickly be offset by bottom-line savings), and the level of control you need over your product and data (in which case an open source model is best).
How much does it cost per conversation for a major LLM?
Now we can look at the base cost of each conversation unit.
For our modeling, we used an empirical method: 1,000 words = 7,515 characters = 1,870 tokens.
We assumed that the average consumer conversation between AI and human was 16 messages total. This equates to an input of 29,920 tokens and an output of 470 tokens, or a total of 30,390 tokens (the input is much higher due to prompt rules and logic).
In GPT-4o price The cost per 1,000 input tokens is $0.005, and the cost per 1,000 output tokens is $0.015. So the “benchmark” conversation costs about $0.16.
GPT-4o input/output | Number of tokens | Price per 1,000 tokens | expense |
Input token | 29,920 | $0.00500 | $0.14960 |
Output Token | 470 | $0.01500 | $0.00705 |
Total cost per conversation | $0.15665 |
For Llama 3-70B on AWS Bedrock price The cost per 1,000 input tokens is $0.00265, and the cost per 1,000 output tokens is $0.00350, giving a “benchmark” conversation cost of about $0.08.
Rama 3-70B Input/Output | Number of tokens | Price per 1,000 tokens | expense |
Input token | 29,920 | $0.00265 | $0.07929 |
Output Token | 470 | $0.00350 | $0.00165 |
Total cost per conversation | $0.08093 |
In summary, once both models are fully set up, the cost of running a conversation on Llama 3 is almost 50% cheaper than running the same conversation on GPT-4o. However, all server costs must be added to the Llama 3 calculation.
Please keep in mind that this is only a snapshot of the overall cost of each LLM. Many other variables come into play when building a product to fit your unique needs, such as whether you use a multi-prompt approach or a single-prompt approach.
For businesses looking to leverage conversational AI as a core service rather than a fundamental element of their brand, investing in building AI in-house may not be worth the time and effort compared to the quality you can get from off-the-shelf products.
Whichever path you choose, integrating conversational AI can be incredibly useful—just make sure it’s always in line with your company’s context and your customers’ needs.
Sam Oliver is a Scottish technology entrepreneur and serial startup founder.
Data Decision Maker
Welcome to the VentureBeat community!
DataDecisionMakers is a place where data professionals, including technologists who work with data, can share data-related insights and innovations.
Join DataDecisionMakers if you want to learn about cutting-edge ideas, latest information, best practices, and the future of data and data technologies.
Why not try contributing an article yourself?
Read more at DataDecisionMakers