This chapter is a synthesis of a webinar presented by Ajit Ghuman and Sundeep Teki (sundeepteki.org). All credit due to Sundeep Teki for the work done in the ‘The Impact of Technology Choice for GenAI Costs’ section of this chapter.
The discussion around pricing in GenAI development has been a step change compared to traditional SaaS, largely due to two significant factors: value and cost. Unlike traditional software, where value is predictable and deterministic, AI models deliver probabilistic outputs. This makes estimating and demonstrating AI value a challenge, as the results are not guaranteed to follow a consistent pattern.
At the same time, the cost of developing these models is substantial. Whether it's the computing power required to train a model like GPT-4o or the infrastructure to support ongoing operations, costs remain high and do not diminish over time as they might in traditional SaaS models. These two factors, uncertain value and high costs, make pricing GenAI products more complex and demanding. Let’s break it down in a way that makes sense.
Let’s talk about what it really takes to build GenAI models. First off, it's not cheap. We’re not just talking about a few million dollars here—training a model like GPT-4o or Llama 3.1 can cost anywhere from $100 million to $1 billion. These numbers might seem out of this world, but they’re real. They reflect the sheer amount of computing power, data, and infrastructure that goes into these models.
Now, imagine trying to figure out the value of what you’re building. In Figure 1 you will see that one of the biggest hurdles in implementing GenAI is the challenge of estimating and demonstrating AI value. With traditional SaaS software, the value is often simple, what you see is what you get (WYSIWYG) and the output is deterministic. The software either automates a business process or it doesn’t. But with GenAI, the situation is different. The output of AI products (generated text, classification, next action, etc.) is probabilistic, meaning there’s an element of uncertainty involved.
This means that some AI products might deliver outsized value, but others will be total duds, both requiring similar capital outlays – almost like blockbuster movie production. This uneven distribution of value creates a risky environment for companies investing in AI.
This brings us to a key difference between regular SaaS and AI-first SaaS. Figure 2 lays it out. In traditional SaaS, making a profit is simple: it’s what you earn minus what you spend (minimal hosting fees, etc.). And as your business grows, your costs usually go down per customer, which means more profit for you.
But AI-first SaaS, that’s different. Here, the costs can be quite high at setup and on an ongoing basis (not the same as application layer software). Regarding setup costs, training AI models can be incredibly expensive. For example, training GPT-4o cost around $100 million, and even BloombergGPT cost $10 million. If you’re thinking about creating a custom model, you’re looking at starting costs of $2 to $3 million with OpenAI, just to get things going. And the catch here is that the revenue you make isn’t necessarily steady because, as we just discussed, the value can be hit or miss.
An example of AI delivering outsized value.
Earlier in 2024, Klarna conducted an analysis of its AI-driven customer service operations, and the results were impressive. By integrating AI-powered agents, Klarna was able to handle a significant portion of their customer interactions with minimal human intervention. Specifically, the AI system managed 2.3 million conversations, representing two-thirds of all incoming customer service chats.
This AI system did the work of approximately 700 full-time human agents, significantly reducing operational costs and improving efficiency. Moreover, the implementation of AI resulted in an estimated $40 million USD in profit improvements for Klarna in 2024. This kind of outcome highlights the potential value AI can bring to a business when it is implemented effectively.
Earlier in the book we introduced a 5-step framework for pricing software products. Within this framework, for pricing AI products, we need to now pay a keen attention to the packaging and pricing metric decisions as they become somewhat more complex.
The core principles of packaging AI products do not change and need not be repeated again (these are covered earlier in the book). However, we do want to discuss the treatment of AI products that are often introduced as features within existing packaging lineups. For this it is useful to consider the rubric below.
The next critical step is determining the pricing metric. This decision will shape how the AI product generates revenue and aligns with the broader business model.
There are different pricing options to consider. On the fixed side of the spectrum, as you see in the below Figure 5, traditional models like on-premises licenses or named user licenses offer predictable and stable pricing. These options are common in many SaaS models because they provide consistency and are easy to manage.
As you move toward more flexible pricing, consumption-based models start to appear. Companies like Amazon, for example, charge based on actual usage, whether it's for services like EC2 or S3. Similarly, OpenAI uses a token-based model where pricing varies depending on how much the system is used. These pricing structures offer flexibility but come with less predictability.
Choosing the right pricing metric is an art, but it is also further complicated by the fact that AI products have high running costs. This can make traditional user based pricing models a much harder selection because high use single users can drive up your costs enough for it to not make sense for you economically.
However, choosing the wrong usage/consumption based metric can also be problematic. Let’s say you are using my AI software. Whether you use it 60 times or 600 times across a few days, may not have a tangible impact on your success. However a 10x usage based invoice when you have not necessarily succeeded, is not going to make you a happy customer.
So, what here truly matters is selecting a pricing metrics that benefits both the parties:
A look at Zendesk’s recent moves
Zendesk has recently started offering AI-powered agents that handle automated resolutions. In each plan, they provide a certain number of automated resolutions per agent each month. Beyond that limit, Zendesk charges up to $2 per additional automated resolution per month, or it can go as low as $1 if you buy a bundle.
Now, Zendesk has chosen "Automated Resolution per agent per month" as the metric for pricing. This is a foray in what is called “outcome based pricing”. This metric can be tricky because it may not have a consistent definition across their customer base. Zendesk might define it one way, but customers may have their own understanding of what counts as a resolution. It will likely take time for both Zendesk and its customers to fully agree on what this metric means. Resolutions are also unpredictable, and some can be very complex, while others can be easy, this affects the cost dynamics. More complex service environments could actually have Zendesk lose money at those customers.
In addition to this, Zendesk charges separately for an “Advanced AI” feature on a per-agent basis. This is available as an add-on. If you go back and check our packaging rubric you will appreciate which features are available as add-ons and which ones are available as included in all plans to gain an appreciation about their likely willingness to pay and popularity across Zendesk’s customer base.
How would you have approached the Zendesk case if it were up to you? It would necessarily involve creating pricing metric candidate options and weighing them against each other.
It seems Zendesk has chosen the most aligned metric, even though it comes with a lot of customer evangelism, instrumentation challenges and cost unpredictability.
Like in all things pricing, there is no wrong or right. Sometimes you just have to take a bet on something and be willing to listen to the market.
All credit due to Sundeep Teki for the work done in the ‘The Impact of Technology Choice for GenAI Costs’ section of this chapter. (sundeepteki.org)
Now that we have considered some aspects of the packaging and pricing metric decisions, let us truly get a handle on GenAI costs and technology choices that affect the economics of these products. This is the biggest wildcard with these types of products and profoundly influences the pricing metric and packaging decision.
A few years ago, AI models like GPT-4o were ahead of open-source options. Back in 2022, if you needed top performance, you would go with a closed-source model because they were about 25% better. The performance metric often used for this comparison was based on benchmarks like SuperGLUE and MMLU (Massive Multitask Language Understanding), which measure a model’s accuracy and capability across a wide range of tasks, including language understanding and reasoning. The performance of these models is calculated based on their accuracy in completing specific tasks, where higher accuracy indicates better performance. But by 2024, that gap has almost disappeared. Open-source models like Llama 3.1 are performing almost as well as GPT-4o in many tasks, reducing the performance gap to nearly zero.
If you take a look at Figure 8, you’ll see how Llama 3.1 has caught up with GPT-4o. This isn’t just a small improvement; it shows how open-source models are getting better. The performance comparisons are typically measured through evaluations like token generation speed and accuracy across diverse datasets, with results showing that the differences are now minimal.
The quality of open-source models has also improved. There used to be a clear difference, but that gap is nearly gone now. Open-source models like Llama 3.1 and Mixtrel 8x22B are offering quality that is on par with the best closed-source models. This improvement is measured through tests that calculate performance across multiple tasks, focusing on accuracy, speed, and efficiency. Now, if you look at Figure 9, you’ll notice that the difference in quality between closed-source and open-source models has almost disappeared. This shows just how far open-source models have come in a few years.
The reduction in the performance gap, once around 25%, has played a significant role in this transformation. Open-source models like Mixtrel from the French startup Mistral are contributing to this trend, pushing the boundaries and matching the closed-source counterparts more closely.
One of the reasons for this change is the ability to fine-tune models like Llama 3.1. Businesses no longer have to rely on general-purpose models like GPT-4o; they can customize models like Llama 3.1 to fit their specific needs, which can lead to better performance in certain areas.
This brings up an important point about cost. Open-source models aren’t just catching up in quality; they are doing it at a lower cost.
As shown in Figure 10, some open-source models, even though they cost less, deliver performance that’s as good as or better than the more expensive closed-source models. This is important for businesses that need strong AI but also want to keep costs in check.
So, what does all of this mean for you? It means you have more choices now. You don’t have to rely on expensive closed-source models to get the results you need. Open-source models have caught up, and with the right fine-tuning, they can be just as effective, depending on what you need. This gives you the flexibility to choose a solution that meets your needs and helps you manage costs at the same time.
The cost of using GenAI products is reducing.
If you check Figure 11, you will see how these savings have made AI more accessible. You’ll notice the breakdown of input and output costs across different models. The chart shows how the cost of inference has dropped dramatically over the last couple of years. Two years ago, you might have been paying around $50 for every million tokens processed by a model like GPT-4o. Now, that cost has fallen to about $0.50 per million tokens. That’s a 100-fold reduction in cost. This reduction in inference costs is a key factor for businesses looking to scale AI applications.
This price drop isn’t just limited to closed-source models. Open-source models like Llama 3.1, Mixtrel 8x22B, and others have also seen a similar decrease.
Real-World Cost Modeling for Summarizing Customer Service Calls
Let us look at a very specific use case of customer service call summarization.
Imagine a B2C company that has 100 customer service agents. These agents handle a lot of calls—about 100 calls each every day, with each call lasting around 5 minutes. Over a year, this adds up to about 3.65 million calls. That’s a huge amount of information to manage!
To help with this, the company uses ACME company’s advanced AI tools. These tools cost $50 per agent every month and are designed to make the agents’ jobs easier. The AI can do things like summarize customer calls, change the tone of responses to be more friendly or formal, and generally help agents respond faster and more effectively.
But providing this AI offering isn’t free, and ACME needs to think carefully about the costs. One option is to use a popular AI model like GPT-4o, which was the best available when it launched in March 2023. However, GPT-4o is expensive. If this company uses GPT-4o to summarize all 3.65 million calls each year, it would cost them around $54,713 – just for this one B2C company!
But there’s another option—using an open-source model like Llama 3.1. This model can be much cheaper. For example, using the smaller Llama 3.1 BB model might only cost about $1,155 a year, and the larger Llama 3.1 70B model would cost around $5,412. That’s a big difference in price!
Breaking Down the Costs
Let’s talk a bit about how these costs are calculated. When the AI summarizes a call, it processes words and turns them into “tokens.” Each 5-minute call has about 750 words, which equals around 1,000 tokens. Then, the AI summarizes this into a shorter text, which is about 500 words, or around 666 tokens.
Computation of Input Tokens
Computation of Output Tokens
These tokens are important because they determine how much it costs to use the AI. The more tokens the AI processes, the higher the cost.
Let’s look at different AI models and how much they would cost for 3.65 million calls each year:
As you can see, the cost varies significantly depending on which model you choose. For instance, using GPT-4o to summarize all 3.65 million calls would cost around $54,713 per year, while the smaller Llama 3.1 BB model would only cost around $1,155. ACME company better think carefully about what model it goes to market with.
How Costs Grow with More Clients
As the ACME company grows, its number of clients increases.
The key question here is how these costs scale when you use different AI models, such as closed-source models like GPT-4o versus open-source models like Llama 3.1.
Let’s break it down.
With a closed-source API like GPT-4o, the costs increase directly as the number of clients grows. This means the more clients you serve, the more calls are handled, and the more tokens the AI processes, which raises the overall costs in a straight line.
On the other hand, if you use an open-source model like Llama 3.1, the upfront costs may be higher because you have to invest in setting up the model. However, as your client base grows, the additional costs do not rise as quickly as they do with closed-source models. After the initial setup, adding more clients doesn’t increase costs at the same rate because you’ve already made the necessary investments in training and infrastructure.
Here’s how the costs compare across different AI models as the company scales:
Let’s say you start with 10 clients. If you’re using GPT-4o, the annual cost will be around $550,000. In contrast, using an open-source model like Llama 3.1 BB costs significantly less—about $377,000. If you choose to train your own model, the cost is higher at $647,000 due to the additional investment in infrastructure and training.
As your business grows to 10,000 clients, GPT-4o becomes extremely expensive, costing around $550 million per year. But if you stick with the open-source model, your costs stay much lower at $12.36 million. Training your own model also stays at $12.36 million once the setup is done. So, as your business scales, the cost differences between closed-source and open-source models become much larger, making open-source models the more affordable option.
So why is this happening?
With open-source models or custom models, you spend more upfront to get things ready. But once everything is set up, the cost of adding more clients stays low. On the other hand, with a closed-source API like GPT-4o, every new client adds more to your costs.
This difference can be a big deal if you have a lot of clients or plan to grow quickly. Open-source and custom models might need more work at the start, but they help keep costs down as you get more clients. This makes them a good choice if you’re thinking long-term and want to grow without letting costs get out of control.
Breaking Down The Costs
Now that the costs of scaling have been discussed, it's important to understand the factors that make up the cost calculations. Below you will see a comparison of the costs involved in using a closed-source model (like GPT-4o), customizing an open-source model (like Llama 3.1), and training your own model from scratch.
Let’s explain these costs further:
As you can now see, technology choices impact the commercialization of a GenAI product and consequently how you package and price it.
A poor tech decision could seriously impact the cost of service, resulting in pricing models that fail in the market. And a good or bad decision is entirely context dependent. Closed source models might be great to prototype but for financial viability may need to be cutover to open source models as you scale.
Pricing metric decisions themselves are bringing both tech companies and customers to a brave new world where their prior mental anchors are going to be re-examined. It is indeed a fun time to be in tech.
To succeed with GenAI, it's essential to grasp the economics and pricing involved. The choices you make about which model to use, how you manage costs, and how you set your prices will make or break your product. No pressure!