How to reduce AI chat costs in UltimumAI

2 minute read

presentation.pdf

Summarize this document.

Digitalization increases efficiency, and globalization brings new opportunities.

response price: $0.0035 (on average)

The price of one AI response is:

size of your newest message 💬

size of previous chat content

amount of thinking 🧠

size of generated response 🤖

AI does not remember. Every time you send a message, it reads the whole chat from the beginning. That costs!

1. The longer the chat, the more expensive responses!

2. The more content your messages have, the more expensive responses!

3. The bigger the response, the more expensive it is!

Here are some money saving tips

Start new chats often ✂️

Start a new chat instead of extending the old one. If you must, use the tactics described below.

Send messages quickly in a row ⚡

If you send a new message shortly after the previous one, the cost of re-reading previous chat content will be 50% – 90% lower.
For Claude AI models, you need to manually enable this.

Reduce memory 🧠

Often it's enough for the model to look at only the last few messages instead of the entire conversation. This allows you to have infinite conversations without increasing the cost.

Only for Claude models: enable auto caching 💾

If you send a new message within 5 minutes of the previous one, re-reading the chat costs 90% less.
If you wait longer, it costs 25% more.
Your first message in chat is always 25% more expensive (since it creates the cache).

Edit message instead of sending a new one ✏️

When you're not satisfied with the received response, edit your message and regenerate it. This will prevent the conversation from growing.

Use cheaper AI model 👶

Consider whether you really need the strongest model for a particular task. In UltimumAI, you can change the model mid-conversation.

Additional Tips

Maximize use of official apps 💻

Some models can be used for free in official applications. Make the most of them, and jump into UltimumAI when you encounter limitations.

Use english and latin alphabet 💬

English uses the least tokens and money. For example, Arabic is 3x more expensive (try it here). Tell the model to always respond in English, and try your best to do the same.

Set system instructions to have shorter responses ⚙️

The longer the response, the higher the cost. For example, you can write 'respond briefly and directly' or 'respond with code only'.

Ask multiple questions in the same message 🔢

Don't send multiple messages in a row with one question.

Create a conversation summary 📝

Tell a cheap/free model to create a summary of the conversation. Then copy that into a new conversation and continue there.

Turn off unnecessary tools 🛠️

Even if the model does not use a certain tool, the message price will be higher just because it is enabled.

Monitor message costs 💸

Below each message you receive, UltimumAI shows you how much it cost. Watch it!

Go to chat