UltimumAI

How to reduce AI chat costs in UltimumAI

2 minute read

presentation.pdf

Summarize this document.

AI

Digitalization increases efficiency, and globalization brings new opportunities.

response price: $0.0035 (on average)

The price of one AI response is:

size of your newest message πŸ’¬

+

size of previous chat content

+

amount of thinking 🧠

+

size of generated response πŸ€–

AI does not remember. Every time you send a message, it reads the whole chat from the beginning. That costs!

1. The longer the chat, the more expensive responses!

2. The more content your messages have, the more expensive responses!

3. The bigger the response, the more expensive it is!

Here are some money saving tips

Start new chats often βœ‚οΈ

Start a new chat instead of extending the old one. If you must, use the tactics described below.

Send messages quickly in a row ⚑

If you send a new message shortly after the previous one, the cost of re-reading previous chat content will be 50% – 90% lower.
For Claude AI models, you need to manually enable this.

Reduce memory 🧠

Often it's enough for the model to look at only the last few messages instead of the entire conversation. This allows you to have infinite conversations without increasing the cost.

Only for Claude models: enable auto caching πŸ’Ύ

  • If you send a new message within 5 minutes of the previous one, re-reading the chat costs 90% less.
  • If you wait longer, it costs 25% more.Β Β 
  • Your first message in chat is always 25% more expensive (since it creates the cache).

Edit message instead of sending a new one ✏️

When you're not satisfied with the received response, edit your message and regenerate it. This will prevent the conversation from growing.

Use cheaper AI model πŸ‘Ά

Consider whether you really need the strongest model for a particular task. In UltimumAI, you can change the model mid-conversation.

Additional TipsΒ Β 

Maximize use of official apps πŸ’»

Some models can be used for free in official applications. Make the most of them, and jump into UltimumAI when you encounter limitations.

Use english and latin alphabet πŸ’¬

English uses the least tokens and money. For example, Arabic is 3x more expensive (try it here). Tell the model to always respond in English, and try your best to do the same.

Set system instructions to have shorter responses βš™οΈ

The longer the response, the higher the cost. For example, you can write 'respond briefly and directly' or 'respond with code only'.

Ask multiple questions in the same message πŸ”’

Don't send multiple messages in a row with one question.

Create a conversation summary πŸ“

Tell a cheap/free model to create a summary of the conversation. Then copy that into a new conversation and continue there.

Turn off unnecessary tools πŸ› οΈ

Even if the model does not use a certain tool, the message price will be higher just because it is enabled.

Monitor message costs πŸ’Έ

Below each message you receive, UltimumAI shows you how much it cost. Watch it!
Images provided by Freepik