Best answer

How exactly does ChatGPT context window work with memory key?

  • 7 July 2023
  • 6 replies
  • 1640 views

I have a couple of questions about the ChatGPT functions. I’m trying to put together a chatbot that uses chatgpt.
 

  • What’s the difference between the user message and the assistant instructions?
  • Does the memory key store both the user message and the assistant instructions?
  • What if I create 1 zap with 1 prompt that is 1000 words, but the user message is dynamic based on what the person chatting with the chat bot says…
    • Will the assistant instructions be counted towards the context window (via the memory key) for every time the zap is triggered?

For example:

  • Our prospect sends 1st message.
  • We query chatgpt api via our zap with 1000 word assistant instructions.
  • Prospect responds to that message with their 2nd message.
  • We trigger the zap again for our 2nd response.

Would that add 2000 words to the context window? Or is the assistant instruction a one-time add to the context window?

Also:

Would both the assistant instruction and user message be entered basically into the chat log and chatgpt is simply told that one is different than the other?
If I had 2 zaps, 1 for the first message with the user message, and 2nd completely different assistant instruction, for the 2nd user message, would Chatgpt disregard the first assistant instruction or string it together with the 2nd assistant instruction (so they would build on top of each other).
So would Chatgpt basically have 2 dialogues going in its memory key for the conversation: 1 for the convo with the user and 2nd for the ongoing assistant instructions? Both of which it keeps the memory of?
(Assuming it is within the context window.)

icon

Best answer by hoon 14 July 2023, 03:01

View original

This post has been closed for comments. Please create a new post if you need help or have a question about this topic.

6 replies

@RefundPilot - Thanks for the details of your testing. That helped us look into how the integration is handling these types of messages/instructions with “too many” tokens. Below is a description of how things work currently, but they don’t necessarily represent how we think things should work.

Currently, for a model with a 4k-token limit, if a Zap runs when the Assistant Instructions has 2,400 tokens and a User Message with 3,000 token, the integration will only use the Assistant Instructions to generate a response from ChatGPT. The integration is prioritizing the Assistant Instructions over the User Message, and because this particular User Message would push the request over the 4k-token limit, the integration drops the User Message.

(If the Zap is set up to use a Memory Key, then some earlier User Messages and/or responses might be used if they will fit under the 4k-token limit.)

To our surprise, it did not crash when it had to answer the long one even though it exceeded ChatGPT3.5’s threshold of 4k tokens. Instead, it gave a decent answer, which relied heavily on the content of the assistant instruction. 

I think this is why a “decent answer” was still provided—the entire Assistant instructions were used and maybe some previous User Messages saved with a Memory Key gave the ChatGPT model enough context to provide that “decent answer.”


That said, this is probably not how things should work when a single request is larger then a model’s token limit. Instead of dropping an entire User Message, maybe we should be truncating it, or perhaps this should cause some sort of error with a message that the request was too large to avoid a response that doesn’t have the full context as expected. We will investigate further this particular scenario and work on making the integration handle these types of situation better.

Hi @minnesotatwins, thanks for your questions! I’ll do my best to answer them below:

What’s the difference between the user message and the assistant instructions?

The user message is the most recent message the assistant (ChatGPT) responds to. When using ChatGPT on the web, you can think of this as the message you type in to ask or tell ChatGPT what to do.

The assistant instructions are additional instructions you can choose to provide to help guide the assistant on what to do. Typically, a conversation will start with these instructions that tell the assistant how to behave. If you don’t provide any instructions, the Zap will automatically use the default You are a helpful assistant.

Does the memory key store both the user message and the assistant instructions?

The memory key is used to store the user message and the assistant’s response. The assistant instructions are not stored. They are sent every time the Zap runs.

What if I create 1 zap with 1 prompt that is 1000 words, but the user message is dynamic based on what the person chatting with the chat bot says… Will the assistant instructions be counted towards the context window (via the memory key) for every time the zap is triggered?

While the assistant instructions are not stored with the memory key, they do count towards the token limit for each time the action is performed.

In your example, the 1,000-word assistant instructions would only add 1,000 words to the context window each time the action is performed. The second response would include the first user message, the first assistant response, the second user message, and the 1,000-word assistant instructions (once).

Would both the assistant instruction and user message be entered basically into the chat log and chatgpt is simply told that one is different than the other?

The API doesn’t work exactly like using ChatGPT on the web. Both the assistant instructions and user message are sent, but they are marked differently. In OpenAI’s API docs, the assistant instructions is a system role and the user message is a user role. Each time a request is made, only one system message is sent (the assistant instructions provided in the Zap), but multiple user and assistant messages could be included depending on how many previous requests have been made with the same memory key value.

If I had 2 zaps, 1 for the first message with the user message, and 2nd completely different assistant instruction, for the 2nd user message, would Chatgpt disregard the first assistant instruction or string it together with the 2nd assistant instruction (so they would build on top of each other).

The former is essentially what would happen. But, ChatGPT isn’t “disregarding” the first Zap’s assistant instruction. Zapier just isn’t storing that information for use by the second Zap, so the first Zap’s assistant instruction doesn’t get sent to ChatGPT when the second Zap runs.

I hope that helps clear things up!

Thank you @hoon for providing these insightful answers to the questions already raised in this thread. We actually have a few follow-up questions related to the “assistant instructions” feature, too. It would be amazing if you could answer them as well: Our team played around with this feature a bit today in a Zap using ChatGPT3.5 turbo. We added assistant instructions about 2,400 tokens long. Then we tried user messages with different lengths (one lengthy message of about 3,000 tokens and a rather short one with about 100 tokens). To our surprise, it did not crash when it had to answer the long one even though it exceeded ChatGPT3.5’s threshold of 4k tokens. Instead, it gave a decent answer, which relied heavily on the content of the assistant instruction. I cannot fully rule out that this is due to sheer conincidence an guessworking, but we consider this highly unlikely as the assistant instruction contains information about our company that could not be found on the web.

To find an explanation for why it did not crash, we looked at the user statistics in our Open AI account. According to the data there, it managed to work with more than 4k tokens by using 2 instead of 1 request (see the screenshot depicting the aforementioned two test runs at 16:50 and 17:20 attached). 

Against this background, we are wondering about the following:

  1. Do we have any idea how Zapier breaks the user messsage and the assistant instructions apart when it submits more than one request to ChatGPT (like it did in our case at 16:50 with the assistant instructions and the user message adding up to 5,747 tokens)? 
  2. Are there parts of the assistant instructions or the user message in those cases that are not taken into account when ChatGPT formulates its answers? For instance, the content of the first of the two requests?
  3. If not, to what extent could this practice be extended further without risking that ChatGPT no longer takes into consideration the content of additional requests? 

We would highly appreciate an answer to this.

Amazing, thank you so much!

Userlevel 1

@hoon  When I’m using ChatGPT just by myself, I can continue a conversation up until I hit the maximum context length (around 8000 tokens).

If I’m using a Zap with a memory key, is it possible for me to hit a limit like this? And if I do, what happens?

@hoon  When I’m using ChatGPT just by myself, I can continue a conversation up until I hit the maximum context length (around 8000 tokens).

If I’m using a Zap with a memory key, is it possible for me to hit a limit like this? And if I do, what happens?

@brilliantops - In order to avoid errors, the integration will drop the oldest message(s) that are stored under that memory key value to stay under the token limit.