Has anyone figured out how to do Embeddings with OpenAI integration?
The usual workflow for embeddings is to bulk embed a bunch of documents and then use those documents to compare against the question.
I don’t understand what the “Documents” stand for here. It seems I should just have the original strings to search against? Or is it my embeddings file that I can insert here. In that case - how do I link it here?
Page 1 / 1
Hi @Arnoldas
Good question.
Perhaps this will help provide context.
So I guess it is really just a list of strings that they rerun the embedding for every query.
This doesn’t seem too useful unless you just want to do a demo.
Well.. I guess if you wanted to standardise classification of something that is usually inserted by humans manually this could technically do it.
Hey @Arnoldas! Great question. Embeddings are a different way to think about searching or matching records but instead of relying on an exact match it relies on similarity. For instance if you had a list of animals in the “Documents” section including (Bison, Koala, and Giraffe) and then used “Africa” as the query, Giraffe would appear as the highest scored “document.”
There are lots of interesting use cases we’re seeing. A recent one we were looking at internally is using an Embedding search with various Jira Issues to match against new requests to see if there was one that was really similar to the new request to avoid duplicating work.
If it helps, you can also use the output for Zapier Actions such the Google Sheets Lookup Spreadsheet Rows and then put the output into the Document box. The Action will automatically unpack each of the line items from the search. This way you could have a document with hundreds of things you want to search against automatically added as separate values.
Let me know if this helps!
Included some screenshots below:
If we are for instance searching a large body of information say a 150 page legal contract from a pdf that has been reformatted into Column A (Title) and Column B (Content). How can embeddings through zapier find the closest matching title and interpret the content. With zapier would we even need to data formatted this way?
I guess that larger question is how can we use embeddings to retain a larger knowledge base for general qa or prompting?
@FuzzyNubbins I made a Loom for you showing an example. Currently, it is a bit weird to get the full text of a massive document into individual strings that the Embedding action can use. But here’s an example I came up with where I took the entire Apple Terms and made it searchable with embeddings:
@FuzzyNubbins I made a Loom for you showing an example. Currently, it is a bit weird to get the full text of a massive document into individual strings that the Embedding action can use. But here’s an example I came up with where I took the entire Apple Terms and made it searchable with embeddings:
Hey @Arnoldas! Great question. Embeddings are a different way to think about searching or matching records but instead of relying on an exact match it relies on similarity. For instance if you had a list of animals in the “Documents” section including (Bison, Koala, and Giraffe) and then used “Africa” as the query, Giraffe would appear as the highest scored “document.”
There are lots of interesting use cases we’re seeing. A recent one we were looking at internally is using an Embedding search with various Jira Issues to match against new requests to see if there was one that was really similar to the new request to avoid duplicating work.
If it helps, you can also use the output for Zapier Actions such the Google Sheets Lookup Spreadsheet Rows and then put the output into the Document box. The Action will automatically unpack each of the line items from the search. This way you could have a document with hundreds of things you want to search against automatically added as separate values.
Let me know if this helps!
Included some screenshots below:
Would you mind expounding on this entire flow? Would love to use it for a similar use case. Wondering how you got sheets to populate in documents. Did you have to add sheets as a step prior for it to know to grab data from there? Thanks!!!
Hey @Arnoldas! Great question. Embeddings are a different way to think about searching or matching records but instead of relying on an exact match it relies on similarity. For instance if you had a list of animals in the “Documents” section including (Bison, Koala, and Giraffe) and then used “Africa” as the query, Giraffe would appear as the highest scored “document.”
There are lots of interesting use cases we’re seeing. A recent one we were looking at internally is using an Embedding search with various Jira Issues to match against new requests to see if there was one that was really similar to the new request to avoid duplicating work.
If it helps, you can also use the output for Zapier Actions such the Google Sheets Lookup Spreadsheet Rows and then put the output into the Document box. The Action will automatically unpack each of the line items from the search. This way you could have a document with hundreds of things you want to search against automatically added as separate values.
Let me know if this helps!
Included some screenshots below:
Would you mind expounding on this entire flow? Would love to use it for a similar use case. Wondering how you got sheets to populate in documents. Did you have to add sheets as a step prior for it to know to grab data from there? Thanks!!!
Here you go! You first need to add a new column to your Google Sheet where every record has the same value. I often call this column “Lookup” and make every value “yes.” You then want to use the Google Sheets Lookup Spreadsheet Rows (output as Line Items) action and then use the column with your data as the input for Documents.
Amazing, that is very helpful! I imagine that each of those sections need to be less than about 4k tokens. Programmatically when doing embeddings, from what I understand, is you can find the closest matching title then return the associated content so that you don’t have to process the entire content for every query when doing a general qa response prompt. However, it might be that the formater takes care of that piece. I wonder if what I just said is correct, I’ll try the above strategy and let you know if it works with a lot of data! Thanks again for super prompt and game changing help above!
On the left you can see the token count for all the contents in the C column.
Amazing, that is very helpful! I imagine that each of those sections need to be less than about 4k tokens. Programmatically when doing embeddings, from what I understand, is you can find the closest matching title then return the associated content so that you don’t have to process the entire content for every query when doing a general qa response prompt. However, it might be that the formater takes care of that piece. I wonder if what I just said is correct, I’ll try the above strategy and let you know if it works with a lot of data! Thanks again for super prompt and game changing help above!
On the left you can see the token count for all the contents in the C column.
Happy to help! Keep me posted on how it works for you. If you have feedback let me know and I'll see what we can do.
Also, do you have a Tokenizer Script running in your Google Sheet? Mind sharing that?
Thanks! For the tokenizing I used this extension chatgpt for work and in that field I just prompted chatgpt to tell me!
Sharing this here as I found a bit of a fun way to take any text and get it into the array format needed for the Zapier OpenAI Embeddings step. Hope this helps.
Can’t believe I missed out on this! @Reid - I expected the it to behave something like that.
I will use it for sure. Your Loom recordings are very helpful.
Hopefully the API costs remain manageable embedding the entire document every time (depending on the use case of course)
This whole thread is absolute . Thanks everyone for sharing your ideas in community!