The usual workflow for embeddings is to bulk embed a bunch of documents and then use those documents to compare against the question.I don’t understand what the “Documents” stand for here. It seems I should just have the original strings to search against? Or is it my embeddings file that I can insert here. In that case - how do I link it here?

Has anyone figured out how to do Embeddings with OpenAI integration?

+14

Troy Tessalone
Zapier Orchestrator & Solution Partner
Answer
Forum|Forum|3 years ago
February 6, 2023

Hi @Arnoldas

Good question.

Perhaps this will help provide context.

⚡ Troy Tessalone - AutomationAce.com | Top Zapier Solution Partner | #1 Zapier Community Contributor

Like

A

Arnoldas
Author
Beginner
Forum|Forum|3 years ago
February 6, 2023

So I guess it is really just a list of strings that they rerun the embedding for every query.

This doesn’t seem too useful unless you just want to do a demo.

Well.. I guess if you wanted to standardise classification of something that is usually inserted by humans manually this could technically do it.

Like

+1

Reid
Zapier Staff
Forum|Forum|3 years ago
February 10, 2023

Hey @Arnoldas! Great question. Embeddings are a different way to think about searching or matching records but instead of relying on an exact match it relies on similarity. For instance if you had a list of animals in the “Documents” section including (Bison, Koala, and Giraffe) and then used “Africa” as the query, Giraffe would appear as the highest scored “document.”

There are lots of interesting use cases we’re seeing. A recent one we were looking at internally is using an Embedding search with various Jira Issues to match against new requests to see if there was one that was really similar to the new request to avoid duplicating work.

If it helps, you can also use the output for Zapier Actions such the Google Sheets Lookup Spreadsheet Rows and then put the output into the Document box. The Action will automatically unpack each of the line items from the search. This way you could have a document with hundreds of things you want to search against automatically added as separate values.

Let me know if this helps!

Included some screenshots below:

Like

FuzzyNubbins
New
Forum|Forum|3 years ago
February 11, 2023

If we are for instance searching a large body of information say a 150 page legal contract from a pdf that has been reformatted into Column A (Title) and Column B (Content). How can embeddings through zapier find the closest matching title and interpret the content. With zapier would we even need to data formatted this way?

I guess that larger question is how can we use embeddings to retain a larger knowledge base for general qa or prompting?

Solve or explanation is very much appreciated!

https://platform.openai.com/docs/guides/embeddings

Like

+1

Reid
Zapier Staff
Forum|Forum|3 years ago
February 13, 2023

@FuzzyNubbins I made a Loom for you showing an example. Currently, it is a bit weird to get the full text of a massive document into individual strings that the Embedding action can use. But here’s an example I came up with where I took the entire Apple Terms and made it searchable with embeddings:

https://www.loom.com/share/25bcf7a74b7a43ec87aeab3940851ef7

Like

FuzzyNubbins
New
Forum|Forum|3 years ago
February 13, 2023

@FuzzyNubbins I made a Loom for you showing an example. Currently, it is a bit weird to get the full text of a massive document into individual strings that the Embedding action can use. But here’s an example I came up with where I took the entire Apple Terms and made it searchable with embeddings:

https://www.loom.com/share/25bcf7a74b7a43ec87aeab3940851ef7

Super helpful thank you!

Like

FuzzyNubbins
New
Forum|Forum|3 years ago
February 13, 2023

Hey @Arnoldas! Great question. Embeddings are a different way to think about searching or matching records but instead of relying on an exact match it relies on similarity. For instance if you had a list of animals in the “Documents” section including (Bison, Koala, and Giraffe) and then used “Africa” as the query, Giraffe would appear as the highest scored “document.”

There are lots of interesting use cases we’re seeing. A recent one we were looking at internally is using an Embedding search with various Jira Issues to match against new requests to see if there was one that was really similar to the new request to avoid duplicating work.

If it helps, you can also use the output for Zapier Actions such the Google Sheets Lookup Spreadsheet Rows and then put the output into the Document box. The Action will automatically unpack each of the line items from the search. This way you could have a document with hundreds of things you want to search against automatically added as separate values.

Let me know if this helps!

Included some screenshots below:

Would you mind expounding on this entire flow? Would love to use it for a similar use case. Wondering how you got sheets to populate in documents. Did you have to add sheets as a step prior for it to know to grab data from there? Thanks!!!

Like

+1

Reid
Zapier Staff
Forum|Forum|3 years ago
February 13, 2023

Hey @Arnoldas! Great question. Embeddings are a different way to think about searching or matching records but instead of relying on an exact match it relies on similarity. For instance if you had a list of animals in the “Documents” section including (Bison, Koala, and Giraffe) and then used “Africa” as the query, Giraffe would appear as the highest scored “document.”

There are lots of interesting use cases we’re seeing. A recent one we were looking at internally is using an Embedding search with various Jira Issues to match against new requests to see if there was one that was really similar to the new request to avoid duplicating work.

If it helps, you can also use the output for Zapier Actions such the Google Sheets Lookup Spreadsheet Rows and then put the output into the Document box. The Action will automatically unpack each of the line items from the search. This way you could have a document with hundreds of things you want to search against automatically added as separate values.

Let me know if this helps!

Included some screenshots below:

Would you mind expounding on this entire flow? Would love to use it for a similar use case. Wondering how you got sheets to populate in documents. Did you have to add sheets as a step prior for it to know to grab data from there? Thanks!!!

Here you go! You first need to add a new column to your Google Sheet where every record has the same value. I often call this column “Lookup” and make every value “yes.” You then want to use the Google Sheets Lookup Spreadsheet Rows (output as Line Items) action and then use the column with your data as the input for Documents.

Like

FuzzyNubbins
New
Forum|Forum|3 years ago
February 15, 2023

Amazing, that is very helpful! I imagine that each of those sections need to be less than about 4k tokens. Programmatically when doing embeddings, from what I understand, is you can find the closest matching title then return the associated content so that you don’t have to process the entire content for every query when doing a general qa response prompt. However, it might be that the formater takes care of that piece. I wonder if what I just said is correct, I’ll try the above strategy and let you know if it works with a lot of data! Thanks again for super prompt and game changing help above!

On the left you can see the token count for all the contents in the C column.

Like

+1

Reid
Zapier Staff
Forum|Forum|3 years ago
February 15, 2023

Amazing, that is very helpful! I imagine that each of those sections need to be less than about 4k tokens. Programmatically when doing embeddings, from what I understand, is you can find the closest matching title then return the associated content so that you don’t have to process the entire content for every query when doing a general qa response prompt. However, it might be that the formater takes care of that piece. I wonder if what I just said is correct, I’ll try the above strategy and let you know if it works with a lot of data! Thanks again for super prompt and game changing help above!

On the left you can see the token count for all the contents in the C column.

Happy to help! Keep me posted on how it works for you. If you have feedback let me know and I'll see what we can do.

Also, do you have a Tokenizer Script running in your Google Sheet? Mind sharing that?

Like

FuzzyNubbins
New
Forum|Forum|3 years ago
February 17, 2023

Thanks! For the tokenizing I used this extension chatgpt for work and in that field I just prompted chatgpt to tell me!

Like

+1

Reid
Zapier Staff
Forum|Forum|3 years ago
March 22, 2023

Sharing this here as I found a bit of a fun way to take any text and get it into the array format needed for the Zapier OpenAI Embeddings step. Hope this helps.

Like

A

Arnoldas
Author
Beginner
Forum|Forum|3 years ago
March 27, 2023

Can’t believe I missed out on this! @Reid - I expected the it to behave something like that.

I will use it for sure. Your Loom recordings are very helpful.

Hopefully the API costs remain manageable embedding the entire document every time (depending on the use case of course)

Like

+9

christina.d
Zapier Staff
Forum|Forum|3 years ago
March 27, 2023

This whole thread is absolute 🔥. Thanks everyone for sharing your ideas in community!

We love to see it. 🧡

Like

Has anyone figured out how to do Embeddings with OpenAI integration?

14 replies

The Zappy Awards are back

Useful links

The Zappy Awards are back

Useful links

Sign up

Use your Zapier credentials

Log in to the Community

Use your Zapier credentials

Scanning file for viruses.

This file cannot be downloaded