The usual workflow for embeddings is to bulk embed a bunch of documents and then use those documents to compare against the question.I don’t understand what the “Documents” stand for here. It seems I should just have the original strings to search against? Or is it my embeddings file that I can insert here. In that case - how do I link it here?

Has anyone figured out how to do Embeddings with OpenAI integration?

+14

Troy Tessalone
Zapier Expert
31297 replies
Answer
2 years ago
February 6, 2023

Hi @Arnoldas

Good question.

Perhaps this will help provide context.

⚡ Troy Tessalone - AutomationAce.com | Premier Certified Zapier Expert | #1 Zapier Community Contributor

A

Arnoldas
Author
Beginner
6 replies
2 years ago
February 6, 2023

So I guess it is really just a list of strings that they rerun the embedding for every query.

This doesn’t seem too useful unless you just want to do a demo.

Well.. I guess if you wanted to standardise classification of something that is usually inserted by humans manually this could technically do it.

+1

Reid
Zapier Staff
17 replies
2 years ago
February 10, 2023

Hey @Arnoldas! Great question. Embeddings are a different way to think about searching or matching records but instead of relying on an exact match it relies on similarity. For instance if you had a list of animals in the “Documents” section including (Bison, Koala, and Giraffe) and then used “Africa” as the query, Giraffe would appear as the highest scored “document.”

There are lots of interesting use cases we’re seeing. A recent one we were looking at internally is using an Embedding search with various Jira Issues to match against new requests to see if there was one that was really similar to the new request to avoid duplicating work.

If it helps, you can also use the output for Zapier Actions such the Google Sheets Lookup Spreadsheet Rows and then put the output into the Document box. The Action will automatically unpack each of the line items from the search. This way you could have a document with hundreds of things you want to search against automatically added as separate values.

Let me know if this helps!

Included some screenshots below:

A

FuzzyNubbins
New
5 replies
2 years ago
February 11, 2023

If we are for instance searching a large body of information say a 150 page legal contract from a pdf that has been reformatted into Column A (Title) and Column B (Content). How can embeddings through zapier find the closest matching title and interpret the content. With zapier would we even need to data formatted this way?

I guess that larger question is how can we use embeddings to retain a larger knowledge base for general qa or prompting?

Solve or explanation is very much appreciated!

https://platform.openai.com/docs/guides/embeddings

+1

Reid
Zapier Staff
17 replies
2 years ago
February 13, 2023

@FuzzyNubbins I made a Loom for you showing an example. Currently, it is a bit weird to get the full text of a massive document into individual strings that the Embedding action can use. But here’s an example I came up with where I took the entire Apple Terms and made it searchable with embeddings:

https://www.loom.com/share/25bcf7a74b7a43ec87aeab3940851ef7

FuzzyNubbins
New
5 replies
2 years ago
February 13, 2023

Reid wrote:

@FuzzyNubbins I made a Loom for you showing an example. Currently, it is a bit weird to get the full text of a massive document into individual strings that the Embedding action can use. But here’s an example I came up with where I took the entire Apple Terms and made it searchable with embeddings:

https://www.loom.com/share/25bcf7a74b7a43ec87aeab3940851ef7

Super helpful thank you!

FuzzyNubbins
New
5 replies
2 years ago
February 13, 2023

Reid wrote:

Hey @Arnoldas! Great question. Embeddings are a different way to think about searching or matching records but instead of relying on an exact match it relies on similarity. For instance if you had a list of animals in the “Documents” section including (Bison, Koala, and Giraffe) and then used “Africa” as the query, Giraffe would appear as the highest scored “document.”

There are lots of interesting use cases we’re seeing. A recent one we were looking at internally is using an Embedding search with various Jira Issues to match against new requests to see if there was one that was really similar to the new request to avoid duplicating work.

If it helps, you can also use the output for Zapier Actions such the Google Sheets Lookup Spreadsheet Rows and then put the output into the Document box. The Action will automatically unpack each of the line items from the search. This way you could have a document with hundreds of things you want to search against automatically added as separate values.

Let me know if this helps!

Included some screenshots below:

Would you mind expounding on this entire flow? Would love to use it for a similar use case. Wondering how you got sheets to populate in documents. Did you have to add sheets as a step prior for it to know to grab data from there? Thanks!!!

+1

Reid
Zapier Staff
17 replies
2 years ago
February 13, 2023

FuzzyNubbins wrote:

Reid wrote:

Hey @Arnoldas! Great question. Embeddings are a different way to think about searching or matching records but instead of relying on an exact match it relies on similarity. For instance if you had a list of animals in the “Documents” section including (Bison, Koala, and Giraffe) and then used “Africa” as the query, Giraffe would appear as the highest scored “document.”

There are lots of interesting use cases we’re seeing. A recent one we were looking at internally is using an Embedding search with various Jira Issues to match against new requests to see if there was one that was really similar to the new request to avoid duplicating work.

If it helps, you can also use the output for Zapier Actions such the Google Sheets Lookup Spreadsheet Rows and then put the output into the Document box. The Action will automatically unpack each of the line items from the search. This way you could have a document with hundreds of things you want to search against automatically added as separate values.

Let me know if this helps!

Included some screenshots below:

Would you mind expounding on this entire flow? Would love to use it for a similar use case. Wondering how you got sheets to populate in documents. Did you have to add sheets as a step prior for it to know to grab data from there? Thanks!!!

Here you go! You first need to add a new column to your Google Sheet where every record has the same value. I often call this column “Lookup” and make every value “yes.” You then want to use the Google Sheets Lookup Spreadsheet Rows (output as Line Items) action and then use the column with your data as the input for Documents.

https://www.loom.com/share/4dff64e2dd6649839e2d3e4ada04b6e2

FuzzyNubbins
New
5 replies
2 years ago
February 15, 2023

Amazing, that is very helpful! I imagine that each of those sections need to be less than about 4k tokens. Programmatically when doing embeddings, from what I understand, is you can find the closest matching title then return the associated content so that you don’t have to process the entire content for every query when doing a general qa response prompt. However, it might be that the formater takes care of that piece. I wonder if what I just said is correct, I’ll try the above strategy and let you know if it works with a lot of data! Thanks again for super prompt and game changing help above!

On the left you can see the token count for all the contents in the C column.

+1

Reid
Zapier Staff
17 replies
2 years ago
February 15, 2023

FuzzyNubbins wrote:

Amazing, that is very helpful! I imagine that each of those sections need to be less than about 4k tokens. Programmatically when doing embeddings, from what I understand, is you can find the closest matching title then return the associated content so that you don’t have to process the entire content for every query when doing a general qa response prompt. However, it might be that the formater takes care of that piece. I wonder if what I just said is correct, I’ll try the above strategy and let you know if it works with a lot of data! Thanks again for super prompt and game changing help above!

On the left you can see the token count for all the contents in the C column.

Happy to help! Keep me posted on how it works for you. If you have feedback let me know and I'll see what we can do.

Also, do you have a Tokenizer Script running in your Google Sheet? Mind sharing that?

FuzzyNubbins
New
5 replies
2 years ago
February 17, 2023

Thanks! For the tokenizing I used this extension chatgpt for work and in that field I just prompted chatgpt to tell me!

+1

Reid
Zapier Staff
17 replies
2 years ago
March 22, 2023

Sharing this here as I found a bit of a fun way to take any text and get it into the array format needed for the Zapier OpenAI Embeddings step. Hope this helps.

https://www.loom.com/share/56b1a5ffeaa64e94b7e7af33ec269d40

A

Arnoldas
Author
Beginner
6 replies
1 year ago
March 27, 2023

Can’t believe I missed out on this! @Reid - I expected the it to behave something like that.

I will use it for sure. Your Loom recordings are very helpful.

Hopefully the API costs remain manageable embedding the entire document every time (depending on the use case of course)

+9

christina.d
Zapier Staff
2653 replies
1 year ago
March 27, 2023

This whole thread is absolute 🔥. Thanks everyone for sharing your ideas in community!

We love to see it. 🧡

A

Has anyone figured out how to do Embeddings with OpenAI integration?

14 replies

Useful links

Code of conduct

Using the Community

Community expectations

Useful links

Code of conduct

Using the Community

Community expectations

Related topics

Odd explorer.exe and Activity Window 7-64icon

Nothing but trouble here

go to webroot community won't work in a std accounticon

Webroot blocking LoJack problem..icon

Explorer.exe using up to 300mb RAM.icon

Popular Tags

Sign up

Use your Zapier credentials

Log in to the Community

Use your Zapier credentials

Scanning file for viruses.

This file cannot be downloaded