Using regex to extract different types of content

  • 17 March 2023
  • 0 replies
  • 2524 views
Using regex to extract different types of content
Userlevel 7
Badge +11
  • Community Manager
  • 4275 replies

Hi friends! 👋 I know that regex can often seem a bit daunting, scary even, but it doesn’t need to be. It’s super useful so I thought I’d collate some examples of how you can use regex in a not so scary Formatter (Text > Extract Pattern) action to extract certain types of content. 

Don’t have time to read everything? That’s totally fine, you can just scroll down to the relevant sections and copy and paste the regex patterns into your Zap. I totally won’t be offended. 😉

Setting up the Formatter Action

This first part is relatively straight forward. You add an action to your Zap. Select the Formatter by Zapier app and for the Event, pick Text. Next, you’ll select the Extract Pattern option from the Transform field.

In the Input field you’ll select the field from a trigger (or previous action) that contains the text that you want to extract something from. The Pattern field is where you’d enter the regex expressions that we’re going to talk more about in a bit. 

21824b8bdba321d0afd2819975e16ec7.png

Matching One or All?

Ok, before we get to the code I wanted to mention that by default it will only return the first match it finds. If you're after all matches for a particular pattern make sure you set the Match All option to Yes. Otherwise only the first match will be returned:

e23fa704ebd0a8f933648556ebd4ccad.png
You can find out more about the other settings for this action here: Find text with regular expressions in Zaps.

Extracting Usernames/@mentions

This one's super simple but useful if you're looking to get all the usernames or @mentions from a social media post for example:

@(\w+)

So here, @ finds the @ symbol. \w then finds any word character (like digits, letters, underscores). And + indicates that the word character (\w) should be found one or more times, which should get the entire username/@mention. So it would be able to extract usernames or @mentions like "@123user", "@username", "@user_name03" for example.

Extracting Hashtags

It's possible to extract hashtags in the same way as with usernames/@mentions mentioned above. The difference being that you'd swap out the @ symbol for the # symbol:

#(\w+)

Extracting Currency Values

If the app you're looking to pull a value like $12.99 out of a field that contains multiple text or other values then try using the following:

(\$[\d,]+(\.\d{2})?)

So \$ would find the $currency character. Then [\d,]+ would find one or more digits (and commas). And the (.\d{2})? part handles instances where there's decimal point followed by two digits.

So it could be used to extract figures such as "$10.99" or "$4". To find different currency types you'd swap out the $ for a different currency value (£, etc.).

Extracting a Website From an Email Address

To get the website address fakedomain.com from the email fakeemail@fakedomain.com for example you'd use the following pattern:

[a-zA-Z][\w\.-]*[a-zA-Z0-9]@([a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z])

Yep, this one looks a bit complicated right? The [a-zA-Z] part matches any single letter character (uppercase or lowercase). The [\w\.-]* bit looks for zero or more characters that are a letter, digit, underscore, period or hyphen. The @ as you’d expect, finds the @ symbol in the email address. Then long ([a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]) part it what will extract the domain name in an email address by looking for any characters like letters, digits, periods, hyphens etc. in the last section of the email address.

And if you’re looking to only to extract the domain name without the .com part, you could use the same pattern that’s used for extracting username/@mentions: @(\w+)

Extracting an Image URL From an HTML Image Tag

For cases where you need to extract the url from a HTML image tag try the following pattern:

<img[^>]*src="([^"]+)"[^>]*>

Ok so, <img will find the start of the IMG tag. The [^>]* part then skips over zero or more characters that aren’t the end of the tag. Next the src=" will locate the start the of the src attribute of the img tag. Then ([^"]+) captures the URL of the image (which can be any number of characters but not a double quote). Then "[^>]*> locates the closing double quote of the "src" attribute, and any remaining characters up to and including the closing ">" character.

Extracting Text From in Between Specific Values

So in the below regex pattern you’d swap out x for the value that’s just before what you want to extract. And swap y for the value that’s just after what you want to extract: 

(?<=x)(.*?)(?=y)

So if you wanted to pull out the word “Zapier" from the following text “Client Name: Zapier Contact: Someone Nice” You could use the following: (?<=Client Name: )(.*?)(?=Contact:)

Extracting Email Addresses, Numbers and Phone Numbers

Hoping to extract an email, numbers or phone number? No regex needed here my friend! Our Extract Email AddressExtract Number and Extract Phone Number Formatter transform functions already have it covered.

Further Reading

I hope you find these example regex expressions and enjoy using them in your Zaps! And if you’ve got any other examples of handy regex that you’re using please share them in the comments below, we’d love to see them! 😁⚡


0 replies

Be the first to reply!

Reply