Hi,
I’m trying to use Formatter to extract a couple elements of data from emails and having trouble coding the Python expression to match the specific string I’m after.
From other guides in this community, the suggestion is to first of all strip out the HTML of the email. However, when I do so, I end up with the following:
From: John Smith Message Body:Firstname: JohnLastname: SmithPhone: 0780000000Email: johnsmith@hotmailtest.co.ukJobTitle: CEOYearsOfService: Over two yearsSalary: £100,000 +Settled: NoMessage: Please call me on 0780000000
The <br> is stripped and as such, elements such as JobTitle and YearsOfService is merged.
I am able to extract the email address directly from the HTML version of the email with existing Transform options.
Can someone help with the what code to use in the Pattern field so that I can strip out the YearsofService, Salary, Settled components please?
<html><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body>From: John Smith Message Body:<br>Firstname: John<br>Lastname: Smith<br>Phone: 0780000000<br>Email: johnsmith@hotmailtest.co.uk<br>JobTitle: CEO<br>YearsOfService: Over two years<br>Salary: £100,000 +<br>Settled: No<br>Message: Please call me on 0780000000 <br>
This is the code I’ve tried.
(?<=Salary:)(.*?)(?=\n)
Thanks!