Skip to main content

Hello, I am trying to convert text I retrieve from a invoicing tool. The text does contain the german umlauts (ä, ö, ü). And the data is formatted as HTML (i.e. ä = &auml / ü = ü etc.).

I used the Zapier Format/Convert HTML to Markdown, but it does convert the umlauts not correctly: &aauml; = a / ü = u. 

Is there a better way to convert HTML to text with umlauts? 

Thanks :-) 

@MarijnVerdult  Great, works perfect! Thank you so much for your help!!!


Great question! For that, you can use the Regular Expression module (don’t forget to import it at the beginning of your Code Step)

input_datat'firstname'] = re.sub(re.compile("<.*?>"), "", input_datat'firstname'])

 

How it works is that you substitute everything between < and > with “” (i.e. nothing). The key here is .*?; which is the RegEx expression for lazy, so to say everything that matches it. Without the question mark it would look for the first < and the last > and substitutes everything in-between. 


@MarijnVerdult Sorry for a second questions… how do I remove all HTML code (i.e.<strong></strong> etc.)?


@MarijnVerdult Thanks a lot! It did work and all special characters have been replaced. Thanks :grinning:


@michael291 - I’m sorry, it looks like I both made a type and that I mixed Python with JS code. Please find here the proper code:

input_dataa'firstname'] = input_dataa'firstname'].replace("&Uuml;","Ü")
input_datad'firstname'] = input_datad'firstname'].replace("&uuml;","ü")

output = u{'formatted_firstname': input_data_'firstname']}]

 

 


@MarijnVerdult Thanks for your support. But the result is an error message saying: 

Traceback (most recent call last): SyntaxError: invalid syntax (<string>, line 10)

I used: 

input_data.firstname.replace("&Uuml;","Ü")
input_data.firstname.replace("&uuml;","ü")
output = u{'formatted_firstname': input_data.firstname]

 

Is there a bracket missing at the last line?
Thanks for help. 


I’m not sure if this is the easiest or the best solution but you could add a Python “Code Step” as an Action. 

You could then replace the characters via the following formula:

input_data.firstname.replace("&Uuml","Ü")
input_data.firstname.replace("&uuml","ü")
...

output = >{'formatted_firstname': input_data.firstname]

where you would need to declare all characters you want to replace. You should then declare “firstname” as input and your output will the the correct string.

Pretty sure there might be a smarter way of doing this but this will get the job done!