Hello, I am trying to convert text I retrieve from a invoicing tool. The text does contain the german umlauts (ä, ö, ü). And the data is formatted as HTML (i.e. ä = &auml / ü = ü etc.).I used the Zapier Format/Convert HTML to Markdown, but it does convert the umlauts not correctly: &aauml; = a / ü = u. Is there a better way to convert HTML to text with umlauts? Thanks :-)

Best answer

Problems with Umlauts and special characters

Forum|Forum|5 years ago
January 14, 2021
7 replies
1780 views

michael291
Builder

Hello, I am trying to convert text I retrieve from a invoicing tool. The text does contain the german umlauts (ä, ö, ü). And the data is formatted as HTML (i.e. ä = &auml / ü = ü etc.).

I used the Zapier Format/Convert HTML to Markdown, but it does convert the umlauts not correctly: &aauml; = a / ü = u.

Is there a better way to convert HTML to text with umlauts?

Thanks :-)

Best answer by MarijnVerdult

Great question! For that, you can use the Regular Expression module (don’t forget to import it at the beginning of your Code Step)

input_data['firstname'] = re.sub(re.compile("<.*?>"), "", input_data['firstname'])

How it works is that you substitute everything between < and > with “” (i.e. nothing). The key here is .*?; which is the RegEx expression for lazy, so to say everything that matches it. Without the question mark it would look for the first < and the last > and substitutes everything in-between.

This post has been closed for comments. Please create a new post if you need help or have a question about this topic.

MarijnVerdult
Builder
Forum|Forum|5 years ago
January 14, 2021

I’m not sure if this is the easiest or the best solution but you could add a Python “Code Step” as an Action.

You could then replace the characters via the following formula:

input_data.firstname.replace("&Uuml","Ü")
input_data.firstname.replace("&uuml","ü")
...

output = [{'formatted_firstname': input_data.firstname]

where you would need to declare all characters you want to replace. You should then declare “firstname” as input and your output will the the correct string.

Pretty sure there might be a smarter way of doing this but this will get the job done!

michael291
Author
Builder
Forum|Forum|5 years ago
January 14, 2021

@MarijnVerdult Thanks for your support. But the result is an error message saying:

Traceback (most recent call last): SyntaxError: invalid syntax (<string>, line 10)

I used:

input_data.firstname.replace("Ü","Ü")
input_data.firstname.replace("ü","ü")
output = [{'formatted_firstname': input_data.firstname]

Is there a bracket missing at the last line?
Thanks for help.

MarijnVerdult
Builder
Forum|Forum|5 years ago
January 15, 2021

@michael291 - I’m sorry, it looks like I both made a type and that I mixed Python with JS code. Please find here the proper code:

input_data['firstname'] = input_data['firstname'].replace("&Uuml;","Ü")
input_data['firstname'] = input_data['firstname'].replace("&uuml;","ü")

output = [{'formatted_firstname': input_data['firstname']}]

michael291
Author
Builder
Forum|Forum|5 years ago
January 15, 2021

@MarijnVerdult Thanks a lot! It did work and all special characters have been replaced. Thanks :grinning:

michael291
Author
Builder
Forum|Forum|5 years ago
January 15, 2021

@MarijnVerdult Sorry for a second questions… how do I remove all HTML code (i.e.<strong></strong> etc.)?

MarijnVerdult
Builder
Answer
Forum|Forum|5 years ago
January 15, 2021

Great question! For that, you can use the Regular Expression module (don’t forget to import it at the beginning of your Code Step)

input_data['firstname'] = re.sub(re.compile("<.*?>"), "", input_data['firstname'])

michael291
Author
Builder
Forum|Forum|5 years ago
January 15, 2021

@MarijnVerdult Great, works perfect! Thank you so much for your help!!!

Useful links

Sign up

Use your Zapier credentials

Log in to the Community

Use your Zapier credentials

Scanning file for viruses.

This file cannot be downloaded