Hello, I am trying to convert text I retrieve from a invoicing tool. The text does contain the german umlauts (ä, ö, ü). And the data is formatted as HTML (i.e. ä = &auml / ü = ü etc.).I used the Zapier Format/Convert HTML to Markdown, but it does convert the umlauts not correctly: &aauml; = a / ü = u. Is there a better way to convert HTML to text with umlauts? Thanks :-)

Best answer

Problems with Umlauts and special characters

3 years ago
14 January 2021
7 replies
1522 views

Userlevel 4

michael291
Builder
72 replies

Hello, I am trying to convert text I retrieve from a invoicing tool. The text does contain the german umlauts (ä, ö, ü). And the data is formatted as HTML (i.e. ä = &auml / ü = ü etc.).

I used the Zapier Format/Convert HTML to Markdown, but it does convert the umlauts not correctly: &aauml; = a / ü = u.

Is there a better way to convert HTML to text with umlauts?

Thanks :-)

icon

Best answer by MarijnVerdult 15 January 2021, 11:38

View original

This post has been closed for comments. Please create a new post if you need help or have a question about this topic.

7 replies

Userlevel 4

I’m not sure if this is the easiest or the best solution but you could add a Python “Code Step” as an Action.

You could then replace the characters via the following formula:

input_data.firstname.replace("&Uuml","Ü")
input_data.firstname.replace("&uuml","ü")
...

output = [{'formatted_firstname': input_data.firstname]

where you would need to declare all characters you want to replace. You should then declare “firstname” as input and your output will the the correct string.

Pretty sure there might be a smarter way of doing this but this will get the job done!

Userlevel 4

michael291
Author
Builder
72 replies
3 years ago
14 January 2021

@MarijnVerdult Thanks for your support. But the result is an error message saying:

Traceback (most recent call last): SyntaxError: invalid syntax (<string>, line 10)

I used:

input_data.firstname.replace("Ü","Ü")
input_data.firstname.replace("ü","ü")
output = [{'formatted_firstname': input_data.firstname]

Is there a bracket missing at the last line?
Thanks for help.

Userlevel 4

@michael291 - I’m sorry, it looks like I both made a type and that I mixed Python with JS code. Please find here the proper code:

input_data['firstname'] = input_data['firstname'].replace("&Uuml;","Ü")
input_data['firstname'] = input_data['firstname'].replace("&uuml;","ü")

output = [{'formatted_firstname': input_data['firstname']}]

Userlevel 4

michael291
Author
Builder
72 replies
3 years ago
15 January 2021

@MarijnVerdult Thanks a lot! It did work and all special characters have been replaced. Thanks :grinning:

Userlevel 4

michael291
Author
Builder
72 replies
3 years ago
15 January 2021

@MarijnVerdult Sorry for a second questions… how do I remove all HTML code (i.e.<strong></strong> etc.)?

Userlevel 4

MarijnVerdult
Builder
63 replies
3 years ago
15 January 2021
Answer

Great question! For that, you can use the Regular Expression module (don’t forget to import it at the beginning of your Code Step)

input_data['firstname'] = re.sub(re.compile("<.*?>"), "", input_data['firstname'])

How it works is that you substitute everything between < and > with “” (i.e. nothing). The key here is .*?; which is the RegEx expression for lazy, so to say everything that matches it. Without the question mark it would look for the first < and the last > and substitutes everything in-between.

Userlevel 4

michael291
Author
Builder
72 replies
3 years ago
15 January 2021

@MarijnVerdult Great, works perfect! Thank you so much for your help!!!

Sign up

Use your Zapier credentials

Log in to the Community

Use your Zapier credentials

Scanning file for viruses.

This file cannot be downloaded