Best answer

Problems with Umlauts and special characters

  • 14 January 2021
  • 7 replies
  • 617 views

Userlevel 3
Badge +3

Hello, I am trying to convert text I retrieve from a invoicing tool. The text does contain the german umlauts (ä, ö, ü). And the data is formatted as HTML (i.e. ä = &auml / ü = ü etc.).

I used the Zapier Format/Convert HTML to Markdown, but it does convert the umlauts not correctly: &aauml; = a / ü = u. 

Is there a better way to convert HTML to text with umlauts? 

Thanks :-) 

icon

Best answer by MarijnVerdult 15 January 2021, 11:38

View original

This post has been closed for comments. Please create a new post if you need help or have a question about this topic.

7 replies

Userlevel 4
Badge +3

I’m not sure if this is the easiest or the best solution but you could add a Python “Code Step” as an Action. 

You could then replace the characters via the following formula:

input_data.firstname.replace("&Uuml","Ü")
input_data.firstname.replace("&uuml","ü")
...

output = [{'formatted_firstname': input_data.firstname]

where you would need to declare all characters you want to replace. You should then declare “firstname” as input and your output will the the correct string.

Pretty sure there might be a smarter way of doing this but this will get the job done!

Userlevel 3
Badge +3

@MarijnVerdult Thanks for your support. But the result is an error message saying: 

Traceback (most recent call last): SyntaxError: invalid syntax (<string>, line 10)

I used: 

input_data.firstname.replace("&Uuml;","Ü")
input_data.firstname.replace("&uuml;","ü")
output = [{'formatted_firstname': input_data.firstname]

 

Is there a bracket missing at the last line?
Thanks for help. 

Userlevel 4
Badge +3

@michael291 - I’m sorry, it looks like I both made a type and that I mixed Python with JS code. Please find here the proper code:

input_data['firstname'] = input_data['firstname'].replace("&Uuml;","Ü")
input_data['firstname'] = input_data['firstname'].replace("&uuml;","ü")

output = [{'formatted_firstname': input_data['firstname']}]

 

 

Userlevel 3
Badge +3

@MarijnVerdult Thanks a lot! It did work and all special characters have been replaced. Thanks :grinning:

Userlevel 3
Badge +3

@MarijnVerdult Sorry for a second questions… how do I remove all HTML code (i.e.<strong></strong> etc.)?

Userlevel 4
Badge +3

Great question! For that, you can use the Regular Expression module (don’t forget to import it at the beginning of your Code Step)

input_data['firstname'] = re.sub(re.compile("<.*?>"), "", input_data['firstname'])

 

How it works is that you substitute everything between < and > with “” (i.e. nothing). The key here is .*?; which is the RegEx expression for lazy, so to say everything that matches it. Without the question mark it would look for the first < and the last > and substitutes everything in-between. 

Userlevel 3
Badge +3

@MarijnVerdult  Great, works perfect! Thank you so much for your help!!!