Match Accented Letters with Regular Expressions
Publikováno: 6.8.2020
Regular expressions are used for a variety of tasks but the one I see most often is input validation. Names, dates, numbers…we tend to use regular expressions for everything, even when we probably shouldn’t. The most common syntax for checking alphabetic characters is A-z but what if the string contains accented characters? Characters like ğ […]
The post Match Accented Letters with Regular Expressions appeared first on David Walsh Blog.
Regular expressions are used for a variety of tasks but the one I see most often is input validation. Names, dates, numbers…we tend to use regular expressions for everything, even when we probably shouldn’t.
The most common syntax for checking alphabetic characters is A-z
but what if the string contains accented characters? Characters like ğ
and Ö
will make the regex fail. That’s where we need to use Unicode property escapes to check for a broader letter format!
Let’s look at how we can use \p{Letter}
and the Unicode flag (u
) to match both standard and accented characters:
// Single word "Özil".match(/[\p{Letter}]+/gu) // Word with spaces "Oğuzhan Özyakup".match(/[\p{Letter}\s]+/gu);
Using regular expressions to validate strings, especially names, is much more difficult than A-z+
. Names and other strings can be very diverse — let’s not insult users by making them provide non-accented letters just to pass validation!
The post Match Accented Letters with Regular Expressions appeared first on David Walsh Blog.