avoid special characters in word tokens

Transliteration should only consist of ASCII letters
and numbers. Avoid any other characters.
This commit is contained in:
Sarah Hoffmann
2021-11-10 17:14:13 +01:00
parent 7326b246b7
commit 1886952666

View File

@@ -21,8 +21,8 @@ transliteration:
- !include icu-rules/extended-unicode-to-asccii.yaml
- ":: Ascii ()"
- ":: NFD ()"
- "[^[:Ascii:]] >"
- ":: lower ()"
- "[^a-z0-9[:Space:]] >"
- ":: NFC ()"
sanitizers:
- step: split-name-list