overhaul the token analysis interface

The functional split betweenthe two functions is now that the first one creates the ID that is used in the word table and the second one creates the variants. There no longer is a requirement that the ID is the normalized version. We might later reintroduce the requirement that a normalized version be available but it doesn't necessarily need to be through the ID. The function that creates the ID now gets the full PlaceName. That way it might take into account attributes that were set by the sanitizers. Finally rename both functions to something more sane.
2026-02-15 10:57:58 +00:00 · 2022-07-29 15:14:11 +02:00
parent 34d27ed45c
commit 51b6d16dc6
9 changed files with 76 additions and 43 deletions
--- a/docs/develop/ICU-Tokenizer-Modules.md
+++ b/docs/develop/ICU-Tokenizer-Modules.md
@@ -7,6 +7,12 @@ selection of sanitizers and token analyzers which you can use to adapt your
 installation to your needs. If the provided modules are not enough, you can
 also provide your own implementations. This section describes how to do that.

+!!! warning
+    This API is currently in early alpha status. While this API is meant to
+    be a public API on which other sanitizers and token analyzers may be
+    implemented, it is not guaranteed to be stable at the moment.
+
+
 ## Using non-standard sanitizers and token analyzers

 Sanitizer names (in the `step` property) and token analysis names (in the