Commit Graph

6 Commits

Author SHA1 Message Date
Sarah Hoffmann
bfc706a596 cache ICU transliterators and reuse them 2023-08-15 23:08:44 +02:00
Sarah Hoffmann
3d0bc85b4d improve penalty for token-split words
The rematch penalty for partial words created by the transliteration
need to take into account that they are rematched against the full word.
That means that missing beginning and end should not get a significant
penalty.
2023-08-12 11:26:02 +02:00
Sarah Hoffmann
3a21999a17 move text normalization into extra function 2023-06-22 10:48:05 +02:00
Sarah Hoffmann
9bc5be837b remove useless check
Found by new mypy version.
2023-06-21 11:56:39 +02:00
Sarah Hoffmann
2448cf2a14 add factory for query analyzer 2023-05-22 09:23:19 +02:00
Sarah Hoffmann
004883bdb1 query analyzer for ICU tokenizer 2023-05-22 08:46:19 +02:00