Commit Graph

7 Commits

Author SHA1 Message Date
Sarah Hoffmann
d984100e23 add inner word break penalty 2025-01-07 21:42:25 +01:00
Sarah Hoffmann
499110f549 add SOFT_PHRASE break and enable parsing
Also enables parsing of PART breaks.
2025-01-06 17:10:24 +01:00
Sarah Hoffmann
2b87c016db generalize normalization step for search query
It is now possible to configure functions for changing the query
input before it is analysed by the tokenizer.

Code is a cleaned-up version of the implementation by @miku.
2024-12-13 14:31:08 +01:00
Sarah Hoffmann
1f07967787 fix style issue found by flake8 2024-11-10 22:47:14 +01:00
Sarah Hoffmann
a690605a96 remove support for unindexed tokens
This was a special feature of the legacy tokenizer who would not
index very frequent tokens.
2024-09-22 10:39:10 +02:00
Sarah Hoffmann
4da4cbfe27 reduce from 3 to 2 packages 2024-06-28 09:13:22 +02:00
Sarah Hoffmann
6e89310a92 split code into submodules 2024-06-26 11:52:47 +02:00