Sarah Hoffmann
d984100e23
add inner word break penalty
2025-01-07 21:42:25 +01:00
Sarah Hoffmann
499110f549
add SOFT_PHRASE break and enable parsing
...
Also enables parsing of PART breaks.
2025-01-06 17:10:24 +01:00
Sarah Hoffmann
2b87c016db
generalize normalization step for search query
...
It is now possible to configure functions for changing the query
input before it is analysed by the tokenizer.
Code is a cleaned-up version of the implementation by @miku.
2024-12-13 14:31:08 +01:00
Sarah Hoffmann
1f07967787
fix style issue found by flake8
2024-11-10 22:47:14 +01:00
Sarah Hoffmann
a690605a96
remove support for unindexed tokens
...
This was a special feature of the legacy tokenizer who would not
index very frequent tokens.
2024-09-22 10:39:10 +02:00
Sarah Hoffmann
4da4cbfe27
reduce from 3 to 2 packages
2024-06-28 09:13:22 +02:00
Sarah Hoffmann
6e89310a92
split code into submodules
2024-06-26 11:52:47 +02:00