Sarah Hoffmann
07c2907064
split normalized word when transliteration is split up
2025-09-08 22:58:01 +02:00
Sarah Hoffmann
621d8e785b
Merge pull request #3779 from lonvia/fix-zero-devision-direction
...
Fix direction factor computation on empty strings
2025-07-11 14:51:00 +02:00
Sarah Hoffmann
21ef3be433
fix direction factor computation on empty strings
2025-07-11 11:25:14 +02:00
Sarah Hoffmann
fe30663b21
remove penalty from TokenRanges
...
The parameter is no longer needed.
2025-07-11 11:01:22 +02:00
Sarah Hoffmann
4634ad0720
rebalance word transition penalties
2025-07-11 11:01:21 +02:00
Sarah Hoffmann
4a9253a0a9
simplify QueryNode penalty and initial assignment
2025-07-11 11:01:09 +02:00
Sarah Hoffmann
7f710d2394
add a comment about the precomputed denominator
2025-04-15 09:38:05 +02:00
Sarah Hoffmann
06e39e42d8
add direction penalties
...
Direction penalties are estimated by getting the name to address
ratio usage for each partial term in the query and computing the
linear regression of that ratio over the entire phrase. Or to put
it in ither words: we try to determine if the terms at the beginning
or the end of the query are more likely to constitute a name.
Direction penalties are currently used only in classic name queries.
2025-04-11 20:41:06 +02:00
Sarah Hoffmann
2ef0e20a3f
reorganise token reranking
...
As the reranking is about changing penalties in presence of other
tokens, change the datastructure to have the other tokens readily
avilable.
2025-04-11 13:38:34 +02:00
Sarah Hoffmann
e0e067b1d6
replace use of range when computing word list
2025-04-11 09:59:04 +02:00
Sarah Hoffmann
3980791cfd
use iterator instead of list to go over partials
2025-04-11 09:38:24 +02:00
Sarah Hoffmann
497e27bb9a
move partial token into a separate field in the query struct
...
There is exactly one token to be expected and the token is usually
present.
2025-04-11 08:57:34 +02:00
Sarah Hoffmann
6759edfb5d
make word generation from query a class method
2025-03-04 08:57:37 +01:00
Sarah Hoffmann
e362a965e1
search: merge QueryPart array with QueryNodes
...
The basic information on terms is pretty much always used together
with the node inforamtion. Merging them together saves some
allocation while making lookup easier at the same time.
2025-03-04 08:57:37 +01:00
Sarah Hoffmann
49bd18b048
replace PhraseType enum with simple int constants
2025-02-21 16:44:12 +01:00
Sarah Hoffmann
31412e0674
replace TokenType enum with simple char constants
2025-02-21 10:23:41 +01:00
Sarah Hoffmann
4577669213
replace BreakType enum with simple char constants
2025-02-21 09:57:48 +01:00
Sarah Hoffmann
d984100e23
add inner word break penalty
2025-01-07 21:42:25 +01:00
Sarah Hoffmann
499110f549
add SOFT_PHRASE break and enable parsing
...
Also enables parsing of PART breaks.
2025-01-06 17:10:24 +01:00
Sarah Hoffmann
1f07967787
fix style issue found by flake8
2024-11-10 22:47:14 +01:00
Sarah Hoffmann
a690605a96
remove support for unindexed tokens
...
This was a special feature of the legacy tokenizer who would not
index very frequent tokens.
2024-09-22 10:39:10 +02:00
Sarah Hoffmann
6e89310a92
split code into submodules
2024-06-26 11:52:47 +02:00