Commit Graph

30 Commits

Author SHA1 Message Date
Sarah Hoffmann
ffd5c32f17 fix comparision between countr tokens and country restriction 2025-12-04 18:29:25 +01:00
Sarah Hoffmann
81c6cb72e6 add normalised country name to word table
Country tokens now follow the usual convetion of having the
normalized version in the word column and the extra info about the
country code in the info column.
2025-12-01 13:10:18 +01:00
Sarah Hoffmann
193d6c4173 in-word penalty for final address token 2025-09-12 12:05:29 +02:00
Sarah Hoffmann
54620f9566 base penalty for housenumber searches on similar address searches 2025-09-12 10:52:42 +02:00
Sarah Hoffmann
341c09ee95 remove unused functions 2025-09-06 11:09:40 +02:00
Sarah Hoffmann
93ac1023f7 restrict name-only search more 2025-07-14 14:21:09 +02:00
Sarah Hoffmann
6d2b79870c only use most infrequent tokens for search index lookup 2025-07-14 14:18:22 +02:00
Sarah Hoffmann
71025f3f43 fix order of address rankings prefering longest words 2025-07-11 11:01:21 +02:00
Sarah Hoffmann
e4b671f8b1 reinstate penalty for partial only matches 2025-07-11 11:01:21 +02:00
Sarah Hoffmann
4634ad0720 rebalance word transition penalties 2025-07-11 11:01:21 +02:00
Sarah Hoffmann
c634e9fc5f differentiate between place searches with and without address 2025-07-07 12:03:56 +02:00
Sarah Hoffmann
13eaea8aae split place search into address search and named search
The presence/absence of houenumbers makes quite a difference for search.
2025-07-07 09:13:48 +02:00
Sarah Hoffmann
800c56642b tweak full count cut-off (as per deployment on osm.org) 2025-05-11 11:48:07 +02:00
Sarah Hoffmann
b680d81f0a ensure that bailout-check is done after each iteration 2025-04-11 11:02:11 +02:00
Sarah Hoffmann
3980791cfd use iterator instead of list to go over partials 2025-04-11 09:38:24 +02:00
Sarah Hoffmann
497e27bb9a move partial token into a separate field in the query struct
There is exactly one token to be expected and the token is usually
present.
2025-04-11 08:57:34 +02:00
Sarah Hoffmann
f2aa15778f always use lookup when requested
Doesn't seem to cause any issues in production.
2025-03-31 11:38:21 +02:00
Sarah Hoffmann
efe65c3e49 increase allowable address counts 2025-03-31 11:38:21 +02:00
Sarah Hoffmann
51847ebfeb more agressively reduce expected count for multi-word terms
Improves searching of non-latin scripts with forced token spaces.
2025-03-31 11:18:22 +02:00
Miroslav Šedivý
6ff51712fe Simplify int/float manipulation 2025-03-06 19:26:56 +01:00
Sarah Hoffmann
31412e0674 replace TokenType enum with simple char constants 2025-02-21 10:23:41 +01:00
Sarah Hoffmann
4577669213 replace BreakType enum with simple char constants 2025-02-21 09:57:48 +01:00
Sarah Hoffmann
9bf1428d81 consistently use query module as qmod 2025-02-21 09:31:21 +01:00
Sarah Hoffmann
46c4446dc2 remove address penalty for postcode search
Searches of the form <postcode> <city> are in fact quite common.
2025-02-20 11:11:45 +01:00
Sarah Hoffmann
499110f549 add SOFT_PHRASE break and enable parsing
Also enables parsing of PART breaks.
2025-01-06 17:10:24 +01:00
Sarah Hoffmann
1f07967787 fix style issue found by flake8 2024-11-10 22:47:14 +01:00
Sarah Hoffmann
a690605a96 remove support for unindexed tokens
This was a special feature of the legacy tokenizer who would not
index very frequent tokens.
2024-09-22 10:39:10 +02:00
Sarah Hoffmann
cfe5284f64 make housenumber search work with non-indexed partials 2024-07-31 14:09:35 +02:00
Mateusz Konieczny
e51973f8b1 fix some typos 2024-07-01 15:03:57 +02:00
Sarah Hoffmann
6e89310a92 split code into submodules 2024-06-26 11:52:47 +02:00