Sarah Hoffmann
ffd5c32f17
fix comparision between countr tokens and country restriction
2025-12-04 18:29:25 +01:00
Sarah Hoffmann
81c6cb72e6
add normalised country name to word table
...
Country tokens now follow the usual convetion of having the
normalized version in the word column and the extra info about the
country code in the info column.
2025-12-01 13:10:18 +01:00
Sarah Hoffmann
193d6c4173
in-word penalty for final address token
2025-09-12 12:05:29 +02:00
Sarah Hoffmann
54620f9566
base penalty for housenumber searches on similar address searches
2025-09-12 10:52:42 +02:00
Sarah Hoffmann
341c09ee95
remove unused functions
2025-09-06 11:09:40 +02:00
Sarah Hoffmann
93ac1023f7
restrict name-only search more
2025-07-14 14:21:09 +02:00
Sarah Hoffmann
6d2b79870c
only use most infrequent tokens for search index lookup
2025-07-14 14:18:22 +02:00
Sarah Hoffmann
71025f3f43
fix order of address rankings prefering longest words
2025-07-11 11:01:21 +02:00
Sarah Hoffmann
e4b671f8b1
reinstate penalty for partial only matches
2025-07-11 11:01:21 +02:00
Sarah Hoffmann
4634ad0720
rebalance word transition penalties
2025-07-11 11:01:21 +02:00
Sarah Hoffmann
c634e9fc5f
differentiate between place searches with and without address
2025-07-07 12:03:56 +02:00
Sarah Hoffmann
13eaea8aae
split place search into address search and named search
...
The presence/absence of houenumbers makes quite a difference for search.
2025-07-07 09:13:48 +02:00
Sarah Hoffmann
800c56642b
tweak full count cut-off (as per deployment on osm.org)
2025-05-11 11:48:07 +02:00
Sarah Hoffmann
b680d81f0a
ensure that bailout-check is done after each iteration
2025-04-11 11:02:11 +02:00
Sarah Hoffmann
3980791cfd
use iterator instead of list to go over partials
2025-04-11 09:38:24 +02:00
Sarah Hoffmann
497e27bb9a
move partial token into a separate field in the query struct
...
There is exactly one token to be expected and the token is usually
present.
2025-04-11 08:57:34 +02:00
Sarah Hoffmann
f2aa15778f
always use lookup when requested
...
Doesn't seem to cause any issues in production.
2025-03-31 11:38:21 +02:00
Sarah Hoffmann
efe65c3e49
increase allowable address counts
2025-03-31 11:38:21 +02:00
Sarah Hoffmann
51847ebfeb
more agressively reduce expected count for multi-word terms
...
Improves searching of non-latin scripts with forced token spaces.
2025-03-31 11:18:22 +02:00
Miroslav Šedivý
6ff51712fe
Simplify int/float manipulation
2025-03-06 19:26:56 +01:00
Sarah Hoffmann
31412e0674
replace TokenType enum with simple char constants
2025-02-21 10:23:41 +01:00
Sarah Hoffmann
4577669213
replace BreakType enum with simple char constants
2025-02-21 09:57:48 +01:00
Sarah Hoffmann
9bf1428d81
consistently use query module as qmod
2025-02-21 09:31:21 +01:00
Sarah Hoffmann
46c4446dc2
remove address penalty for postcode search
...
Searches of the form <postcode> <city> are in fact quite common.
2025-02-20 11:11:45 +01:00
Sarah Hoffmann
499110f549
add SOFT_PHRASE break and enable parsing
...
Also enables parsing of PART breaks.
2025-01-06 17:10:24 +01:00
Sarah Hoffmann
1f07967787
fix style issue found by flake8
2024-11-10 22:47:14 +01:00
Sarah Hoffmann
a690605a96
remove support for unindexed tokens
...
This was a special feature of the legacy tokenizer who would not
index very frequent tokens.
2024-09-22 10:39:10 +02:00
Sarah Hoffmann
cfe5284f64
make housenumber search work with non-indexed partials
2024-07-31 14:09:35 +02:00
Mateusz Konieczny
e51973f8b1
fix some typos
2024-07-01 15:03:57 +02:00
Sarah Hoffmann
6e89310a92
split code into submodules
2024-06-26 11:52:47 +02:00