Sarah Hoffmann
11d624e92a
split db_searches moving each class in its own file
2025-07-01 22:57:04 +02:00
Sarah Hoffmann
87a8c246a0
improve result cutting when a POI comes out with top importance
2025-06-01 12:00:36 +02:00
Sarah Hoffmann
90050de717
only rerank results if there is more than one
...
With one result order is obvious.
2025-06-01 11:55:27 +02:00
Sarah Hoffmann
10a7d1106d
reduce influence of query rematching a little bit
2025-06-01 11:54:21 +02:00
Sarah Hoffmann
f2236f68f1
when rematching only distinguish between perfect, somewhat and bad match
2025-06-01 11:53:23 +02:00
Sarah Hoffmann
800c56642b
tweak full count cut-off (as per deployment on osm.org)
2025-05-11 11:48:07 +02:00
Sarah Hoffmann
34b72591cc
exclude address searches with country from direction penalty
...
Countries are not adequately represented by partial term counts.
2025-04-29 17:37:31 +02:00
Sarah Hoffmann
7f710d2394
add a comment about the precomputed denominator
2025-04-15 09:38:05 +02:00
Sarah Hoffmann
06e39e42d8
add direction penalties
...
Direction penalties are estimated by getting the name to address
ratio usage for each partial term in the query and computing the
linear regression of that ratio over the entire phrase. Or to put
it in ither words: we try to determine if the terms at the beginning
or the end of the query are more likely to constitute a name.
Direction penalties are currently used only in classic name queries.
2025-04-11 20:41:06 +02:00
Sarah Hoffmann
2ef0e20a3f
reorganise token reranking
...
As the reranking is about changing penalties in presence of other
tokens, change the datastructure to have the other tokens readily
avilable.
2025-04-11 13:38:34 +02:00
Sarah Hoffmann
b680d81f0a
ensure that bailout-check is done after each iteration
2025-04-11 11:02:11 +02:00
Sarah Hoffmann
e0e067b1d6
replace use of range when computing word list
2025-04-11 09:59:04 +02:00
Sarah Hoffmann
3980791cfd
use iterator instead of list to go over partials
2025-04-11 09:38:24 +02:00
Sarah Hoffmann
497e27bb9a
move partial token into a separate field in the query struct
...
There is exactly one token to be expected and the token is usually
present.
2025-04-11 08:57:34 +02:00
Sarah Hoffmann
f2aa15778f
always use lookup when requested
...
Doesn't seem to cause any issues in production.
2025-03-31 11:38:21 +02:00
Sarah Hoffmann
efe65c3e49
increase allowable address counts
2025-03-31 11:38:21 +02:00
Sarah Hoffmann
51847ebfeb
more agressively reduce expected count for multi-word terms
...
Improves searching of non-latin scripts with forced token spaces.
2025-03-31 11:18:22 +02:00
Sarah Hoffmann
35baf77b18
make query upper-case when parsing postcodes
...
The postcode patterns expect upper-case letters.
2025-03-21 09:44:15 +01:00
Sarah Hoffmann
d400fd5f76
fix debug output for lookup type
2025-03-19 17:31:18 +01:00
Sarah Hoffmann
9419c5adb2
penalize postcode searches with multiple name qualifiers
2025-03-19 10:05:36 +01:00
Sarah Hoffmann
2c61fe08a0
use word_token length when penalizing against postcodes
2025-03-19 09:52:40 +01:00
Sarah Hoffmann
7b3c725f2a
postcode token should have transliterated term in word_token
2025-03-19 09:52:40 +01:00
Sarah Hoffmann
edc5ada625
improve handling of leading postcodes
...
Setting the direction of the query while yielding assignments is
a bad idea because it may override a direction already set.
2025-03-19 09:52:40 +01:00
Miroslav Šedivý
6ff51712fe
Simplify int/float manipulation
2025-03-06 19:26:56 +01:00
Sarah Hoffmann
6b0d58d9fd
restrict postcode parsing in typed phrases
...
Postcodes can only appear in postcode-type phrases and must then
cover the full phrase
2025-03-05 10:09:33 +01:00
Sarah Hoffmann
434fbbfd18
add support for country prefixes in postcodes
2025-03-04 15:18:27 +01:00
Sarah Hoffmann
921db8bb2f
cache all info of ICUQueryAnalyser in a single object
2025-03-04 08:58:57 +01:00
Sarah Hoffmann
e67ae701ac
show token begin and end in debug output
2025-03-04 08:57:59 +01:00
Sarah Hoffmann
fc1c6261ed
add postcode parser
2025-03-04 08:57:37 +01:00
Sarah Hoffmann
6759edfb5d
make word generation from query a class method
2025-03-04 08:57:37 +01:00
Sarah Hoffmann
e362a965e1
search: merge QueryPart array with QueryNodes
...
The basic information on terms is pretty much always used together
with the node inforamtion. Merging them together saves some
allocation while making lookup easier at the same time.
2025-03-04 08:57:37 +01:00
Sarah Hoffmann
49bd18b048
replace PhraseType enum with simple int constants
2025-02-21 16:44:12 +01:00
Sarah Hoffmann
31412e0674
replace TokenType enum with simple char constants
2025-02-21 10:23:41 +01:00
Sarah Hoffmann
4577669213
replace BreakType enum with simple char constants
2025-02-21 09:57:48 +01:00
Sarah Hoffmann
9bf1428d81
consistently use query module as qmod
2025-02-21 09:31:21 +01:00
Sarah Hoffmann
b56edf3d0a
avoid yielding when extracting words from query
2025-02-20 23:32:39 +01:00
Sarah Hoffmann
abc911079e
remove word_number counting for phrases
...
We can just examine the break types to know if we are dealing
with a partial token.
2025-02-20 17:36:50 +01:00
Sarah Hoffmann
adabfee3be
Merge pull request #3655 from lonvia/remove-name-ranking-in-postcode-search
...
Tweak penalties for postcode searches
2025-02-20 14:32:43 +01:00
Sarah Hoffmann
46c4446dc2
remove address penalty for postcode search
...
Searches of the form <postcode> <city> are in fact quite common.
2025-02-20 11:11:45 +01:00
Sarah Hoffmann
add9244a2f
do not rerank address by full match in postcode search
...
The reranking result will not be completely correct because
the address of a postcode refer to the address _and_ name
of the parent and reranking was only done against the
address. We assume here that the postcode is precise enough
as to not require a penalty to to partial matches.
2025-02-20 10:29:03 +01:00
Sarah Hoffmann
55c3176957
strip normalisation results of normal and special spaces
2025-02-19 14:40:35 +01:00
Sarah Hoffmann
86ad9efa8a
keep break indicators [:-] during normalisation
...
All punctuation will be converted to '-'. Soft breaks : may be
added by preprocessors. The break signs are only used during
query analysis and are ignored during import token analysis.
2025-01-09 09:21:55 +01:00
Sarah Hoffmann
d984100e23
add inner word break penalty
2025-01-07 21:42:25 +01:00
Sarah Hoffmann
499110f549
add SOFT_PHRASE break and enable parsing
...
Also enables parsing of PART breaks.
2025-01-06 17:10:24 +01:00
Sarah Hoffmann
2b87c016db
generalize normalization step for search query
...
It is now possible to configure functions for changing the query
input before it is analysed by the tokenizer.
Code is a cleaned-up version of the implementation by @miku.
2024-12-13 14:31:08 +01:00
Sarah Hoffmann
0770eaa5d0
use bbox size for secondary order of results
...
Helps to return the largest object when deduplicating results.
2024-11-19 10:38:50 +01:00
Sarah Hoffmann
122ecd4626
remove remaining pylint hints
2024-11-10 22:49:29 +01:00
Sarah Hoffmann
1f07967787
fix style issue found by flake8
2024-11-10 22:47:14 +01:00
Sarah Hoffmann
2c0f2e1ede
remove now unnecessary type-ignores
2024-10-25 17:56:47 +02:00
Sarah Hoffmann
5160a1d577
get bbox of postcode areas into results
2024-09-30 08:58:40 +02:00