Sarah Hoffmann
8216899a9a
trim all coordinate output to 7 digits
2023-10-23 17:19:12 +02:00
Sarah Hoffmann
b62dbd1f92
reduce influence of viewbox
...
Perfectly matching city names should still get priority.
2023-10-07 22:00:52 +02:00
Sarah Hoffmann
b00b16aa3a
more unit tests for search
2023-09-27 15:00:05 +02:00
Sarah Hoffmann
7fcbe13669
move get_addressdata() implementation to Python
...
The pgsql function get_addressdata() does a lookup of a lot of data
that is already available in Python.
2023-09-26 11:21:36 +02:00
Sarah Hoffmann
21df87dedc
filter duplicate results after DB query
2023-09-20 14:58:54 +02:00
Sarah Hoffmann
fd26310d6a
rerank results by query
...
The algorithm is similar to the PHP reranking and uses the terms from
the display name to check against the query terms. However instead of
exact matching it uses a per-word-edit-distance, so that it is less
strict when it comes to mismatching accents or other one letter
differences.
Country names get a higher penalty because they don't receive a
penalty during token matching right now.
This will work badly with the legacy tokenizer. Given that it is
marked for removal, it is simply not worth optimising for it.
2023-09-20 14:52:05 +02:00
Sarah Hoffmann
44da684d1d
reduce expected count for multi-part words
...
Fixes #3196 .
2023-09-11 17:45:34 +02:00
Sarah Hoffmann
c284df2dc9
restrict range for interpolated housenumbers
...
Interpolations are only supported up to 2^32 by the database.
Limit to 8 digits, which is still more than should be needed.
2023-09-05 11:41:41 +02:00
Sarah Hoffmann
15e09f2b24
remove alias where it does not work with lambdas
...
Fixes #3177 .
2023-08-30 21:55:34 +02:00
Sarah Hoffmann
1115705cbc
add additional timeout for entire request
2023-08-25 09:16:53 +02:00
Sarah Hoffmann
2762c45569
apply adjusted counts only to final result
2023-08-24 21:37:02 +02:00
Sarah Hoffmann
0a2d0c3b5c
allow terms with frequent searches together with viewbox
2023-08-24 09:21:09 +02:00
Sarah Hoffmann
dcdda314e2
further tweak search containing very frequent tokens
...
Excluding non-rare full names is not really possible because it makes
addresses with street names like 'main st' unsearchable. This tries to
leav all names in but refrain from ordering results by accuracy
when too many results are expected. This means that the DB will simply
get the first n results without any particular order.
2023-08-23 23:04:12 +02:00
Sarah Hoffmann
23eed4ff2f
fix tag name for housename addresses in layer selection
...
Fixes #3156 .
2023-08-19 15:57:33 +02:00
Sarah Hoffmann
bfc706a596
cache ICU transliterators and reuse them
2023-08-15 23:08:44 +02:00
Sarah Hoffmann
746dd057b9
prefer name-only searches more
2023-08-13 15:24:16 +02:00
Sarah Hoffmann
b710297d05
return bbox of full country for country searches
...
Fixes #3149 .
2023-08-13 14:37:28 +02:00
Sarah Hoffmann
0a8e8cec0f
fix application of label to wrong expression
2023-08-13 11:59:01 +02:00
Sarah Hoffmann
96e5a23727
avoid lambda SQL in connection with alias tables
2023-08-13 11:40:49 +02:00
Sarah Hoffmann
cab2a74740
do not use index when searching in large areas
...
This concerns viewboxes as well as radius search.
2023-08-12 16:12:44 +02:00
Sarah Hoffmann
95d1048789
take token_assignment penalty into account
...
Also computes the expected count differently when addresses are
involved. Address token counts do not bare a direct relation to
real counts.
2023-08-12 15:33:50 +02:00
Sarah Hoffmann
38b2b8a143
fix debug output for NearSearch
...
The search info is in a subsearch and was therefore not taken into
account.
2023-08-12 11:27:55 +02:00
Sarah Hoffmann
3d0bc85b4d
improve penalty for token-split words
...
The rematch penalty for partial words created by the transliteration
need to take into account that they are rematched against the full word.
That means that missing beginning and end should not get a significant
penalty.
2023-08-12 11:26:02 +02:00
Sarah Hoffmann
78648f1faf
remove lookup by address only
...
There are too many lookups where the address is very frequent,
even when many address parts are present.
2023-08-06 21:00:10 +02:00
Sarah Hoffmann
2c7e1db5f6
remove SQL lambdas with IN expressions
...
The values of IN expressions are incorrectly cached.
2023-08-02 12:34:07 +02:00
Sarah Hoffmann
2171b38551
only print non-empty search tables
2023-08-02 09:25:47 +02:00
Sarah Hoffmann
afdbdb02a1
do not lookup by address vector when only few tokens are available
...
Names of countries and states are exceedingly rare in the word count
but are very frequent in the address. A short name has the danger
of producing too many results.
2023-08-02 09:25:47 +02:00
Sarah Hoffmann
8fc3dd9457
fix query over classtype tables
...
The case statement prevented the index on the classtype tables
from being used. Move the case statement inside the geometry
function instead.
2023-07-30 23:51:36 +02:00
Sarah Hoffmann
587698a6f3
disallow special housenumber search with a single frequent partial
2023-07-20 18:05:54 +02:00
Sarah Hoffmann
927d2cc824
do not split names from typed phrases
...
When phrases are typed, they should only contain exactly one term.
2023-07-17 20:09:08 +02:00
Sarah Hoffmann
7f9cb4e68d
split up get_assignment functon in more readable parts
2023-07-17 16:27:25 +02:00
Sarah Hoffmann
d48ea4f22c
disallow address searches that start with a postcode
...
These are postcode searches and nothing else.
2023-07-17 16:27:25 +02:00
Sarah Hoffmann
412bd2ec20
block search queries with too many tokens
2023-07-17 16:27:25 +02:00
Sarah Hoffmann
1c189060c2
simplify yield_lookups() function
...
Move creation of field lookups in separate functions to make the code
more readable.
2023-07-17 16:27:25 +02:00
Sarah Hoffmann
4a00a3c0f5
penalize name token splitting when phrases are used
2023-07-17 16:27:25 +02:00
Sarah Hoffmann
8366e4ca83
penalize search with frequent partials
...
Avoid search against frequent partials if we have already looked for
the full name equivalents.
2023-07-17 16:27:25 +02:00
Sarah Hoffmann
283db76e45
avoid splitting of first token when a housenumber is present
...
This only covers the case of <poi name> <street name> <housenumber>
which is exceedingly rare.
2023-07-17 16:27:25 +02:00
Sarah Hoffmann
8a36ed4f6f
increase threshold for full name searches
...
They still should be preferrred over expensive partial name searches.
2023-07-17 16:27:25 +02:00
Sarah Hoffmann
d0f45155c8
fix search for housenumber names
...
The search still included a lookup of housenumbers in children which is
wrong.
2023-07-17 16:27:25 +02:00
Sarah Hoffmann
7932b1849b
selected lambdas for search
2023-07-14 15:43:29 +02:00
Sarah Hoffmann
cc45930ef9
avoid lookup via partials on frequent words
...
Drops expensive searches via partials on terms like 'rue de'.
See #2979 .
2023-07-06 12:16:57 +02:00
Sarah Hoffmann
3266daa8fd
add a small penalty to lookups in address vectors
2023-07-04 16:54:42 +02:00
Sarah Hoffmann
49e0d83d5d
fix linting issues
2023-07-01 20:18:59 +02:00
Sarah Hoffmann
673c3c7a55
replace regexp_match with generic op() functions
...
Works around a bug in SQLAlchemy where regexp_match creates an
unstable cache key.
2023-07-01 18:15:22 +02:00
Sarah Hoffmann
5135041405
replace CASE construct with plpgsql function
2023-07-01 18:15:22 +02:00
Sarah Hoffmann
9f6f12cfeb
move search to bind parameters
2023-07-01 18:03:07 +02:00
Sarah Hoffmann
3a21999a17
move text normalization into extra function
2023-06-22 10:48:05 +02:00
Sarah Hoffmann
9bc5be837b
remove useless check
...
Found by new mypy version.
2023-06-21 11:56:39 +02:00
Sarah Hoffmann
4ad8818809
avoid fallback country lookup when places are excluded
2023-06-20 12:22:08 +02:00
Sarah Hoffmann
d0a1e8e311
tweak postcode search
...
Give a preference to left-right reading, i.e <postcode>,<address>
prefers a postcode search while <address>,<postcode> rather does
an address search.
Also exclude non-addressables, countries and state from results when a
postcode is contained in the query.
2023-06-20 11:56:43 +02:00