Commit Graph

89 Commits

Author SHA1 Message Date
Sarah Hoffmann
93afe5a7c3 update typing for latest changes in SQLAlchemy 2023-12-29 20:55:33 +01:00
Sarah Hoffmann
6d39563b87 enable all API tests for sqlite and port missing features 2023-12-07 09:32:02 +01:00
Sarah Hoffmann
df6eddebcd void unnecessary aliases 2023-12-07 09:31:00 +01:00
Sarah Hoffmann
b6c8c0e72b factor out SQL for filtering by location
Also improves on the decision if an indexed is used or not.
2023-12-07 09:31:00 +01:00
Sarah Hoffmann
b06f5fddcb simplify handling of SQL lookup code for search_name
Use function classes which can be instantiated directly.
2023-12-07 09:31:00 +01:00
Sarah Hoffmann
615b166c68 clean up ST_DWithin and intersects() functions
A non-index version of ST_DWithin is not necessary. ST_Distance
can be used for that purpose. Index use for intersects can be
covered with a simple parameter.
2023-12-07 09:31:00 +01:00
Sarah Hoffmann
c41f2fed21 simplify weigh_search() function
Use JSON arrays which can have mixed types and therefore have
a more logical structure than separate arrays. Avoid JSON dicts
because of their verboseness.
2023-12-07 09:31:00 +01:00
Sarah Hoffmann
c4fd3ab97f hide type differences between Postgres and Sqlite in custom types
Also define a custom set of operators in preparation of differences
in implementation.
2023-12-07 09:31:00 +01:00
Sarah Hoffmann
8a2c6067a2 skip lookup with full names when there are none 2023-12-01 12:11:58 +01:00
Sarah Hoffmann
3c7a28dab0 further restrict stop search criterion 2023-11-29 11:28:54 +01:00
Sarah Hoffmann
0c72a434e0 use restrict for housenumber lookups with few numbers 2023-11-29 11:28:54 +01:00
Sarah Hoffmann
32e7b59b1f NearSearch needs to inherit penalty from inner search 2023-11-29 11:28:52 +01:00
Sarah Hoffmann
b2319e52ff correctly exclude streets with housenumber searches
Street result are not subject to the full filtering in the SQL
query, so recheck.
2023-11-28 17:53:37 +01:00
Sarah Hoffmann
25279d009a add tests for interaction of category parameter with category terms 2023-11-28 16:56:08 +01:00
Sarah Hoffmann
3f72ca4bca rename use of category as POI search to near_item
Use the term category only as a short-cut for "tuple of key and value".
2023-11-28 16:27:05 +01:00
Sarah Hoffmann
70dc4957dc the category parameter in search should result in a qualifier 2023-11-28 12:01:49 +01:00
Sarah Hoffmann
a7f5c6c8f5 drop category tokens when they make up a full phrase 2023-11-26 20:58:50 +01:00
Sarah Hoffmann
a8b023e57e restrict base results in near search by rank
This avoids in particular that roads or POIs are used as base
for the near search when a place result is present.
2023-11-26 17:41:29 +01:00
Sarah Hoffmann
47ca56f21b deduplicate categories/qualifiers 2023-11-26 17:11:15 +01:00
Sarah Hoffmann
580a7b032f order near searches by distance instead of importance 2023-11-26 16:48:04 +01:00
Sarah Hoffmann
8fcc2bb7f5 avoid duplicate lines during category search 2023-11-26 14:53:20 +01:00
Sarah Hoffmann
d6fe58f84e fix polygon selection for classtable lookups
Polygons should be used preferably with higher address ranks
where the areas are smaller.
2023-11-25 21:01:27 +01:00
Sarah Hoffmann
4e4d29f653 increase penalty for one-letter words 2023-11-23 10:51:58 +01:00
Sarah Hoffmann
195c13ee8a more preference for name-only queries in search 2023-11-22 23:57:23 +01:00
Sarah Hoffmann
ac5ef64701 avoid index use when filtering by layer 2023-11-22 20:54:04 +01:00
Sarah Hoffmann
155f26060d avoid index on rank_address in near search 2023-11-22 17:33:17 +01:00
Sarah Hoffmann
8216899a9a trim all coordinate output to 7 digits 2023-10-23 17:19:12 +02:00
Sarah Hoffmann
b62dbd1f92 reduce influence of viewbox
Perfectly matching city names should still get priority.
2023-10-07 22:00:52 +02:00
Sarah Hoffmann
b00b16aa3a more unit tests for search 2023-09-27 15:00:05 +02:00
Sarah Hoffmann
7fcbe13669 move get_addressdata() implementation to Python
The pgsql function get_addressdata() does a lookup of a lot of data
that is already available in Python.
2023-09-26 11:21:36 +02:00
Sarah Hoffmann
21df87dedc filter duplicate results after DB query 2023-09-20 14:58:54 +02:00
Sarah Hoffmann
fd26310d6a rerank results by query
The algorithm is similar to the PHP reranking and uses the terms from
the display name to check against the query terms. However instead of
exact matching it uses a per-word-edit-distance, so that it is less
strict when it comes to mismatching accents or other one letter
differences.

Country names get a higher penalty because they don't receive a
penalty during token matching right now.

This will work badly with the legacy tokenizer. Given that it is
marked for removal, it is simply not worth optimising for it.
2023-09-20 14:52:05 +02:00
Sarah Hoffmann
44da684d1d reduce expected count for multi-part words
Fixes #3196.
2023-09-11 17:45:34 +02:00
Sarah Hoffmann
c284df2dc9 restrict range for interpolated housenumbers
Interpolations are only supported up to 2^32 by the database.
Limit to 8 digits, which is still more than should be needed.
2023-09-05 11:41:41 +02:00
Sarah Hoffmann
15e09f2b24 remove alias where it does not work with lambdas
Fixes #3177.
2023-08-30 21:55:34 +02:00
Sarah Hoffmann
1115705cbc add additional timeout for entire request 2023-08-25 09:16:53 +02:00
Sarah Hoffmann
2762c45569 apply adjusted counts only to final result 2023-08-24 21:37:02 +02:00
Sarah Hoffmann
0a2d0c3b5c allow terms with frequent searches together with viewbox 2023-08-24 09:21:09 +02:00
Sarah Hoffmann
dcdda314e2 further tweak search containing very frequent tokens
Excluding non-rare full names is not really possible because it makes
addresses with street names like 'main st' unsearchable. This tries to
leav all names in but refrain from ordering results by accuracy
when too many results are expected. This means that the DB will simply
get the first n results without any particular order.
2023-08-23 23:04:12 +02:00
Sarah Hoffmann
23eed4ff2f fix tag name for housename addresses in layer selection
Fixes #3156.
2023-08-19 15:57:33 +02:00
Sarah Hoffmann
bfc706a596 cache ICU transliterators and reuse them 2023-08-15 23:08:44 +02:00
Sarah Hoffmann
746dd057b9 prefer name-only searches more 2023-08-13 15:24:16 +02:00
Sarah Hoffmann
b710297d05 return bbox of full country for country searches
Fixes #3149.
2023-08-13 14:37:28 +02:00
Sarah Hoffmann
0a8e8cec0f fix application of label to wrong expression 2023-08-13 11:59:01 +02:00
Sarah Hoffmann
96e5a23727 avoid lambda SQL in connection with alias tables 2023-08-13 11:40:49 +02:00
Sarah Hoffmann
cab2a74740 do not use index when searching in large areas
This concerns viewboxes as well as radius search.
2023-08-12 16:12:44 +02:00
Sarah Hoffmann
95d1048789 take token_assignment penalty into account
Also computes the expected count differently when addresses are
involved. Address token counts do not bare a direct relation to
real counts.
2023-08-12 15:33:50 +02:00
Sarah Hoffmann
38b2b8a143 fix debug output for NearSearch
The search info is in a subsearch and was therefore not taken into
account.
2023-08-12 11:27:55 +02:00
Sarah Hoffmann
3d0bc85b4d improve penalty for token-split words
The rematch penalty for partial words created by the transliteration
need to take into account that they are rematched against the full word.
That means that missing beginning and end should not get a significant
penalty.
2023-08-12 11:26:02 +02:00
Sarah Hoffmann
78648f1faf remove lookup by address only
There are too many lookups where the address is very frequent,
even when many address parts are present.
2023-08-06 21:00:10 +02:00