Commit Graph

143 Commits

Author SHA1 Message Date
Sarah Hoffmann
bc450d110c Merge pull request #3722 from emmanuel-ferdman/master
resolve datetime deprecation warnings
2025-04-22 14:21:05 +02:00
Sarah Hoffmann
3999977941 revert accidental change in json output format 2025-04-18 12:05:25 +02:00
Emmanuel Ferdman
df58870e3f resolve datetime deprecation warnings
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-04-17 11:15:16 -07:00
Sarah Hoffmann
7f710d2394 add a comment about the precomputed denominator 2025-04-15 09:38:05 +02:00
Sarah Hoffmann
06e39e42d8 add direction penalties
Direction penalties are estimated by getting the name to address
ratio usage for each partial term in the query and computing the
linear regression of that ratio over the entire phrase. Or to put
it in ither words: we try to determine if the terms at the beginning
or the end of the query are more likely to constitute a name.

Direction penalties are currently used only in classic name queries.
2025-04-11 20:41:06 +02:00
Sarah Hoffmann
2ef0e20a3f reorganise token reranking
As the reranking is about changing penalties in presence of other
tokens, change the datastructure to have the other tokens readily
avilable.
2025-04-11 13:38:34 +02:00
Sarah Hoffmann
b680d81f0a ensure that bailout-check is done after each iteration 2025-04-11 11:02:11 +02:00
Sarah Hoffmann
e0e067b1d6 replace use of range when computing word list 2025-04-11 09:59:04 +02:00
Sarah Hoffmann
3980791cfd use iterator instead of list to go over partials 2025-04-11 09:38:24 +02:00
Sarah Hoffmann
497e27bb9a move partial token into a separate field in the query struct
There is exactly one token to be expected and the token is usually
present.
2025-04-11 08:57:34 +02:00
Sarah Hoffmann
97d9e3c548 allow updating postcodes without a project directory
Postcodes will then be updated without looking for external postcodes.
2025-04-09 20:04:01 +02:00
Sarah Hoffmann
b34991d85f add BDD tests for DB 2025-04-09 14:52:34 +02:00
Sarah Hoffmann
39f56ba4b8 restrict coordinate output to 7 digits 2025-04-04 11:02:51 +02:00
Sarah Hoffmann
2ce2d031fa Merge pull request #3702 from lonvia/remove-tokenizer-dir
Remove automatic setup of tokenizer directory

So far the tokenizer factory would create a directory for private data for the tokenizer and then hand in the directory location to the tokenizer.

ICU tokenizer doesn't need any extra data anymore, so it doesn't make sense to create a directory which then remains empty. If a tokenizer needs such a directory in the future, it needs to create it on its own and make sure to handle the situation correctly where no project directory is used at all.
2025-04-03 09:04:48 +02:00
Sarah Hoffmann
186f562dd7 remove automatic setup of tokenizer directory
ICU tokenizer doesn't need any extra data anymore, so it doesn't
make sense to create a directory which then remains empty. If a
tokenizer needs such a directory in the future, it needs to create
it on its own and make sure to handle the situation correctly where
no project directory is used at all.
2025-04-02 20:20:04 +02:00
Sarah Hoffmann
c5bbeb626f Merge pull request #3700 from lonvia/ignore-inherited-addresses
Ignore POIs with inherited addresses for the address layer
2025-04-02 12:00:45 +02:00
Sarah Hoffmann
3bc77629c8 ignore POIs with inherited addresses for the address layer
We know that there is a building which describes the address as a
polygon and is therefore more suitable.
2025-04-02 10:30:45 +02:00
Sarah Hoffmann
6cf1287c4e Merge pull request #3686 from astridx/output_names
Output names as setting
2025-04-01 20:16:15 +02:00
Sarah Hoffmann
a49e8b9cf7 Merge pull request #3675 from TuringVerified/generic-preprocessors
Add generic preprocessors
2025-04-01 20:14:43 +02:00
TuringVerified
2eeec46040 Remove unnecessary assert statement, Fix regex_replace docstring and simplify regex_replace 2025-04-01 18:54:30 +05:30
TuringVerified
6d5a4a20c5 Update documentation, optimise regex_replace, add tests 2025-04-01 18:54:30 +05:30
TuringVerified
4665ea3e77 Add generic preprocessor 2025-04-01 18:54:30 +05:30
Sarah Hoffmann
fce279226f prepare release 5.1.0 2025-04-01 10:16:35 +02:00
astridx
12ad95067d output names as setting 2025-03-31 16:55:05 +02:00
Sarah Hoffmann
bfd1c83cb0 Merge pull request #3692 from lonvia/word-lookup-variants
Avoid matching penalty for abbreviated search terms
2025-03-31 16:38:31 +02:00
Sarah Hoffmann
3cb183ffb0 add lookup word to variants in word table 2025-03-31 14:52:50 +02:00
Sarah Hoffmann
1705bb5f57 do not save word counts of 1
This is the default setting, which will be assumed when the count is
missing.
2025-03-31 14:52:50 +02:00
Sarah Hoffmann
f2aa15778f always use lookup when requested
Doesn't seem to cause any issues in production.
2025-03-31 11:38:21 +02:00
Sarah Hoffmann
efe65c3e49 increase allowable address counts 2025-03-31 11:38:21 +02:00
Sarah Hoffmann
51847ebfeb more agressively reduce expected count for multi-word terms
Improves searching of non-latin scripts with forced token spaces.
2025-03-31 11:18:22 +02:00
Sarah Hoffmann
d4994a152b fix function signature for newer SQLAlchemy 2025-03-31 09:42:29 +02:00
Sarah Hoffmann
35baf77b18 make query upper-case when parsing postcodes
The postcode patterns expect upper-case letters.
2025-03-21 09:44:15 +01:00
Sarah Hoffmann
b1fc721f4b fix layer setting for structured search 2025-03-19 17:31:43 +01:00
Sarah Hoffmann
d400fd5f76 fix debug output for lookup type 2025-03-19 17:31:18 +01:00
Sarah Hoffmann
9419c5adb2 penalize postcode searches with multiple name qualifiers 2025-03-19 10:05:36 +01:00
Sarah Hoffmann
2c61fe08a0 use word_token length when penalizing against postcodes 2025-03-19 09:52:40 +01:00
Sarah Hoffmann
7b3c725f2a postcode token should have transliterated term in word_token 2025-03-19 09:52:40 +01:00
Sarah Hoffmann
edc5ada625 improve handling of leading postcodes
Setting the direction of the query while yielding assignments is
a bad idea because it may override a direction already set.
2025-03-19 09:52:40 +01:00
Sarah Hoffmann
3026c333ca adapt typing for latest SQLAlchemy version 2025-03-13 10:49:08 +01:00
Sarah Hoffmann
f5755a7a82 remove code for setting osm2pgsql via config.lib_dir
With the internal osm2pgsql gone, configuration of the binary location
via settings is the only option left that makes sense.
2025-03-11 09:04:05 +01:00
Miroslav Šedivý
6ff51712fe Simplify int/float manipulation 2025-03-06 19:26:56 +01:00
Sarah Hoffmann
6b0d58d9fd restrict postcode parsing in typed phrases
Postcodes can only appear in postcode-type phrases and must then
cover the full phrase
2025-03-05 10:09:33 +01:00
Sarah Hoffmann
434fbbfd18 add support for country prefixes in postcodes 2025-03-04 15:18:27 +01:00
Sarah Hoffmann
921db8bb2f cache all info of ICUQueryAnalyser in a single object 2025-03-04 08:58:57 +01:00
Sarah Hoffmann
a574b98e4a remove postcode computation for word table during import 2025-03-04 08:57:59 +01:00
Sarah Hoffmann
e67ae701ac show token begin and end in debug output 2025-03-04 08:57:59 +01:00
Sarah Hoffmann
fc1c6261ed add postcode parser 2025-03-04 08:57:37 +01:00
Sarah Hoffmann
6759edfb5d make word generation from query a class method 2025-03-04 08:57:37 +01:00
Sarah Hoffmann
e362a965e1 search: merge QueryPart array with QueryNodes
The basic information on terms is pretty much always used together
with the node inforamtion. Merging them together saves some
allocation while making lookup easier at the same time.
2025-03-04 08:57:37 +01:00
Sarah Hoffmann
13db4c9731 replace datrie library with a more simple pure-Python class 2025-02-24 10:24:21 +01:00