Sarah Hoffmann
c634e9fc5f
differentiate between place searches with and without address
2025-07-07 12:03:56 +02:00
Sarah Hoffmann
13eaea8aae
split place search into address search and named search
...
The presence/absence of houenumbers makes quite a difference for search.
2025-07-07 09:13:48 +02:00
Sarah Hoffmann
11d624e92a
split db_searches moving each class in its own file
2025-07-01 22:57:04 +02:00
Sarah Hoffmann
f43fec0d57
Merge pull request #3764 from lonvia/update-importance
...
'refresh --importance' also needs to refresh importances in search_name table
2025-06-27 10:02:18 +02:00
Sarah Hoffmann
678702ceb7
rewrite importances in search_name after updating in placex
2025-06-26 20:27:37 +02:00
Sarah Hoffmann
f9eb93c4ab
remove support for deprecated gazetteer osm2pgsql output
2025-06-25 23:09:08 +02:00
anqixxx
cf9b946eba
Added skip for when min =0
2025-06-05 09:25:14 +08:00
anqixxx
7dc3924a3c
Added default min = 0 argument for private functions
...
empty
2025-06-04 01:12:36 -07:00
anqixxx
20cf4b56b9
Refactored min and associated tests to follow greater than or equal to logic, so that min=0 accounted for no filtering
...
r
2025-06-04 00:53:52 -07:00
anqixxx
40d5b78eb8
Added command line (default 0) min argument for minimum filtering, updated args.py to reflect this
2025-06-04 00:53:52 -07:00
Sarah Hoffmann
87a8c246a0
improve result cutting when a POI comes out with top importance
2025-06-01 12:00:36 +02:00
Sarah Hoffmann
90050de717
only rerank results if there is more than one
...
With one result order is obvious.
2025-06-01 11:55:27 +02:00
Sarah Hoffmann
10a7d1106d
reduce influence of query rematching a little bit
2025-06-01 11:54:21 +02:00
Sarah Hoffmann
f2236f68f1
when rematching only distinguish between perfect, somewhat and bad match
2025-06-01 11:53:23 +02:00
Sarah Hoffmann
d2e691b63f
work around bogus type error in latest starlette
2025-05-31 09:43:48 +02:00
Sarah Hoffmann
2a508b6c99
fix missing optional return
2025-05-30 12:03:00 +02:00
anqixxx
6220bde2d6
Added mypy ignore fix for logging.py (library change), as well as quick mac fix on mem.cached
2025-05-21 11:11:56 -07:00
anqixxx
618fbc63d7
Added testing to test get classtype pairs in import special phrases
2025-05-21 10:39:51 -07:00
anqixxx
3f51cb3fd1
Made the limit configurable with an optional argument, updating the testing as well to reflect this. default is now 0, meaning that it will return everything that occurs more than once. Removed mock database test, and got rid of fetch all. Rebased all tests to monkeypatch
2025-05-21 10:38:34 -07:00
anqixxx
59a947c5f5
Removed class type pair getter that used style sheets from both spi_importer and the associated testing function
2025-05-21 10:38:08 -07:00
anqixxx
1952290359
Removed magic mocking, using monkeypatch instead, and using a placex table to simulate a 'real database'
2025-05-21 10:37:42 -07:00
anqixxx
1a323165f9
Filter special phrases by style and frequency to fix #235
2025-05-21 10:36:46 -07:00
Sarah Hoffmann
800c56642b
tweak full count cut-off (as per deployment on osm.org)
2025-05-11 11:48:07 +02:00
Sarah Hoffmann
34b72591cc
exclude address searches with country from direction penalty
...
Countries are not adequately represented by partial term counts.
2025-04-29 17:37:31 +02:00
Sarah Hoffmann
bc450d110c
Merge pull request #3722 from emmanuel-ferdman/master
...
resolve datetime deprecation warnings
2025-04-22 14:21:05 +02:00
Sarah Hoffmann
3999977941
revert accidental change in json output format
2025-04-18 12:05:25 +02:00
Emmanuel Ferdman
df58870e3f
resolve datetime deprecation warnings
...
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com >
2025-04-17 11:15:16 -07:00
Sarah Hoffmann
7f710d2394
add a comment about the precomputed denominator
2025-04-15 09:38:05 +02:00
Sarah Hoffmann
06e39e42d8
add direction penalties
...
Direction penalties are estimated by getting the name to address
ratio usage for each partial term in the query and computing the
linear regression of that ratio over the entire phrase. Or to put
it in ither words: we try to determine if the terms at the beginning
or the end of the query are more likely to constitute a name.
Direction penalties are currently used only in classic name queries.
2025-04-11 20:41:06 +02:00
Sarah Hoffmann
2ef0e20a3f
reorganise token reranking
...
As the reranking is about changing penalties in presence of other
tokens, change the datastructure to have the other tokens readily
avilable.
2025-04-11 13:38:34 +02:00
Sarah Hoffmann
b680d81f0a
ensure that bailout-check is done after each iteration
2025-04-11 11:02:11 +02:00
Sarah Hoffmann
e0e067b1d6
replace use of range when computing word list
2025-04-11 09:59:04 +02:00
Sarah Hoffmann
3980791cfd
use iterator instead of list to go over partials
2025-04-11 09:38:24 +02:00
Sarah Hoffmann
497e27bb9a
move partial token into a separate field in the query struct
...
There is exactly one token to be expected and the token is usually
present.
2025-04-11 08:57:34 +02:00
Sarah Hoffmann
97d9e3c548
allow updating postcodes without a project directory
...
Postcodes will then be updated without looking for external postcodes.
2025-04-09 20:04:01 +02:00
Sarah Hoffmann
b34991d85f
add BDD tests for DB
2025-04-09 14:52:34 +02:00
Sarah Hoffmann
39f56ba4b8
restrict coordinate output to 7 digits
2025-04-04 11:02:51 +02:00
Sarah Hoffmann
2ce2d031fa
Merge pull request #3702 from lonvia/remove-tokenizer-dir
...
Remove automatic setup of tokenizer directory
So far the tokenizer factory would create a directory for private data for the tokenizer and then hand in the directory location to the tokenizer.
ICU tokenizer doesn't need any extra data anymore, so it doesn't make sense to create a directory which then remains empty. If a tokenizer needs such a directory in the future, it needs to create it on its own and make sure to handle the situation correctly where no project directory is used at all.
2025-04-03 09:04:48 +02:00
Sarah Hoffmann
186f562dd7
remove automatic setup of tokenizer directory
...
ICU tokenizer doesn't need any extra data anymore, so it doesn't
make sense to create a directory which then remains empty. If a
tokenizer needs such a directory in the future, it needs to create
it on its own and make sure to handle the situation correctly where
no project directory is used at all.
2025-04-02 20:20:04 +02:00
Sarah Hoffmann
c5bbeb626f
Merge pull request #3700 from lonvia/ignore-inherited-addresses
...
Ignore POIs with inherited addresses for the address layer
2025-04-02 12:00:45 +02:00
Sarah Hoffmann
3bc77629c8
ignore POIs with inherited addresses for the address layer
...
We know that there is a building which describes the address as a
polygon and is therefore more suitable.
2025-04-02 10:30:45 +02:00
Sarah Hoffmann
6cf1287c4e
Merge pull request #3686 from astridx/output_names
...
Output names as setting
2025-04-01 20:16:15 +02:00
Sarah Hoffmann
a49e8b9cf7
Merge pull request #3675 from TuringVerified/generic-preprocessors
...
Add generic preprocessors
2025-04-01 20:14:43 +02:00
TuringVerified
2eeec46040
Remove unnecessary assert statement, Fix regex_replace docstring and simplify regex_replace
2025-04-01 18:54:30 +05:30
TuringVerified
6d5a4a20c5
Update documentation, optimise regex_replace, add tests
2025-04-01 18:54:30 +05:30
TuringVerified
4665ea3e77
Add generic preprocessor
2025-04-01 18:54:30 +05:30
Sarah Hoffmann
fce279226f
prepare release 5.1.0
2025-04-01 10:16:35 +02:00
astridx
12ad95067d
output names as setting
2025-03-31 16:55:05 +02:00
Sarah Hoffmann
bfd1c83cb0
Merge pull request #3692 from lonvia/word-lookup-variants
...
Avoid matching penalty for abbreviated search terms
2025-03-31 16:38:31 +02:00
Sarah Hoffmann
3cb183ffb0
add lookup word to variants in word table
2025-03-31 14:52:50 +02:00