Sarah Hoffmann
c5bbeb626f
Merge pull request #3700 from lonvia/ignore-inherited-addresses
...
Ignore POIs with inherited addresses for the address layer
2025-04-02 12:00:45 +02:00
Sarah Hoffmann
3bc77629c8
ignore POIs with inherited addresses for the address layer
...
We know that there is a building which describes the address as a
polygon and is therefore more suitable.
2025-04-02 10:30:45 +02:00
Sarah Hoffmann
6cf1287c4e
Merge pull request #3686 from astridx/output_names
...
Output names as setting
2025-04-01 20:16:15 +02:00
Sarah Hoffmann
a49e8b9cf7
Merge pull request #3675 from TuringVerified/generic-preprocessors
...
Add generic preprocessors
2025-04-01 20:14:43 +02:00
TuringVerified
2eeec46040
Remove unnecessary assert statement, Fix regex_replace docstring and simplify regex_replace
2025-04-01 18:54:30 +05:30
TuringVerified
6d5a4a20c5
Update documentation, optimise regex_replace, add tests
2025-04-01 18:54:30 +05:30
TuringVerified
4665ea3e77
Add generic preprocessor
2025-04-01 18:54:30 +05:30
Sarah Hoffmann
fce279226f
prepare release 5.1.0
2025-04-01 10:16:35 +02:00
astridx
12ad95067d
output names as setting
2025-03-31 16:55:05 +02:00
Sarah Hoffmann
bfd1c83cb0
Merge pull request #3692 from lonvia/word-lookup-variants
...
Avoid matching penalty for abbreviated search terms
2025-03-31 16:38:31 +02:00
Sarah Hoffmann
3cb183ffb0
add lookup word to variants in word table
2025-03-31 14:52:50 +02:00
Sarah Hoffmann
1705bb5f57
do not save word counts of 1
...
This is the default setting, which will be assumed when the count is
missing.
2025-03-31 14:52:50 +02:00
Sarah Hoffmann
f2aa15778f
always use lookup when requested
...
Doesn't seem to cause any issues in production.
2025-03-31 11:38:21 +02:00
Sarah Hoffmann
efe65c3e49
increase allowable address counts
2025-03-31 11:38:21 +02:00
Sarah Hoffmann
51847ebfeb
more agressively reduce expected count for multi-word terms
...
Improves searching of non-latin scripts with forced token spaces.
2025-03-31 11:18:22 +02:00
Sarah Hoffmann
d4994a152b
fix function signature for newer SQLAlchemy
2025-03-31 09:42:29 +02:00
Sarah Hoffmann
35baf77b18
make query upper-case when parsing postcodes
...
The postcode patterns expect upper-case letters.
2025-03-21 09:44:15 +01:00
Sarah Hoffmann
b1fc721f4b
fix layer setting for structured search
2025-03-19 17:31:43 +01:00
Sarah Hoffmann
d400fd5f76
fix debug output for lookup type
2025-03-19 17:31:18 +01:00
Sarah Hoffmann
9419c5adb2
penalize postcode searches with multiple name qualifiers
2025-03-19 10:05:36 +01:00
Sarah Hoffmann
2c61fe08a0
use word_token length when penalizing against postcodes
2025-03-19 09:52:40 +01:00
Sarah Hoffmann
7b3c725f2a
postcode token should have transliterated term in word_token
2025-03-19 09:52:40 +01:00
Sarah Hoffmann
edc5ada625
improve handling of leading postcodes
...
Setting the direction of the query while yielding assignments is
a bad idea because it may override a direction already set.
2025-03-19 09:52:40 +01:00
Sarah Hoffmann
3026c333ca
adapt typing for latest SQLAlchemy version
2025-03-13 10:49:08 +01:00
Sarah Hoffmann
f5755a7a82
remove code for setting osm2pgsql via config.lib_dir
...
With the internal osm2pgsql gone, configuration of the binary location
via settings is the only option left that makes sense.
2025-03-11 09:04:05 +01:00
Miroslav Šedivý
6ff51712fe
Simplify int/float manipulation
2025-03-06 19:26:56 +01:00
Sarah Hoffmann
6b0d58d9fd
restrict postcode parsing in typed phrases
...
Postcodes can only appear in postcode-type phrases and must then
cover the full phrase
2025-03-05 10:09:33 +01:00
Sarah Hoffmann
434fbbfd18
add support for country prefixes in postcodes
2025-03-04 15:18:27 +01:00
Sarah Hoffmann
921db8bb2f
cache all info of ICUQueryAnalyser in a single object
2025-03-04 08:58:57 +01:00
Sarah Hoffmann
a574b98e4a
remove postcode computation for word table during import
2025-03-04 08:57:59 +01:00
Sarah Hoffmann
e67ae701ac
show token begin and end in debug output
2025-03-04 08:57:59 +01:00
Sarah Hoffmann
fc1c6261ed
add postcode parser
2025-03-04 08:57:37 +01:00
Sarah Hoffmann
6759edfb5d
make word generation from query a class method
2025-03-04 08:57:37 +01:00
Sarah Hoffmann
e362a965e1
search: merge QueryPart array with QueryNodes
...
The basic information on terms is pretty much always used together
with the node inforamtion. Merging them together saves some
allocation while making lookup easier at the same time.
2025-03-04 08:57:37 +01:00
Sarah Hoffmann
13db4c9731
replace datrie library with a more simple pure-Python class
2025-02-24 10:24:21 +01:00
Sarah Hoffmann
49bd18b048
replace PhraseType enum with simple int constants
2025-02-21 16:44:12 +01:00
Sarah Hoffmann
31412e0674
replace TokenType enum with simple char constants
2025-02-21 10:23:41 +01:00
Sarah Hoffmann
4577669213
replace BreakType enum with simple char constants
2025-02-21 09:57:48 +01:00
Sarah Hoffmann
9bf1428d81
consistently use query module as qmod
2025-02-21 09:31:21 +01:00
Sarah Hoffmann
b56edf3d0a
avoid yielding when extracting words from query
2025-02-20 23:32:39 +01:00
Sarah Hoffmann
abc911079e
remove word_number counting for phrases
...
We can just examine the break types to know if we are dealing
with a partial token.
2025-02-20 17:36:50 +01:00
Sarah Hoffmann
adabfee3be
Merge pull request #3655 from lonvia/remove-name-ranking-in-postcode-search
...
Tweak penalties for postcode searches
2025-02-20 14:32:43 +01:00
Sarah Hoffmann
46c4446dc2
remove address penalty for postcode search
...
Searches of the form <postcode> <city> are in fact quite common.
2025-02-20 11:11:45 +01:00
Sarah Hoffmann
add9244a2f
do not rerank address by full match in postcode search
...
The reranking result will not be completely correct because
the address of a postcode refer to the address _and_ name
of the parent and reranking was only done against the
address. We assume here that the postcode is precise enough
as to not require a penalty to to partial matches.
2025-02-20 10:29:03 +01:00
Sarah Hoffmann
55c3176957
strip normalisation results of normal and special spaces
2025-02-19 14:40:35 +01:00
Sarah Hoffmann
6730c8bac8
add optional output of extratags to geocodejson
2025-02-16 10:16:40 +01:00
Sarah Hoffmann
ee8915f2b6
prepare 5.0.0 release
2025-02-05 10:54:38 +01:00
Sarah Hoffmann
c2cb6722fe
use autocommit when creating tables and indexes
...
Might avoid some deadlock situations with autovacuum.
2025-01-09 17:14:37 +01:00
Sarah Hoffmann
efc09a5cfc
add japanese phrase preprocessing
...
Code adapted from GSOC code by @miku.
2025-01-09 09:24:10 +01:00
Sarah Hoffmann
86ad9efa8a
keep break indicators [:-] during normalisation
...
All punctuation will be converted to '-'. Soft breaks : may be
added by preprocessors. The break signs are only used during
query analysis and are ignored during import token analysis.
2025-01-09 09:21:55 +01:00