Commit Graph

167 Commits

Author SHA1 Message Date
Sarah Hoffmann
1705bb5f57 do not save word counts of 1
This is the default setting, which will be assumed when the count is
missing.
2025-03-31 14:52:50 +02:00
Sarah Hoffmann
f2aa15778f always use lookup when requested
Doesn't seem to cause any issues in production.
2025-03-31 11:38:21 +02:00
Sarah Hoffmann
efe65c3e49 increase allowable address counts 2025-03-31 11:38:21 +02:00
Sarah Hoffmann
51847ebfeb more agressively reduce expected count for multi-word terms
Improves searching of non-latin scripts with forced token spaces.
2025-03-31 11:18:22 +02:00
Sarah Hoffmann
d4994a152b fix function signature for newer SQLAlchemy 2025-03-31 09:42:29 +02:00
Sarah Hoffmann
35baf77b18 make query upper-case when parsing postcodes
The postcode patterns expect upper-case letters.
2025-03-21 09:44:15 +01:00
Sarah Hoffmann
b1fc721f4b fix layer setting for structured search 2025-03-19 17:31:43 +01:00
Sarah Hoffmann
d400fd5f76 fix debug output for lookup type 2025-03-19 17:31:18 +01:00
Sarah Hoffmann
9419c5adb2 penalize postcode searches with multiple name qualifiers 2025-03-19 10:05:36 +01:00
Sarah Hoffmann
2c61fe08a0 use word_token length when penalizing against postcodes 2025-03-19 09:52:40 +01:00
Sarah Hoffmann
7b3c725f2a postcode token should have transliterated term in word_token 2025-03-19 09:52:40 +01:00
Sarah Hoffmann
edc5ada625 improve handling of leading postcodes
Setting the direction of the query while yielding assignments is
a bad idea because it may override a direction already set.
2025-03-19 09:52:40 +01:00
Sarah Hoffmann
3026c333ca adapt typing for latest SQLAlchemy version 2025-03-13 10:49:08 +01:00
Sarah Hoffmann
f5755a7a82 remove code for setting osm2pgsql via config.lib_dir
With the internal osm2pgsql gone, configuration of the binary location
via settings is the only option left that makes sense.
2025-03-11 09:04:05 +01:00
Miroslav Šedivý
6ff51712fe Simplify int/float manipulation 2025-03-06 19:26:56 +01:00
Sarah Hoffmann
6b0d58d9fd restrict postcode parsing in typed phrases
Postcodes can only appear in postcode-type phrases and must then
cover the full phrase
2025-03-05 10:09:33 +01:00
Sarah Hoffmann
434fbbfd18 add support for country prefixes in postcodes 2025-03-04 15:18:27 +01:00
Sarah Hoffmann
921db8bb2f cache all info of ICUQueryAnalyser in a single object 2025-03-04 08:58:57 +01:00
Sarah Hoffmann
a574b98e4a remove postcode computation for word table during import 2025-03-04 08:57:59 +01:00
Sarah Hoffmann
e67ae701ac show token begin and end in debug output 2025-03-04 08:57:59 +01:00
Sarah Hoffmann
fc1c6261ed add postcode parser 2025-03-04 08:57:37 +01:00
Sarah Hoffmann
6759edfb5d make word generation from query a class method 2025-03-04 08:57:37 +01:00
Sarah Hoffmann
e362a965e1 search: merge QueryPart array with QueryNodes
The basic information on terms is pretty much always used together
with the node inforamtion. Merging them together saves some
allocation while making lookup easier at the same time.
2025-03-04 08:57:37 +01:00
Sarah Hoffmann
13db4c9731 replace datrie library with a more simple pure-Python class 2025-02-24 10:24:21 +01:00
Sarah Hoffmann
49bd18b048 replace PhraseType enum with simple int constants 2025-02-21 16:44:12 +01:00
Sarah Hoffmann
31412e0674 replace TokenType enum with simple char constants 2025-02-21 10:23:41 +01:00
Sarah Hoffmann
4577669213 replace BreakType enum with simple char constants 2025-02-21 09:57:48 +01:00
Sarah Hoffmann
9bf1428d81 consistently use query module as qmod 2025-02-21 09:31:21 +01:00
Sarah Hoffmann
b56edf3d0a avoid yielding when extracting words from query 2025-02-20 23:32:39 +01:00
Sarah Hoffmann
abc911079e remove word_number counting for phrases
We can just examine the break types to know if we are dealing
with a partial token.
2025-02-20 17:36:50 +01:00
Sarah Hoffmann
adabfee3be Merge pull request #3655 from lonvia/remove-name-ranking-in-postcode-search
Tweak penalties for postcode searches
2025-02-20 14:32:43 +01:00
Sarah Hoffmann
46c4446dc2 remove address penalty for postcode search
Searches of the form <postcode> <city> are in fact quite common.
2025-02-20 11:11:45 +01:00
Sarah Hoffmann
add9244a2f do not rerank address by full match in postcode search
The reranking result will not be completely correct because
the address of a postcode refer to the address _and_ name
of the parent and reranking was only done against the
address. We assume here that the postcode is precise enough
as to not require a penalty to to partial matches.
2025-02-20 10:29:03 +01:00
Sarah Hoffmann
55c3176957 strip normalisation results of normal and special spaces 2025-02-19 14:40:35 +01:00
Sarah Hoffmann
6730c8bac8 add optional output of extratags to geocodejson 2025-02-16 10:16:40 +01:00
Sarah Hoffmann
ee8915f2b6 prepare 5.0.0 release 2025-02-05 10:54:38 +01:00
Sarah Hoffmann
c2cb6722fe use autocommit when creating tables and indexes
Might avoid some deadlock situations with autovacuum.
2025-01-09 17:14:37 +01:00
Sarah Hoffmann
efc09a5cfc add japanese phrase preprocessing
Code adapted from GSOC code by @miku.
2025-01-09 09:24:10 +01:00
Sarah Hoffmann
86ad9efa8a keep break indicators [:-] during normalisation
All punctuation will be converted to '-'. Soft breaks : may be
added by preprocessors. The break signs are only used during
query analysis and are ignored during import token analysis.
2025-01-09 09:21:55 +01:00
Sarah Hoffmann
d984100e23 add inner word break penalty 2025-01-07 21:42:25 +01:00
Sarah Hoffmann
499110f549 add SOFT_PHRASE break and enable parsing
Also enables parsing of PART breaks.
2025-01-06 17:10:24 +01:00
Sarah Hoffmann
eeb3d5dd0a make nominatim callable with themepark style 2024-12-16 10:26:55 +01:00
Sarah Hoffmann
4760e8341b move lua scripts into a separate directory 2024-12-16 10:26:55 +01:00
Sarah Hoffmann
fbb6edfdaf add documentation for new query preprocessing 2024-12-13 16:53:08 +01:00
Sarah Hoffmann
2b87c016db generalize normalization step for search query
It is now possible to configure functions for changing the query
input before it is analysed by the tokenizer.

Code is a cleaned-up version of the implementation by @miku.
2024-12-13 14:31:08 +01:00
Sarah Hoffmann
d9b4d1591d ignore postcode areas on reverse
Postcode lookups are best done by doing reverse at a higher
level and then extracting the postcode.
2024-12-12 19:02:00 +01:00
Sarah Hoffmann
416e70b97e have reverse fall back to country table when no country is found 2024-12-12 17:14:02 +01:00
Sarah Hoffmann
0770eaa5d0 use bbox size for secondary order of results
Helps to return the largest object when deduplicating results.
2024-11-19 10:38:50 +01:00
Sarah Hoffmann
98c1b923fc remove code only needed for older PostgreSQL/PostGIS versions 2024-11-18 10:11:09 +01:00
Sarah Hoffmann
fd1f2bc719 increase minimum versions for PostgreSQL and PostGIS 2024-11-18 09:28:06 +01:00