Sarah Hoffmann
b2af358f66
reenable ZIP+ test
2025-03-04 08:57:59 +01:00
Sarah Hoffmann
e67ae701ac
show token begin and end in debug output
2025-03-04 08:57:59 +01:00
Sarah Hoffmann
fc1c6261ed
add postcode parser
2025-03-04 08:57:37 +01:00
Sarah Hoffmann
6759edfb5d
make word generation from query a class method
2025-03-04 08:57:37 +01:00
Sarah Hoffmann
e362a965e1
search: merge QueryPart array with QueryNodes
...
The basic information on terms is pretty much always used together
with the node inforamtion. Merging them together saves some
allocation while making lookup easier at the same time.
2025-03-04 08:57:37 +01:00
Sarah Hoffmann
eff60ba6be
enable parsing of US ZIP+ codes
...
The four-digit part of these postcodes will simply be ignored.
2025-02-25 20:29:06 +01:00
Sarah Hoffmann
157414a053
Merge pull request #3659 from lonvia/custom-datrie-structure
...
Replace datrie library with a simple custom Python implementation
2025-02-24 16:49:42 +01:00
Sarah Hoffmann
18d4996bec
remove datrie dependency
2025-02-24 10:24:21 +01:00
Sarah Hoffmann
13db4c9731
replace datrie library with a more simple pure-Python class
2025-02-24 10:24:21 +01:00
Sarah Hoffmann
f567ea89cc
Merge pull request #3658 from lonvia/minor-query-parsing-optimisations
...
Minor query parsing optimisations
2025-02-24 10:16:47 +01:00
Sarah Hoffmann
3e718e40d9
adapt documentation for PhraseType type
2025-02-21 17:16:42 +01:00
Sarah Hoffmann
49bd18b048
replace PhraseType enum with simple int constants
2025-02-21 16:44:12 +01:00
Sarah Hoffmann
31412e0674
replace TokenType enum with simple char constants
2025-02-21 10:23:41 +01:00
Sarah Hoffmann
4577669213
replace BreakType enum with simple char constants
2025-02-21 09:57:48 +01:00
Sarah Hoffmann
9bf1428d81
consistently use query module as qmod
2025-02-21 09:31:21 +01:00
Sarah Hoffmann
b56edf3d0a
avoid yielding when extracting words from query
2025-02-20 23:32:39 +01:00
Sarah Hoffmann
abc911079e
remove word_number counting for phrases
...
We can just examine the break types to know if we are dealing
with a partial token.
2025-02-20 17:36:50 +01:00
Sarah Hoffmann
adabfee3be
Merge pull request #3655 from lonvia/remove-name-ranking-in-postcode-search
...
Tweak penalties for postcode searches
2025-02-20 14:32:43 +01:00
Sarah Hoffmann
46c4446dc2
remove address penalty for postcode search
...
Searches of the form <postcode> <city> are in fact quite common.
2025-02-20 11:11:45 +01:00
Sarah Hoffmann
add9244a2f
do not rerank address by full match in postcode search
...
The reranking result will not be completely correct because
the address of a postcode refer to the address _and_ name
of the parent and reranking was only done against the
address. We assume here that the postcode is precise enough
as to not require a penalty to to partial matches.
2025-02-20 10:29:03 +01:00
Sarah Hoffmann
96d7a8e8f6
Merge pull request #3653 from lonvia/trailing-spaces-in-normalization
...
Strip leading and trailing space markers during normalization
2025-02-19 17:25:59 +01:00
Sarah Hoffmann
55c3176957
strip normalisation results of normal and special spaces
2025-02-19 14:40:35 +01:00
Sarah Hoffmann
e29823e28f
add test for structured query with leading spaces
2025-02-19 10:31:36 +01:00
Sarah Hoffmann
97ed168996
Merge pull request #3652 from lonvia/update-variants
...
Cleanup and updates of tokenizer variant configuration
2025-02-18 19:47:45 +01:00
Sarah Hoffmann
9b8ef97d4b
Merge pull request #3649 from lonvia/actions-move-to-ubuntu22
...
Move Github actions to Unbuntu-22 image
2025-02-18 13:21:09 +01:00
Sarah Hoffmann
4f3c88f0c1
remove e-ë mutation, this is taken care of by transliteration
2025-02-18 10:31:44 +01:00
mhsr21
7781186f3c
Add USPS Standard Suffix Abbreviation
2025-02-18 09:28:13 +01:00
Sarah Hoffmann
f78686edb8
fix Norwegian variants
...
More cases of 'no' being interpreted as fasle by yaml.
2025-02-18 09:28:13 +01:00
Sarah Hoffmann
e330cd3162
remove ineffective and dupicate variants
2025-02-18 09:28:13 +01:00
Sarah Hoffmann
671af4cff2
Merge pull request #3555 from IvanShift/patch-1
...
Fixed Russian abbreviation list
2025-02-17 18:44:11 +01:00
Sarah Hoffmann
e612b7d550
actions: use Debians's script for adding the Postgres apt repo
2025-02-17 17:56:23 +01:00
Sarah Hoffmann
0b49d01703
actions: move tests to Ubuntu-20
2025-02-17 17:54:49 +01:00
Sarah Hoffmann
f6bc8e153f
Merge pull request #3648 from lonvia/extratags-for-geocodejson
...
Enable output of extratags for geocodejson format
2025-02-17 11:14:52 +01:00
Sarah Hoffmann
f143ecaf1c
add documentation for new extra field
2025-02-17 10:04:23 +01:00
Sarah Hoffmann
6730c8bac8
add optional output of extratags to geocodejson
2025-02-16 10:16:40 +01:00
Sarah Hoffmann
ee8915f2b6
prepare 5.0.0 release
v5.0.0
2025-02-05 10:54:38 +01:00
Sarah Hoffmann
5475bf7b9c
Merge pull request #3635 from lonvia/replace-wikimedia-importance-test-data
...
Update wikimedia importance file for test database
2025-01-14 16:49:52 +01:00
Sarah Hoffmann
95e2d8c846
adapt tests to changed wikimedia importance test table
2025-01-14 14:19:17 +01:00
Sarah Hoffmann
7552818866
replace wikimedia importance file for test data with CSV version
2025-01-14 09:16:25 +01:00
Sarah Hoffmann
db3991af74
Merge pull request #3626 from lonvia/import-performance
...
Import performance
2025-01-10 16:44:33 +01:00
Sarah Hoffmann
4523b9aaed
Merge pull request #3631 from lonvia/avoid-transactions
...
Creating tables and indexes in autocommit mode
2025-01-10 16:44:18 +01:00
Sarah Hoffmann
8b1cabebd6
Merge pull request #3633 from lonvia/restrict-long-ways
...
Ignore overly long ways during import
2025-01-10 16:06:37 +01:00
Sarah Hoffmann
0cf636a80c
ignore overly long ways during import
2025-01-10 13:55:43 +01:00
Sarah Hoffmann
c2cb6722fe
use autocommit when creating tables and indexes
...
Might avoid some deadlock situations with autovacuum.
2025-01-09 17:14:37 +01:00
Sarah Hoffmann
f8337bedb2
Merge pull request #3629 from lonvia/additional-breaks
...
Introduce new break types and phrase splitting for Japanese addresses
2025-01-09 13:55:29 +01:00
Sarah Hoffmann
efc09a5cfc
add japanese phrase preprocessing
...
Code adapted from GSOC code by @miku.
2025-01-09 09:24:10 +01:00
Sarah Hoffmann
86ad9efa8a
keep break indicators [:-] during normalisation
...
All punctuation will be converted to '-'. Soft breaks : may be
added by preprocessors. The break signs are only used during
query analysis and are ignored during import token analysis.
2025-01-09 09:21:55 +01:00
Sarah Hoffmann
d984100e23
add inner word break penalty
2025-01-07 21:42:25 +01:00
Sarah Hoffmann
499110f549
add SOFT_PHRASE break and enable parsing
...
Also enables parsing of PART breaks.
2025-01-06 17:10:24 +01:00
Sarah Hoffmann
267e5dac0d
split up MultiPolygons before adding them to large_areas table
2024-12-22 09:15:16 +01:00