Miroslav Šedivý
cd64788a58
Replace custom Almost with stdlib math.isclose
2025-03-05 20:35:01 +01:00
Sarah Hoffmann
1b44fe2555
Merge pull request #3665 from lonvia/pattern-matching-postcodes
...
Add full parsing of postcodes in query
2025-03-05 16:02:03 +01:00
Sarah Hoffmann
6b0d58d9fd
restrict postcode parsing in typed phrases
...
Postcodes can only appear in postcode-type phrases and must then
cover the full phrase
2025-03-05 10:09:33 +01:00
Sarah Hoffmann
afb89f9c7a
add unit tests for postcode parser
2025-03-04 16:25:00 +01:00
Sarah Hoffmann
6712627d5e
adapt BDD tests to new postcode handling
2025-03-04 15:18:46 +01:00
Sarah Hoffmann
434fbbfd18
add support for country prefixes in postcodes
2025-03-04 15:18:27 +01:00
Sarah Hoffmann
921db8bb2f
cache all info of ICUQueryAnalyser in a single object
2025-03-04 08:58:57 +01:00
Sarah Hoffmann
a574b98e4a
remove postcode computation for word table during import
2025-03-04 08:57:59 +01:00
Sarah Hoffmann
b2af358f66
reenable ZIP+ test
2025-03-04 08:57:59 +01:00
Sarah Hoffmann
e67ae701ac
show token begin and end in debug output
2025-03-04 08:57:59 +01:00
Sarah Hoffmann
fc1c6261ed
add postcode parser
2025-03-04 08:57:37 +01:00
Sarah Hoffmann
6759edfb5d
make word generation from query a class method
2025-03-04 08:57:37 +01:00
Sarah Hoffmann
e362a965e1
search: merge QueryPart array with QueryNodes
...
The basic information on terms is pretty much always used together
with the node inforamtion. Merging them together saves some
allocation while making lookup easier at the same time.
2025-03-04 08:57:37 +01:00
Sarah Hoffmann
eff60ba6be
enable parsing of US ZIP+ codes
...
The four-digit part of these postcodes will simply be ignored.
2025-02-25 20:29:06 +01:00
Sarah Hoffmann
157414a053
Merge pull request #3659 from lonvia/custom-datrie-structure
...
Replace datrie library with a simple custom Python implementation
2025-02-24 16:49:42 +01:00
Sarah Hoffmann
18d4996bec
remove datrie dependency
2025-02-24 10:24:21 +01:00
Sarah Hoffmann
13db4c9731
replace datrie library with a more simple pure-Python class
2025-02-24 10:24:21 +01:00
Sarah Hoffmann
f567ea89cc
Merge pull request #3658 from lonvia/minor-query-parsing-optimisations
...
Minor query parsing optimisations
2025-02-24 10:16:47 +01:00
Sarah Hoffmann
3e718e40d9
adapt documentation for PhraseType type
2025-02-21 17:16:42 +01:00
Sarah Hoffmann
49bd18b048
replace PhraseType enum with simple int constants
2025-02-21 16:44:12 +01:00
Sarah Hoffmann
31412e0674
replace TokenType enum with simple char constants
2025-02-21 10:23:41 +01:00
Sarah Hoffmann
4577669213
replace BreakType enum with simple char constants
2025-02-21 09:57:48 +01:00
Sarah Hoffmann
9bf1428d81
consistently use query module as qmod
2025-02-21 09:31:21 +01:00
Sarah Hoffmann
b56edf3d0a
avoid yielding when extracting words from query
2025-02-20 23:32:39 +01:00
Sarah Hoffmann
abc911079e
remove word_number counting for phrases
...
We can just examine the break types to know if we are dealing
with a partial token.
2025-02-20 17:36:50 +01:00
Sarah Hoffmann
adabfee3be
Merge pull request #3655 from lonvia/remove-name-ranking-in-postcode-search
...
Tweak penalties for postcode searches
2025-02-20 14:32:43 +01:00
Sarah Hoffmann
46c4446dc2
remove address penalty for postcode search
...
Searches of the form <postcode> <city> are in fact quite common.
2025-02-20 11:11:45 +01:00
Sarah Hoffmann
add9244a2f
do not rerank address by full match in postcode search
...
The reranking result will not be completely correct because
the address of a postcode refer to the address _and_ name
of the parent and reranking was only done against the
address. We assume here that the postcode is precise enough
as to not require a penalty to to partial matches.
2025-02-20 10:29:03 +01:00
Sarah Hoffmann
96d7a8e8f6
Merge pull request #3653 from lonvia/trailing-spaces-in-normalization
...
Strip leading and trailing space markers during normalization
2025-02-19 17:25:59 +01:00
Sarah Hoffmann
55c3176957
strip normalisation results of normal and special spaces
2025-02-19 14:40:35 +01:00
Sarah Hoffmann
e29823e28f
add test for structured query with leading spaces
2025-02-19 10:31:36 +01:00
Sarah Hoffmann
97ed168996
Merge pull request #3652 from lonvia/update-variants
...
Cleanup and updates of tokenizer variant configuration
2025-02-18 19:47:45 +01:00
Sarah Hoffmann
9b8ef97d4b
Merge pull request #3649 from lonvia/actions-move-to-ubuntu22
...
Move Github actions to Unbuntu-22 image
2025-02-18 13:21:09 +01:00
Sarah Hoffmann
4f3c88f0c1
remove e-ë mutation, this is taken care of by transliteration
2025-02-18 10:31:44 +01:00
mhsr21
7781186f3c
Add USPS Standard Suffix Abbreviation
2025-02-18 09:28:13 +01:00
Sarah Hoffmann
f78686edb8
fix Norwegian variants
...
More cases of 'no' being interpreted as fasle by yaml.
2025-02-18 09:28:13 +01:00
Sarah Hoffmann
e330cd3162
remove ineffective and dupicate variants
2025-02-18 09:28:13 +01:00
Sarah Hoffmann
671af4cff2
Merge pull request #3555 from IvanShift/patch-1
...
Fixed Russian abbreviation list
2025-02-17 18:44:11 +01:00
Sarah Hoffmann
e612b7d550
actions: use Debians's script for adding the Postgres apt repo
2025-02-17 17:56:23 +01:00
Sarah Hoffmann
0b49d01703
actions: move tests to Ubuntu-20
2025-02-17 17:54:49 +01:00
Sarah Hoffmann
f6bc8e153f
Merge pull request #3648 from lonvia/extratags-for-geocodejson
...
Enable output of extratags for geocodejson format
2025-02-17 11:14:52 +01:00
Sarah Hoffmann
f143ecaf1c
add documentation for new extra field
2025-02-17 10:04:23 +01:00
Sarah Hoffmann
6730c8bac8
add optional output of extratags to geocodejson
2025-02-16 10:16:40 +01:00
Sarah Hoffmann
ee8915f2b6
prepare 5.0.0 release
v5.0.0
2025-02-05 10:54:38 +01:00
Sarah Hoffmann
5475bf7b9c
Merge pull request #3635 from lonvia/replace-wikimedia-importance-test-data
...
Update wikimedia importance file for test database
2025-01-14 16:49:52 +01:00
Sarah Hoffmann
95e2d8c846
adapt tests to changed wikimedia importance test table
2025-01-14 14:19:17 +01:00
Sarah Hoffmann
7552818866
replace wikimedia importance file for test data with CSV version
2025-01-14 09:16:25 +01:00
Sarah Hoffmann
db3991af74
Merge pull request #3626 from lonvia/import-performance
...
Import performance
2025-01-10 16:44:33 +01:00
Sarah Hoffmann
4523b9aaed
Merge pull request #3631 from lonvia/avoid-transactions
...
Creating tables and indexes in autocommit mode
2025-01-10 16:44:18 +01:00
Sarah Hoffmann
8b1cabebd6
Merge pull request #3633 from lonvia/restrict-long-ways
...
Ignore overly long ways during import
2025-01-10 16:06:37 +01:00