Commit Graph

4984 Commits

Author SHA1 Message Date
Sarah Hoffmann
49bd18b048 replace PhraseType enum with simple int constants 2025-02-21 16:44:12 +01:00
Sarah Hoffmann
31412e0674 replace TokenType enum with simple char constants 2025-02-21 10:23:41 +01:00
Sarah Hoffmann
4577669213 replace BreakType enum with simple char constants 2025-02-21 09:57:48 +01:00
Sarah Hoffmann
9bf1428d81 consistently use query module as qmod 2025-02-21 09:31:21 +01:00
Sarah Hoffmann
b56edf3d0a avoid yielding when extracting words from query 2025-02-20 23:32:39 +01:00
Sarah Hoffmann
abc911079e remove word_number counting for phrases
We can just examine the break types to know if we are dealing
with a partial token.
2025-02-20 17:36:50 +01:00
Sarah Hoffmann
adabfee3be Merge pull request #3655 from lonvia/remove-name-ranking-in-postcode-search
Tweak penalties for postcode searches
2025-02-20 14:32:43 +01:00
Sarah Hoffmann
46c4446dc2 remove address penalty for postcode search
Searches of the form <postcode> <city> are in fact quite common.
2025-02-20 11:11:45 +01:00
Sarah Hoffmann
add9244a2f do not rerank address by full match in postcode search
The reranking result will not be completely correct because
the address of a postcode refer to the address _and_ name
of the parent and reranking was only done against the
address. We assume here that the postcode is precise enough
as to not require a penalty to to partial matches.
2025-02-20 10:29:03 +01:00
Sarah Hoffmann
96d7a8e8f6 Merge pull request #3653 from lonvia/trailing-spaces-in-normalization
Strip leading and trailing space markers during normalization
2025-02-19 17:25:59 +01:00
Sarah Hoffmann
55c3176957 strip normalisation results of normal and special spaces 2025-02-19 14:40:35 +01:00
Sarah Hoffmann
e29823e28f add test for structured query with leading spaces 2025-02-19 10:31:36 +01:00
Sarah Hoffmann
97ed168996 Merge pull request #3652 from lonvia/update-variants
Cleanup and updates of tokenizer variant configuration
2025-02-18 19:47:45 +01:00
Sarah Hoffmann
9b8ef97d4b Merge pull request #3649 from lonvia/actions-move-to-ubuntu22
Move Github actions to Unbuntu-22 image
2025-02-18 13:21:09 +01:00
Sarah Hoffmann
4f3c88f0c1 remove e-ë mutation, this is taken care of by transliteration 2025-02-18 10:31:44 +01:00
mhsr21
7781186f3c Add USPS Standard Suffix Abbreviation 2025-02-18 09:28:13 +01:00
Sarah Hoffmann
f78686edb8 fix Norwegian variants
More cases of 'no' being interpreted as fasle by yaml.
2025-02-18 09:28:13 +01:00
Sarah Hoffmann
e330cd3162 remove ineffective and dupicate variants 2025-02-18 09:28:13 +01:00
Sarah Hoffmann
671af4cff2 Merge pull request #3555 from IvanShift/patch-1
Fixed Russian abbreviation list
2025-02-17 18:44:11 +01:00
Sarah Hoffmann
e612b7d550 actions: use Debians's script for adding the Postgres apt repo 2025-02-17 17:56:23 +01:00
Sarah Hoffmann
0b49d01703 actions: move tests to Ubuntu-20 2025-02-17 17:54:49 +01:00
Sarah Hoffmann
f6bc8e153f Merge pull request #3648 from lonvia/extratags-for-geocodejson
Enable output of extratags for geocodejson format
2025-02-17 11:14:52 +01:00
Sarah Hoffmann
f143ecaf1c add documentation for new extra field 2025-02-17 10:04:23 +01:00
Sarah Hoffmann
6730c8bac8 add optional output of extratags to geocodejson 2025-02-16 10:16:40 +01:00
Sarah Hoffmann
ee8915f2b6 prepare 5.0.0 release v5.0.0 2025-02-05 10:54:38 +01:00
Sarah Hoffmann
5475bf7b9c Merge pull request #3635 from lonvia/replace-wikimedia-importance-test-data
Update wikimedia importance file for test database
2025-01-14 16:49:52 +01:00
Sarah Hoffmann
95e2d8c846 adapt tests to changed wikimedia importance test table 2025-01-14 14:19:17 +01:00
Sarah Hoffmann
7552818866 replace wikimedia importance file for test data with CSV version 2025-01-14 09:16:25 +01:00
Sarah Hoffmann
db3991af74 Merge pull request #3626 from lonvia/import-performance
Import performance
2025-01-10 16:44:33 +01:00
Sarah Hoffmann
4523b9aaed Merge pull request #3631 from lonvia/avoid-transactions
Creating tables and indexes in autocommit mode
2025-01-10 16:44:18 +01:00
Sarah Hoffmann
8b1cabebd6 Merge pull request #3633 from lonvia/restrict-long-ways
Ignore overly long ways during import
2025-01-10 16:06:37 +01:00
Sarah Hoffmann
0cf636a80c ignore overly long ways during import 2025-01-10 13:55:43 +01:00
Sarah Hoffmann
c2cb6722fe use autocommit when creating tables and indexes
Might avoid some deadlock situations with autovacuum.
2025-01-09 17:14:37 +01:00
Sarah Hoffmann
f8337bedb2 Merge pull request #3629 from lonvia/additional-breaks
Introduce new break types and phrase splitting for Japanese addresses
2025-01-09 13:55:29 +01:00
Sarah Hoffmann
efc09a5cfc add japanese phrase preprocessing
Code adapted from GSOC code by @miku.
2025-01-09 09:24:10 +01:00
Sarah Hoffmann
86ad9efa8a keep break indicators [:-] during normalisation
All punctuation will be converted to '-'. Soft breaks : may be
added by preprocessors. The break signs are only used during
query analysis and are ignored during import token analysis.
2025-01-09 09:21:55 +01:00
Sarah Hoffmann
d984100e23 add inner word break penalty 2025-01-07 21:42:25 +01:00
Sarah Hoffmann
499110f549 add SOFT_PHRASE break and enable parsing
Also enables parsing of PART breaks.
2025-01-06 17:10:24 +01:00
Sarah Hoffmann
267e5dac0d split up MultiPolygons before adding them to large_areas table 2024-12-22 09:15:16 +01:00
Sarah Hoffmann
32d3eb46d5 move geometry split into insertLocationAreaLarge()
thus insert only needs to be called once.
2024-12-22 09:15:16 +01:00
Sarah Hoffmann
c8a0dc8af1 more efficient belongs-to-address determination 2024-12-22 09:15:16 +01:00
Sarah Hoffmann
14ecfc7834 Merge pull request #3619 from lonvia/demote-farms
Remove farms and isolated dwellings from computed addresses
2024-12-22 09:13:42 +01:00
Sarah Hoffmann
cad44eb00c remove farms and isolated dwellings from computed addresses
Farms and isolated dwellings are usually confined to a very small
area. It does not make sense if they are automatically used in
addressing surrounding features. Still works to use them for
parenting when used with addr:place.
2024-12-20 22:59:02 +01:00
Sarah Hoffmann
f76dbb0a16 docs: update Update docs for virtualenv use 2024-12-20 11:27:45 +01:00
Sarah Hoffmann
8dd218a1d0 Merge pull request #3618 from osm-search/settings-md-table-space-osm-index
Settings.md - one setting was repeated
2024-12-19 08:40:31 +01:00
mtmail
501e13483e Settings.md - one setting was repeated 2024-12-18 21:58:51 +01:00
Sarah Hoffmann
b1d25e404f Merge pull request #3617 from mtmail/pr-3615-wording
Slight wording changes for Import-Styles.md
2024-12-18 11:04:21 +01:00
marc tobias
71fceb6854 Slight wording changes for Import-Styles.md 2024-12-18 01:02:46 +01:00
Sarah Hoffmann
a06e123d70 Merge pull request #3616 from osm-search/tokenizers-md-typo
fix typo in Tokenizers.md
2024-12-17 08:43:16 +01:00
mtmail
df6f70d223 fix typo in Tokenizers.md 2024-12-16 23:38:18 +01:00