Sarah Hoffmann
a08ef43e40
simplify if statements
2021-07-12 11:28:47 +02:00
Sarah Hoffmann
bc5e15996a
convert single case switch to if statement
2021-07-12 11:28:47 +02:00
Sarah Hoffmann
128ca800cd
avoid local variable assignment
2021-07-11 23:22:53 +02:00
Sarah Hoffmann
000d133af6
fix more missing braces on one-liners
2021-07-11 23:22:53 +02:00
Sarah Hoffmann
1e40d65aa9
remove dead code
2021-07-11 23:22:53 +02:00
Sarah Hoffmann
bffbe68ec3
do not intermix params with and without default
2021-07-11 23:22:53 +02:00
Sarah Hoffmann
58b10074ad
directly return data in function
...
The temporary variable is not necessary.
2021-07-11 19:24:04 +02:00
Sarah Hoffmann
d933ead2b5
remove unnecessayly nested ifs
...
Found by Sonarqube.
2021-07-11 19:11:37 +02:00
Sarah Hoffmann
1cdc30c5e8
remove unused functions
...
The functions were necessary for the transitory code
to Python and are no longer used.
2021-07-11 19:10:04 +02:00
Sarah Hoffmann
3661f7a321
avoid multiple returns of same value
...
Found by Sonarqube.
2021-07-11 18:23:42 +02:00
Sarah Hoffmann
27af9b102c
always use brackets on if statements
...
This adds bracket around all one-line if statements that did
not have them yet.
2021-07-10 17:04:46 +02:00
Sarah Hoffmann
500c61685b
remove unused variables
...
As reported by sonarqube.
2021-07-09 16:36:42 +02:00
Sarah Hoffmann
106d960f84
fix bad use of echo in PHP output
2021-07-09 12:50:35 +02:00
Sarah Hoffmann
a5970d7548
Merge pull request #2384 from lonvia/actions-add-icu-tokenizer
...
CI: run tests on Ubuntu 18
2021-07-07 14:39:53 +02:00
Sarah Hoffmann
c216144dd1
add missing pyyaml requirement
2021-07-07 11:29:33 +02:00
Sarah Hoffmann
42e08da7ca
enable PHP 7.2 for Ubuntu 18 CI
2021-07-07 11:29:33 +02:00
Sarah Hoffmann
a2edbbf78a
cannot use capture_output in subprocess.run
...
Only available since Python 3.7.
2021-07-06 22:57:42 +02:00
Sarah Hoffmann
1e86dc1d93
remove default parameter for namedtuple
...
This is only available in Python 3.7.
2021-07-06 22:57:42 +02:00
Sarah Hoffmann
54f295be52
CI: run tests on older Ubuntu version as well
2021-07-06 22:57:42 +02:00
Sarah Hoffmann
8bc3c0a07c
Merge pull request #2382 from lonvia/remove-json-config
...
Remove outdated ICU tokenizer JSON config
2021-07-05 12:34:34 +02:00
Sarah Hoffmann
d75bc20174
Merge pull request #2383 from lonvia/remove-more-names
...
Exclude name:etymology and name:signed
2021-07-05 12:34:16 +02:00
Sarah Hoffmann
fd8751658f
exclude name:etymology and name:signed
...
name:etymology contains a description of the name origin and is
thus more informative than search-worthy.
name:signed basically indicates that the feature does not have
a name.
2021-07-05 11:04:16 +02:00
Sarah Hoffmann
4db5a1a0b8
remove outdated ICU tokenizer JSON config
2021-07-05 11:01:35 +02:00
Sarah Hoffmann
4c52777ef0
Merge pull request #2371 from lonvia/increase-python-version
...
Increase minimum required Python version to 3.6
2021-07-05 10:32:38 +02:00
Sarah Hoffmann
d4c7bf20a2
Merge pull request #2381 from lonvia/reorganise-abbreviations
...
Reorganise abbreviation handling
2021-07-05 10:32:16 +02:00
Sarah Hoffmann
affe1300d9
add warning about experimental nature of ICU tokenizer
2021-07-04 10:44:58 +02:00
Sarah Hoffmann
62d5984b1b
limit the number of variants that can be produced
2021-07-04 10:28:28 +02:00
Sarah Hoffmann
c32551b4e0
restrict partial word counting to names of reasoanble length
...
The partial word count does not split names to save a bit of time.
The result is that it might enounter unreasonably long names
which in truth consist of multiple words. No accurate statistics
are needed so simply restrict the count to words shorter than
75 characters.
2021-07-04 10:28:28 +02:00
Sarah Hoffmann
e85f7e7aa9
fix subsequent replacements
...
Two replacement words directly following each other did not
work as expected because each expects a space at the
beginning/end while there was only one space available.
Also forbit composing a word after a space was added in the
end by a previous replacement.
2021-07-04 10:28:28 +02:00
Sarah Hoffmann
7b0f6b7905
leave ICU variant properties empty for now
...
Saving unused properties causes unnecessary duplicates.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
0894ce9dc3
import abbreviations from OSM Wiki
...
Replaces the variant rules with a slightly cleaned-up
version of the abbreviation lists at
https://wiki.openstreetmap.org/wiki/Name_finder:Abbreviations
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
4fd2e961b6
improve normalization
...
Make sure all special symbols are removed during normalization already.
Those won't be interpreted in any way because they are unlikely to be
searched for.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
b9fbfeff67
only consider partials in multi-words for initial count
...
This ensures that it is less likely that we exclude meaningful
words like 'hauptstrasse' just because they are frequent.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
5dd24b3ef0
add documentation for ICU tokenizer configuration
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
62828fc5c1
switch to a more flexible variant description format
...
The new format combines compound splitting and abbreviation.
It also allows to restrict rules to additional conditions
(like language or region). This latter ability is not used
yet.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
a6aa6360e0
use yaml tag syntax to mark include files
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
c4f6c06f44
add dependency on datrie
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
0d80a9b897
tests for composing decomposed suffixes
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
f70930b1a0
make compund decomposition pure import feature
...
Compound decomposition now creates a full name variant on
import just like abbreviations. This simplifies query time
normalization and opens a path for changing abbreviation
and compund decomposition lists for an existing database.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
9ff4f66f55
complete tests for icu tokenizer
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
32ca631b74
fix full term token in special phrases
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
2e81084f35
complete tests for rule loader
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
a0a7b05c9f
correctly quote strings when copying in data
...
Encapsulate the copy string in a class that ensures that
copy lines are written with correct quoting.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
2f6e4edcdb
update unit tests for adapted abbreviation code
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
1bd9f455fc
add abbreviations from legacy tokenizer
...
These abbreviations are not a perfect fit anymore because
abbreviation replacement is now applied before transliteration.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
2e3c5d4c5b
adapt tests for ICU tokenizer
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
8413075249
move abbreviation computation into import phase
...
This adds precomputation of abbreviated terms for names and removes
abbreviation of terms in the query. Basic import works but still
needs some thorough testing as well as speed improvements during
import.
New dependency for python library datrie.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
6ba00e6aee
icu tokenizer: move transliteration rules in separate file
...
The tokenizer configuration has become difficult to handle
due to the additional manual transliteration rules. Allow
to have a separate rule file that is given to the ICU library
as is.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
de4fac33dc
docs: nominatim-ui should be installed from the release
...
The development version does not provide the pre-packaged
dist directory anymore.
2021-07-03 21:16:52 +02:00
Sarah Hoffmann
c9984669a7
Merge pull request #2373 from lonvia/tweak-search-cost
...
Further tweaking of search cost
2021-06-26 16:21:08 +02:00