Commit Graph

7 Commits

Author SHA1 Message Date
Sarah Hoffmann
e85f7e7aa9 fix subsequent replacements
Two replacement words directly following each other did not
work as expected because each expects a space at the
beginning/end while there was only one space available.

Also forbit composing a word after a space was added in the
end by a previous replacement.
2021-07-04 10:28:28 +02:00
Sarah Hoffmann
0894ce9dc3 import abbreviations from OSM Wiki
Replaces the variant rules with a slightly cleaned-up
version of the abbreviation lists at
https://wiki.openstreetmap.org/wiki/Name_finder:Abbreviations
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
4fd2e961b6 improve normalization
Make sure all special symbols are removed during normalization already.
Those won't be interpreted in any way because they are unlikely to be
searched for.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
62828fc5c1 switch to a more flexible variant description format
The new format combines compound splitting and abbreviation.
It also allows to restrict rules to additional conditions
(like language or region). This latter ability is not used
yet.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
a6aa6360e0 use yaml tag syntax to mark include files 2021-07-04 10:28:20 +02:00
Sarah Hoffmann
1bd9f455fc add abbreviations from legacy tokenizer
These abbreviations are not a perfect fit anymore because
abbreviation replacement is now applied before transliteration.
2021-07-04 10:28:20 +02:00
Sarah Hoffmann
8413075249 move abbreviation computation into import phase
This adds precomputation of abbreviated terms for names and removes
abbreviation of terms in the query. Basic import works but still
needs some thorough testing as well as speed improvements during
import.

New dependency for python library datrie.
2021-07-04 10:28:20 +02:00