improve penalty for token-split words

The rematch penalty for partial words created by the transliteration need to take into account that they are rematched against the full word. That means that missing beginning and end should not get a significant penalty.
2026-02-16 15:47:58 +00:00 · 2023-08-12 11:26:02 +02:00
parent 926c4a7d04
commit 3d0bc85b4d
1 changed files with 1 additions and 1 deletions
--- a/nominatim/api/search/icu_tokenizer.py
+++ b/nominatim/api/search/icu_tokenizer.py
@@ -83,7 +83,7 @@ class ICUToken(qmod.Token):
        seq = difflib.SequenceMatcher(a=self.lookup_word, b=norm)
        distance = 0
        for tag, afrom, ato, bfrom, bto in seq.get_opcodes():
-            if tag == 'delete' and (afrom == 0 or ato == len(self.lookup_word)):
+            if tag in ('delete', 'insert') and (afrom == 0 or ato == len(self.lookup_word)):
                distance += 1
            elif tag == 'replace':
                distance += max((ato-afrom), (bto-bfrom))