when rematching only distinguish between perfect, somewhat and bad match

This commit is contained in:
Sarah Hoffmann
2025-06-01 11:53:23 +02:00
parent 831fccdaee
commit f2236f68f1

View File

@@ -153,11 +153,10 @@ class ForwardGeocoder:
if not words: if not words:
continue continue
for qword in qwords: for qword in qwords:
wdist = max(difflib.SequenceMatcher(a=qword, b=w).quick_ratio() for w in words) # only add distance penalty if there is no perfect match
if wdist < 0.5: if qword not in words:
distance += len(qword) wdist = max(difflib.SequenceMatcher(a=qword, b=w).quick_ratio() for w in words)
else: distance += len(qword) if wdist < 0.4 else 1
distance += (1.0 - wdist) * len(qword)
# Compensate for the fact that country names do not get a # Compensate for the fact that country names do not get a
# match penalty yet by the tokenizer. # match penalty yet by the tokenizer.
# Temporary hack that needs to be removed! # Temporary hack that needs to be removed!