rerank results by query

The algorithm is similar to the PHP reranking and uses the terms from the display name to check against the query terms. However instead of exact matching it uses a per-word-edit-distance, so that it is less strict when it comes to mismatching accents or other one letter differences. Country names get a higher penalty because they don't receive a penalty during token matching right now. This will work badly with the legacy tokenizer. Given that it is marked for removal, it is simply not worth optimising for it.
2026-03-11 05:14:07 +00:00 · 2023-09-19 16:18:09 +02:00
parent 5762a5bc80
commit fd26310d6a
3 changed files with 64 additions and 4 deletions
--- a/nominatim/api/search/query_analyzer_factory.py
+++ b/nominatim/api/search/query_analyzer_factory.py
@@ -30,6 +30,15 @@ class AbstractQueryAnalyzer(ABC):
        """


+    @abstractmethod
+    def normalize_text(self, text: str) -> str:
+        """ Bring the given text into a normalized form. That is the
+            standardized form search will work with. All information removed
+            at this stage is inevitably lost.
+        """
+
+
+
 async def make_query_analyzer(conn: SearchConnection) -> AbstractQueryAnalyzer:
    """ Create a query analyzer for the tokenizer used by the database.
    """