add developers documentation for query-side of tokenizer

2026-03-11 05:14:07 +00:00 · 2024-12-13 17:09:42 +01:00
parent fbb6edfdaf
commit 5b40aa579b
1 changed files with 33 additions and 5 deletions
--- a/docs/develop/Tokenizers.md
+++ b/docs/develop/Tokenizers.md
@@ -91,14 +91,19 @@ for a custom tokenizer implementation.
 ### Directory Structure
-Nominatim expects a single file `src/nominatim_db/tokenizer/<NAME>_tokenizer.py`
+Nominatim expects two files containing the Python part of the implementation:
-containing the Python part of the implementation.
+
 * `src/nominatim_db/tokenizer/<NAME>_tokenizer.py` contains the tokenizer
   code used during import and
 * `src/nominatim_api/search/NAME>_tokenizer.py` has the code used during
   query time.
 `<NAME>` is a unique name for the tokenizer consisting of only lower-case
 letters, digits and underscore. A tokenizer also needs to install some SQL
 functions. By convention, these should be placed in `lib-sql/tokenizer`.
 If the tokenizer has a default configuration file, this should be saved in
-the `settings/<NAME>_tokenizer.<SUFFIX>`.
+`settings/<NAME>_tokenizer.<SUFFIX>`.
 ### Configuration and Persistence
@@ -110,9 +115,11 @@ are tied to a database installation and must only be read during installation
 time. If they are needed for the runtime then they must be saved into the
 `nominatim_properties` table and later loaded from there.
-### The Python module
+### The Python modules
-The Python module is expect to export a single factory function:
+#### `src/nominatim_db/tokenizer/`
 The import Python module is expected to export a single factory function:
 ```python
 def create(dsn: str, data_dir: Path) -> AbstractTokenizer
@@ -123,6 +130,20 @@ is a directory in the project directory that the tokenizer may use to save
 database-specific data. The function must return the instance of the tokenizer
 class as defined below.
 #### `src/nominatim_api/search/`
 The query-time Python module must also export a factory function:
 ``` python
 def create_query_analyzer(conn: SearchConnection) -> AbstractQueryAnalyzer
 ```
 The `conn` parameter contains the current search connection. See the
 [library documentation](../library/Low-Level-DB-Access.md#searchconnection-class)
 for details on the class. The function must return the instance of the tokenizer
 class as defined below.
 ### Python Tokenizer Class
 All tokenizers must inherit from `nominatim_db.tokenizer.base.AbstractTokenizer`
@@ -138,6 +159,13 @@ and implement the abstract functions defined there.
    options:
        heading_level: 6
 ### Python Query Analyzer Class
 ::: nominatim_api.search.query_analyzer_factory.AbstractQueryAnalyzer
    options:
        heading_level: 6
 ### PL/pgSQL Functions
 The tokenizer must provide access functions for the `token_info` column