add developers documentation for query-side of tokenizer

This commit is contained in:
Sarah Hoffmann
2024-12-13 17:09:42 +01:00
parent fbb6edfdaf
commit 5b40aa579b

View File

@@ -91,14 +91,19 @@ for a custom tokenizer implementation.
### Directory Structure ### Directory Structure
Nominatim expects a single file `src/nominatim_db/tokenizer/<NAME>_tokenizer.py` Nominatim expects two files containing the Python part of the implementation:
containing the Python part of the implementation.
* `src/nominatim_db/tokenizer/<NAME>_tokenizer.py` contains the tokenizer
code used during import and
* `src/nominatim_api/search/NAME>_tokenizer.py` has the code used during
query time.
`<NAME>` is a unique name for the tokenizer consisting of only lower-case `<NAME>` is a unique name for the tokenizer consisting of only lower-case
letters, digits and underscore. A tokenizer also needs to install some SQL letters, digits and underscore. A tokenizer also needs to install some SQL
functions. By convention, these should be placed in `lib-sql/tokenizer`. functions. By convention, these should be placed in `lib-sql/tokenizer`.
If the tokenizer has a default configuration file, this should be saved in If the tokenizer has a default configuration file, this should be saved in
the `settings/<NAME>_tokenizer.<SUFFIX>`. `settings/<NAME>_tokenizer.<SUFFIX>`.
### Configuration and Persistence ### Configuration and Persistence
@@ -110,9 +115,11 @@ are tied to a database installation and must only be read during installation
time. If they are needed for the runtime then they must be saved into the time. If they are needed for the runtime then they must be saved into the
`nominatim_properties` table and later loaded from there. `nominatim_properties` table and later loaded from there.
### The Python module ### The Python modules
The Python module is expect to export a single factory function: #### `src/nominatim_db/tokenizer/`
The import Python module is expected to export a single factory function:
```python ```python
def create(dsn: str, data_dir: Path) -> AbstractTokenizer def create(dsn: str, data_dir: Path) -> AbstractTokenizer
@@ -123,6 +130,20 @@ is a directory in the project directory that the tokenizer may use to save
database-specific data. The function must return the instance of the tokenizer database-specific data. The function must return the instance of the tokenizer
class as defined below. class as defined below.
#### `src/nominatim_api/search/`
The query-time Python module must also export a factory function:
``` python
def create_query_analyzer(conn: SearchConnection) -> AbstractQueryAnalyzer
```
The `conn` parameter contains the current search connection. See the
[library documentation](../library/Low-Level-DB-Access.md#searchconnection-class)
for details on the class. The function must return the instance of the tokenizer
class as defined below.
### Python Tokenizer Class ### Python Tokenizer Class
All tokenizers must inherit from `nominatim_db.tokenizer.base.AbstractTokenizer` All tokenizers must inherit from `nominatim_db.tokenizer.base.AbstractTokenizer`
@@ -138,6 +159,13 @@ and implement the abstract functions defined there.
options: options:
heading_level: 6 heading_level: 6
### Python Query Analyzer Class
::: nominatim_api.search.query_analyzer_factory.AbstractQueryAnalyzer
options:
heading_level: 6
### PL/pgSQL Functions ### PL/pgSQL Functions
The tokenizer must provide access functions for the `token_info` column The tokenizer must provide access functions for the `token_info` column