mirror of
https://github.com/osm-search/Nominatim.git
synced 2026-03-11 05:14:07 +00:00
add developers documentation for query-side of tokenizer
This commit is contained in:
@@ -91,14 +91,19 @@ for a custom tokenizer implementation.
|
|||||||
|
|
||||||
### Directory Structure
|
### Directory Structure
|
||||||
|
|
||||||
Nominatim expects a single file `src/nominatim_db/tokenizer/<NAME>_tokenizer.py`
|
Nominatim expects two files containing the Python part of the implementation:
|
||||||
containing the Python part of the implementation.
|
|
||||||
|
* `src/nominatim_db/tokenizer/<NAME>_tokenizer.py` contains the tokenizer
|
||||||
|
code used during import and
|
||||||
|
* `src/nominatim_api/search/NAME>_tokenizer.py` has the code used during
|
||||||
|
query time.
|
||||||
|
|
||||||
`<NAME>` is a unique name for the tokenizer consisting of only lower-case
|
`<NAME>` is a unique name for the tokenizer consisting of only lower-case
|
||||||
letters, digits and underscore. A tokenizer also needs to install some SQL
|
letters, digits and underscore. A tokenizer also needs to install some SQL
|
||||||
functions. By convention, these should be placed in `lib-sql/tokenizer`.
|
functions. By convention, these should be placed in `lib-sql/tokenizer`.
|
||||||
|
|
||||||
If the tokenizer has a default configuration file, this should be saved in
|
If the tokenizer has a default configuration file, this should be saved in
|
||||||
the `settings/<NAME>_tokenizer.<SUFFIX>`.
|
`settings/<NAME>_tokenizer.<SUFFIX>`.
|
||||||
|
|
||||||
### Configuration and Persistence
|
### Configuration and Persistence
|
||||||
|
|
||||||
@@ -110,9 +115,11 @@ are tied to a database installation and must only be read during installation
|
|||||||
time. If they are needed for the runtime then they must be saved into the
|
time. If they are needed for the runtime then they must be saved into the
|
||||||
`nominatim_properties` table and later loaded from there.
|
`nominatim_properties` table and later loaded from there.
|
||||||
|
|
||||||
### The Python module
|
### The Python modules
|
||||||
|
|
||||||
The Python module is expect to export a single factory function:
|
#### `src/nominatim_db/tokenizer/`
|
||||||
|
|
||||||
|
The import Python module is expected to export a single factory function:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
def create(dsn: str, data_dir: Path) -> AbstractTokenizer
|
def create(dsn: str, data_dir: Path) -> AbstractTokenizer
|
||||||
@@ -123,6 +130,20 @@ is a directory in the project directory that the tokenizer may use to save
|
|||||||
database-specific data. The function must return the instance of the tokenizer
|
database-specific data. The function must return the instance of the tokenizer
|
||||||
class as defined below.
|
class as defined below.
|
||||||
|
|
||||||
|
#### `src/nominatim_api/search/`
|
||||||
|
|
||||||
|
The query-time Python module must also export a factory function:
|
||||||
|
|
||||||
|
``` python
|
||||||
|
def create_query_analyzer(conn: SearchConnection) -> AbstractQueryAnalyzer
|
||||||
|
```
|
||||||
|
|
||||||
|
The `conn` parameter contains the current search connection. See the
|
||||||
|
[library documentation](../library/Low-Level-DB-Access.md#searchconnection-class)
|
||||||
|
for details on the class. The function must return the instance of the tokenizer
|
||||||
|
class as defined below.
|
||||||
|
|
||||||
|
|
||||||
### Python Tokenizer Class
|
### Python Tokenizer Class
|
||||||
|
|
||||||
All tokenizers must inherit from `nominatim_db.tokenizer.base.AbstractTokenizer`
|
All tokenizers must inherit from `nominatim_db.tokenizer.base.AbstractTokenizer`
|
||||||
@@ -138,6 +159,13 @@ and implement the abstract functions defined there.
|
|||||||
options:
|
options:
|
||||||
heading_level: 6
|
heading_level: 6
|
||||||
|
|
||||||
|
|
||||||
|
### Python Query Analyzer Class
|
||||||
|
|
||||||
|
::: nominatim_api.search.query_analyzer_factory.AbstractQueryAnalyzer
|
||||||
|
options:
|
||||||
|
heading_level: 6
|
||||||
|
|
||||||
### PL/pgSQL Functions
|
### PL/pgSQL Functions
|
||||||
|
|
||||||
The tokenizer must provide access functions for the `token_info` column
|
The tokenizer must provide access functions for the `token_info` column
|
||||||
|
|||||||
Reference in New Issue
Block a user