Sarah Hoffmann
cafd8e2b1e
fix typos and grammar issues
2023-08-29 12:14:44 +02:00
Sarah Hoffmann
d3372e69ec
update to modern mkdocstrings python handler
2023-08-25 21:40:20 +02:00
Sarah Hoffmann
252fe42612
Merge pull request #3122 from miku0/sanitizer-final
...
Adds sanitizer for Japanese addresses to correspond to block address
2023-08-01 10:38:58 +02:00
miku0
67e1c7dc72
Moved KANJI_MAP to icu-rules
2023-07-31 11:57:49 +00:00
miku0
2350018106
Fixed cosmetic issues
2023-07-31 02:39:04 +00:00
miku0
fac8c32cda
Moved KANJI_MAP to global variable
2023-07-26 21:43:22 +00:00
Sarah Hoffmann
8cba65809c
older version of Postgres cannot convert jsonb to int
2023-07-26 17:45:21 +02:00
miku0
848e5ac5de
Correction to PR's comment
2023-07-26 09:50:25 +00:00
miku0
0722495434
add japanese sanitizer
2023-07-26 07:54:58 +00:00
Sarah Hoffmann
faeee7528f
move warm script to python code
2023-07-25 21:39:23 +02:00
Sarah Hoffmann
d7a3039c2a
also switch legacy tokenizer to new street/place choice behaviour
2023-06-30 17:03:17 +02:00
Sarah Hoffmann
645ea5a057
use information from tokenizer to determine street vs. place address
...
So far the SQL logic used the information from the address field
to determine if an address is attached to a street or place.
This changes the logic to use the information provided in the
token_info. This allows sanitizers to enforce a certain parenting
without changing the visible address information.
2023-06-30 11:08:25 +02:00
biswajit-k
8f03c80ce8
generalize filter for sanitizers
2023-04-01 19:24:09 +05:30
Sarah Hoffmann
9a5f75dba7
Merge pull request #2993 from biswajit-k/delete-tags
...
Adds sanitizer for preventing certain tags to enter search index based on parameters
2023-03-09 14:31:45 +01:00
biswajit-k
ca149fb796
Adds sanitizer for preventing certain tags to enter search index based on parameters
...
fix: pylint error
added docs for delete tags sanitizer
fixed typos in docs and code comments
fix: python typechecking error
fixed rank address type
Revert "fixed typos in docs and code comments"
This reverts commit 6839eea755a87f557895f30524fb5c03dd983d60.
added default parameters and refactored code
added test for all parameters
2023-03-09 14:18:39 +05:30
biswajit-k
36388cafe9
fixed typos in docs and code comments
2023-03-06 17:09:38 +05:30
Sarah Hoffmann
2abe9e6fd9
use data paths from new nominatim.paths
2022-11-27 12:15:41 +01:00
Sarah Hoffmann
fd3dec8efe
add sanitizer for TIGER tags
...
Currently only takes over cleaning the tiger:county data. This was
done by the import until now.
2022-11-23 10:37:27 +01:00
Sarah Hoffmann
8d082c13e0
adapt to new type annotations from typeshed
...
Some more functions frrom psycopg are now properly annotated.
No ignoring necessary anymore.
2022-08-09 11:06:54 +02:00
Sarah Hoffmann
9864b191b1
fix various typos
2022-07-31 17:10:35 +02:00
Sarah Hoffmann
51b6d16dc6
overhaul the token analysis interface
...
The functional split betweenthe two functions is now that the
first one creates the ID that is used in the word table and
the second one creates the variants. There no longer is a
requirement that the ID is the normalized version. We might
later reintroduce the requirement that a normalized version be available
but it doesn't necessarily need to be through the ID.
The function that creates the ID now gets the full PlaceName. That way
it might take into account attributes that were set by the sanitizers.
Finally rename both functions to something more sane.
2022-07-29 15:14:11 +02:00
Sarah Hoffmann
34d27ed45c
move PlaceName into the generic data module
2022-07-29 11:42:20 +02:00
Sarah Hoffmann
094100bbf6
harmonize spelling
...
Stick with the American spelling of Analyze.
2022-07-29 10:52:01 +02:00
Sarah Hoffmann
c8873d34af
harmonize interface of token analysis module
...
The configure() function now receives a Transliterator object instead
of the ICU rules. This harmonizes the parameters with the create
function.
2022-07-29 10:43:07 +02:00
Sarah Hoffmann
f0d640961a
add documentation for custom token analysis
2022-07-29 09:41:28 +02:00
Sarah Hoffmann
3746befd88
add documentation for sanitizer interface
...
Also switches mkdocstrings to 0.18 with the rather unfortunate
consequence that now mkdocstrings-python-legacy is needed as well.
2022-07-28 22:00:29 +02:00
Sarah Hoffmann
d819036daa
add support for external token analysis modules
2022-07-25 16:27:22 +02:00
Sarah Hoffmann
6d41046b15
add support for external sanitizer modules
2022-07-25 16:10:19 +02:00
Kian-Meng Ang
f5e52e748f
docs: fix typos
2022-07-20 22:05:31 +08:00
Sarah Hoffmann
83054af46f
remove typing_extensions requirement
...
The typing_extensions package is only necessary now when running mypy.
It won't be used at runtime anymore.
2022-07-18 09:55:58 +02:00
Sarah Hoffmann
9963261d8d
add type annotations to special phrase importer
2022-07-18 09:54:29 +02:00
Sarah Hoffmann
6c6bbe5747
add type annotations for ICU tokenizer
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
18b16e06ca
add type annotations for legacy tokenizer
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
e37cfc64d2
add type annotations to ICU tokenizer helper modules
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
d35e3c25b6
add type annotations for token analysis
...
No annotations for ICU types yet.
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
62eedbb8f6
add type hints for sanitizers
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
d0c44431d0
add typing information for place_info and country_info
2022-07-18 09:47:57 +02:00
Sarah Hoffmann
cbbcbb1fd7
move country_info into data submodule
2022-07-06 11:08:36 +02:00
Sarah Hoffmann
bce93d60bd
move PlaceInfo into data submodule
...
This data structure is shared between indexer and tokenizer.
2022-07-06 10:54:47 +02:00
Sarah Hoffmann
612d34930b
handle postcodes properly on word table updates
...
update_postcodes_from_db() needs to do the full postcode treatment
in order to derive the correct word table entries.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
5be320368c
add documentation for postcode customization
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
7f2ad4ac7e
fix linting issue
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
0f00f4968c
fix up BDD tests for postcode changes
...
Includes smaller code fixes found by the tests.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
37b2c6a830
port legacy tokenizer to new postcode handling
...
Also documents the changes to the SQL functions of the tokenizer.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
e86db3001f
fix postcode pattern for Mozambique
...
Optional groups are not implemented yet.
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
80ea13437d
move postcode matcher in a separate file
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
b7704833e4
icu: switch postcodes to using the pre-formatted one
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
ca7b46511d
introduce and use analyzer for postcodes
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
18864afa8a
postcodes: introduce a default pattern for countries without postcodes
2022-06-23 23:42:31 +02:00
Sarah Hoffmann
5ba75df507
postcode: generate a generic form
2022-06-23 23:42:31 +02:00