Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2026-02-16 15:47:58 +00:00

Author	SHA1	Message	Date
Sarah Hoffmann	54d35ddfe9	split cli tests by subcommand and extend coverage	2021-12-02 23:45:48 +01:00
Sarah Hoffmann	7beccb7997	remove unnecessary pass statements	2021-12-02 15:54:24 +01:00
Sarah Hoffmann	14a78f55cd	more unit tests for tokenizers	2021-12-02 15:46:36 +01:00
Sarah Hoffmann	a52ed366e4	add tests for migration	2021-12-01 20:27:40 +01:00
Sarah Hoffmann	810056349f	add migration for inclusive housenumber Tiger index	2021-11-24 12:03:20 +01:00
Sarah Hoffmann	10e979e841	only instantiate indexer once for replication Also makes sure that indexer object exists everywhere were needed. See #2518.	2021-11-19 14:48:58 +01:00
Sarah Hoffmann	345c812e43	better error reporting when API script does not exist Check if the API script exists on the expected location before running php-cli. This way we can add a useful hint about the project directory. Fixes #2513.	2021-11-10 11:58:20 +01:00
Sarah Hoffmann	d479a0585d	prepare release 4.0.0	2021-11-02 20:27:55 +01:00
Sarah Hoffmann	37eeccbf4c	ICU: use normalization from config in PHP The TERM_NORMALIZATION config option is no longer applicable. That was already documented but not yet implemented.	2021-10-27 11:32:44 +02:00
Sarah Hoffmann	53dbe58ada	do not count words when in reverse-only mode	2021-10-26 12:00:13 +02:00
Sarah Hoffmann	2c4b798f9b	further refactor setup to keep function small	2021-10-26 12:00:13 +02:00
Sarah Hoffmann	9934421442	make word count computation part of the import Accurate word counts are now essential when using the ICU tokenizer and don't hurt for the legacy one. Adds about an hour import time.	2021-10-26 12:00:13 +02:00
Sarah Hoffmann	5c778c6d32	Merge pull request #2486 from lonvia/fix-special-phrases Fix parsing of operator in special phrases	2021-10-25 21:45:08 +02:00
Sarah Hoffmann	85797acf1e	ICU: add an index over word_ids Needed for keyword lookup in the details response.	2021-10-25 21:33:27 +02:00
Sarah Hoffmann	c4f5c11a4e	be case-insensitve about special phrase operator	2021-10-25 19:51:20 +02:00
Sarah Hoffmann	5a1c3dbea3	fix parsing of operator in special phrases Because of unstripped input, the operators wouldn't match.	2021-10-25 19:46:30 +02:00
Sarah Hoffmann	13e7398566	allow relative paths for log files	2021-10-25 10:26:05 +02:00
Sarah Hoffmann	1098ab732f	allow relative paths for flatnode file	2021-10-22 17:32:51 +02:00
Sarah Hoffmann	507fdd4f40	switch IMPORT_STYLE to use generic file search Allows relative paths wrt project directory.	2021-10-22 16:49:57 +02:00
Sarah Hoffmann	0ae8d7ac08	have ADDRESS_LEVEL_CONFIG use load_sub_configuration This means that relative paths now are looked up in the project directory.	2021-10-22 16:36:52 +02:00
Sarah Hoffmann	c77df2d1eb	replace NOMINATIM_PHRASE_CONFIG with command line option	2021-10-22 14:41:14 +02:00
Sarah Hoffmann	c1fa70639b	add new replication mode catch-up This mode gets updates until the server reports no new diffs anymore. Also adds additional indexing, when the main indexing step left a couple of objects to process. This happens only when the next update is expected to be more than 40min away.	2021-10-20 22:05:15 +02:00
Sarah Hoffmann	12643c5986	run Tiger import with parallel threads per default	2021-10-19 15:00:26 +02:00
Sarah Hoffmann	ec7184c533	icu: no longer precompute terms The ICU analyzer no longer drops frequent partials, so it is no longer necessary to know the frequencies in advance.	2021-10-19 11:52:28 +02:00
Sarah Hoffmann	e8e2502e2f	make word recount a tokenizer-specific function	2021-10-19 11:21:16 +02:00
Sarah Hoffmann	47417d1871	update and extend man page Provide extended descriptions for most subcommands.	2021-10-18 09:03:07 +02:00
Sarah Hoffmann	552fb16cb2	fix template expressions for tablespaces	2021-10-15 15:11:09 +02:00
Sarah Hoffmann	3649487f5e	use SP-GIST index for building index where available Point-in-polygon queries are much faster with a SP-GIST geometry index, so use that for the index used to check if a housenumber is inside a building. Only available with Postgis 3. There is an automatic fallback to GIST for Postgis 2.	2021-10-10 21:55:38 +02:00
Sarah Hoffmann	6c79a60e19	add documentation for new configuration of ICU tokenizer	2021-10-07 11:55:53 +02:00
Sarah Hoffmann	2a94bfc703	fix argument description for check_database	2021-10-07 09:49:13 +02:00
Sarah Hoffmann	299934fd2a	reorganize and complete tests around generic token analysis	2021-10-06 17:03:37 +02:00
Sarah Hoffmann	b18d042832	add tests for sanitizer tagging language	2021-10-06 12:29:25 +02:00
Sarah Hoffmann	97a10ec218	apply variants by languages Adds a tagger for names by language so that the analyzer of that language is used. Thus variants are now only applied to names in the specific language and only tag name tags, no longer to reference-like tags.	2021-10-06 11:09:54 +02:00
Sarah Hoffmann	d35400a7d7	use analyser provided in the 'analyzer' property Implements per-name choice of analyzer. If a non-default analyzer is choosen, then the 'word' identifier is extended with the name of the ana;yzer, so that we still have unique items.	2021-10-05 14:10:32 +02:00
Sarah Hoffmann	92f6ec2328	remove support for properties on variants Those are not going to be used in the near future, so no need to carry that code around just now.	2021-10-05 10:29:36 +02:00
Sarah Hoffmann	9ba2019470	precompute replacements while loading configuration	2021-10-05 10:20:08 +02:00
Sarah Hoffmann	c171d88194	move parsing of token analysis config to analyzer Adds a second callback for the analyzer which is responsible for parsing the configuration rules and converting it to whatever format necessary. This way, each analyzer implementation can define its own configuration rules.	2021-10-04 18:31:58 +02:00
Sarah Hoffmann	7cfcbacfc7	make token analyzers configurable modules Adds a mandatory section 'analyzer' to the token-analysis entries which define, which analyser to use. Currently there is exactly one, generic, which implements the former ICUNameProcessor.	2021-10-04 17:37:34 +02:00
Sarah Hoffmann	52847b61a3	extend ICU config to accomodate multiple analysers Adds parsing of multiple variant lists from the configuration. Every entry except one must have a unique 'id' paramter to distinguish the entries. The entry without id is considered the default. Currently only the list without an id is used for analysis.	2021-10-04 16:40:28 +02:00
Sarah Hoffmann	5a36559834	move flatten_config_list into config module For general usage by other modules.	2021-10-04 11:56:54 +02:00
Sarah Hoffmann	732cd27d2e	add unit tests for new sanatizer functions	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	8171fe4571	introduce sanitizer step before token analysis Sanatizer functions allow to transform name and address tags before they are handed to the tokenizer. Theses transformations are visible only for the tokenizer and thus only have an influence on the search terms and address match terms for a place. Currently two sanitizers are implemented which are responsible for splitting names with multiple values and removing bracket additions. Both was previously hard-coded in the tokenizer.	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	16daa57e47	unify ICUNameProcessorRules and ICURuleLoader There is no need for the additional layer of indirection that the ICUNameProcessorRules class adds. The ICURuleLoader can fill the database properties directly.	2021-10-01 12:27:24 +02:00
Sarah Hoffmann	5e5addcdbf	fix typo	2021-09-29 14:16:09 +02:00
Sarah Hoffmann	be65c8303f	export more data for the tokenizer name preparation Adds class, type, country and rank to the exported information and removes the rather odd hack for countries. Whether a place represents a country boundary can now be computed by the tokenizer.	2021-09-29 11:54:14 +02:00
Sarah Hoffmann	231250f2eb	add wrapper class for place data passed to tokenizer This is mostly for convenience and documentation purposes.	2021-09-29 11:54:07 +02:00
Sarah Hoffmann	bb18479d5b	remove unused parameter	2021-09-27 14:58:43 +02:00
Sarah Hoffmann	bd7c7ddad0	icu tokenizer: switch to matching against partial names When matching address parts from addr:* tags against place names, the address names where so far converted to full names and compared those to the place names. This can become problematic with the new ICU tokenizer once we introduce creation of different variants depending on the place name context. It wouldn't be clear which variant to produce to get a match, so we would have to create all of them. To work around this issue, switch to using the partial terms for matching. This introduces a larger fuzziness between matches but that shouldn't be a problem because matching is always geographically restricted. The search terms created for address parts have a different problem: they are already created before we even know if they are going to be used. This can lead to spurious entries in the word table, which slows down searching. This problem can also be circumvented by using only partial terms for the search terms. In terms of searching that means that the address terms would not get the full-word boost, but given that the case where an address part does not exist as an OSM object should be the exception, this is likely acceptable.	2021-09-27 11:36:19 +02:00
Sarah Hoffmann	b894d2c04a	fix indent	2021-09-04 10:30:35 +02:00
Sarah Hoffmann	8e1d4818ac	use yaml config loader for country info	2021-09-04 00:22:55 +02:00

1 2 3 4 5 ...

404 Commits