Nominatim

Author	SHA1	Message	Date
Sarah Hoffmann	d4f3eda314	remove special casing for legacy tokenizer from BDD tests	2024-09-21 17:07:32 +02:00
Sarah Hoffmann	929a13d4cd	remove comma as name separator Commas are most of the time used as a part of a name, not to separate multiple names. See also #2950.	2023-01-22 22:29:36 +01:00
Sarah Hoffmann	02068aec7f	bdd: move import tests from scenes to grid descriptions	2022-06-17 11:54:18 +02:00
Sarah Hoffmann	bd7c7ddad0	icu tokenizer: switch to matching against partial names When matching address parts from addr:* tags against place names, the address names where so far converted to full names and compared those to the place names. This can become problematic with the new ICU tokenizer once we introduce creation of different variants depending on the place name context. It wouldn't be clear which variant to produce to get a match, so we would have to create all of them. To work around this issue, switch to using the partial terms for matching. This introduces a larger fuzziness between matches but that shouldn't be a problem because matching is always geographically restricted. The search terms created for address parts have a different problem: they are already created before we even know if they are going to be used. This can lead to spurious entries in the word table, which slows down searching. This problem can also be circumvented by using only partial terms for the search terms. In terms of searching that means that the address terms would not get the full-word boost, but given that the case where an address part does not exist as an OSM object should be the exception, this is likely acceptable.	2021-09-27 11:36:19 +02:00
Sarah Hoffmann	3aac51c81f	switch BDD tests to always use search API	2021-06-06 15:27:52 +02:00
Sarah Hoffmann	4f4d15c28a	reorganize keyword creation for legacy tokenizer - only save partial words without internal spaces - consider comma and semicolon a separator of full words - consider parts before an opening bracket a full word (but not the part after the bracket) Fixes #244.	2021-05-24 10:41:42 +02:00
Sarah Hoffmann	e1c5673ac3	require tokeinzer for indexer	2021-04-30 11:30:51 +02:00
Sarah Hoffmann	22800d7d59	Search housenumbers with unknown address parts by housenumber term House numbers need special handling because they may appear after the street term. That means we canot just use them as the main name for searches where the address has its own search term entries. Doing this right now, we are able to find '40, Main St, Town' but not 'Main St 40, Town'. This switches to using the housenumber token as the name term instead. House number tokens can get special handling when building the search query that covers the case where they come after the street. The main disadvantage is that this once more increases the numbers of possible search interpretation of which we have already too many. no penalty for housenumber searches	2020-11-25 11:36:10 +01:00
Sarah Hoffmann	49083c2597	Merge pull request #2058 from lonvia/split-address-words Split addr:* tags into words before adding to the search index	2020-11-18 08:58:17 +01:00
Sarah Hoffmann	ffb2c93ba3	POIs with unknown addr:place must add parent name to address The previous behaviour was a left-over from a former version where such POIs parented to the street. Now that they parent to places, it should be included.	2020-11-17 19:44:43 +01:00
Sarah Hoffmann	30a6b6bdac	split addr: tags into words before adding to the search index Address parts are only matched by single partial words. If the addr: names are not split, then multi-word names cannot be found.	2020-11-17 18:03:33 +01:00
Sarah Hoffmann	c7472662a6	lookup places for address tags for rank < 30 While previously the content of addr:* tags was only added to the list of address search keywords, we now really look up the matching place. This has the advantage that we pull in all potential translations from the place, just like all the other address terms that are looked up by neighbourhood search. If no place can be found for a given name, the content of the addr:* tag is still added to the search keywords as before.	2020-11-16 15:28:01 +01:00
Sarah Hoffmann	c84e7e72f1	add unknown addr:place to address output When a POI has no addr:street but an addr:place that is not contained in the name list of the parent place, then remember this situation and merge the content of addr:place into the address output. We don't need to care about translations in this case because it is obvious that no object with translations exists if the parent isn't the object named in addr:place.	2020-09-23 11:55:18 +02:00
Sarah Hoffmann	248d6b413a	remove test for is_in	2020-09-22 21:36:49 +02:00
Sarah Hoffmann	a8dfbcef44	always bind addr:place to place instead of street If an addr:place is given but no addr:street tag, then bind the rank 30 object always to a <=25 object, even when there is none found with the same name.	2020-09-21 10:15:14 +02:00
Sarah Hoffmann	caea14d035	merge addr tags into search_name table When a place of rank 30 has addr tags that are not covered by the search terms of the parent, add a separate entry for the POI in the search_name table that includes the addr tags. We can only do that with named places. For POIs without a name the housenumber is used as name. If that is not available either, searching still won't work.	2020-09-21 10:15:14 +02:00
Sarah Hoffmann	d6ff7475f1	make sure that addr:* tags can always be searched for Always add contents of addr:* tags into address part of the search table, even when there is no corresponding other name. This keeps search tolerant to the kind of tagging where parts show up in the address that have no corresponding object in the database or where it is only an unaddressable object.	2020-08-19 11:44:10 +02:00
Sarah Hoffmann	78526a33b4	Remove linkees from search_name Fixes #722	2020-03-04 11:36:39 +01:00
Sarah Hoffmann	6073d948e6	fix duplicate keys in tests The tests suddenly failed because the unique key constraint is more strict and does no longer include the type.	2020-02-12 11:29:33 +01:00
Sarah Hoffmann	5182da9f45	add tests for address tag parsing for search name	2018-04-15 22:52:42 +02:00
Sarah Hoffmann	a44377c7b0	fix postcode-related tests	2017-08-19 19:37:06 +02:00
Sarah Hoffmann	ddb7296663	add parenting tests	2016-12-30 22:58:58 +01:00
Sarah Hoffmann	604706a827	ad search_name import tests	2016-12-30 22:58:58 +01:00

23 Commits