Nominatim

Author	SHA1	Message	Date
Sarah Hoffmann	11d624e92a	split db_searches moving each class in its own file	2025-07-01 22:57:04 +02:00
Sarah Hoffmann	87a8c246a0	improve result cutting when a POI comes out with top importance	2025-06-01 12:00:36 +02:00
Sarah Hoffmann	90050de717	only rerank results if there is more than one With one result order is obvious.	2025-06-01 11:55:27 +02:00
Sarah Hoffmann	10a7d1106d	reduce influence of query rematching a little bit	2025-06-01 11:54:21 +02:00
Sarah Hoffmann	f2236f68f1	when rematching only distinguish between perfect, somewhat and bad match	2025-06-01 11:53:23 +02:00
Sarah Hoffmann	800c56642b	tweak full count cut-off (as per deployment on osm.org)	2025-05-11 11:48:07 +02:00
Sarah Hoffmann	34b72591cc	exclude address searches with country from direction penalty Countries are not adequately represented by partial term counts.	2025-04-29 17:37:31 +02:00
Sarah Hoffmann	7f710d2394	add a comment about the precomputed denominator	2025-04-15 09:38:05 +02:00
Sarah Hoffmann	06e39e42d8	add direction penalties Direction penalties are estimated by getting the name to address ratio usage for each partial term in the query and computing the linear regression of that ratio over the entire phrase. Or to put it in ither words: we try to determine if the terms at the beginning or the end of the query are more likely to constitute a name. Direction penalties are currently used only in classic name queries.	2025-04-11 20:41:06 +02:00
Sarah Hoffmann	2ef0e20a3f	reorganise token reranking As the reranking is about changing penalties in presence of other tokens, change the datastructure to have the other tokens readily avilable.	2025-04-11 13:38:34 +02:00
Sarah Hoffmann	b680d81f0a	ensure that bailout-check is done after each iteration	2025-04-11 11:02:11 +02:00
Sarah Hoffmann	e0e067b1d6	replace use of range when computing word list	2025-04-11 09:59:04 +02:00
Sarah Hoffmann	3980791cfd	use iterator instead of list to go over partials	2025-04-11 09:38:24 +02:00
Sarah Hoffmann	497e27bb9a	move partial token into a separate field in the query struct There is exactly one token to be expected and the token is usually present.	2025-04-11 08:57:34 +02:00
Sarah Hoffmann	f2aa15778f	always use lookup when requested Doesn't seem to cause any issues in production.	2025-03-31 11:38:21 +02:00
Sarah Hoffmann	efe65c3e49	increase allowable address counts	2025-03-31 11:38:21 +02:00
Sarah Hoffmann	51847ebfeb	more agressively reduce expected count for multi-word terms Improves searching of non-latin scripts with forced token spaces.	2025-03-31 11:18:22 +02:00
Sarah Hoffmann	35baf77b18	make query upper-case when parsing postcodes The postcode patterns expect upper-case letters.	2025-03-21 09:44:15 +01:00
Sarah Hoffmann	d400fd5f76	fix debug output for lookup type	2025-03-19 17:31:18 +01:00
Sarah Hoffmann	9419c5adb2	penalize postcode searches with multiple name qualifiers	2025-03-19 10:05:36 +01:00
Sarah Hoffmann	2c61fe08a0	use word_token length when penalizing against postcodes	2025-03-19 09:52:40 +01:00
Sarah Hoffmann	7b3c725f2a	postcode token should have transliterated term in word_token	2025-03-19 09:52:40 +01:00
Sarah Hoffmann	edc5ada625	improve handling of leading postcodes Setting the direction of the query while yielding assignments is a bad idea because it may override a direction already set.	2025-03-19 09:52:40 +01:00
Miroslav Šedivý	6ff51712fe	Simplify int/float manipulation	2025-03-06 19:26:56 +01:00
Sarah Hoffmann	6b0d58d9fd	restrict postcode parsing in typed phrases Postcodes can only appear in postcode-type phrases and must then cover the full phrase	2025-03-05 10:09:33 +01:00
Sarah Hoffmann	434fbbfd18	add support for country prefixes in postcodes	2025-03-04 15:18:27 +01:00
Sarah Hoffmann	921db8bb2f	cache all info of ICUQueryAnalyser in a single object	2025-03-04 08:58:57 +01:00
Sarah Hoffmann	e67ae701ac	show token begin and end in debug output	2025-03-04 08:57:59 +01:00
Sarah Hoffmann	fc1c6261ed	add postcode parser	2025-03-04 08:57:37 +01:00
Sarah Hoffmann	6759edfb5d	make word generation from query a class method	2025-03-04 08:57:37 +01:00
Sarah Hoffmann	e362a965e1	search: merge QueryPart array with QueryNodes The basic information on terms is pretty much always used together with the node inforamtion. Merging them together saves some allocation while making lookup easier at the same time.	2025-03-04 08:57:37 +01:00
Sarah Hoffmann	49bd18b048	replace PhraseType enum with simple int constants	2025-02-21 16:44:12 +01:00
Sarah Hoffmann	31412e0674	replace TokenType enum with simple char constants	2025-02-21 10:23:41 +01:00
Sarah Hoffmann	4577669213	replace BreakType enum with simple char constants	2025-02-21 09:57:48 +01:00
Sarah Hoffmann	9bf1428d81	consistently use query module as qmod	2025-02-21 09:31:21 +01:00
Sarah Hoffmann	b56edf3d0a	avoid yielding when extracting words from query	2025-02-20 23:32:39 +01:00
Sarah Hoffmann	abc911079e	remove word_number counting for phrases We can just examine the break types to know if we are dealing with a partial token.	2025-02-20 17:36:50 +01:00
Sarah Hoffmann	adabfee3be	Merge pull request #3655 from lonvia/remove-name-ranking-in-postcode-search Tweak penalties for postcode searches	2025-02-20 14:32:43 +01:00
Sarah Hoffmann	46c4446dc2	remove address penalty for postcode search Searches of the form <postcode> <city> are in fact quite common.	2025-02-20 11:11:45 +01:00
Sarah Hoffmann	add9244a2f	do not rerank address by full match in postcode search The reranking result will not be completely correct because the address of a postcode refer to the address _and_ name of the parent and reranking was only done against the address. We assume here that the postcode is precise enough as to not require a penalty to to partial matches.	2025-02-20 10:29:03 +01:00
Sarah Hoffmann	55c3176957	strip normalisation results of normal and special spaces	2025-02-19 14:40:35 +01:00
Sarah Hoffmann	86ad9efa8a	keep break indicators [:-] during normalisation All punctuation will be converted to '-'. Soft breaks : may be added by preprocessors. The break signs are only used during query analysis and are ignored during import token analysis.	2025-01-09 09:21:55 +01:00
Sarah Hoffmann	d984100e23	add inner word break penalty	2025-01-07 21:42:25 +01:00
Sarah Hoffmann	499110f549	add SOFT_PHRASE break and enable parsing Also enables parsing of PART breaks.	2025-01-06 17:10:24 +01:00
Sarah Hoffmann	2b87c016db	generalize normalization step for search query It is now possible to configure functions for changing the query input before it is analysed by the tokenizer. Code is a cleaned-up version of the implementation by @miku.	2024-12-13 14:31:08 +01:00
Sarah Hoffmann	0770eaa5d0	use bbox size for secondary order of results Helps to return the largest object when deduplicating results.	2024-11-19 10:38:50 +01:00
Sarah Hoffmann	122ecd4626	remove remaining pylint hints	2024-11-10 22:49:29 +01:00
Sarah Hoffmann	1f07967787	fix style issue found by flake8	2024-11-10 22:47:14 +01:00
Sarah Hoffmann	2c0f2e1ede	remove now unnecessary type-ignores	2024-10-25 17:56:47 +02:00
Sarah Hoffmann	5160a1d577	get bbox of postcode areas into results	2024-09-30 08:58:40 +02:00

1 2

60 Commits