Nominatim

mirror of https://github.com/osm-search/Nominatim.git synced 2026-02-26 11:08:13 +00:00

Author	SHA1	Message	Date
Sarah Hoffmann	7ebd121abc	give word break slight advantage towards continuation prefers longer words	2025-07-11 11:01:21 +02:00
Sarah Hoffmann	4634ad0720	rebalance word transition penalties	2025-07-11 11:01:21 +02:00
Sarah Hoffmann	4a9253a0a9	simplify QueryNode penalty and initial assignment	2025-07-11 11:01:09 +02:00
Sarah Hoffmann	2ef0e20a3f	reorganise token reranking As the reranking is about changing penalties in presence of other tokens, change the datastructure to have the other tokens readily avilable.	2025-04-11 13:38:34 +02:00
Sarah Hoffmann	497e27bb9a	move partial token into a separate field in the query struct There is exactly one token to be expected and the token is usually present.	2025-04-11 08:57:34 +02:00
Sarah Hoffmann	2c61fe08a0	use word_token length when penalizing against postcodes	2025-03-19 09:52:40 +01:00
Sarah Hoffmann	7b3c725f2a	postcode token should have transliterated term in word_token	2025-03-19 09:52:40 +01:00
Sarah Hoffmann	921db8bb2f	cache all info of ICUQueryAnalyser in a single object	2025-03-04 08:58:57 +01:00
Sarah Hoffmann	e67ae701ac	show token begin and end in debug output	2025-03-04 08:57:59 +01:00
Sarah Hoffmann	fc1c6261ed	add postcode parser	2025-03-04 08:57:37 +01:00
Sarah Hoffmann	6759edfb5d	make word generation from query a class method	2025-03-04 08:57:37 +01:00
Sarah Hoffmann	e362a965e1	search: merge QueryPart array with QueryNodes The basic information on terms is pretty much always used together with the node inforamtion. Merging them together saves some allocation while making lookup easier at the same time.	2025-03-04 08:57:37 +01:00
Sarah Hoffmann	31412e0674	replace TokenType enum with simple char constants	2025-02-21 10:23:41 +01:00
Sarah Hoffmann	4577669213	replace BreakType enum with simple char constants	2025-02-21 09:57:48 +01:00
Sarah Hoffmann	b56edf3d0a	avoid yielding when extracting words from query	2025-02-20 23:32:39 +01:00
Sarah Hoffmann	abc911079e	remove word_number counting for phrases We can just examine the break types to know if we are dealing with a partial token.	2025-02-20 17:36:50 +01:00
Sarah Hoffmann	55c3176957	strip normalisation results of normal and special spaces	2025-02-19 14:40:35 +01:00
Sarah Hoffmann	d984100e23	add inner word break penalty	2025-01-07 21:42:25 +01:00
Sarah Hoffmann	499110f549	add SOFT_PHRASE break and enable parsing Also enables parsing of PART breaks.	2025-01-06 17:10:24 +01:00
Sarah Hoffmann	2b87c016db	generalize normalization step for search query It is now possible to configure functions for changing the query input before it is analysed by the tokenizer. Code is a cleaned-up version of the implementation by @miku.	2024-12-13 14:31:08 +01:00
Sarah Hoffmann	1f07967787	fix style issue found by flake8	2024-11-10 22:47:14 +01:00
Sarah Hoffmann	a690605a96	remove support for unindexed tokens This was a special feature of the legacy tokenizer who would not index very frequent tokens.	2024-09-22 10:39:10 +02:00
Sarah Hoffmann	4da4cbfe27	reduce from 3 to 2 packages	2024-06-28 09:13:22 +02:00
Sarah Hoffmann	6e89310a92	split code into submodules	2024-06-26 11:52:47 +02:00

24 Commits