Excluding non-rare full names is not really possible because it makes
addresses with street names like 'main st' unsearchable. This tries to
leav all names in but refrain from ordering results by accuracy
when too many results are expected. This means that the DB will simply
get the first n results without any particular order.
Before postgis 3.4 ST_Project required a geography as input and seemed
to have implicitly converted to geography. Since 3.4 geometry input
is supported but leads to a completely different result.
The rematch penalty for partial words created by the transliteration
need to take into account that they are rematched against the full word.
That means that missing beginning and end should not get a significant
penalty.
Names of countries and states are exceedingly rare in the word count
but are very frequent in the address. A short name has the danger
of producing too many results.
When expressions are generated with SQLAlchemy, any constants are
replaced with bind parameters. The bind parameters become parameters of
prepared statements. The result is that the query planner tends to
oversee that the partial indexes can be used.
VALUEs() is not a cachable construct in SQLAlchemy, so use arrays
instead. Also add a special case for single results, the usual result
for reverse queries.
So far the SQL logic used the information from the address field
to determine if an address is attached to a street or place.
This changes the logic to use the information provided in the
token_info. This allows sanitizers to enforce a certain parenting
without changing the visible address information.
Give a preference to left-right reading, i.e <postcode>,<address>
prefers a postcode search while <address>,<postcode> rather does
an address search.
Also exclude non-addressables, countries and state from results when a
postcode is contained in the query.
Search is now split into three functions: for free-text search,
for structured search and for search by category. Note that the
free-text search does not have as many hidden features like
coordinate search. Use the search parameters for that.
This switches the input parameters for API calls to a generic
keyword argument catch-all which is then loaded into a dataclass
where the parameters are checked and forwarded to internal
function.
The dataclass gives more flexibility with the parameters and makes
it easier to reuse common parameters for the different API calls.
Snapping a line to a point before splitting was meant to ensure
that the split point is really on the line. However, ST_Snap() does
not always behave well for this case. It may shorten the interpolation
line in some cases with the result that two points housenumbers
suddenly fall on the same point. It might also shorten the line down
to a single point which then makes ST_Split() crash.
Switch to a combination of ST_LineLocatePoint and ST_LineSubString
instead, which guarantees to keep the original geometry. Explicitly
handle the corner cases, where the split point falls on the beginning
or end of the line.
The initial plan to serve /details and /lookup endpoints from
the same API call turned out to be impractical, so the API now
also has deparate functions for both.
Prepared statements do not work well with the partial indexes that
Nominatim uses because all Python constants are replaced with
parameters. A query like:
placex.select().where(placex.c.rank_address.between(4, 25)
gets translated into a prepared query with two parameters:
SELECT * FROM placex WHERE rank_address BETWEEN %s and %s
And this does not work with a partial index of:
CREATE INDEX on placex(geometry) WHERE rank_address between 4 and 25
The only allowable difference is precision of coordinates. Python uses
a precision of 7 digits where possible, which corresponds to the
precision of OSM data.
Also fixes some smaller bugs found by the BDD tests.
The implementation follows for most part the PHP code but introduces an
additional layer parameter with which the kind of places to be returned
can be restricted. This replaces the hard-coded exclusion lists.
Use adapted types for the different result types. This makes it
easier to have adapted output formatting and means there are only
result fields that are filled.
Add a 'village' zoom level at 13 between town and neighbourhood
and a all locality-like objects for zoom 15. These zoom levels had
the same behaviour as the lower level so far. However, the distinction
for village and locality may be useful at times.
fix: pylint error
added docs for delete tags sanitizer
fixed typos in docs and code comments
fix: python typechecking error
fixed rank address type
Revert "fixed typos in docs and code comments"
This reverts commit 6839eea755a87f557895f30524fb5c03dd983d60.
added default parameters and refactored code
added test for all parameters
Add a special index that contains the place nodes buffered by their
respective area according to their search rank. This replaces the
maximum area search for place nodes and reduces drastically the number
of place nodes that need to be retrieved.
It is not possible to produce type annotations that work with both
versions 1.4 and 2.0. So keep to the principle of only supporting
newest versions when it comes to mypy. This means that some types
may have to be string quoted to not cause issues when running with
SQLAlchemy 1.4.
Also defines an extended connection object that includes access to
the table definitions. Makes it easier to access the tables from
code that has been split off into separate modules.
Most of the server implementation of V1 API now resides in
api.v1.server_glue. The webframeworks only supply some glue code
which is independent to changes in the API code.
Code is now organized by api version. So formatting has moved to
the api.v1 module. Instead of holding a separate ResultFormatter
object per result format, simply move the functions to the
formater collector and hand in the requested format as a parameter.
Thus reorganized, the api.v1 module can export three simple functions
for result formatting which in turn makes the code that uses
the formatters much simpler.
Use a directory for the submodule where the __init__ file contains
the public API. This makes it easier to separate public interface
from the internal implementation.
As the updates modify the placex table, there may be deadlocks
when different objects want to forward modifications to the same
place (for example because they are both linked to it).
There are some pathological cases where an isolated letter may
be deleted because it is in itself meaningless. If this happens in
the middle of a sentence, then the transliteration contains two
consecutive spaces. Add a final rule to fix this.
See #2909.
When a associatedStreet relation has multiple street members
always take the closest one. Avoid geometry operations for
the frequent case that there is only one street.
Special terms with operator name usually appear in combination with the
name. The current penalties only took name + special term into account
not special term + name.
Fixes#2876.
Removes territories of US, France, Australia and Netherlands from the
country list. These territories have their own country code (which is
why they are in the list in the first place) but are mapped as part of
the admin_level 2 relations for the respective parent countries.
Therefore they never had any places attached. In practical terms, the
change only affects the number of tables created.
This makes Nominatim compatible with osm2pgsql's default update
modus operandi of deleting and reinserting data. Deletes are diverted
into a TODO table instead of executing them. When data is reinserted,
the corresponding entry in the TODO table is deleted. After updates are
finished, the remaining entries in the TODO table are executed, doing
the same work as the delete trigger did before.
The new behaviour also works against the gazetteer output with its
insert-only mechanism.
If no parent can be found for an interpolation, there is most
likely a data error involved. So don' t show these interpolations
in reverse search results.
The search 'landstr' produces many duplicates so that with
some bad luck 4 or less results may appear. Disable deduplication
to make it more predictable.
The values in the raster are already normalized between 0 and 2**16,
so a simple conversion to [0, 1] will do.
Check for existance of secondary_importance table statically when
creating the SQL function. For that to work importance tables need
to be created before the functions.
Adds partial indexes for all geometry queries used during import.
A full index is not necessary anymore at that point. Still create
the index afterwards for use in queries.
Also adds documentation for all indexes on where they are used.
The index over geometry_sectors are mainly used for ordering
the places which need indexing. That means they function effectively
as a TODO list. Consolodate them so that they always only contain
the places which are still to do. Also add the appropriate index
for the boundary indexing phase.
When deciding if an address interpolation has address information, only
look for addr:street and addr:place. If they are not there go looking
for the address on the address nodes. Ignores irrelevant tags like
addr:inclusion.
Fixes#2797.
Changed the script to functional programming paradigm to remove the big number of local attributes to decrease memory usage when running it. Additional OS info are now included.
When a boundary or place changes its address rank, all places where
it participates as address need to be potentially reindexed.
Also use the computed rank when testing place nodes against
boundaries. Boundaries are computed earlier.
Fixes#2794.
The functional split betweenthe two functions is now that the
first one creates the ID that is used in the word table and
the second one creates the variants. There no longer is a
requirement that the ID is the normalized version. We might
later reintroduce the requirement that a normalized version be available
but it doesn't necessarily need to be through the ID.
The function that creates the ID now gets the full PlaceName. That way
it might take into account attributes that were set by the sanitizers.
Finally rename both functions to something more sane.
Loads modules for configurable code like tokenizers, sanitizers, etc.
Supports internal modules, external libraries and code from the
project directory.
Requires TypedDict which is only available from Python 3.8. Require
therefore typing_extensions to make the functions available for
earlier Python versions.
A proper run of indexing requires the place information from the
analyzer. Add the pre-processing of place data, so the right
information is handed into the update function.
To avoid importance becoming zero and cancelling out other weights,
df008d99f5 introduced a minimum value
for importance. That broke importances for interpolated addresses,
which are less than zero.
Instead of setting a minimum, set zero importances to a very small
value.
Fixes#2753.
Moves postcodes that are either in countries without a postcode
system or don't correspond to the local pattern for postcodes into
a field for a normal address part. Makes them searchable but not as
a special address. This has two consequences: they are no longer a
skippable part of the address and the postcodes cannot be searched
on their own.
When shifting address ranks, the evaluation is always done against
unshifted address ranks on import because the objects we compare against
have not been indexed yet. This changes for updates when the object have
been touched in the meantime. To ensure consistent behaviour across
imports and updates, always use the unshifted address ranks.
Resolves a couple of situations where a mixed use of places areas and
administrative boundaries would result in a hierarchy that did not
properly respect the contains relation.
When taking over the address rank from a linked place, it needs
to be the originally computed rank, not the one that might have
been adjusted in the meantime. The adjustment was made under the
assumption that the node is not linked.
Interpolations are now indexed after rank 30 objects. The housenumber
nodes no longer need information from the interpolations while the
interpolations can make use of precomputed postcodes.
The bad quotes around the type for special phrases
specifically occure in the Wiki pages, so it should be
removed by the loader and not in the generic SpecialPhrase
object.
The search query builder currently rejects searches for partial
names only, when the partial terms are all very frequent to avoid
queries that return too many results.
This change slightly relaxes the condition to allow the search when
there are 3 or more partial terms. With so many terms the number
of matches should be managable.
When moving the finding of linked places to the precomputation stage,
it was also moved before the statement where the linked_place_id was
removed from the linkee. The result was that the current linkee was
excluded when looking for a linked place on updates because it was
still linked to the boundary to be updated.
Fixed by allowing to either keep the linkage or change to an unlinked
place.
Canada has complete coverage for administrative boundaries on
county level. Removing the county nodes from the addresses avoids error
due to a wide-spread doubling of place nodes for city counties.
In offline mode no attempts are made to download data from the internet.
At the moment that only concerns the computation of the database date.
It contacts the main API to get the date.
The fallback country boundaries already contain a sufficiently large
part of the water area, so there is no need to extend the country
assignment even more. Features outside countries should not show a
country in their address.
This checker encourages bad behaviour (namely changing the static
status of a function during inheritence) and will be made optional
in upcoming versions of pylint.
This is needed for pedestrian areas mapped as multipolygons
and consequently as relations. The lookup in placex guarantees
that the referenced OSM object is indeed a street.
Fixes#2669.
Adds a new check level WARNING because missing wikipedia importances
are not necessarily an error. If the database is run for reverse
requests only, then it is fine to go without them.
The inherited housenumber is needed for display output. We can't
take the one from the housenumber field because it is already
normalized. Remove the inherited address only when reindexing.
Fixes#2683.
The Letter class does not include non-spacing marks that can also
have a consonant or vowel meaning, especially in Indian languages.
Use the alnum propoerty instead which includes them all. Also
include the vowel-canceling Virama, which is not a letter by itself
but changes the transliteration.
Return OSM main tag information in geocodejson. This is not part
of the official spec but can be useful to get more detailed information
of the object type. Brings the Nominatim output closer to what
Photon produces.
'type' so far contained the value of the OSM tag. That is rarely
helpful because it is not a restricted class of values. Change
this to contain the types as defined in the geocodejson spec,
which correspond to the address layer names.
For point features, keep using the distance to centroid.
For area features, add a tie breaker for the case where the
center point falls on the boundary.
Instead of computing the distance to the centroid of the area
compute the distance of the area to the centroid of the feature.
This means we give preference to the area that covers the centroid.
It's still a heuristics but one that is a bit less random.
Automatically repopulate the tokenizer/ directory with the PHP stub
and the postgresql module, when the directory is missing. This allows
to switch working directories and in particular run the service
from a different maschine then where it was installed.
Users still need to make sure that .env files are set up correctly
or they will shoot themselves in the foot.
See #2515.
The OSM data has been sufficiently cleaned up by now that
the operator no longer needs to be considered a name tag.
Use 'brand' as the searchable alternative.
Convert the '_place_*' entries back to normal entries before
returning them in the 'namedetails' section. If the name field is
duplicated, kept the '_place_*' notation. This preserves the previous
behaviour before _place_ names were introduces but adds the additional
names from the linked place for reference.
This keeps the names tracable and ensures that all names are searchable
when they differ. Do not keep names when they are exactly the same
to save some space. Linked names are cleaned out before relinking.
An expression of the form 'SELECT (func()).*' will be expanded
by Postgresql _before_ execution with the result that the function
will be called as many times as there are fields in the record.
This is not what we want. The function call needs to go into
the FROM clause instead.
This lays the groundwork for adding variants for housenumbers.
When analysis is enabled, then the 'word' field in the word table
is used as usual, so that variants can be created. There will be
only one analyser allowed which must have the fixed name
'@housenumber'.
When changing something in the default configuration of the sanatizers
that refers to an analyzer that is not yet loaded, there shouldn't be
any errors.
This gives the analyzer more flexibility in choosing the normalized
form. In particular, an analyzer creating different variants can choose
the variant that will be used as the canonical form.
Given that the tag is most of the time duplicated by an amenity
tag which is already imported, only import it as a fallback when
there is no name.
Fixes#2609.
The name query already looks for the existence of housenumbers and
may as well retrive them. Saves up to threee additional lookups.
It also means that we can lift the restriction on looking
for existance of housenumbers for simple queries only.
Nodes on an interpolation now only get the address tags of
interpolations and then compute their own parent from that. They no
longer inherit the parent directly.
Use the same update mechanism as for updates on the interpolations
themselves. Updates must solely happen in place_insert as this is
the place where actual changes of the data happen.
Mutations are regular-expression-based replacements that are applied
after variants have been computed. They are meant to be used for
variations on character level.
Add spelling variations for German umlauts.
While technically being a letter, the apostrophe is often replaced
with a normal apostrophe in writing which is a punctuation mark.
This makes sure that the modifier letter apostrophe yields the same
normalization results and thus is really interchangable.
Only has an effect after the next reimport.
Fixes#2569.
A forward interpretation of the form 'street, city, country' is
much more frequent than the reverse form 'country, city, street'.
Thus swap the order of interpretations that the forward order comes
first.
Only one addr: tag can be processed currently, so make
sure it is the one without suffixes to not get odd data.
addr:street is the exception because it uses a different
matching mechanism.
Using partial names turned out to not work well because there are
often similarly named streets next to each other. It also
prevents us from being able to take into account all addr:street:*
tags.
This change gets all the full term tokens for the addr:street tags
from the DB. As they are used for matching only, we can assume that
the term must already be there or there will be no match. This
avoid creating unused full name tags.
The highway key is being used more and more for non-ways these
days. This clashes with Nominatim's assumption that essentially
everything that has a highway tag can be used as the street part
of the address.
Change the default rank of highway objects to 30 to avoid this.
Only the known values for streets keep the rank 26 and are now
listed explicitly.
Queries with a housenumber need to rank streets higher that
have the requested housenumber attached. We already do that for
ordinary housenumber objects and for interpolations. This
adds support for Tiger housenumbers as well.
Fixes#2501.
House numbers of the form '9 bis' are usual in France. So
be a bit more lenient before adding penalties to house numbers
with letters in them.
Fixes#2527.
Check if the API script exists on the expected location before
running php-cli. This way we can add a useful hint about the
project directory.
Fixes#2513.
When ordering results by the fact that they have a housenumber,
also take cases into account where the housenumber is on the
place itself. This may happen when the search includes the name
of the place and the housenumber or for addr:place addresses
where the place is unlisted.
The fairly complex where condition of idx_placex_geometry_placenode
won't always be matched by the query planner if the condition
part doesn't appear verbatim in the query.
Fixes#2480.
Running the warm-up search requests requires querying
the most frequent words. This must be done via the tokenizer
to honor the different formats of the word table.
This mode gets updates until the server reports no new diffs
anymore.
Also adds additional indexing, when the main indexing step left
a couple of objects to process. This happens only when the
next update is expected to be more than 40min away.
Point-in-polygon queries are much faster with a SP-GIST geometry
index, so use that for the index used to check if a housenumber
is inside a building.
Only available with Postgis 3. There is an automatic fallback to
GIST for Postgis 2.
Adds a tagger for names by language so that the analyzer of that
language is used. Thus variants are now only applied to names
in the specific language and only tag name tags, no longer to
reference-like tags.
Implements per-name choice of analyzer. If a non-default
analyzer is choosen, then the 'word' identifier is extended
with the name of the ana;yzer, so that we still have unique
items.
Adds a second callback for the analyzer which is responsible
for parsing the configuration rules and converting it to
whatever format necessary. This way, each analyzer implementation
can define its own configuration rules.
Adds a mandatory section 'analyzer' to the token-analysis entries
which define, which analyser to use. Currently there is exactly
one, generic, which implements the former ICUNameProcessor.
Adds parsing of multiple variant lists from the configuration.
Every entry except one must have a unique 'id' paramter to
distinguish the entries. The entry without id is considered
the default. Currently only the list without an id is used
for analysis.
Sanatizer functions allow to transform name and address tags before
they are handed to the tokenizer. Theses transformations are visible
only for the tokenizer and thus only have an influence on the
search terms and address match terms for a place.
Currently two sanitizers are implemented which are responsible for
splitting names with multiple values and removing bracket additions.
Both was previously hard-coded in the tokenizer.
There is no need for the additional layer of indirection that
the ICUNameProcessorRules class adds. The ICURuleLoader can
fill the database properties directly.
Adds class, type, country and rank to the exported information
and removes the rather odd hack for countries. Whether a place
represents a country boundary can now be computed by the tokenizer.
Levels choosen according to OSM wiki. Mainly moves admin_level 6
to county level and admin_level 8 to city/town level. Higher
levels are adjusted accordingly.
Fixes#2453.
When matching address parts from addr:* tags against place names,
the address names where so far converted to full names and compared
those to the place names. This can become problematic with the new
ICU tokenizer once we introduce creation of different variants
depending on the place name context. It wouldn't be clear which
variant to produce to get a match, so we would have to create all of
them. To work around this issue, switch to using the partial terms
for matching. This introduces a larger fuzziness between matches but
that shouldn't be a problem because matching is always geographically
restricted.
The search terms created for address parts have a different problem:
they are already created before we even know if they are going to be
used. This can lead to spurious entries in the word table, which slows
down searching. This problem can also be circumvented by using only
partial terms for the search terms. In terms of searching that means
that the address terms would not get the full-word boost, but given
that the case where an address part does not exist as an OSM object
should be the exception, this is likely acceptable.
Instead of requesting the match tokens from the tokenizer
when looking for parent streets/places and address parts,
hand in the saved tokens and ask if they match. This gives
the tokenizer more freedom to decide how name matching
should be done.
Adjusts levels for boundaries according to the list on
https://wiki.openstreetmap.org/wiki/Tag:boundary%3Dadministrative
* no admin_level 5, so drop that from addresses
* admin_level 6 has the province
* admin_level 7 has the county when it exists
Also reranks place=province so that it matches up with
admin_level 6 and introduces place=civil_parish which
is used as a place node for some admin_level=9 boundaries
in Galicia.
A boolean check for dynamic changes of address parts is not
sufficient. The order of choice should be:
1. an addr:* part matches the name
2. the address part surrounds the object
3. the address part was declared as isaddress
The implementation uses a slightly different ordering
to avoid geometry checks unless strictly necessary (isaddress
is false and no matching address).
See #2446.
Adds a function to the Configuration class to load a YAML
file. This means that searching for the file is generalised
and works the same now for all configuration files. Changes
the search logic, so that it is always possible to have a
custom version of the configuration file in the project
directory.
Move ICU tokenizer to use new load function.
Additional penalty for special terms with operator None
should only go to near searches. To reduce the number
of produced searches, restrict the none operator to
appear only in conjunction with the name.
Use vanilla docker images of Ubuntu and leave the setup
to the vagrant scripts. Then do the usual import tests.
Also fixes a couple of issues found with the scripts
Linked places may bring in extra names. These names need to be
processed by the tokenizer. That means that the linking needs
to be done before the data is handed to the tokenizer. Move finding
the linked place into the preparation stage and update the name
fields. Everything else is still done in the indexing stage.
The new icu tokenizer is now no longer compatible with the old
legacy tokenizer in terms of data structures. Therefore there
is also no longer a need to refer to the legacy tokenizer in the
name.
The hack for IL, AL and LA is only needed because these abbreviations
are removed by the legacy tokenizer as a stop word. There is no need
to keep the hack for future tokenizers. Move it therefore to the
token extraction function.
mkdocstrings also needs access to the Python sources, so set
a PYTHONPATH accordingly. This makes running mkdocs directly
a bit awkward, therefore add a `make serve-doc` target.
This separates the logic of creating word sets from the Phrase
class. A tokenizer may now derived the word sets any way they
like. The SimpleWordList class provides a standard implementation
for splitting phrases on spaces.
Restricting tokens due to the search context is better done in
the generic search part instead of repeating the same test in
every tokenizer implementation.
Also updates the documentation. For the simple case of just
importing multiple regions, provide simplified instructions
that use the new multi-file import feature.
Fixes#2365.
Postgresql is very bad at creating statistics for jsonb
columns. The result is that the query planer tends to
use JIT for queries with a where over 'info' even when
there is an index.
The BDD tests cannot make assumptions about the structure of the
word table anymore because it depends on the tokenizer. Use more
abstract descriptions instead that ask for specific kinds of
tokens.
Requires a second wrapper class for the word table with the new
layout. This class is interface-compatible, so that later when
the ICU tokenizer becomes the default, all tests that depend on
behaviour of the default tokenizer can be switched to the other
wrapper.
The table now directly reflects the different token types.
Extra information is saved in a json structure that may be
dynamically extended in the future without affecting the
table layout.
Moving the logic for extending the SearchDescription into the
token classes splits up the code and makes it more readable.
More importantly: it allows tokenizer to define custom token
classes in the future.
The token string is only required by the PartialToken type, so
it can simply save the token string internally. No need to pass
it to every type.
Also moves the check for multi-word partials to the token loader
code in the tokenizer. Multi-word partials can only happen with
the legacy tokenizer and when the database was loaded with an
older version of Nominatim. No need to keep the check for
everybody.
Moves token and phrase position and phrase type into a separate
class that is handed in when assembling the search description.
This drastically reduces the number of parameters for the function
to extend the search descriptions and gives us more flexibility
in the future for more complex positional analysis.
Full-word tokens are no longer marked by a space at the
beginning of the token. Use the new Partial token category
instead. This removes a couple of special casing, we don't
really need.
The word table still has the space for compatibility reasons,
so the tokenizer code needs to get rid of it when loading the
tokens.
name:etymology contains a description of the name origin and is
thus more informative than search-worthy.
name:signed basically indicates that the feature does not have
a name.
The partial word count does not split names to save a bit of time.
The result is that it might enounter unreasonably long names
which in truth consist of multiple words. No accurate statistics
are needed so simply restrict the count to words shorter than
75 characters.
Two replacement words directly following each other did not
work as expected because each expects a space at the
beginning/end while there was only one space available.
Also forbit composing a word after a space was added in the
end by a previous replacement.
Make sure all special symbols are removed during normalization already.
Those won't be interpreted in any way because they are unlikely to be
searched for.
The new format combines compound splitting and abbreviation.
It also allows to restrict rules to additional conditions
(like language or region). This latter ability is not used
yet.
Compound decomposition now creates a full name variant on
import just like abbreviations. This simplifies query time
normalization and opens a path for changing abbreviation
and compund decomposition lists for an existing database.
This adds precomputation of abbreviated terms for names and removes
abbreviation of terms in the query. Basic import works but still
needs some thorough testing as well as speed improvements during
import.
New dependency for python library datrie.
The tokenizer configuration has become difficult to handle
due to the additional manual transliteration rules. Allow
to have a separate rule file that is given to the ICU library
as is.
Now that mutli-word partials no longer exist, multi-word full
words need to be used to search in addresses and therefore no
longer should have a penalty.
Also changes the condition when a full word is included into
the address. It is no longer relevant if an equivalent partial
exists but only if the term consists of more than one word.
When searching for house numbers in the name (for place-only
terms) then the same penalties need to apply as for the
regular house number search.
Change the code to first compute the penalties and then create
the new search variants.
Python 3.6 introduces formatted string literals and
flag enums as well as a much faster dict implementation.
These changes make the code so much simpler as to warrant
dropping Python 3.5 support.
Affected distributions are Ubuntu 16.04 and Debian Stretch.
We've previously added searching through rank 30 in a house
number search to enable searches for house number+name.
This had the unintended side effect that rank 30 objects
are also returned in s search that dropped the house number
from the query. This is wrong because POIs cannot function
as a parent to a house number.
This fix drops all rank 30 objects from the results for a
house number search if they do not match the requested house
number.
Special terms need to be prefixed by a space because they are
full terms.
For countries avoid duplicate entries of word tokens.
Adds tests for adding country terms.
SQL functions must always be reloaded when updating the software.
All other updates included the instruction as part of some other
migration. From 3.7 on it will happen as part of the migration
command.
Fixes#2335.
When guessing postcodes from the area, only postcodes within
that area are accepted. For POIs that is usually not what we
want as the postcode would have to be within a house for
example.
Fixes#2301.
- only save partial words without internal spaces
- consider comma and semicolon a separator of full words
- consider parts before an opening bracket a full word
(but not the part after the bracket)
Fixes#244.
Explicitly check for the tokenizer source file to check that
the name is correct. We can't use the import error for that
because it hides other import errors like a missing
library.
Fixes#2327.
When name and address is empty, the keywords field in the response
of the details API would be an array because that is what PHP's
json_encode defaults to with empty array(). This default can only
be changed globally per json_encode call and that might cause
unintended colleteral damage. Work around the issue by making
name and address an empty array instead of keywords.
Fixes#2329.
The Python code now takes care of reading postcodes from placex,
enhancing them with potentially existing external postcodes and
updating location_postcodes accordingly. The initial setup and
updates use exactly the same function.
External postcode handling has been generalized. External postcodes
for any country are now accepted. The format of the external postcode
file has changed. We now expect CSV, potentially gzipped. The
postcodes are no longer saved in the database.
The ICU library only offers transliterations for a limited set of
script. Add transliterations for missing scripts from the PostgreSQL
module. These means that the same selection of scripts is supported
as with the old module.
The tokenizer to be used can be choosen with -DTOKENIZER.
Adapt all tests, so that they work with legacy_icu tokenizer.
Move lookup in word table to a function in the tokenizer.
Special phrases are temporarily imported from the wiki until
we have an implementation that can import from file. TIGER
tests do not work yet.
The indexer now fetches any extra data besides the place_id
asynchronously while processing the places from the last batch.
This also means that more places are now fetched at once.
This adds an installation step for PHP code for the tokenizer. The
PHP code is split in two parts. The updateable code is found in
lib-php. The tokenizer installs an additional script in the
project directory which then includes the code from lib-php and
defines all settings that are static to the database. The website
code then always includes the PHP from the project directory.
The BDD tests still use the old-style amenity creation scripts
because we don't have simple means to import a hand-crafted
test file of special phrases right now.
Normalization and token computation are now done in the tokenizer.
The tokenizer keeps a cache to the hundred most used house numbers
to keep the numbers of calls to the database low.
The name analyzer is the actual work horse of the tokenizer. It
is instantiated on a thread-base and provides all functions for
analysing names and queries.
Add a jsonb column to the placex and location_property_osmline tables
which can be used by the installed tokenizer as required. No other
part of the software will use or otherwise rely on this column.
Indexing is now split into three parts: first a preparation step
that collects the necessary information from the database and
returns it to Python. In a second step the data is transformed
within Python as necessary and then returned to the database
through the usual UPDATE which now not only sets the indexed_status
but also other fields. The third step comprises the address
computation which is still done inside the update trigger in
the database.
The second processing step doesn't do anything useful yet.
Creating and populating the word table is now the responsibility
of the tokenizer.
The get_maxwordfreq() function has been replaced with a
simple template parameter to the SQL during function installation.
The number is taken from the parameter list in the database to
ensure that it is not changed after installation.
Adds a migration that initialises a legacy tokenizer for
an existing database. The migration is not active yet as
it will need completion when more functionality is added
to the legacy tokenizer.
This adds the boilerplate for selecting configurable tokenizers.
A tokenizer can be chosen at import time and will then install
itself such that it is fixed for the given database import even
when the software itself is updated.
The legacy tokenizer implements Nominatim's traditional algorithms.
These tables have never been actively maintained and the code is
completely untested. With the upcomming changes, it is unlikely
that the code remains usable.
This removes the aux tables and all code that references them.
Rename function to reflect that it is only used for precomputation.
The token IDs are not really needed, so don't bother to compute
the array of tokens.
Looks like 2af82975cd
accidentally renamed an index. Because of the added "if not
exists" clause, the index doesn't get created. This
significantly slows down reverse queries because they now
require full scans on location_property_tiger.
Without this fix, reverse queries can take 8s on a full
planet install on an r5.8xlarge instance in EC2.
Indexing scans the placex table sequentially during indexing
on the initial import. That is okay because we know that all
rows need to be processed anywhere. When continuing the import,
however, a large part might already be indexed, so that the
process spends a lot of time going through rows that are no
longer of interest. Create a supporting index for all unindexed
rows to speed up the scan. This is the same index as used later
for updates.
Instead of normalising the names simply compare them in lower
case. This removes the dependency on the tokenizer for
linking boundaries and nodes. When looking up the linked places
by place type also allow that one name is simply contained in the
other. This catches the frequent case where one of the names has
an addendum (e.g. Newport vs. City of Newport).
Drops the special index for the name lookup and insted relies
on a slightly extended version of the geometry index used for
reverse lookup. Saves around 100MB on a planet.
Disabling query reversal is no longer possible in the configuration,
so there is no need to keep this as an option. Reversal is
automatically disabled for structured search only.
On Postgresql versions 11+ add an index to speed up the lookup
of housenumbers for terms found in search_name. This is really
just a band-aid around the query planer's interpretation of the
query.
Usually we don't narrow down search results by house number when
only a street name is given because there may be a lot of rows
to cross check when the street name is very frequent. However,
when it is known to be rare, the housenumber check may be done
anyway.
Fixes#2238.
On Windows systems the timer may not be accurate enough to measure
the time between init() and done(). Avoid computing statistics with
a diff time of 0 in such cases.
Fixes#2230.
- the command is now --import-special-phrases
- the output is not an sql file anymore, data are directly imported to the database.
- the little part on the documentation (section data import) has been modified.
Drops all calls to PHP utility functions. nominatim cli functions
are used where possible, to stay as close to the final code as
possible with the tests.
By removing the PHP calls, the test code now only uses osm2pgsql and
the database module from the build directory.
Always look up the closest housenumber before looking up
interpolations. This ensures that closer housenumbers are
preferred over interpolations.
Fixes#2214.
Adds a general higher penalty for special search term and an
additional one if the term is anywhere but the beginning or the
end. Also housenumbers and special searches together are less
likely.
When we are in the final iteration of the search groups, it is not
possible to further delay the results. Unconditionally use the
results with the best rank instead.
Full word terms are already preferred for the name part. Adding
only one-word partials to the address, makes it impossible to
give a similar preference for the address part. Each term adds
a rank penalty. The problem here is that we interpret the query
forwards and backwards. Having different penalty systems for
name and address means that the same term ends up with different
penalties and that often leads to interpretations of the wrong
direction being in the way.
Adds a 'admin --migrate' command that checks for the current
database version and runs any necessary migrations. Also
has migrations going back to 3.6.
Also switches to jinja-based preprocessing, which allows to
simplify the SQL files. Use 'if not exists' where possible
so that the step can be rerun to fix missing indexes.
Replaces various hand-crafted replacements of varying format with
a single Jinja2 templating mechanism. Allows full access to
configuration if necessary.
If an execption is raised by other means, we still have to close
the stdin pipe to psql to make sure that it exits and releases its
connection to the database.
Instead of parsing the DSN for each external libpq program we
are going to execute, provide a function that feeds them all
necessary parameters through the environment.
osm2pgsql is the first user.
Psycopg2 has changed the kind of exception that is emitted on
deadlocks between versions 2.7 and 2.8. The code was already
trying to catch both kind of errors but because the
psycopg2.errors package is unknown in 2.7 and below, the
code would throw an exception on anything but a deadlock error.
This commit wraps the deadlock handling into a context manager
to avoid code duplication and uses module imports to detect if
the new error codes are available.
Also sets the required psycopg2 version to 2.7 or bigger as
versions below are difficult to test.
The gazetteer output doesn't disable these functions when
writing to the place table but the triggers may contain
operations that cause misplanning for the query planner.
Results where the housenumber was dropped are an unlikely result
when they refer to something other than a street. Therefore
increase their result rank so that other matches are tried first
before choosing them as a result.
Improves #2167.
Installation includes PHP andPython libraries, settings, the basic
country data, the postgresql module and our custom version of
osm2pgsql. The latter is installed in our private library directory
so that it does not get in the way of a potentially installed
osm2pgsql from the distribution.
So far the data directory constant has pointed to the source
directory to be usable with different subdirectories. Now only
the data subdirectory itself is being used with the constant,
so point to the directory directly.
This replaces {data_dir}/settings throughout the code, so that
the configuration may be placed somewhere else in the directory
structure (e.g. in /etc).
Also moves the call to the setup from the setup-db
step to the calculate-postcodes step. The tables also need
no longer be accessible by the webservice.
This ports the --socket-timeout parameter from
pyosmium-get-changes which ensures that the update
process eventually times out on hanging network connections.
With the website directory now tied to the project directory instead
of the build directory, it is no longer possible to use make for
running the web server.
This reverts commit 559fe513fa.
Increasing the splitting results in geometries where with rounding
issues at the split points, so that contain operations do not
work as expected anymore.
Fixes#2137.
Given that the module is now copied to the project directory
when no module path is set, we need the information that the
module path is empty. Therefore hand in the default module path
in a separate variable.
When no explicity database module is configured, then the
module is now copied into the project directory and used from
there. This means that Nominatim can be updated to a new
version of the module while existing installation keep their
version of normalisation.
This is a exception to be thrown when the error occures because
of bad user data. We don't want to print a full stack trace in
these cases but just tell the user what went wrong.
The new functions always creates normal and partitioned functions.
Also adds specialised connection and cursor classes for adding
frequently used helper functions.
Given that only one command will be executed in the end, it is
not necessary to import what amounts to the whole library. This
becomes in particular important for update functions that have
a dependency on pyosmium. The dependency can remain optional for
people not using updates.
The PHP scripts need to know the position of the nominatim
tool in order to call it. This is handed in as environment
variable, so it can be set by the Python script.
This means that the php-symfony-dotenv library is now only needed
when using the legacy scripts. This includes the BDD tests which
currently still rely on the PHP utils.
When a setting can be read from the environment variable, avoid
accessing the internal defaults. This way the scripts can be
accessed directly in \lib as long as the environment is set up
correctly with full defaults.
Adds infrastructure for calling the legacy PHP scripts. As the
CONST_* values cannot be set from the python script, hand the values
in via secret environment variables instead. These are all
temporary hacks for the transition phase to python code.
In the future, the BDD tests will simply set up the required
test database themselves. Like with the template database, it
is not reimported when it already exists unless that is explicitly
forced.
Makes most of the API tests currently fail because they still
point to old test data.
Introduces a new class DBRow that encapsulates the comparison
functions. This also is responsible for formatting more informative
assert messages. place and placex steps are unified.
Put the connection to the test database into auto-commit mode
and get rid of the explicit commits. Also use cursors always in
context managers and unify the two implementations that copy
data from the place table.
The project directory contains the website script as
configured through the test configuration. This means
that tests are now completely independet of any
configuration that may be contained in the build
directory.
Also removes the hack to inject additional settings via
a environment variable.
The default location of osm2pgsql and the postgresql module
is decided at compile/installation time and is not necessarily
in the project directory.
With this change it is now possible to have a project directory
that is completely separate from the build directory.
Instead of creating the website wrapper scripts with cmake,
they are now created when --setup-website is called. The
setup of the configuration constants is directly embedded
into the scripts. This means we can get rid of the separate
settings-frontend.php. More importantly however, it means
that it is now possible to set up multiple website directories
from the same build directory.
DB tests now can simply set the environment to change configuration
variables. API tests still rely on a configuration file.
Also, query.php needs to set up the CONST_* variables to work with
the query scripts. That is a tiny bit messy and duplicates code
but this part will need to be reworked later.
As we can't refer to the project root dir in the module path, the
module path may now also be a relative directory which is then
taken as being relative to the project root path.
Moves the checkModulePresence() function into the Setup class, so
that it can work on the computed absolute module path.
This adds the notion of a project directory. This is the directory
that holds all necessary files for one specific installation of
Nominatim. Dotenv looks for an .env file in this directory and
adds it to the global environment together with the defaults from
Nominatim's data directory.
Add's symfony's dotenv library as a new dependency.
The setup relies on the project configuration which we want to
explicitly set up in later steps. Therefore proxy setup needs to
be done explicitly as well. There is the added bonus that the
setup is done only for the utils which try to call outside.
CONST_BasePath is split into separate configuration variables
for binaries, libraries and data. These variables as well as
the installation path are now set in the executable directly and
no longer configurable via project settings.
This is the first step towards an installable software. The
executables should know per installation where to find their
necessary data to execute. Project configuration needs to be
restricted to settings that really concern the specific Nominatim
installation.
Update names in the coutry_names table on the fly from incomming
OSM country data. Adding a small sanity check that the country
must be an OSM relation and within the area where we expect the
country to be.
* remove traces of HTML output
* add details on artificial objects (see also #1671)
* add geometry output documentation for lookup
* deprecate query by ID via reverse endpoint
* remove /search/<query> query format, no longer supported
* explain better what reverse geocoding does
* lots of smaller fixes to wording
If a place node is already linked against a boundary, it should not
be used for linking again. It is usually a sign of a mapping error,
when there are multiple boundary candidates. This change just avoids
inconsistent data in the database, it does not guarantee that the
linking is against the more correct boundary.
The address rank is much more interesting than the search rank
these days because it tells something about the kind of object.
Reverse did have neither rank, so add both for consistency.
Rank 30 objects usually use the address parts of their parent.
When the parent has address parts that are areas but not marked
as isaddress, then the parent might go through multiple administrative
areas. In that case recheck if the right area has been choosen
for the object in question instead of relying on isaddress.
Note that we really only have to do the recomputation in the
case of 'isarea = True and isaddress = False' which hopefully
keeps the number of additional geometric operations we have to do
to a minimum.
There is one more special case to be taken into account here: a
street may go through two administrative areas and a house along
that street is placed in one of the area while the addr:* tags
says it belongs to the other. In that case we must not switch
the isaddress to the one it is situated. To avoid that recheck
the address names against the name of the ara. That is not perfect
but should cover most cases.
Fixes#328.
Wait for 2 seconds before logging the first progress, so that we
have numbers that are a bit more reliable statistically speaking.
Also provides an actual implementation for the log_interval
parameter and fixes some small style issues.
It would be nice to always compute addresses for rank 0 objects
over the complete geometry, so that they can be found via all
the admin boundaries that they intersect. However, there are a
couple of extramely large boundaries in OSM (like timezones)
where this results in thousands of possible address candidates
that need to be checked. Fall back to getting the address of the
centroid for them.
The post codes are the last part that does not fit the new
address ranking scheme. In particular, the search rank is still
relevant for choosing if a postcode should be included into
the address terms. Filter out irrelevant postcodes in
getNearFeatures() already, to avoid having to check for
geometry relation.
Multi-word partial terms had an undue advantage over separate partial
terms because they only need to pay the penalty once. This changes
the behaviour by setting the penalty according to the number of
words in the token. This should get rid of search interpretations
with low chance of matching.
This also fixes handling of exact term matching. We now match against
all exact terms of the query, not just a couple of them collected
while building the interpretations.
Also adds a penalty to very short postcodes.
House numbers need special handling because they may appear after
the street term. That means we canot just use them as the main name
for searches where the address has its own search term entries.
Doing this right now, we are able to find '40, Main St, Town' but not
'Main St 40, Town'.
This switches to using the housenumber token as the name term instead.
House number tokens can get special handling when building the search
query that covers the case where they come after the street.
The main disadvantage is that this once more increases the numbers
of possible search interpretation of which we have already too many.
no penalty for housenumber searches
The previous behaviour was a left-over from a former version
where such POIs parented to the street. Now that they parent to
places, it should be included.
get_addressdata() now also checks if the place itself has entries
in the place_addressline table and merges them into the results.
Also restrict checking for address tag places to cases where the
name cannot be found in the parent's address search terms. Looking
up all address tags is just too slow.
While previously the content of addr:* tags was only added
to the list of address search keywords, we now really look up
the matching place. This has the advantage that we pull in all
potential translations from the place, just like all the other
address terms that are looked up by neighbourhood search.
If no place can be found for a given name, the content of the
addr:* tag is still added to the search keywords as before.
There are two places where the website URL is still used:
for icons, replace the URL with a link to the icon repository
of the UI repo. The more URL now builds the link from the
server info.
Go back to using centroid when determining if one admin level
is within another. There are cases where boundaries are slightly
misaligned due to mapping errors (not using the same ways in the
relations).
Only declare boundaries the same when they have the same wikidata
tag _and_ have exactly the same geometry. This works around tagging
errors with the wikidata tag, which happen because of automated
edits to the wikidata tag.
The Polish community maps admin boundaries that span multiple
levels by duplicating the boundary relations. Detect this situation
by looking out for matching wikidata tags. The higher ranked
duplicates are then thrown out from the address pool by setting
their address rank to 0.
In cases of countries and remote places without an address
it is possible that 'addressimportance' comes back empty.
Adjust the 'foundorder' to the places importance instead
in such cases.
Fixes#2023.
Rank 25 is now available for places that should appear in addresses
but not when a street is present. Use this for som block-like
place types. Also document the particularity of rank 25.
subdevisions and allotments are now at the same level as landuse
which they are frequently used together with.
Important changes:
* fix disabling of JIT in Postgresql
* support for finding latests Postgresql from their repos
* no longer create nodes table with flatnodes
Now that the containment check uses ST_Relate, we need to add
a separate bbox contains check to ensure that Postgis does the
efficient check first. Note that we still cannot get rid of the
overlap(&&) check because then Postgis will use the wrong indexes.
Removes admin level 7, which should not exist and promotes
admin level 8 to municipality level.
place=municipality is only used for boroughs of St. Petersburg,
so demote to level 18.
Fixes#926.
The new address computation assumes that the centroid is inside
the area. Therefore we cannot use the centroid function. Use the
pre-computed centroid instead which has already been corrected to
be inside the area.
Also demote the address rank of an admin boundary when there
is a place area of higher rank that completely contains the
area. This catches the case where city boundaries do not exactly
align with administrative units (see for example Moscow).
This is a complete rewrite of the selection of address parts to
be inserted into the place_addressline table.
The new algorithm selects for each rank:
* the boundary overlapping with the addressee and contained
in the already selected boundaries of lower rank, or failing that
* the place node closest to the addressee that is contained in
the already selected boundaries and in the influence radius
of already selected place nodes of lower rank
Place nodes that are not contained in already selected boundaries
of lower rank are completely thrown away. All other candidates are
added as non-address parts.
If a place node of city rank and above finds itself in an
administrative boundary of the same address rank, then
increase the address rank by 2. This catches the rather
frequent case where city suburbs are tagged for historical
reasons as towns or villages.
In structured queries we should only assume that it is
a postcode search when only the postcode and optionally
the country is given. If any other term is present, it
is better to avoid the search for postcode as it yields
too many bad searches. Given that the terms in a structured
query are ordered, this means that the postcode must be
the first token just like in the unstructured query.
Fixes#1988.
The latest version of osm2pgsql no longer creates indexes on
the members of planet_osm_rels. So we do that ourselves.
Given that we only need to access associated street relations,
the index can become quite a bit smaller.
Using both is slightly problematic because they have different
ways to use the index. Newer versions of Postgis exhibit a
query planner issue when both functions appear together.
As ST_Intersects includes ST_Covers, simply remove the latter.
We can't use getNearFeatures() to determine the parent of a
place with an unlisted addr:place because this function
returns place nodes that are potentially outside the area
of interest. Doing the complete address computation is too
expensive, so simply use the area with the largest rank that
contains the feature instead.
When a POI has no addr:street but an addr:place that is not
contained in the name list of the parent place, then remember
this situation and merge the content of addr:place into the
address output.
We don't need to care about translations in this case because
it is obvious that no object with translations exists if the
parent isn't the object named in addr:place.
These are used to mark large paved areas. Sometimes they exists
together with named regular streets. In such cases the unnamed
area may overshadow the actual street when computing the address
parent. As unnamed highways are not very useful anyway, we
simply remove them from the database.
If an addr:place is given but no addr:street tag, then bind
the rank 30 object always to a <=25 object, even when there
is none found with the same name.
When a place of rank 30 has addr tags that are not covered by the
search terms of the parent, add a separate entry for the POI in
the search_name table that includes the addr tags. We can only
do that with named places. For POIs without a name the housenumber
is used as name. If that is not available either, searching still
won't work.
Colons are used as a delimiter in tiger:left and tiger:right tags
when multiple postcodes are given. Ignore those. This was already
done in the postcode update script. This changes just makes the
two places consistent where postcodes are added.
place=postcode places are artificial places that collect addr:postcode
points for aggration. They should neither show up in the address nor
be searchable. That means that there is no need to index them at all.
Only let boundary=postal_code through which define correct areas for
postcodes.
Postal boundaries usually just have the postcode tag set and are
therefore officially 'nameless'. We want to have them as
boundary=postal_code anyways in order to distiguish them from postcode
points inherited from addr: tags.
Boundaries shound derive the address part type from the
linked place if possible. This was already implemented
for the address objects but not for the address information
from the address itself.
Fixes#1949.
It may happen that two different postcodes normalize to exactly
the same token. In that case we still need two different entries
in the word table. Token lookup will then make sure that the correct
one is choosen.
Fixes#1953.
Add a section on setting up the development environment which now
also includes the former chapter on recreating the documentation.
Move the README from test/ into the manual as the new Testing
chapter.
Unify the two vagrant scripts for Ubuntu 18. The script can now
be run in three modes: no webserver, with apache, with nginx.
The default mode is to not install any webserver at all. This is
normally sufficient when just developping.
The commit also switches from bento to generic boxes and adds config
for running with a libvirt provider. You need an NFS deamon for
synchronized folders.
This runs the PHP development server in a mode where it listens
globally. This is needed when running inside vagrant and port-forwarding
to the host machine.
Further reduce the size from which on POIs are no longer bound
to streets but only to larger objects. The point of reference,
of what a largest POI could be that is still bound is JFK airport.
Boundaries and places now always get a rank < 26 to make sure that
they do not parent to a street. Skip boundary=place completely
because they will be covered throught the secondary place tag.
In rare cases search_name might have entries for places for
which we do not return details, in particular for linkees.
Need to remove those entries in the result list before returning
the details.
Fixes#1932.
The initial search results retrieved from the database already come
preordered, either by importnace or by distance. We want to keep
that order if all other things are equal.
Squares are now addressable (on address level 25) and thus can
be attached to a house number via addr:place. Needed to increase
the rank range for matching up addr:place to 25.
When computing the address parts for a geometry, we need to do
a ST_Relates lookup in the location_area_large_* tables. This is
potentially very expensive for geometries with many vertices.
There is already a funtion for splitting large areas to reduce the
impact. This commit reduces the minimum area of a split, effectively
increasing the number of splits.
The effect on database size is minimal (around 3% increase), while
the indexing speed for streets increases by a good 60%.
Always add contents of addr:* tags into address part of the search
table, even when there is no corresponding other name. This keeps
search tolerant to the kind of tagging where parts show up in the
address that have no corresponding object in the database or where
it is only an unaddressable object.
A place needs all lower address rank object indexed to make up
the address. The search rank no longer ensures that as it can have
a different ordering than the address rank.
This switches indexing rank order to address ranks. Non-address
objects (with address rank 0) are indexed together with POIs.
Traffic signs rarely have name and are therefore mostly not
searchable. Remove them completely. Allow street lamps only when
they have a name. Removes about 2M object from a planet instance.
Before updating an admin boundary we need to make sure that any
artificially generated 'linked_place' entry is removed from the
extratags column. This ensures that the place designation does
not linger when a linked place disappears and that it is updated
when the linking changes.
An admin boundary might have a place tag but no matching place node.
We still should use the place value as indicator for the address
rank in this case.
Removes the defunct --osmosis-init and --no-api switches and the
unsupported (and unnecessary) deduplicate. Also removes
'experimental' from --setup-website as this is a required
function now.
After 6cc6cf950c names and house numbers
of POIS got mingled into a single item when creating the display name.
Add the house number as extra information without place_id to avoid
later mangling.
When the address rank of an admin boundary is changed because
of an attached place type, it may happen that the admin_level
hierarchy gets inversed. Avoid that by adjusting the address
rank if an inversion is detected.
Make access to the DB object a function, so that the connection
can be opened implicitly when the object is accessed for the first
time. This way we no longer need to check beforehand if a specific
function of the setup needs DB access or not.
Also move the check for the module to the relevant sub step.
The parameter got lost when switching to website settings.
Given that the use of a fixed parameter is limited,
debugging output can now only be set via the URL parameter.
We already exclude all polygon places without an address
rank. place nodes should also be ignored. This removes
places like locality from the reverse results.
Fixes#1839.
So far we've used a buffer around a place node to define its
potential address reach. This had two problems: the buffer was
so large that addresses often contain false positives and the
buffer is really distorted when getting closer to the poles.
Change the buffer here to draw a bounndig box at a certain
distance in meter. This means that we always use the same
box everywhere on the planet and can make the extent much
smaller. Using a box has the advantage that it is much faster
to figure out if a point is within the box.
There are a couple of places with a search rank < 25 which are
not addressable like waterways and islands. We don't want them
to function as parents for POI-level objects. So use the
address rank for finding parents, not the search rank.
See #1815.
Locally disable jit and parallel workers in the connection that
do indexing. The query planner tends to be overenthusiatic about
using JIT. But with the rather less complex queries we have, the
overhead tends to be larger than the performance gain.
Fixes#1677.
Add a separate function for each property which saves necessary
information independently. Simplify computation of labels and
simple labels to not explicitly save the labels.
Only send query parameters relevant for the current query
type (simple/structured), hide the other input fields.
This is quite a bit of CSS state changing, so move the intial
setup of the input field states into Javascript.
PHPUnit 7.3 introduced the functions assertEqualsWithDelta for comparing
floats with a delta. The old four-argument version of assertEquals is
deprecated in PHPUnit 8 and removed in PHPUnit 9.
This commit means that the tests will fail with PHPUnit < 7.3 because
assertEqualsWithDelta is not defined there.
The instructions in
[`VAGRANT.md`](42c80893cb/VAGRANT.md)
still work as before. The names of the Vagrant machines are updated so
that Ubuntu 18.04 (previously called `ubuntu`) is now called `ubuntu18`
and Ubuntu 20.04 is now called `ubuntu20`.
The version changes from Ubuntu 18.04 to Ubuntu 20.04 are:
- Python: 3.6 to 3.8
- Postgres: 10 to 12
- PHP: 7.2 to 7.4
In the `apt-get`, I changed `--force` to `--allow-downgrades --allow-remove-essential --allow-change-held-packages`, because the former is deprecated. Cf. the [manpage of apt-get](http://manpages.ubuntu.com/manpages/focal/man8/apt-get.8.html)
The php module `codesniffer` was previously installed via Composer, but it is available via the Ubuntu repository, so I installed it via `apt-get` now.
A lot of streams in OSM are of minor importance, they certainly
should show up lower in the list of results than villages. Those
rivers/streams that are well known have a wikipedia page and get
a higher importance from that.
The disadvantage with downgrading is that the address gets even
more useless but that's something that needs to be solved outside
the rank search.
When inheriting an address rank from a linked place we
must be careful not to destroy the hierarchy established
through boundary admin_level. Therefore, before assigning
an address rank from a linked place, find the next higher
boundary in the admin_level hierarchy, look up its address
rank and then only use the address rank from the linked
place if it is higher.
With ranks being dynamically changed through linking of places,
it is important to reset the ranks on update, so that changes
of the rank due to changes in linking are correctly taken into
account.
Using the address rank to set the address parts catches
a much wider variety of types like 'town' and 'suburb'.
With recent address ranking changes the rank ranges
are relatively reliable.
The order of preference is now:
1. a post code on the place itself
2. a post code area in the address
3. the computed postcode from the place
Fixes#1723.
If for a boundary the place_type is defined, handle the address
part like a place node. This is the same behaviour as before
when class/type where patched earlier.
Removes the special casing for boundaries with a place
type in get_addressdata(). Instead the place type is now
returned as an extra field, so that the caller has to
handle the situation.
This fixes the details link next to the address in the details
view, which previously would go to a place class instead of the
original boundary class.
Having the same wikidata is a strong indicator that the same place
is meant. There are some assignment errors where the wikidata does
not link to the object itself but to something that mentions the
place. To reduce errors there, prefer same name.
The admin_centre role is for the seat of government which is not
the same as the administrative entity. This is mostly used
correctly these days, so avoid matching by that role.
Administrative boundaries that do not figure in the address
should still be able to take part in the name matching.
Use the rank_search for comparison in this case.
This fixes a regression where the area of the lower ranking
area was not computed correctly.
Also excludes postcodes areas now as they have their own
hierarchy.
When using a linked place as centroid for a boundary, check
first that it is really within the area. If it is outside,
just keep the computed centroid because a centroid outside the
area just causes havok.
Fixes#1352.
If a boundary relation has no label member preferably
link against a place node with the same place type.
Also inherit the rank_address from the place node (only
has an effect when linking via lable member or place type).
The function only ever returns one result of which only the
place_id is used. So simplify it to return a single place_id
only (or NULL if none is found).
Also fix typo in function name.
The function only ever returns one result of which only the
place_id is used. So simplify it to return a single place_id
only (or NULL if none is found). Rename funciton to avoid
conflicts when updating an existing database.
Instead of unconditionally parenting them to a street, the
larger areas get a parent area that contains them. To keep
things computationally light-weight, only use the centroid and
bbox to determine if an area is contained.
Requires renaming of parenting functions because renaming
a parameter of the function causes issues when updating the
function (it requires a manual delete, which I'd like to
avoid).
Normally ST_Covers() should include a bbox index use,
so adding a bbox where clause is not really necessary.
However, the query planner messes up and uses a parallel
index search with a second index instead of exclusively
running on the geometry index, when the bbox part is
missing.
When --drop is given, we can remove all node geometry information
already after the import with osm2pgsql. Also drop all unnecessary
tables before creating the final indices.
The _async parameter name is only supported since psycopg 2.7.
However, async is a keyword in Python >= 3.7, so using this
gives us a syntax error. Working around this by defining the
parameters in a dict and handing that into the connect function.
The apache config allows api calls without extension for eg /search?q=query string.
This does not work on nginx and we need to enable this via this patch
2018-08-29 12:59:29 -04:00
4135 changed files with 113701 additions and 35384 deletions
about: You have searched something with Nominatim and did not get the expected result.
title: ''
labels: ''
assignees: ''
---
<!-- Note: this template is for reporting problems with searching. If you have found an issue with the data, you need to report/fix the issue directly in OpenStreetMap. See https://www.openstreetmap.org/fixthemap for details. -->
## What did you search for?
<!-- Please try to provide a link to your search. You can go to https://nominatim.openstreetmap.org and repeat your search there. If you originally found the issue somewhere else, please tell us what software/website you were using. -->
## What result did you get?
## What result did you expect?
**When the result in the right place and just named wrongly:**
<!-- Please tell us the display name you expected. -->
**When the result missing completely:**
<!-- Make sure that the data you are looking for is in OpenStreetMap. Provide a link to the OpenStreetMap object or if you cannot get it, a link to the map on https://openstreetmap.org where you expect the result to be.
To get the link to the OSM object, you can try the following:
*Go to [https://openstreetmap.org](https://openstreetmap.org).
*Move to the area of the map where you expect the result and then zoom in as much as possible.
*Click on the question mark on the right side of the map. You get a question cursor. Use it to click on the map where your object is located.
*Find the object of interest in the list that appears on the left side.
*Click on the object and report back the URL that the browser shows.
-->
## Further details
<!-- Anything else we should know about the search. Particularities with addresses in the area etc. -->
about: You have your own installation of Nominatim and found a bug.
title: ''
labels: ''
assignees: ''
---
<!-- Note: if you are installing Nominatim through a docker image, you should report issues with the installation process with the docker repository first.
Do not send screen shots! Copy any console output directly into the issue.
-->
**Describe the bug**
<!-- A clear and concise description of what the bug is.-->
**To Reproduce**
<!-- Please describe what you did to get to the issue. -->
**Software Environment (please complete the following information):**
- Nominatim version:
- Postgresql version:
- Postgis version:
- OS:
**Hardware Configuration (please complete the following information):**
- RAM:
- number of CPUs:
- type and size of disks:
**Postgresql Configuration:**
<!-- List any configuration items you changed in your postgresql configuration. -->
**Nominatim Configuration:**
<!-- List the contents of your customized `.env` file. -->
**Additional context**
<!-- Add any other context about the problem here. -->
- if [[ $TEST_SUITE == "tests" ]]; then phpcs --report-width=120 . ; fi
- cd $TRAVIS_BUILD_DIR/test/php
- if [[ $TEST_SUITE == "tests" ]]; then /usr/bin/phpunit ./ ; fi
- cd $TRAVIS_BUILD_DIR/test/bdd
- # behave --format=progress3 api
- if [[ $TEST_SUITE == "tests" ]]; then behave -DREMOVE_TEMPLATE=1 --format=progress3 db ; fi
- if [[ $TEST_SUITE == "tests" ]]; then behave --format=progress3 osm2pgsql ; fi
- cd $TRAVIS_BUILD_DIR/build
- if [[ $TEST_SUITE == "monaco" ]]; then wget --no-verbose --output-document=../data/monaco.osm.pbf http://download.geofabrik.de/europe/monaco-latest.osm.pbf; fi
- if [[ $TEST_SUITE == "monaco" ]]; then /usr/bin/env php ./utils/setup.php --osm-file ../data/monaco.osm.pbf --osm2pgsql-cache 1000 --all 2>&1 | grep -v 'ETA (seconds)'; fi
- if [[ $TEST_SUITE == "monaco" ]]; then /usr/bin/env php ./utils/specialphrases.php --wiki-import | psql -d test_api_nominatim >/dev/null; fi
The latest stable release can be downloaded from https://nominatim.org.
There you can also find [installation instructions for the release](https://nominatim.org/release-docs/latest/admin/Installation).
There you can also find [installation instructions for the release](https://nominatim.org/release-docs/latest/admin/Installation), as well as an extensive [Troubleshooting/FAQ section](https://nominatim.org/release-docs/latest/admin/Faq/).
Detailed installation instructions for the development version can be
found at [nominatim.org](https://nominatim.org/release-docs/develop/admin/Installation)
as well.
[Detailed installation instructions for current master](https://nominatim.org/release-docs/develop/admin/Installation)
can be found at nominatim.org as well.
A quick summary of the necessary steps:
@@ -34,12 +34,15 @@ A quick summary of the necessary steps:
cd build
cmake ..
make
sudo make install
2.Get OSM data and import:
2.Create a project directory, get OSM data and import:
Let's say you have a Postgres database named `nominatim_it` on server `your-server.com` and port `5432`. The Postgres username is `postgres`. You can edit `settings/local.php` and point Nominatim to it.
Let's say you have a Postgres database named `nominatim_it` on server `your-server.com`
and port `5432`. The Postgres username is `postgres`. You can edit the `.env` in your
before creating places in the database. This helps with fast lookups and missing data (e.g. if the data the user wants to import doesn't contain any country places).
The number of countries in the world can change (South Sudan created 2011, Germany reunification), so can their boundaries. This document explain how the pre-generated files can be updated.
## Country code
Each place is assigned a two letter country_code based on its location, e.g. `gb` for Great Britain. Or `NULL` if no suitable country is found (usually it's in open water then).
In `sql/functions.sql: get_country_code(geometry)` the place's center is checked against
1. country places already imported from the user's data file. Places are imported by rank low-to-high. Lowest rank 2 is countries so most places should be matched. Still the data file might be incomplete.
2. if unmatched: OSM grid boundaries
3. if still unmatched: OSM grid boundaries, but allow a small distance
## Partitions
Each place is assigned partition, which is a number 0..250. 0 is fallback/other.
During place indexing (`sql/functions.sql: placex_insert()`) a place is assigned the partition based on its country code (`sql/functions.sql: get_partition(country_code)`). It checks in the `country_name` table.
Most countries have their own partition, some share a partition. Thus partition counts vary greatly.
Several database tables are split by partition to allow queries to run against less indices and improve caching.
*`location_area_large_<partition>`
*`search_name_<partition>`
*`location_road_<partition>`
## Data files
### data/country_name.sql
Export from existing database table plus manual changes. `country_default_language_code` most taken from [https://wiki.openstreetmap.org/wiki/Nominatim/Country_Codes](), see `utils/country_languages.php`.
### data/country_osm_grid.sql
`country_grid.sql` merges territories by country. Then uses `function.sql: quad_split_geometry` to split each country into multiple [Quadtree](https://en.wikipedia.org/wiki/Quadtree) polygons for faster point-in-polygon lookups.
To visualize one country as geojson feature collection, e.g. for loading into [geojson.io](http://geojson.io/):
The server [importing instructions](https://www.nominatim.org/release-docs/latest/admin/Import-and-Update/) allow optionally download [`gb_postcode_data.sql.gz`](https://www.nominatim.org/data/gb_postcode_data.sql.gz). This document explains how the file got created.
## GB vs UK
GB (Great Britain) is more correct as the Ordnance Survey dataset doesn't contain postcodes from Northern Ireland.
## Importing separately after the initial import
If you forgot to download the file, or have a new version, you can import it separately:
1. Import the downloaded `gb_postcode_data.sql.gz` file.
2. Run the SQL query `SELECT count(getorcreate_postcode_id(postcode)) FROM gb_postcode;`. This will update the search index.
3. Run `utils/setup.php --calculate-postcodes` from the build directory. This will copy data form the `gb_postcode` table to the `location_postcodes` table.
## Converting Code-Point Open data
1. Download from [Code-Point® Open](https://www.ordnancesurvey.co.uk/business-and-government/products/code-point-open.html). It requires an email address where a download link will be send to.
Convert [TIGER](https://www.census.gov/geo/maps-data/data/tiger.html)/Line dataset of the US Census Bureau to SQL files which can be imported by Nominatim. The created tables in the Nominatim database are separate from OpenStreetMap tables and get queried at search time separately.
The dataset gets updated once per year. Downloading is prone to be slow (can take a full day) and converting them can take hours as well.
Replace '2019' with the current year throughout.
1. Install the GDAL library and python bindings and the unzip tool
# Ubuntu:
sudo apt-get install python3-gdal unzip
2. Get the TIGER 2019 data. You will need the EDGES files
OSM contributors frequently tag items with links to Wikipedia and Wikidata. Nominatim can use the page ranking of Wikipedia pages to help indicate the relative importance of osm features. This is done by calculating an importance score between 0 and 1 based on the number of inlinks to an article for a location. If two places have the same name and one is more important than the other, the wikipedia score often points to the correct place.
These scripts extract and prepare both Wikipedia page rank and Wikidata links for use in Nominatim.
#### Create a new postgres DB for Processing
Due to the size of initial and intermediate tables, processing can be done in an external database:
```
CREATE DATABASE wikiprocessingdb;
```
---
Wikipedia
---
Processing these data requires a large amount of disk space (~1TB) and considerable time (>24 hours).
#### Import & Process Wikipedia tables
This step downloads and converts [Wikipedia](https://dumps.wikimedia.org/) page data SQL dumps to postgreSQL files which can be imported and processed with pagelink information from Wikipedia language sites to calculate importance scores.
- The script will processes data from whatever set of Wikipedia languages are specified in the initial languages array
- Note that processing the top 40 Wikipedia languages can take over a day, and will add nearly 1TB to the processing database. The final output tables will be approximately 11GB and 2GB in size
To download, convert, and import the data, then process summary statistics and compute importance scores, run:
```
./wikipedia_import.sh
```
---
Wikidata
---
This script downloads and processes Wikidata to enrich the previously created Wikipedia tables for use in Nominatim.
#### Import & Process Wikidata
This step downloads and converts [Wikidata](https://dumps.wikimedia.org/wikidatawiki/) page data SQL dumps to postgreSQL files which can be processed and imported into Nominatim database. Also utilizes Wikidata Query Service API to discover and include place types.
- Script presumes that the user has already processed Wikipedia tables as specified above
- Script requires wikidata_place_types.txt and wikidata_place_type_levles.csv
- script requires the [jq json parser](https://stedolan.github.io/jq/)
- Script processes data from whatever set of Wikipedia languages are specified in the initial languages array
- Script queries Wikidata Query Service API and imports all instances of place types listed in wikidata_place_types.txt
- Script updates wikipedia_articles table with extracted wikidata
By including Wikidata in the wikipedia_articles table, new connections can be made on the fly from the Nominatim placex table to wikipedia_article importance scores.
To download, convert, and import the data, then process required items, run:
echo"CREATE TABLE geo_earth_primary AS SELECT gt_page_id, gt_lat, gt_lon FROM geo_tags WHERE gt_globe = 'earth' AND gt_primary = 1 AND NOT( gt_lat < -90 OR gt_lat > 90 OR gt_lon < -180 OR gt_lon > 180 OR gt_lat=0 OR gt_lon=0) ;"| psqlcmd
echo"CREATE TABLE geo_earth_wikidata AS SELECT DISTINCT geo_earth_primary.gt_page_id, geo_earth_primary.gt_lat, geo_earth_primary.gt_lon, page.page_title, page.page_namespace FROM geo_earth_primary LEFT OUTER JOIN page ON (geo_earth_primary.gt_page_id = page.page_id) ORDER BY geo_earth_primary.gt_page_id;"| psqlcmd
echo"UPDATE wikidata_place_dump SET ont_level = wikidata_place_type_levels.level FROM wikidata_place_type_levels WHERE wikidata_place_dump.instance_of = wikidata_place_type_levels.place_type;"| psqlcmd
echo"CREATE TABLE wikidata_places AS SELECT DISTINCT ON (item) item, instance_of, MAX(ont_level) AS ont_level, lat, lon FROM wikidata_place_dump GROUP BY item, instance_of, ont_level, lat, lon ORDER BY item;"| psqlcmd
echo"UPDATE wikidata_places SET lat = geo_earth_wikidata.gt_lat, lon = geo_earth_wikidata.gt_lon FROM geo_earth_wikidata WHERE wikidata_places.item = geo_earth_wikidata.page_title"| psqlcmd
# process language pages
echo"CREATE TABLE wikidata_pages (item text, instance_of text, lat numeric(11,8), lon numeric(11,8), ips_site_page text, language text );"| psqlcmd
for i in "${language[@]}"
do
echo"CREATE TABLE wikidata_${i}_pages as select wikidata_places.item, wikidata_places.instance_of, wikidata_places.lat, wikidata_places.lon, wb_items_per_site.ips_site_page FROM wikidata_places LEFT JOIN wb_items_per_site ON (CAST (( LTRIM(wikidata_places.item, 'Q')) AS INTEGER) = wb_items_per_site.ips_item_id) WHERE ips_site_id = '${i}wiki' AND LEFT(wikidata_places.item,1) = 'Q' order by wikidata_places.item;"| psqlcmd
echo"ALTER TABLE wikidata_${i}_pages ADD COLUMN language text;"| psqlcmd
echo"UPDATE wikidata_${i}_pages SET language = '${i}';"| psqlcmd
echo"INSERT INTO wikidata_pages SELECT item, instance_of, lat, lon, ips_site_page, language FROM wikidata_${i}_pages;"| psqlcmd
gzip -dc ${i}wiki-latest-pagelinks.sql.gz | sed "s/\`pagelinks\`/\`${i}pagelinks\`/g"| mysql2pgsqlcmd | psqlcmd
gzip -dc ${i}wiki-latest-page.sql.gz | sed "s/\`page\`/\`${i}page\`/g"| mysql2pgsqlcmd | psqlcmd
gzip -dc ${i}wiki-latest-langlinks.sql.gz | sed "s/\`langlinks\`/\`${i}langlinks\`/g"| mysql2pgsqlcmd | psqlcmd
gzip -dc ${i}wiki-latest-redirect.sql.gz | sed "s/\`redirect\`/\`${i}redirect\`/g"| mysql2pgsqlcmd | psqlcmd
done
# process language tables and associated pagelink counts
for i in "${language[@]}"
do
echo"create table ${i}pagelinkcount as select pl_title as title,count(*) as count from ${i}pagelinks where pl_namespace = 0 group by pl_title;"| psqlcmd
echo"insert into linkcounts select '${i}',pl_title,count(*) from ${i}pagelinks where pl_namespace = 0 group by pl_title;"| psqlcmd
echo"insert into wikipedia_redirect select '${i}',page_title,rd_title from ${i}redirect join ${i}page on (rd_from = page_id) where page_namespace = 0 and rd_namespace = 0;"| psqlcmd
echo"update ${i}pagelinkcount set othercount = 0;"| psqlcmd
for j in "${language[@]}"
do
echo"update ${i}pagelinkcount set othercount = ${i}pagelinkcount.othercount + x.count from (select page_title as title,count from ${i}langlinks join ${i}page on (ll_from = page_id) join ${j}pagelinkcount on (ll_lang = '${j}' and ll_title = title)) as x where x.title = ${i}pagelinkcount.title;"| psqlcmd
done
echo"insert into wikipedia_article select '${i}', title, count, othercount, count+othercount from ${i}pagelinkcount;"| psqlcmd
done
# calculate importance score for each wikipedia page
echo"update wikipedia_article set importance = log(totalcount)/log((select max(totalcount) from wikipedia_article))"| psqlcmd
Wikidata does not have any official ontologies, however the [DBpedia project](https://wiki.dbpedia.org/) has created an [ontology](https://wiki.dbpedia.org/services-resources/ontology) that covered [place types](http://mappings.dbpedia.org/server/ontology/classes/#Place). The table below used the DBpedia place ontology as a starting point, and is provided as a cross-reference to the relevant OSM tags.
The Wikidata place types listed in the table below can be used in conjunction with the [Wikidata Query Service](https://query.wikidata.org/) to retrieve instances of those place types from the Wikidata knowledgebase.
```
SELECT ?item ?lat ?lon
WHERE {
?item wdt:P31*/wdt:P279*wd:Q9430; wdt:P625 ?pt.
?item p:P625?loc.
?loc psv:P625?cnode.
?cnode wikibase:geoLatitude?lat.
?cnode wikibase:geoLongitude?lon.
}
```
An example json return for all instances of the Wikidata item "Q9430" (Ocean) can be seen at [json](https://query.wikidata.org/bigdata/namespace/wdq/sparql?format=json&query=SELECT?item?lat?lon%20WHERE{?item%20wdt:P31*/wdt:P279*wd:Q9430;wdt:P625?pt.?item%20p:P625?loc.?loc%20psv:P625?cnode.?cnode%20wikibase:geoLatitude?lat.?cnode%20wikibase:geoLongitude?lon.})
**NOTE** the OSM tags listed are those listed in the wikidata entries, and not all the possible matches for tags within OSM.
The user the webserver, e.g. Apache, runs under needs to have access to the Nominatim database. You can find the user like [this](https://serverfault.com/questions/125865/finding-out-what-user-apache-is-running-as), for default Ubuntu operating system for example it's `www-data`.
The user the webserver, e.g. Apache, runs under needs to have access to the
[context is set up correctly](../appendix/Install-on-Centos-7/#adding-selinux-security-settings).
When you recently updated your operating system, updated PostgreSQL to
a new version or moved files (e.g. the build directory) you should
recreate `nominatim.so`. Try
```
cd build
rm -r module/
cmake $main_Nominatim_path && make
```
### Setup.php fails with "DB Error: extension not found"
Make sure you have the PostgreSQL extensions "hstore" and "postgis" installed.
See the installation instruction for a full list of required packages.
See the installation instructions for a full list of required packages.
### Setup.php reports "Cannot redeclare getDB()"
`Cannot redeclare getDB() (previously declared in /your/path/Nominatim/lib/db.php:4)`
The message is a bit misleading as PHP needs to load the file `DB.php` and
instead re-loads Nominatim's `db.php`. To solve this make sure you
have the [Pear module 'DB'](https://pear.php.net/package/DB/) installed.
sudo pear install DB
### I forgot to delete the flatnodes file before starting an import.
That's fine. For each import the flatnodes file get overwritten.
See [https://help.openstreetmap.org/questions/52419/nominatim-flatnode-storage]()
See [https://help.openstreetmap.org/questions/52419/nominatim-flatnode-storage](https://help.openstreetmap.org/questions/52419/nominatim-flatnode-storage)
for more information.
## Running your own instance
### Can I import multiple countries and keep them up to date?
You should use the extracts and updates from https://download.geofabrik.de.
For the initial import, download the countries you need and merge them.
See [OSM Help](https://help.openstreetmap.org/questions/48843/merging-two-or-more-geographical-areas-to-import-two-or-more-osm-files-in-nominatim)
for examples how to do that. Use the resulting single osm file when
running `setup.php`.
For updates you need to download the change files for each country
To set up the update process now run the following command:
./utils/update.php --init-updates
It outputs the date where updates will start. Recheck that this date is
what you expect.
The `--init-updates` command needs to be rerun whenever the replication service
is changed.
#### Updating Nominatim
The following command will keep your database constantly up to date:
./utils/update.php --import-osmosis-all
(Note that even though the old name "import-osmosis-all" has been kept for compatibility reasons, Osmosis is not required to run this - it uses pyosmium behind the scenes.)
If you have imported multiple country extracts and want to keep them
for table in `psql -d nominatim -c "SELECT tablename FROM pg_tables WHERE tablename LIKE 'search_name_%'" -tA | grep -v search_name_blank`;
do
psql -d nominatim -c "DROP INDEX idx_${table}_centroid_place; CREATE INDEX idx_${table}_centroid_place ON ${table} USING gist (centroid) WHERE ((address_rank >= 2) AND (address_rank <= 25)); DROP INDEX idx_${table}_centroid_street; CREATE INDEX idx_${table}_centroid_street ON ${table} USING gist (centroid) WHERE ((address_rank >= 26) AND (address_rank <= 27))";
Lookup details about a single place by id. The default output is HTML for debugging search logic and results.
Show all details about a single place saved in the database.
**The details page (including JSON output) exists for debugging only and must not be downloaded automatically**, see [Nominatim Usage Policy](https://operations.osmfoundation.org/policies/nominatim/).
This API endpoint is meant for visual inspection of the data in the database,
mainly together with [Nominatim-UI](https://github.com/osm-search/nominatim-ui/).
The parameters of the endpoint and the output may change occasionally between
versions of Nominatim. Do not rely on the output in scripts or applications.
!!! warning
The details endpoint at https://nominatim.openstreetmap.org
may not used in scripts or bots at all.
See [Nominatim Usage Policy](https://operations.osmfoundation.org/policies/nominatim/).
## Parameters
The details API supports the following two request formats:
Placeids are assigned sequentially during Nominatim data import. The id for a place is different between Nominatim installation (servers) and changes when data gets reimported. Therefore it can't be used as permanent id and shouldn't be used in bug reports.
Place IDs are assigned sequentially during Nominatim data import. The ID
for a place is different between Nominatim installation (servers) and
changes when data gets reimported. Therefore it cannot be used as
a permanent id and shouldn't be used in bug reports.
!!! danger "Deprecation warning"
The API can also be used with the URL
`https://nominatim.openstreetmap.org/details.php`. This is now deprecated
and will be removed in future versions.
Additional optional parameters are explained below.
## Parameters
This section lists additional optional parameters.
### Output format
*`format=[html|json]`
| Parameter | Value | Default |
|-----------| ----- | ------- |
| json_callback | function name | _unset_ |
See [Place Output Formats](Output.md) for details on each format. (Default: html)
When set, then JSON output will be wrapped in a callback function with
the given name. See [JSONP](https://en.wikipedia.org/wiki/JSONP) for more
information.
*`json_callback=<string>`
| Parameter | Value | Default |
|-----------| ----- | ------- |
| pretty | 0 or 1 | 0 |
Wrap JSON output in a callback function (JSONP) i.e. `<string>(<json>)`.
Only has an effect for JSON output formats.
*`pretty=[0|1]`
For JSON output will add indentation to make it more human-readable. (Default: 0)
`[PHP-only]` Add indentation to the output to make it more human-readable.
### Output details
*`addressdetails=[0|1]`
| Parameter | Value | Default |
|-----------| ----- | ------- |
| addressdetails | 0 or 1 | 0 |
Include a breakdown of the address into elements. (Default for JSON: 0, for HTML: 1)
When set to 1, include a breakdown of the address into elements.
*`keywords=[0|1]`
| Parameter | Value | Default |
|-----------| ----- | ------- |
| keywords | 0 or 1 | 0 |
Include a list of name keywords and address keywords (word ids). (Default: 0)
When set to 1, include a list of name keywords and address keywords
in the result.
*`linkedplaces=[0|1]`
| Parameter | Value | Default |
|-----------| ----- | ------- |
| linkedplaces | 0 or 1 | 1 |
Include details of places higher in the address hierarchy. E.g. for a street this is usually the city, state, postal code, country. (Default: 1)
Include details of places that are linked with this one. Places get linked
together when they are different forms of the same physical object. Nominatim
links two kinds of objects together: place nodes get linked with the
corresponding administrative boundaries. Waterway relations get linked together with their
members.
*`hierarchy=[0|1]`
| Parameter | Value | Default |
|-----------| ----- | ------- |
| hierarchy | 0 or 1 | 0 |
Include details of places lower in the address hierarchy. E.g. for a city this usually a list of streets, suburbs, rivers. (Default for JSON: 0, for HTML: 1)
Include details of places lower in the address hierarchy.
*`group_hierarchy=[0|1]`
`[Python-only]` will only return properly parented places. These are address
or POI-like places that reuse the address of their parent street or place.
For JSON output will group the places by type. (Default: 0)
| Parameter | Value | Default |
|-----------| ----- | ------- |
| group_hierarchy | 0 or 1 | 0 |
*`polygon_geojson=[0|1]`
When set to 1, the output of the address hierarchy will be
grouped by type.
Include geometry of result. (Default for JSON: 0, for HTML: 1)
| Parameter | Value | Default |
|-----------| ----- | ------- |
| polygon_geojson | 0 or 1 | 0 |
Include geometry of result.
### Language of results
*`accept-language=<browser language string>`
| Parameter | Value | Default |
|-----------| ----- | ------- |
| accept-language | browser language string | content of "Accept-Language" HTTP header |
Preferred language order for showing result, overrides the value
specified in the "Accept-Language" HTTP header.
Either use a standard RFC2616 accept-language string or a simple
comma-separated list of language codes.
Preferred language order for showing search results. This may either be
a simple comma-separated list of language codes or have the same format
as the ["Accept-Language" HTTP header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language).
@@ -35,7 +35,7 @@ it contains the county/state/country across the border.
#### 3. I get different counties/states/countries when I change the zoom parameter in the reverse query. How is that possible?
This is basically the same problem as in the previous answer.
The zoom level influences at which [search rank](https://wiki.openstreetmap.org/wiki/Nominatim/Development_overview#Country_to_street_level) Nominatim starts looking
The zoom level influences at which [search rank](../customize/Ranking.md#search-rank) Nominatim starts looking
for the closest object. So the closest house number maybe on one side of the
border while the closest street is on the other. As the address details contain
the address of the closest object found, you might sometimes get one result,
@@ -58,4 +58,28 @@ The [Overpass API](https://wiki.openstreetmap.org/wiki/Overpass_API) is more
suited for these kinds of queries.
That said if you installed your own Nominatim instance you can use the
`/utils/export.php` PHP script as basis to return such lists.
`nominatim export` PHP script as basis to return such lists.
#### 7. My result has a wrong postcode. Where does it come from?
Most places in OSM don't have a postcode, so Nominatim tries to interpolate
one. It first look at all the places that make up the address of the place.
If one of them has a postcode defined, this is the one to be used. When
none of the address parts has a postcode either, Nominatim interpolates one
from the surrounding objects. If the postcode is for your result is one, then
most of the time there is an OSM object with the wrong postcode nearby.
<placeplace_id="127761056"osm_type="relation"osm_id="146656"place_rank="16"lat="53.4791466"lon="-2.2447445"display_name="Manchester, Greater Manchester, North West England, England, United Kingdom"class="boundary"type="administrative"importance="0.704893333438333">
<placeplace_id="282236157"osm_type="relation"osm_id="146656"place_rank="16"address_rank="16"boundingbox="53.3401044,53.5445923,-2.3199185,-2.1468288"lat="53.44246175"lon="-2.2324547359718547"display_name="Manchester, Greater Manchester, North West England, England, United Kingdom"class="boundary"type="administrative"importance="0.35">
<city>Manchester</city>
<county>Greater Manchester</county>
<state_district>North West England</state_district>
@@ -87,21 +156,20 @@ This overrides the specified machine readable format. (Default: 0)
Nominatim indexes named (or numbered) features within the OpenStreetMap (OSM) dataset and a subset of other unnamed features (pubs, hotels, churches, etc).
!!! Attention
The current version of Nominatim implements two different search frontends:
the old PHP frontend and the new Python frontend. They have a very similar
API but differ in some implementation details. These are marked in the
documentation as `[Python-only]` or `[PHP-only]`.
Its API has the following endpoints for querying the data:
`https://nominatim.openstreetmap.org` implements the **Python frontend**.
So users should refer to the **`[Python-only]`** comments.
This section describes the API V1 of the Nominatim web service. The
service offers the following endpoints:
* __[/search](Search.md)__ - search OSM objects by name or type
* __[/reverse](Reverse.md)__ - search OSM object by their location
* __[/lookup](Lookup.md)__ - look up address details for OSM objects by their ID
* __/status__ - query the status of the server
* __[/status](Status.md)__ - query the status of the server
* __/deletable__ - list objects that have been deleted in OSM but are held
back in Nominatim in case the deletion was accidental
* __/polygons__ - list of broken polygons detected by Nominatim
* __[/details](Details.md)__ - show internal details for an object (for debugging only)
This example creates a search index over traffic lights but will
only include those that have a common name and not those which just
have some reference ID from the city.
#### `set_address_tags()` - defining address parts
Address tags will be used to build up the address of an object.
`set_address_tags()` takes a table with arbitrary fields pointing to
_key match lists_. To fields have a special meaning:
* __main__: defines
the tags that make a full address object out of the OSM object. This
is usually the housenumber or variants thereof. If a main address tag
appears, then the object will always be included, if necessary with a
fallback of `place=house`. If the key has a prefix of `addr:` or `is_in:`
this will be stripped.
* __extra__: defines all supplementary tags for addresses, tags like `addr:street`, `addr:city` etc. If the key has a prefix of `addr:` or `is_in:` this will be stripped.
All other fields will be handled as summary fields. If a key matches the
key match list, then its value will be added to the address tags with the
name of the field as key. If multiple tags match, then an arbitrary one
wins.
Country tags are handled slightly special. Only tags with a two-letter code
| **After Changes:** | run `nominatim refresh --website` |
Sets the connection parameters for the Nominatim database. At a minimum
the name of the database (`dbname`) is required. You can set any additional
parameter that is understood by libpq. See the [Postgres documentation](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS) for a full list.
!!! note
It is usually recommended not to set the password directly in this
The [Nominatim documentation](https://nominatim.org/release-docs/develop/) is built using the [MkDocs](https://www.mkdocs.org/) static site generation framework. The master branch is automatically deployed every night on under [https://nominatim.org/release-docs/develop/]()
To preview local changes:
1. Install MkDocs
```
pip3 install --user mkdocs
```
2. In build directory run
```
make doc
INFO - Cleaning site directory
INFO - Building documentation to directory: /home/vagrant/build/site-html
```
This runs `mkdocs build` plus extra transformion of some files and adds symlinks (see `CMakeLists.txt` for the exact steps).
3. Start webserver for local testing
```
mkdocs serve
[server:296] Serving on http://127.0.0.1:8000
[handlers:62] Start watching changes
```
If you develop inside a Vagrant virtual machine:
* add port forwarding to your Vagrantfile, e.g. `config.vm.network "forwarded_port", guest: 8000, host: 8000`
* use `mkdocs serve --dev-addr 0.0.0.0:8000` because the default localhost
Nominatim (from the Latin, 'by name') is a tool to search OSM data by name and address and to generate synthetic addresses of OSM points (reverse geocoding).
Nominatim (from the Latin, 'by name') is a tool to search OSM data by name and
address and to generate synthetic addresses of OSM points (reverse geocoding).
It has also limited capability to search features by their type
(pubs, hotels, churches, etc).
This guide comes in three parts:
This guide comes in five parts:
* __[API reference](api/Overview.md)__ for users of Nominatim
* __[Administration Guide](admin/Installation.md)__ for those who want
to install their own Nominatim server
* __[Customization Guide](customize/Overview.md)__ for those who want to
adapt their own installation to their special requirements
* __[Library Guide](library/Getting-Started.md)__ for Python developers who
want to use Nominatim as a library in their project
* __[Developer's Guide](develop/overview.md)__ for developers of the software
require references to places in the database. Below the possible
types for place identification are listed. All types are dataclasses.
### PlaceID
::: nominatim.api.PlaceID
options:
heading_level: 6
### OsmID
::: nominatim.api.OsmID
options:
heading_level: 6
## Geometry types
::: nominatim.api.GeometryFormat
options:
heading_level: 6
members_order: source
## Geometry input
### Point
::: nominatim.api.Point
options:
heading_level: 6
show_signature_annotations: True
### Bbox
::: nominatim.api.Bbox
options:
heading_level: 6
show_signature_annotations: True
members_order: source
group_by_category: False
## Layers
Layers allow to restrict the search result to thematic groups. This is
orthogonal to restriction by address ranks, which groups places by their
geographic extent.
::: nominatim.api.DataLayer
options:
heading_level: 6
members_order: source
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.