Compare commits

..

6 Commits

Author SHA1 Message Date
Sarah Hoffmann
282bd4a67e prepare for 3.4.2 release 2020-05-02 22:04:32 +02:00
Sarah Hoffmann
51f6db2e9c properly escape class parameter
The class parameter was used as is, allowing for potential
SQL injection via the API.

Thanks to @bladeswords for finding this.
2020-05-02 21:58:16 +02:00
Sarah Hoffmann
e4ecbef61e prepare for 3.4.1 release 2019-12-28 22:53:38 +01:00
Sarah Hoffmann
23dd49a5a2 update osm2pgsql (exclude country and postcode from address tags) 2019-12-28 22:41:33 +01:00
Francesc Hervada-Sala
0c85f88be8 typo - fixes openstreetmap#1606 2019-12-28 22:41:19 +01:00
Sarah Hoffmann
7829a05002 update osm2pgsql (deletion and address updates) 2019-12-28 22:40:46 +01:00
4132 changed files with 39637 additions and 106801 deletions

View File

@@ -1,7 +0,0 @@
# https://github.com/codespell-project/codespell
[codespell]
skip = ./man/nominatim.1,data,./docs/styles.css,lib-php,module,munin,osm2pgsql,./test,./settings/*.lua,./settings/*.yaml,./settings/**/*.yaml,./settings/icu-rules,./nominatim/tokenizer/token_analysis/config_variants.py
# Need to be lowercase in the list
# Unter = Unter den Linden (an example address)
ignore-words-list = inout,unter

View File

@@ -1,8 +0,0 @@
[flake8]
max-line-length = 100
max-doc-length = 100
extend-ignore =
# something == None constructs are needed for SQLAlchemy
E711
per-file-ignores =
__init__.py: F401

2
.github/FUNDING.yml vendored
View File

@@ -1,2 +0,0 @@
github: lonvia
custom: "https://nominatim.org/funding/"

View File

@@ -1,7 +0,0 @@
contact_links:
- name: Nominatim Discussions
url: https://github.com/osm-search/Nominatim/discussions
about: Ask questions, get support, share ideas and discuss with community members.
- name: Discussions about OpenStreetMap data
url: https://community.openstreetmap.org/
about: Ask questions about the data used by Nominatim and discuss with the OSM community.

View File

@@ -1,22 +0,0 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: ''
assignees: ''
---
<!-- Before opening a new feature request, please search through the open issue to check that your request hasn't been reported already. -->
**Is your feature request related to a problem? Please describe.**
<!-- A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] -->
**Describe the solution you'd like**
<!-- A clear and concise description of what you want to happen. -->
**Describe alternatives you've considered**
<!-- A clear and concise description of any alternative solutions or features you've considered. -->
**Additional context**
<!-- Add any other context or screenshots about the feature request here. -->

View File

@@ -1,39 +0,0 @@
---
name: Report issues with search results
about: You have searched something with Nominatim and did not get the expected result.
title: ''
labels: ''
assignees: ''
---
<!-- Note: this template is for reporting problems with searching. If you have found an issue with the data, you need to report/fix the issue directly in OpenStreetMap. See https://www.openstreetmap.org/fixthemap for details. -->
## What did you search for?
<!-- Please try to provide a link to your search. You can go to https://nominatim.openstreetmap.org and repeat your search there. If you originally found the issue somewhere else, please tell us what software/website you were using. -->
## What result did you get?
## What result did you expect?
**When the result is in the right place and just named wrongly:**
<!-- Please tell us the display name you expected. -->
**When the result is missing completely:**
<!-- Make sure that the data you are looking for is in OpenStreetMap. Provide a link to the OpenStreetMap object or if you cannot get it, a link to the map on https://openstreetmap.org where you expect the result to be.
To get the link to the OSM object, you can try the following:
* Go to [https://openstreetmap.org](https://openstreetmap.org).
* Move to the area of the map where you expect the result and then zoom in as much as possible.
* Click on the question mark on the right side of the map. You get a question cursor. Use it to click on the map where your object is located.
* Find the object of interest in the list that appears on the left side.
* Click on the object and report back the URL that the browser shows.
-->
## Further details
<!-- Anything else we should know about the search. Particularities with addresses in the area etc. -->

View File

@@ -1,42 +0,0 @@
---
name: Report problems with the software
about: You have your own installation of Nominatim and found a bug.
title: ''
labels: ''
assignees: ''
---
<!-- Note: if you are installing Nominatim through a docker image, you should report issues with the installation process with the docker repository first.
Do not send screen shots! Copy any console output directly into the issue.
-->
**Describe the bug**
<!-- A clear and concise description of what the bug is.-->
**To Reproduce**
<!-- Please describe what you did to get to the issue. -->
**Software Environment (please complete the following information):**
- Nominatim version:
- Postgresql version:
- Postgis version:
- OS:
**Hardware Configuration (please complete the following information):**
- RAM:
- number of CPUs:
- type and size of disks:
**Postgresql Configuration:**
<!-- List any configuration items you changed in your postgresql configuration. -->
**Nominatim Configuration:**
<!-- List the contents of your customized `.env` file. -->
**Additional context**
<!-- Add any other context about the problem here. -->

View File

@@ -1,44 +0,0 @@
name: 'Build Nominatim'
inputs:
dependencies:
description: 'Where to install dependencies from (pip/apt)'
required: false
default: 'pip'
runs:
using: "composite"
steps:
- name: Clean out the disk
run: |
sudo rm -rf /opt/hostedtoolcache/go /opt/hostedtoolcache/CodeQL /usr/lib/jvm /usr/local/share/chromium /usr/local/lib/android
df -h
shell: bash
- name: Install general prerequisites
run: |
sudo apt-get install -y -qq libspatialite-dev libsqlite3-mod-spatialite libicu-dev virtualenv python3-dev osm2pgsql
shell: bash
- name: Install prerequisites from apt
run: |
sudo apt-get install -y -qq python3-icu python3-datrie python3-jinja2 python3-psutil python3-dotenv python3-yaml python3-sqlalchemy python3-psycopg python3-asyncpg
shell: bash
if: inputs.dependencies == 'apt'
- name: Setup virtual environment (for pip)
run: |
virtualenv venv
./venv/bin/pip install -U pip
shell: bash
if: inputs.dependencies == 'pip'
- name: Setup virtual environment (for apt)
run: |
virtualenv venv --system-site-packages
shell: bash
if: inputs.dependencies == 'apt'
- name: Build nominatim
run: ./venv/bin/pip install Nominatim/packaging/nominatim-{api,db}
shell: bash

View File

@@ -1,45 +0,0 @@
name: 'Setup Postgresql and Postgis'
inputs:
postgresql-version:
description: 'Version of PostgreSQL to install'
required: true
runs:
using: "composite"
steps:
- name: Remove existing PostgreSQL
run: |
sudo apt-get purge -yq postgresql*
sudo apt install curl ca-certificates gnupg
curl https://www.postgresql.org/media/keys/ACCC4CF8.asc | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/apt.postgresql.org.gpg >/dev/null
sudo sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
sudo apt-get update -qq
shell: bash
- name: Install PostgreSQL
run: |
sudo apt-get install -y -qq --no-install-suggests --no-install-recommends postgresql-client-${PGVER} postgresql-${PGVER}-postgis-3 postgresql-${PGVER}-postgis-3-scripts postgresql-contrib-${PGVER} postgresql-${PGVER}
shell: bash
env:
PGVER: ${{ inputs.postgresql-version }}
- name: Adapt postgresql configuration
run: |
echo 'fsync = off' | sudo tee /etc/postgresql/${PGVER}/main/conf.d/local.conf
echo 'synchronous_commit = off' | sudo tee -a /etc/postgresql/${PGVER}/main/conf.d/local.conf
echo 'full_page_writes = off' | sudo tee -a /etc/postgresql/${PGVER}/main/conf.d/local.conf
echo 'shared_buffers = 1GB' | sudo tee -a /etc/postgresql/${PGVER}/main/conf.d/local.conf
echo 'port = 5432' | sudo tee -a /etc/postgresql/${PGVER}/main/conf.d/local.conf
shell: bash
env:
PGVER: ${{ inputs.postgresql-version }}
- name: Setup database
run: |
sudo systemctl restart postgresql
sudo -u postgres createuser -S www-data
sudo -u postgres createuser -s runner
shell: bash

View File

@@ -1,342 +0,0 @@
name: CI Tests
on: [ push, pull_request ]
jobs:
create-archive:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: true
- uses: actions/cache@v4
with:
path: |
data/country_osm_grid.sql.gz
key: nominatim-country-data-1
- name: Package tarball
run: |
if [ ! -f data/country_osm_grid.sql.gz ]; then
wget --no-verbose -O data/country_osm_grid.sql.gz https://www.nominatim.org/data/country_grid.sql.gz
fi
cd ..
tar czf nominatim-src.tar.bz2 Nominatim
mv nominatim-src.tar.bz2 Nominatim
- name: 'Upload Artifact'
uses: actions/upload-artifact@v4
with:
name: full-source
path: nominatim-src.tar.bz2
retention-days: 1
tests:
needs: create-archive
strategy:
matrix:
flavour: ["ubuntu-20", "ubuntu-24"]
include:
- flavour: ubuntu-20
ubuntu: 20
postgresql: 12
lua: '5.1'
dependencies: pip
- flavour: ubuntu-24
ubuntu: 24
postgresql: 17
lua: '5.3'
dependencies: apt
runs-on: ubuntu-${{ matrix.ubuntu }}.04
steps:
- uses: actions/download-artifact@v4
with:
name: full-source
- name: Unpack Nominatim
run: tar xf nominatim-src.tar.bz2
- uses: ./Nominatim/.github/actions/setup-postgresql
with:
postgresql-version: ${{ matrix.postgresql }}
- uses: ./Nominatim/.github/actions/build-nominatim
with:
dependencies: ${{ matrix.dependencies }}
- name: Compile osm2pgsql
run: |
sudo apt-get install -y -qq libboost-system-dev libboost-filesystem-dev libexpat1-dev zlib1g-dev libbz2-dev libpq-dev libproj-dev libicu-dev liblua${LUA_VERSION}-dev lua-dkjson nlohmann-json3-dev
mkdir osm2pgsql-build
cd osm2pgsql-build
git clone https://github.com/osm2pgsql-dev/osm2pgsql
mkdir build
cd build
cmake ../osm2pgsql
make
sudo make install
cd ../..
rm -rf osm2pgsql-build
if: matrix.ubuntu == '20'
env:
LUA_VERSION: ${{ matrix.lua }}
- name: Install test prerequisites
run: ./venv/bin/pip install behave==1.2.6
- name: Install test prerequisites (apt)
run: sudo apt-get install -y -qq python3-pytest python3-pytest-asyncio uvicorn python3-falcon python3-aiosqlite python3-pyosmium
if: matrix.dependencies == 'apt'
- name: Install test prerequisites (pip)
run: ./venv/bin/pip install pytest-asyncio falcon starlette asgi_lifespan aiosqlite osmium uvicorn
if: matrix.dependencies == 'pip'
- name: Install latest flake8
run: ./venv/bin/pip install -U flake8
- name: Python linting
run: ../venv/bin/python -m flake8 src
working-directory: Nominatim
- name: Install mypy and typechecking info
run: ./venv/bin/pip install -U mypy types-PyYAML types-jinja2 types-psutil types-requests types-ujson types-Pygments typing-extensions
if: matrix.dependencies == 'pip'
- name: Python static typechecking
run: ../venv/bin/python -m mypy --strict --python-version 3.8 src
working-directory: Nominatim
if: matrix.dependencies == 'pip'
- name: Python unit tests
run: ../venv/bin/python -m pytest test/python
working-directory: Nominatim
- name: BDD tests
run: |
../../../venv/bin/python -m behave -DREMOVE_TEMPLATE=1 --format=progress3
working-directory: Nominatim/test/bdd
install:
runs-on: ubuntu-latest
needs: create-archive
strategy:
matrix:
name: [Ubuntu-22, Ubuntu-24]
include:
- name: Ubuntu-22
image: "ubuntu:22.04"
ubuntu: 22
install_mode: install-apache
- name: Ubuntu-24
image: "ubuntu:24.04"
ubuntu: 24
install_mode: install-apache
container:
image: ${{ matrix.image }}
env:
LANG: en_US.UTF-8
defaults:
run:
shell: sudo -Hu nominatim bash --noprofile --norc -eo pipefail {0}
steps:
- name: Prepare container (Ubuntu)
run: |
export APT_LISTCHANGES_FRONTEND=none
export DEBIAN_FRONTEND=noninteractive
apt-get update -qq
apt-get install -y git sudo wget
ln -snf /usr/share/zoneinfo/$CONTAINER_TIMEZONE /etc/localtime && echo $CONTAINER_TIMEZONE > /etc/timezone
shell: bash
- name: Setup import user
run: |
useradd -m nominatim
echo 'nominatim ALL=(ALL:ALL) NOPASSWD: ALL' > /etc/sudoers.d/nominiatim
echo "/home/nominatim/Nominatim/vagrant/Install-on-${OS}.sh no $INSTALL_MODE" > /home/nominatim/vagrant.sh
shell: bash
env:
OS: ${{ matrix.name }}
INSTALL_MODE: ${{ matrix.install_mode }}
- uses: actions/download-artifact@v4
with:
name: full-source
path: /home/nominatim
- name: Install Nominatim
run: |
export USERNAME=nominatim
export USERHOME=/home/nominatim
export NOSYSTEMD=yes
export HAVE_SELINUX=no
tar xf nominatim-src.tar.bz2
. vagrant.sh
working-directory: /home/nominatim
- name: Prepare import environment
run: |
mv Nominatim/test/testdb/apidb-test-data.pbf test.pbf
rm -rf Nominatim
mkdir data-env-reverse
working-directory: /home/nominatim
- name: Add nominatim to path
run: |
sudo ln -s /home/nominatim/nominatim-venv/bin/nominatim /usr/local/bin/nominatim
- name: Need lua binary
run: |
sudo apt-get install -y lua5.4 lua-dkjson
- name: Print version
run: nominatim --version
working-directory: /home/nominatim/nominatim-project
- name: Print taginfo
run: lua ./nominatim-venv/lib/*/site-packages/nominatim_db/resources/lib-lua/taginfo.lua
working-directory: /home/nominatim
- name: Collect host OS information
run: nominatim admin --collect-os-info
working-directory: /home/nominatim/nominatim-project
- name: Import
run: nominatim import --osm-file ../test.pbf
working-directory: /home/nominatim/nominatim-project
- name: Import special phrases
run: nominatim special-phrases --import-from-wiki
working-directory: /home/nominatim/nominatim-project
- name: Check full import
run: nominatim admin --check-database
working-directory: /home/nominatim/nominatim-project
- name: Warm up database
run: nominatim admin --warm
working-directory: /home/nominatim/nominatim-project
- name: Install osmium
run: |
/home/nominatim/nominatim-venv/bin/pip install osmium
- name: Run update
run: |
nominatim replication --init
NOMINATIM_REPLICATION_MAX_DIFF=1 nominatim replication --once
working-directory: /home/nominatim/nominatim-project
- name: Clean up database
run: nominatim refresh --postcodes --word-tokens
working-directory: /home/nominatim/nominatim-project
- name: Run reverse-only import
run : |
echo 'NOMINATIM_DATABASE_DSN="pgsql:dbname=reverse"' >> .env
nominatim import --osm-file ../test.pbf --reverse-only --no-updates
working-directory: /home/nominatim/data-env-reverse
- name: Check reverse-only import
run: nominatim admin --check-database
working-directory: /home/nominatim/data-env-reverse
- name: Clean up database (reverse-only import)
run: nominatim refresh --postcodes --word-tokens
working-directory: /home/nominatim/nominatim-project
install-no-superuser:
runs-on: ubuntu-24.04
needs: create-archive
steps:
- uses: actions/download-artifact@v4
with:
name: full-source
- name: Unpack Nominatim
run: tar xf nominatim-src.tar.bz2
- uses: ./Nominatim/.github/actions/setup-postgresql
with:
postgresql-version: 16
- uses: ./Nominatim/.github/actions/build-nominatim
- name: Prepare import environment
run: |
mv Nominatim/test/testdb/apidb-test-data.pbf test.pbf
rm -rf Nominatim
- name: Prepare Database
run: |
./venv/bin/nominatim import --prepare-database
- name: Create import user
run: |
sudo -u postgres createuser osm-import
psql -d nominatim -c "ALTER USER \"osm-import\" WITH PASSWORD 'osm-import'"
psql -d nominatim -c 'GRANT CREATE ON SCHEMA public TO "osm-import"'
- name: Run import
run: |
NOMINATIM_DATABASE_DSN="pgsql:host=127.0.0.1;dbname=nominatim;user=osm-import;password=osm-import" ./venv/bin/nominatim import --continue import-from-file --osm-file test.pbf
- name: Check full import
run: ./venv/bin/nominatim admin --check-database
migrate:
runs-on: ubuntu-24.04
needs: create-archive
steps:
- uses: actions/download-artifact@v4
with:
name: full-source
- name: Unpack Nominatim
run: tar xf nominatim-src.tar.bz2
- uses: ./Nominatim/.github/actions/setup-postgresql
with:
postgresql-version: 17
- name: Install Python dependencies
run: |
sudo apt-get install --no-install-recommends virtualenv osm2pgsql
- name: Install Nominatim master version
run: |
virtualenv master
cd Nominatim
../master/bin/pip install packaging/nominatim-db
- name: Install Nominatim from pypi
run: |
virtualenv release
./release/bin/pip install nominatim-db
- name: Import Nominatim database using release
run: |
./release/bin/nominatim import --osm-file Nominatim/test/testdb/apidb-test-data.pbf
./release/bin/nominatim add-data --file Nominatim/test/testdb/additional_api_test.data.osm
- name: Migrate to master version
run: |
./master/bin/nominatim admin --migrate
./release/bin/nominatim add-data --file Nominatim/test/testdb/additional_api_test.data.osm
codespell:
runs-on: ubuntu-latest
steps:
- uses: codespell-project/actions-codespell@v2
with:
only_warn: 1

12
.gitignore vendored
View File

@@ -1,13 +1,11 @@
*.log
*.pyc
*.swp
docs/develop/*.png
site-html
build
dist
.coverage
settings/local.php
data/wiki_import.sql
data/wiki_specialphrases.sql
data/osmosischange.osc
.vagrant
data/country_osm_grid.sql.gz

4
.gitmodules vendored
View File

@@ -0,0 +1,4 @@
[submodule "osm2pgsql"]
path = osm2pgsql
url = https://github.com/openstreetmap/osm2pgsql.git
ignore = dirty

View File

@@ -1,23 +0,0 @@
[mypy]
plugins = sqlalchemy.ext.mypy.plugin
[mypy-sanic_cors.*]
ignore_missing_imports = True
[mypy-icu.*]
ignore_missing_imports = True
[mypy-asyncpg.*]
ignore_missing_imports = True
[mypy-datrie.*]
ignore_missing_imports = True
[mypy-dotenv.*]
ignore_missing_imports = True
[mypy-falcon.*]
ignore_missing_imports = True
[mypy-geoalchemy2.*]
ignore_missing_imports = True

34
.travis.yml Normal file
View File

@@ -0,0 +1,34 @@
---
sudo: required
dist: xenial
language: python
python:
- "3.6"
addons:
postgresql: "9.6"
git:
depth: 3
env:
- TEST_SUITE=tests
- TEST_SUITE=monaco
before_install:
- phpenv global 7.1
install:
- vagrant/install-on-travis-ci.sh
before_script:
- psql -U postgres -c "create extension postgis"
script:
- cd $TRAVIS_BUILD_DIR/
- if [[ $TEST_SUITE == "tests" ]]; then phpcs --report-width=120 . ; fi
- cd $TRAVIS_BUILD_DIR/test/php
- if [[ $TEST_SUITE == "tests" ]]; then /usr/bin/phpunit ./ ; fi
- cd $TRAVIS_BUILD_DIR/test/bdd
- # behave --format=progress3 api
- if [[ $TEST_SUITE == "tests" ]]; then behave -DREMOVE_TEMPLATE=1 --format=progress3 db ; fi
- if [[ $TEST_SUITE == "tests" ]]; then behave --format=progress3 osm2pgsql ; fi
- cd $TRAVIS_BUILD_DIR/build
- if [[ $TEST_SUITE == "monaco" ]]; then wget --no-verbose --output-document=../data/monaco.osm.pbf http://download.geofabrik.de/europe/monaco-latest.osm.pbf; fi
- if [[ $TEST_SUITE == "monaco" ]]; then /usr/bin/env php ./utils/setup.php --osm-file ../data/monaco.osm.pbf --osm2pgsql-cache 1000 --all 2>&1 | grep -v 'ETA (seconds)'; fi
- if [[ $TEST_SUITE == "monaco" ]]; then /usr/bin/env php ./utils/specialphrases.php --wiki-import | psql -d test_api_nominatim >/dev/null; fi
notifications:
email: false

16
AUTHORS
View File

@@ -1,15 +1,15 @@
Nominatim was written by:
* Brian Quinion
* Sarah Hoffmann
* Marc Tobias Metten
Brian Quinion
Sarah Hoffmann
Marc Tobias Metten
* markigail
* AntoJvlt
* gemo1011
* darkshredder
markigail
gemo1011
IrlJidel
Frederik Ramm
and many more.
For a full list of contributors see the Git logs or visit
For a full list of contributors see
https://github.com/openstreetmap/Nominatim/graphs/contributors

175
CMakeLists.txt Normal file
View File

@@ -0,0 +1,175 @@
#-----------------------------------------------------------------------------
#
# CMake Config
#
# Nominatim
#
#-----------------------------------------------------------------------------
cmake_minimum_required(VERSION 2.8 FATAL_ERROR)
list(APPEND CMAKE_MODULE_PATH "${CMAKE_SOURCE_DIR}/cmake")
#-----------------------------------------------------------------------------
#
# Project version
#
#-----------------------------------------------------------------------------
project(nominatim)
set(NOMINATIM_VERSION_MAJOR 3)
set(NOMINATIM_VERSION_MINOR 4)
set(NOMINATIM_VERSION_PATCH 2)
set(NOMINATIM_VERSION "${NOMINATIM_VERSION_MAJOR}.${NOMINATIM_VERSION_MINOR}.${NOMINATIM_VERSION_PATCH}")
add_definitions(-DNOMINATIM_VERSION="${NOMINATIM_VERSION}")
#-----------------------------------------------------------------------------
#
# Find external dependencies
#
#-----------------------------------------------------------------------------
set(BUILD_TESTS off CACHE BOOL "Build test suite" FORCE)
set(WITH_LUA off CACHE BOOL "Build with lua support" FORCE)
set(ONLY_DOCS off CACHE BOOL "Build documentation only")
if (NOT ONLY_DOCS)
if (NOT EXISTS "${CMAKE_SOURCE_DIR}/osm2pgsql/CMakeLists.txt")
message(FATAL_ERROR "The osm2pgsql directory is empty.\
Did you forget to check out Nominatim recursively?\
\nTry updating submodules with: git submodule update --init")
endif()
add_subdirectory(osm2pgsql)
find_package(Threads REQUIRED)
unset(PostgreSQL_TYPE_INCLUDE_DIR CACHE)
set(PostgreSQL_TYPE_INCLUDE_DIR "/usr/include/")
find_package(PostgreSQL REQUIRED)
include_directories(${PostgreSQL_INCLUDE_DIRS})
link_directories(${PostgreSQL_LIBRARY_DIRS})
find_program(PYOSMIUM pyosmium-get-changes)
if (NOT EXISTS "${PYOSMIUM}")
set(PYOSMIUM_PATH "")
message(WARNING "pyosmium-get-changes not found (required for updates)")
else()
set(PYOSMIUM_PATH "${PYOSMIUM}")
message(STATUS "Using pyosmium-get-changes at ${PYOSMIUM_PATH}")
endif()
find_program(PG_CONFIG pg_config)
execute_process(COMMAND ${PG_CONFIG} --pgxs
OUTPUT_VARIABLE PGXS
OUTPUT_STRIP_TRAILING_WHITESPACE)
if (NOT EXISTS "${PGXS}")
message(FATAL_ERROR "Postgresql server package not found.")
endif()
find_package(ZLIB REQUIRED)
find_package(BZip2 REQUIRED)
find_package(LibXml2 REQUIRED)
include_directories(${LIBXML2_INCLUDE_DIR})
# Setting PHP binary variable as to command line (prevailing) or auto detect
if (NOT PHP_BIN)
find_program (PHP_BIN php)
endif()
# sanity check if PHP binary exists
if (NOT EXISTS ${PHP_BIN})
message(FATAL_ERROR "PHP binary not found. Install php or provide location with -DPHP_BIN=/path/php ")
endif()
message (STATUS "Using PHP binary " ${PHP_BIN})
endif()
#-----------------------------------------------------------------------------
#
# Setup settings and paths
#
#-----------------------------------------------------------------------------
set(WEBSITESCRIPTS
website/deletable.php
website/details.php
website/hierarchy.php
website/lookup.php
website/polygons.php
website/reverse.php
website/search.php
website/status.php
)
set(CUSTOMSCRIPTS
utils/country_languages.php
utils/importWikipedia.php
utils/export.php
utils/query.php
utils/setup.php
utils/specialphrases.php
utils/update.php
utils/warm.php
)
foreach (script_source ${CUSTOMSCRIPTS})
configure_file(${PROJECT_SOURCE_DIR}/cmake/script.tmpl
${PROJECT_BINARY_DIR}/${script_source})
endforeach()
foreach (script_source ${WEBSITESCRIPTS})
configure_file(${PROJECT_SOURCE_DIR}/cmake/website.tmpl
${PROJECT_BINARY_DIR}/${script_source})
endforeach()
configure_file(${PROJECT_SOURCE_DIR}/settings/defaults.php
${PROJECT_BINARY_DIR}/settings/settings.php)
set(WEBPATHS css images js)
foreach (wp ${WEBPATHS})
execute_process(
COMMAND ln -sf ${PROJECT_SOURCE_DIR}/website/${wp} ${PROJECT_BINARY_DIR}/website/
)
endforeach()
#-----------------------------------------------------------------------------
#
# Tests
#
#-----------------------------------------------------------------------------
if (NOT ONLY_DOCS)
include(CTest)
set(TEST_BDD db osm2pgsql api)
foreach (test ${TEST_BDD})
add_test(NAME bdd_${test}
COMMAND lettuce features/${test}
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}/tests)
set_tests_properties(bdd_${test}
PROPERTIES ENVIRONMENT "NOMINATIM_DIR=${PROJECT_BINARY_DIR}")
endforeach()
add_test(NAME php
COMMAND phpunit ./
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}/tests-php)
endif()
#-----------------------------------------------------------------------------
if (NOT ONLY_DOCS)
add_subdirectory(module)
add_subdirectory(nominatim)
endif()
add_subdirectory(docs)
#-----------------------------------------------------------------------------

View File

@@ -7,6 +7,38 @@ Please always open a separate issue for each problem. In particular, do
not add your bugs to closed issues. They may looks similar to you but
often are completely different from the maintainer's point of view.
### When Reporting Bad Search Results...
Please make sure to add the following information:
* the URL of the query that produces the bad result
* the result you are getting
* the expected result, preferably a link to the OSM object you want to find,
otherwise an address that is as precise as possible
To get the link to the OSM object, you can try the following:
* go to https://openstreetmap.org
* zoom to the area of the map where you expect the result and
zoom in as much as possible
* click on the question mark on the right side of the map,
then with the queston cursor on the map where your object is located
* find the object of interest in the list that appears on the left side
* click on the object and report the URL back that the browser shows
### When Reporting Problems with your Installation...
Please add the following information to your issue:
* hardware configuration: RAM size, CPUs, kind and size of disks
* Operating system (also mention if you are running on a cloud service)
* Postgres and Postgis version
* list of settings you changed in your Postgres configuration
* Nominatim version (release version or,
if you run from the git repo, the output of `git rev-parse HEAD`)
* (if applicable) exact command line of the command that was causing the issue
## Workflow for Pull Requests
We love to get pull requests from you. We operate the "Fork & Pull" model
@@ -30,26 +62,13 @@ feature pull requests. If you plan to make larger changes, please open
an issue first or comment on the appropriate issue already existing so
that duplicate work can be avoided.
### Using AI-assisted code generators
PRs that include AI-generated content, may that be in code, in the PR
description or in documentation need to
1. clearly mark the AI-generated sections as such, for example, by
mentioning all use of AI in the PR description, and
2. include proof that you have run the generated code on an actual
installation of Nominatim. Adding and excuting tests will not be
sufficient. You need to show that the code actually solves the problem
the PR claims to solve.
## Coding style
Nominatim historically hasn't followed a particular coding style but we
are in process of consolidating the style. The following rules apply:
* Python code uses the official Python style
* indentation
* indention
* SQL use 2 spaces
* all other file types use 4 spaces
* [BSD style](https://en.wikipedia.org/wiki/Indent_style#Allman_style) for braces
@@ -59,57 +78,25 @@ are in process of consolidating the style. The following rules apply:
* no spaces after opening and before closing bracket
* leave out space between a function name and bracket
but add one between control statement(if, while, etc.) and bracket
* for PHP variables use CamelCase with a prefixing letter indicating the type
(i - integer, f - float, a - array, s - string, o - object)
The coding style is enforced with flake8. It can be tested with:
The coding style is enforced with PHPCS and can be tested with:
```
make lint
phpcs --report-width=120 --colors .
```
## Testing
Before submitting a pull request make sure that the tests pass:
Before submitting a pull request make sure that the following tests pass:
```
make tests
cd test/bdd
behave -DBUILDDIR=<builddir> db osm2pgsql
```
## Releases
Nominatim follows semantic versioning. Major releases are done for large changes
that require (or at least strongly recommend) a reimport of the databases.
Minor releases can usually be applied to existing databases. Patch releases
contain bug fixes only and are released from a separate branch where the
relevant changes are cherry-picked from the master branch.
Checklist for releases:
* [ ] increase versions in
* `src/nominatim_api/version.py`
* `src/nominatim_db/version.py`
* [ ] update `ChangeLog` (copy information from patch releases from release branch)
* [ ] complete `docs/admin/Migration.md`
* [ ] update EOL dates in `SECURITY.md`
* [ ] commit and make sure CI tests pass
* [ ] update OSMF production repo and release new version -post1 there
* [ ] test migration
* download, build and import previous version
* migrate using master version
* run updates using master version
* [ ] prepare tarball:
* `git clone https://github.com/osm-search/Nominatim` (switch to right branch!)
* `rm -r .git*`
* copy country data into `data/`
* add version to base directory and package
* [ ] upload tarball to https://nominatim.org
* [ ] prepare documentation
* check out new docs branch
* change git checkout instructions to tarball download instructions or adapt version on existing ones
* build documentation and copy to https://github.com/osm-search/nominatim-org-site
* add new version to history
* [ ] check release tarball
* download tarball as per new documentation instructions
* compile and import Nominatim
* run `nominatim --version` to confirm correct version
* [ ] tag new release and add a release on github.com
* [ ] build pip packages and upload to pypi
```
cd test/php
phpunit ./
```

557
COPYING
View File

@@ -1,232 +1,339 @@
GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
Copyright © 2007 Free Software Foundation, Inc. <https://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
Preamble
The GNU General Public License is a free, copyleft license for software and other kinds of works.
The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users. We, the Free Software Foundation, use the GNU General Public License for most of our software; it applies also to any other work released this way by its authors. You can apply it to your programs, too.
When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things.
To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others.
For example, if you distribute copies of such a program, whether gratis or for a fee, you must pass on to the recipients the same freedoms that you received. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights.
Developers that use the GNU GPL protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License giving you legal permission to copy, distribute and/or modify it.
For the developers' and authors' protection, the GPL clearly explains that there is no warranty for this free software. For both users' and authors' sake, the GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions.
Some devices are designed to deny users access to install or run modified versions of the software inside them, although the manufacturer can do so. This is fundamentally incompatible with the aim of protecting users' freedom to change the software. The systematic pattern of such abuse occurs in the area of products for individuals to use, which is precisely where it is most unacceptable. Therefore, we have designed this version of the GPL to prohibit the practice for those products. If such problems arise substantially in other domains, we stand ready to extend this provision to those domains in future versions of the GPL, as needed to protect the freedom of users.
Finally, every program is threatened constantly by software patents. States should not allow patents to restrict development and use of software on general-purpose computers, but in those that do, we wish to avoid the special danger that patents applied to a free program could make it effectively proprietary. To prevent this, the GPL assures that patents cannot be used to render the program non-free.
The precise terms and conditions for copying, distribution and modification follow.
TERMS AND CONDITIONS
0. Definitions.
“This License” refers to version 3 of the GNU General Public License.
“Copyright” also means copyright-like laws that apply to other kinds of works, such as semiconductor masks.
“The Program” refers to any copyrightable work licensed under this License. Each licensee is addressed as “you”. “Licensees” and “recipients” may be individuals or organizations.
To “modify” a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a “modified version” of the earlier work or a work “based on the earlier work.
A “covered work” means either the unmodified Program or a work based on the Program.
To “propagate” a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well.
To “convey” a work means any kind of propagation that enables other parties to make or receive copies. Mere interaction with a user through a computer network, with no transfer of a copy, is not conveying.
An interactive user interface displays “Appropriate Legal Notices” to the extent that it includes a convenient and prominently visible feature that (1) displays an appropriate copyright notice, and (2) tells the user that there is no warranty for the work (except to the extent that warranties are provided), that licensees may convey the work under this License, and how to view a copy of this License. If the interface presents a list of user commands or options, such as a menu, a prominent item in the list meets this criterion.
1. Source Code.
The “source code” for a work means the preferred form of the work for making modifications to it. “Object code” means any non-source form of a work.
A “Standard Interface” means an interface that either is an official standard defined by a recognized standards body, or, in the case of interfaces specified for a particular programming language, one that is widely used among developers working in that language.
The “System Libraries” of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A “Major Component”, in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it.
The “Corresponding Source” for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work's System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work. For example, Corresponding Source includes interface definition files associated with source files for the work, and the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work.
The Corresponding Source need not include anything that users can regenerate automatically from other parts of the Corresponding Source.
The Corresponding Source for a work in source code form is that same work.
2. Basic Permissions.
All rights granted under this License are granted for the term of copyright on the Program, and are irrevocable provided the stated conditions are met. This License explicitly affirms your unlimited permission to run the unmodified Program. The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work. This License acknowledges your rights of fair use or other equivalent, as provided by copyright law.
You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you.
Conveying under any other circumstances is permitted solely under the conditions stated below. Sublicensing is not allowed; section 10 makes it unnecessary.
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
No covered work shall be deemed part of an effective technological measure under any applicable law fulfilling obligations under article 11 of the WIPO copyright treaty adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention of such measures.
When you convey a covered work, you waive any legal power to forbid circumvention of technological measures to the extent such circumvention is effected by exercising rights under this License with respect to the covered work, and you disclaim any intention to limit operation or modification of the work as a means of enforcing, against the work's users, your or third parties' legal rights to forbid circumvention of technological measures.
4. Conveying Verbatim Copies.
You may convey verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice; keep intact all notices stating that this License and any non-permissive terms added in accord with section 7 apply to the code; keep intact all notices of the absence of any warranty; and give all recipients a copy of this License along with the Program.
You may charge any price or no price for each copy that you convey, and you may offer support or warranty protection for a fee.
5. Conveying Modified Source Versions.
You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions:
a) The work must carry prominent notices stating that you modified it, and giving a relevant date.
b) The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to “keep intact all notices”.
c) You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it.
d) If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so.
A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an “aggregate” if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate.
6. Conveying Non-Source Forms.
You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways:
a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange.
b) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge.
c) Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b.
d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements.
e) Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d.
A separable portion of the object code, whose source code is excluded from the Corresponding Source as a System Library, need not be included in conveying the object code work.
A “User Product” is either (1) a “consumer product”, which means any tangible personal property which is normally used for personal, family, or household purposes, or (2) anything designed or sold for incorporation into a dwelling. In determining whether a product is a consumer product, doubtful cases shall be resolved in favor of coverage. For a particular product received by a particular user, “normally used” refers to a typical or common use of that class of product, regardless of the status of the particular user or of the way in which the particular user actually uses, or expects or is expected to use, the product. A product is a consumer product regardless of whether the product has substantial commercial, industrial or non-consumer uses, unless such uses represent the only significant mode of use of the product.
“Installation Information” for a User Product means any methods, procedures, authorization keys, or other information required to install and execute modified versions of a covered work in that User Product from a modified version of its Corresponding Source. The information must suffice to ensure that the continued functioning of the modified object code is in no case prevented or interfered with solely because modification has been made.
If you convey an object code work under this section in, or with, or specifically for use in, a User Product, and the conveying occurs as part of a transaction in which the right of possession and use of the User Product is transferred to the recipient in perpetuity or for a fixed term (regardless of how the transaction is characterized), the Corresponding Source conveyed under this section must be accompanied by the Installation Information. But this requirement does not apply if neither you nor any third party retains the ability to install modified object code on the User Product (for example, the work has been installed in ROM).
The requirement to provide Installation Information does not include a requirement to continue to provide support service, warranty, or updates for a work that has been modified or installed by the recipient, or for the User Product in which it has been modified or installed. Access to a network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and protocols for communication across the network.
Corresponding Source conveyed, and Installation Information provided, in accord with this section must be in a format that is publicly documented (and with an implementation available to the public in source code form), and must require no special password or key for unpacking, reading or copying.
7. Additional Terms.
“Additional permissions” are terms that supplement the terms of this License by making exceptions from one or more of its conditions. Additional permissions that are applicable to the entire Program shall be treated as though they were included in this License, to the extent that they are valid under applicable law. If additional permissions apply only to part of the Program, that part may be used separately under those permissions, but the entire Program remains governed by this License without regard to the additional permissions.
When you convey a copy of a covered work, you may at your option remove any additional permissions from that copy, or from any part of it. (Additional permissions may be written to require their own removal in certain cases when you modify the work.) You may place additional permissions on material, added by you to a covered work, for which you have or can give appropriate copyright permission.
Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms:
a) Disclaiming warranty or limiting liability differently from the terms of sections 15 and 16 of this License; or
b) Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or
c) Prohibiting misrepresentation of the origin of that material, or requiring that modified versions of such material be marked in reasonable ways as different from the original version; or
d) Limiting the use for publicity purposes of names of licensors or authors of the material; or
e) Declining to grant rights under trademark law for use of some trade names, trademarks, or service marks; or
f) Requiring indemnification of licensors and authors of that material by anyone who conveys the material (or modified versions of it) with contractual assumptions of liability to the recipient, for any liability that these contractual assumptions directly impose on those licensors and authors.
All other non-permissive additional terms are considered “further restrictions” within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. If a license document contains a further restriction but permits relicensing or conveying under this License, you may add to a covered work material governed by the terms of that license document, provided that the further restriction does not survive such relicensing or conveying.
If you add terms to a covered work in accord with this section, you must place, in the relevant source files, a statement of the additional terms that apply to those files, or a notice indicating where to find the applicable terms.
Additional terms, permissive or non-permissive, may be stated in the form of a separately written license, or stated as exceptions; the above requirements apply either way.
8. Termination.
You may not propagate or modify a covered work except as expressly provided under this License. Any attempt otherwise to propagate or modify it is void, and will automatically terminate your rights under this License (including any patent licenses granted under the third paragraph of section 11).
However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice.
Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, you do not qualify to receive new licenses for the same material under section 10.
9. Acceptance Not Required for Having Copies.
You are not required to accept this License in order to receive or run a copy of the Program. Ancillary propagation of a covered work occurring solely as a consequence of using peer-to-peer transmission to receive a copy likewise does not require acceptance. However, nothing other than this License grants you permission to propagate or modify any covered work. These actions infringe copyright if you do not accept this License. Therefore, by modifying or propagating a covered work, you indicate your acceptance of this License to do so.
10. Automatic Licensing of Downstream Recipients.
Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License. You are not responsible for enforcing compliance by third parties with this License.
An “entity transaction” is a transaction transferring control of an organization, or substantially all assets of one, or subdividing an organization, or merging organizations. If propagation of a covered work results from an entity transaction, each party to that transaction who receives a copy of the work also receives whatever licenses to the work the party's predecessor in interest had or could give under the previous paragraph, plus a right to possession of the Corresponding Source of the work from the predecessor in interest, if the predecessor has it or can get it with reasonable efforts.
You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License. For example, you may not impose a license fee, royalty, or other charge for exercise of rights granted under this License, and you may not initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging that any patent claim is infringed by making, using, selling, offering for sale, or importing the Program or any portion of it.
11. Patents.
A “contributor” is a copyright holder who authorizes use under this License of the Program or a work on which the Program is based. The work thus licensed is called the contributor's “contributor version”.
A contributor's “essential patent claims” are all patent claims owned or controlled by the contributor, whether already acquired or hereafter acquired, that would be infringed by some manner, permitted by this License, of making, using, or selling its contributor version, but do not include claims that would be infringed only as a consequence of further modification of the contributor version. For purposes of this definition, “control” includes the right to grant patent sublicenses in a manner consistent with the requirements of this License.
Each contributor grants you a non-exclusive, worldwide, royalty-free patent license under the contributor's essential patent claims, to make, use, sell, offer for sale, import and otherwise run, modify and propagate the contents of its contributor version.
In the following three paragraphs, a “patent license” is any express agreement or commitment, however denominated, not to enforce a patent (such as an express permission to practice a patent or covenant not to sue for patent infringement). To “grant” such a patent license to a party means to make such an agreement or commitment not to enforce a patent against the party.
If you convey a covered work, knowingly relying on a patent license, and the Corresponding Source of the work is not available for anyone to copy, free of charge and under the terms of this License, through a publicly available network server or other readily accessible means, then you must either (1) cause the Corresponding Source to be so available, or (2) arrange to deprive yourself of the benefit of the patent license for this particular work, or (3) arrange, in a manner consistent with the requirements of this License, to extend the patent license to downstream recipients. “Knowingly relying” means you have actual knowledge that, but for the patent license, your conveying the covered work in a country, or your recipient's use of the covered work in a country, would infringe one or more identifiable patents in that country that you have reason to believe are valid.
If, pursuant to or in connection with a single transaction or arrangement, you convey, or propagate by procuring conveyance of, a covered work, and grant a patent license to some of the parties receiving the covered work authorizing them to use, propagate, modify or convey a specific copy of the covered work, then the patent license you grant is automatically extended to all recipients of the covered work and works based on it.
A patent license is “discriminatory” if it does not include within the scope of its coverage, prohibits the exercise of, or is conditioned on the non-exercise of one or more of the rights that are specifically granted under this License. You may not convey a covered work if you are a party to an arrangement with a third party that is in the business of distributing software, under which you make payment to the third party based on the extent of your activity of conveying the work, and under which the third party grants, to any of the parties who would receive the covered work from you, a discriminatory patent license (a) in connection with copies of the covered work conveyed by you (or copies made from those copies), or (b) primarily for and in connection with specific products or compilations that contain the covered work, unless you entered into that arrangement, or that patent license was granted, prior to 28 March 2007.
Nothing in this License shall be construed as excluding or limiting any implied license or other defenses to infringement that may otherwise be available to you under applicable patent law.
12. No Surrender of Others' Freedom.
If conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot convey a covered work so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not convey it at all. For example, if you agree to terms that obligate you to collect a royalty for further conveying from those to whom you convey the Program, the only way you could satisfy both those terms and this License would be to refrain entirely from conveying the Program.
13. Use with the GNU Affero General Public License.
Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU Affero General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the special requirements of the GNU Affero General Public License, section 13, concerning interaction through a network will apply to the combination as such.
14. Revised Versions of this License.
The Free Software Foundation may publish revised and/or new versions of the GNU General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the Program specifies that a certain numbered version of the GNU General Public License “or any later version” applies to it, you have the option of following the terms and conditions either of that numbered version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the GNU General Public License, you may choose any version ever published by the Free Software Foundation.
If the Program specifies that a proxy can decide which future versions of the GNU General Public License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Program.
Later license versions may give you additional or different permissions. However, no additional obligations are imposed on any author or copyright holder as a result of your choosing to follow a later version.
15. Disclaimer of Warranty.
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. Limitation of Liability.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
17. Interpretation of Sections 15 and 16.
If the disclaimer of warranty and limitation of liability provided above cannot be given local legal effect according to their terms, reviewing courts shall apply local law that most closely approximates an absolute waiver of all civil liability in connection with the Program, unless a warranty or assumption of liability accompanies a copy of the Program in return for a fee.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively state the exclusion of warranty; and each file should have at least the “copyright” line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.
GNU GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it. (Some other Free Software Foundation software is covered by
the GNU Lesser General Public License instead.) You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must show them these terms so they know their
rights.
We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.
Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and
modification follow.
GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License. The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language. (Hereinafter, translation is included without limitation in
the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.
You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices
stating that you changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in
whole or in part contains or is derived from the Program or any
part thereof, to be licensed as a whole at no charge to all third
parties under the terms of this License.
c) If the modified program normally reads commands interactively
when run, you must cause it, when started running for such
interactive use in the most ordinary way, to print or display an
announcement including an appropriate copyright notice and a
notice that there is no warranty (or else, saying that you provide
a warranty) and that users may redistribute the program under
these conditions, and telling the user how to view a copy of this
License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on
the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of Sections
1 and 2 above on a medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three
years, to give any third party, for a charge no more than your
cost of physically performing source distribution, a complete
machine-readable copy of the corresponding source code, to be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer
to distribute corresponding source code. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.
If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.
5. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.
6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.
7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all. For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.
10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
Also add information on how to contact you by electronic and paper mail.
If the program does terminal interaction, make it output a short notice like this when it starts in an interactive mode:
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:
<program> Copyright (C) <year> <name of author>
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details.
Gnomovision version 69, Copyright (C) year name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, your program's commands might be different; for a GUI interface, you would use an “about box”.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, the commands you use may
be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or school, if any, to sign a “copyright disclaimer” for the program, if necessary. For more information on this, and how to apply and follow the GNU GPL, see <https://www.gnu.org/licenses/>.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here is a sample; alter the names:
The GNU General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. But first, please read <https://www.gnu.org/philosophy/why-not-lgpl.html>.
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
`Gnomovision' (which makes passes at compilers) written by James Hacker.
<signature of Ty Coon>, 1 April 1989
Ty Coon, President of Vice
This General Public License does not permit incorporating your program into
proprietary programs. If your program is a subroutine library, you may
consider it more useful to permit linking proprietary applications with the
library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License.

401
ChangeLog
View File

@@ -1,394 +1,11 @@
5.0.0
* increase required versions for PostgreSQL (12+), PostGIS (3.0+)
* remove installation via cmake and debundle osm2pgsql
* remove deprecated PHP frontend
* remove deprecated legacy tokenizer
* add configurable pre-processing of queries
* add query pre-processor to split up Japanese addresses
* rewrite of osm2pgsql style implementation
(also adds support for osm2pgsql-themepark)
* reduce the number of SQL queries needed to complete a 'lookup' call
* improve computation of centroid for lines with only two points
* improve bbox output for postcode areas
* improve result order by returning the largest object when other things are
equal
* add fallback for reverse geocoding to default country tables
* exclude postcode areas from reverse geocoding
* disable search endpoint when database is reverse-only (regression)
* minor performance improvements to area split algorithm
* switch table and index creation to use autocommit mode to avoid deadlocks
* drop overly long ways during import
* restrict automatic migrations to versions 4.3+
* switch linting from pylint to flake8
* switch tests to use a wikimedia test file in the new CSV style
* various fixes and improvements to documentation
4.5.0
* allow building Nominatim as a pip package
* make osm2pgsql building optional
* switch importer to psycopg3
* allow output format of web search to be customized in self-installations
* look up potential postcode areas for postcode results
* add word usage statistics for address terms
* implement more light-weight CSV format for wiki importance tables
* rewrite SQL for place search to use window functions
* increase search radius when filtering by postcode
* prefer POI points over POI areas
* reintroduce full terms for address terms in search_name table
* reindex postcodes when their parent is deleted
* indexing: precompute counts of affected rows
* ensure consistent country assignments for overlapping countries
* make Nominatim[Async]API context manager to ensure proper calling of
close()
* make usage of project dir optional for library
* drop interpolations when no parent can be found
* style tweaks to reflect OSM usage (man_made, highway and others)
* deprecation of: bundled osm2pgsql, legacy tokenizer, PHP frontend
* make documentation buildable without CMake
* various fixes and improvements to documentation
4.4.1
* fix geocodejson output: admin level output should only print boundaries
* updating: restrict invalidation of child objects on large street features
* restrict valid interpolation house numbers to 0-999999
* fix import error when SQLAlchemy 1.4 and psycopg3 are installed
* various typo fixes in the documentation
4.4.0
* add export to SQLite database and SQLite support for the frontend
* switch to Python frontend as the default frontend
* update to osm2pgsql 1.11.0
* add support for new osm2pgsql middle table format
* simplify geometry for large polygon objects not used in addresses
* various performance tweaks for search in Python frontend
* fix regression in search with categories where it was confused with near
search
* partially roll back use of SQLAlchemy lambda statements due to bugs
in SQLAlchemy
* fix handling of timezones for timestamps from the database
* fix handling of full address searches in connection with a viewbox
* fix postcode computation of highway areas
* fix handling of timeout errors for Python <= 3.10
* fix address computation for postcode areas
* fix variable shadowing in osm2pgsql flex script, causing bugs with LuaJIT
* make sure extratags are always null when empty
* reduce importance of places without wikipedia reference
* improve performance of word count computations
* drop support for wikipedia tags with full URLs
* replace get_addressdata() SQL implementation with a Python function
* improve display name for non-address features
* fix postcode validation for postcodes with country code
(thanks @pawel-wroniszewski)
* add possibility to run imports without superuser database rights
(thanks @robbe-haesendonck)
* new CLI command for cleaning deleted relations (thanks @lujoh)
* add check for database version in the CLI check command
* updates to import styles ignoring more unused objects
* various typo fixes (thanks @kumarUjjawal)
4.3.2
* fix potential SQL injection issue for 'nominatim admin --collect-os-info'
* PHP frontend: fix on-the-fly lookup of postcode areas near boundaries
* Python frontend: improve handling of viewbox
* Python frontend: correct deployment instructions
4.3.1
* reintroduce result rematching
* improve search of multi-part names
* fix accidentally switched meaning of --reverse-only and --search-only in
warm command
4.3.0
* fix failing importance recalculation command
* fix merging of linked names into unnamed boundaries
* fix a number of corner cases with interpolation splitting resulting in
invalid geometries
* fix failure in website generation when password contains curly brackets
* fix broken use of ST_Project in PostGIS 3.4
* new NOMINATIM_SEARCH_WITHIN_COUNTRIES setting to restrict reverse lookups
to known countries (thanks @alfmarcua)
* allow negative OSM IDs (thanks @alfmarcua)
* disallow import of Tiger data in a frozen DB
* avoid UPDATE to change settings to be compatible with r/o DBs (thanks @t-tomek)
* update bundled osm2pgsql to 1.9.2
* reorganise osm2pgsql flex style and make it the default
* exclude names ending in :wikipedia from indexing
* no longer accept comma as a list separator in name tags
* process forward dependencies on update to catch updates in geometries
of ways and relations
* fix handling of isolated silent letters during transliteration
* no longer assign postcodes to large linear features like rivers
* introduce nominatim.paths module for finding data and libraries
* documentation layout changed to material theme
* new documentation section for library
* various smaller fixes to existing documentation
(thanks @woodpeck, @bloom256, @biswajit-k)
* updates to vagrant install scripts, drop support for Ubuntu 18
(thanks @n-timofeev)
* removed obsolete configuration variables from env.defaults
* add script for generating a taginfo description (thanks @biswajit-k)
* modernize Python code around BDD test and add testing of Python frontend
* lots of new BDD tests for API output
4.2.3
* fix deletion handling for 'nominatim add-data'
* adapt place_force_delete() to new deletion handling
* flex style: avoid dropping of postcode areas
* fix update errors on address interpolation handling
4.2.2
* extend flex-style library to fully support all default styles
* fix handling of Hebrew aleph
* do not assign postcodes to rivers
* fix string matching in PHP code
* update osm2pgsql (various updates to flex)
* fix slow query when deleting places on update
* fix CLI details query
* fix recalculation of importance values
* fix polygon simplification in reverse results
* add class/type information to reverse geocodejson result
* minor improvements to default tokenizer configuration
* various smaller fixes to documentation
4.2.1
* fix XSS vulnerability in debug view
4.2.0
* add experimental support for osm2pgsql flex style
* introduce secondary importance value to be retrieved from a raster data file
(currently still unused, to replace address importance, thanks to @tareqpi)
* add new report tool `nominatim admin --collect-os-info`
(thanks @micahcochran, @tareqpi)
* reorganise index to improve lookup performance and size
* run index creation after import in parallel
* run ANALYZE more selectively to speed up continuation of indexing
* fix crash on update when addr:interpolation receives an illegal value
* fix minimum number of retrieved results to be at least 10
* fix search for combinations of special term + name (e.g Hotel Bellevue)
* do not return interpolations without a parent street on reverse search
* improve invalidation of linked places on updates
* fix address parsing for interpolation lines
* make sure socket timeouts are respected during replication
(working around a bug in some versions of pyosmium)
* update bundled osm2pgsql to 1.7.1
* add support for PostgreSQL 15
* typing fixes to work with latest type annotations from typeshed
* smaller improvements to documentation (thanks to @mausch)
4.1.1
* fix XSS vulnerability in debug view
4.1.0
* switch to ICU tokenizer as default
* add housenumber normalization and support optional spaces during search
* add postcode format checking and support optional spaces during search
* add function for cleaning housenumbers in word table
* add updates/deletion of country names imported from OSM
* linked places no longer overwrite names from a place permanently
* move default country name configuration into yaml file (thanks @tareqpi)
* more compact layout for interpolation and TIGER tables
* introduce mutations to ICU tokenizer (used for German umlauts)
* support reinitializing a full project directory with refresh --website
* fix various issues with linked places on updates
* add support for external sanitizers and token analyzers
* add CLI commands for forced indexing
* add CLI command for version report
* add offline import mode
* change geocodejson to return a feature class in the 'type' field
* add ISO3166-2 to address output (thanks @I70l0teN4ik)
* improve parsing and matching of addr: tags
* support relations as street members of associatedStreet
* better ranking for address results from TIGER data
* adapt rank classification to changed tag usage in OSM
* update bundled osm2pgsql to 1.6.0
* add typing information to Python code
* improve unit test coverage
* reorganise and speed up code for BDD tests, drop support for scenes
* move PHP unit tests to PHP 9.5
* extensive typo fixes in documentation (thanks @woodpeck,@StephanGeorg,
@amandasaurus, @nslxndr, @stefkiourk, @Luflosi, @kianmeng)
* drop official support for installation on CentOS
* add installation instructions for Ubuntu 22.04
* add support for PHP8
* add setup instructions for updates and systemd
* drop support for PostgreSQL 9.5
4.0.2
* fix XSS vulnerability in debug view
4.0.1
* fix initialisation error in replication script
* ICU tokenizer: avoid any special characters in word tokens
* better error message when API php script does not exist
* fix quoting of house numbers in SQL queries
* small fixes and improvements in search query parsing
* add documentation for moving the database to a different machine
4.0.0
* refactor name token computation and introduce ICU tokenizer
* name processing now happens in the indexer outside the DB
* reorganizes abbreviation handling and moves it to the indexing phases
* adds preprocessing of names
* add country-specific ranking for Spain, Slovakia
* partially switch to using SP-GIST indexes
* better updating of dependent addresses for name changes in streets
* remove unused/broken tables for external housenumbers
* move external postcodes to CSV format and no longer save them in tables
(adds support for postcodes for arbitrary countries)
* remove postcode helper entries from placex (thanks @AntoJvlt)
* change required format for TIGER data to CSV
* move configuration of default languages from wiki into config file
* expect customized configuration files in project directory by default
* disable search API for reverse-only import (thanks @darkshredder)
* port most of maintenance/import code to Python and remove PHP utils
* add catch-up mode for replication
* add updating of special phrases (thanks @AntoJvlt)
* add support for special phrases in CSV files (thanks @AntoJvlt)
* switch to case-independent matching between place and boundary names
* remove disabling of reverse query parsing
* minor tweaks to search algorithm to avoid more false positives
* major overhaul of the administrator and developer documentation
* add security disclosure policy
* add testing of installation scripts via CI
* drop support for Python < 3.6 and Postgresql < 9.5
3.7.3
* fix XSS vulnerability in debug view
3.7.2
* fix database check for reverse-only imports
* do not error out in status API result when import date is missing
* add array_key_last function for PHP < 7.3 (thanks to @woodpeck)
* fix more url when server name is unknown (thanks to @mogita)
* commit changes to replication log table
3.7.1
* fix smaller issues with special phrases import (thanks @AntoJvlt)
* add index to speed up continued indexing during import
* fix index on location_property_tiger(parent_place_id) (thanks @changpingc)
* make sure Python code is backward-compatible with Python 3.5
* various documentation fixes
3.7.0
* switch to dotenv for configuration file
* introduce 'make install' (reorganising most of the code)
* introduce nominatim tool as replacement for various php scripts
* introduce project directories and allow multiple installations from same build
* clean up BDD tests: drop nose, reorganise step code
* simplify test database for API BDD tests and autoinstall database
* port most of the code for command-line tools to Python
(thanks to @darkshredder and @AntoJvlt)
* add tests for all tooling
* replace pyosmium-get-changes with custom internal implementation using
pyosmium
* improve search for queries with housenumber and partial terms
* add database versioning
* use jinja2 for preprocessing SQL files
* introduce automatic migrations
* reverse fix preference of interpolations over housenumbers
* parallelize indexing of postcodes
* add non-key indexes to speed up housenumber + street searches
* switch housenumber field in placex to save transliterated names
3.6.0
* add full support for searching by and displaying of addr:* tags
* improve address output for large-area objects
* better use of country names from OSM data for search and display
* better debug output for reverse call
* add support for addr:place links without an place equivalent in OSM
* improve finding postcodes with normalisation artefacts
* batch object to index for rank 30, avoiding a wrap-around of transaction
IDs in PostgreSQL
* introduce dynamic address rank computation for administrative boundaries
depending on linked objects and their place in the admin level hierarchy
* add country-specific address ranking for Indonesia, Russia, Belgium and
the Netherlands (thanks @hendrikmoree)
* make sure wikidata/wikipedia tags are imported for all styles
* make POIs searchable by name and housenumber (thanks @joy-yyd)
* reverse geocoding now ignores places without an address rank (rivers etc.)
* installation of a webserver is no longer mandatory, for development
use the php internal webserver via 'make serve
* reduce the influence of place nodes in addresses
* drop support for the unspecific is_in tag
* various minor tweaks to supplied styles
* move HTML web frontend into its own project
* move scripts for processing external data sources into separate directories
* introduce separate configuration for website (thanks @krahulreddy)
* update documentation, in particular, clean up development docs
* update osm2pgsql to 1.4.0
3.5.2
* ensure that wikipedia tags are imported for all styles
* reinstate verbosity for indexing during updates
* make house number reappear in display name on named POIs
* introduce batch processing in indexer to avoid transaction ID overrun
* increase splitting for large geometries to improve indexing speed
* remove deprecated get_magic_quotes_gpc() function
* make sure that all postcodes have an entry in word and are thus searchable
* remove use of ST_Covers in conjunction with ST_Intersects,
causes bad query planning and slow updates in Postgis3
* update osm2pgsql
3.5.1
* disable jit and parallel processing in PostgreSQL for osm2pgsql
* update libosmium to 2.15.6 (fixes an issue with processing hanging
on large multipolygons)
3.5.0
* structured select on HTML search page
* new PHP Nominatim\Shell class to wrap shell escaping
* remove polygon parameter from all API calls
* improve handling of postcode areas
* reorganise place linking algorithm, now using wikidata tag as well
* remove linkees from search_name and larger_area tables
* introduce country-specific address ranks
* reorganise rank address computation
* cleanup of partition function
* improve parenting for large POIs
* add support for Postgresql 12 and Postgis 3
* add earlier cleanup when --drop is given, to reduce memory usage
* remove use of place_id in URLs
* replace C nominatim indexer with a simpler Python implementation
* split up the huge sql/functions.sql file
* move osm2pgsql tests to osm2pgsql
* add new extratags style which imports all tags from OSM
* add new script for checking the import after completion
* update osm2pgsql, reducing memory usage
* use new wikipedia importance and add processing of wikidata tags
* add search form for details page
* use ExtraDataPath for country_grid table
* remove short_name from list of names to be displayed
* split up CMakeFile, so that all parts can be built separately
* update installation instructions for CentOS and Ubuntu
* add script for importing/updating multiple country extracts
* various documentation improvements
3.4.2
* fix security bug in /details endpoint where user input was not
properly sanitized
* security fix: fix possible SQL injection via details API
3.4.1
* update osm2pgsql to fix hans during updates and lost address numbers
during updates
* update osm2pgsql
* move deletion to copy thread (fixes deadlock in updates)
* fix filtering where valid address objects got dropped
* fix typo in import styles
3.4.0
@@ -397,7 +14,7 @@
* exclude postcode ranges separated by colon from centre point calculation
* update osm2pgsql, better handling of imports without flatnode file
* switch to more efficient algorithm for word set computation
* use only boundaries for country and state parts of addresses
* use only boundries for country and state parts of addresses
* improve updates of addresses with housenumbers and interpolations
* remove country from place_addressline table and use country_code instead
* optimise indexes on search_name partition tables
@@ -436,7 +53,7 @@
* complete rewrite of reverse search algorithm
* add new geojson and geocodejson output formats
* add simple export script to export addresses to CSV
* add simple export script to exprot addresses to CSV
* remove is_in terms from address computation
* remove unused search_name_country tables
* various smaller fixes to query parsing
@@ -501,7 +118,7 @@
* move installation documentation into this repo
* add self-documenting vagrant scripts
* remove --create-website, recommend to use website directory in build
* add accessor functions for URL parameters and improve error checking
* add accessor functions for URL parameters and improve erro checking
* remove IP blocking and rate-limiting code
* enable CI via travis
* reformatting for more consistent coding style
@@ -512,7 +129,7 @@
* update to refactored osm2pgsql which use libosmium based types
* switch from osmosis to pyosmium for updates
* be more strict when matching against special search terms
* handle postcode entries with multiple values correctly
* handle postcode entries with mutliple values correctly
2.5

View File

@@ -1,202 +0,0 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -1,339 +0,0 @@
GNU GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it. (Some other Free Software Foundation software is covered by
the GNU Lesser General Public License instead.) You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must show them these terms so they know their
rights.
We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.
Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and
modification follow.
GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License. The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language. (Hereinafter, translation is included without limitation in
the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.
You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices
stating that you changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in
whole or in part contains or is derived from the Program or any
part thereof, to be licensed as a whole at no charge to all third
parties under the terms of this License.
c) If the modified program normally reads commands interactively
when run, you must cause it, when started running for such
interactive use in the most ordinary way, to print or display an
announcement including an appropriate copyright notice and a
notice that there is no warranty (or else, saying that you provide
a warranty) and that users may redistribute the program under
these conditions, and telling the user how to view a copy of this
License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on
the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of Sections
1 and 2 above on a medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three
years, to give any third party, for a charge no more than your
cost of physically performing source distribution, a complete
machine-readable copy of the corresponding source code, to be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer
to distribute corresponding source code. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.
If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.
5. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.
6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.
7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all. For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.
10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) year name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, the commands you use may
be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
`Gnomovision' (which makes passes at compilers) written by James Hacker.
<signature of Ty Coon>, 1 April 1989
Ty Coon, President of Vice
This General Public License does not permit incorporating your program into
proprietary programs. If your program is a subroutine library, you may
consider it more useful to permit linking proprietary applications with the
library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License.

View File

@@ -1,44 +0,0 @@
all:
# Building of wheels
build: clean-build build-db build-api
clean-build:
rm -f dist/*
build-db:
python3 -m build packaging/nominatim-db --outdir dist/
build-api:
python3 -m build packaging/nominatim-api --outdir dist/
# Tests
tests: mypy lint pytest bdd
mypy:
mypy --strict --python-version 3.8 src
pytest:
pytest test/python
lint:
flake8 src
bdd:
cd test/bdd; behave -DREMOVE_TEMPLATE=1
# Documentation
doc:
mkdocs build
serve-doc:
mkdocs serve
manpage:
argparse-manpage --pyfile man/create-manpage.py --function get_parser --project-name Nominatim --url https://nominatim.org > man/nominatim.1 --author 'the Nominatim developer community' --author-email info@nominatim.org
.PHONY: tests mypy pytest lint bdd build clean-build build-db build-api doc serve-doc manpage

View File

@@ -1,4 +1,4 @@
[![Build Status](https://github.com/osm-search/Nominatim/workflows/CI%20Tests/badge.svg)](https://github.com/osm-search/Nominatim/actions?query=workflow%3A%22CI+Tests%22)
[![Build Status](https://travis-ci.org/openstreetmap/Nominatim.svg?branch=master)](https://travis-ci.org/openstreetmap/Nominatim)
Nominatim
=========
@@ -20,58 +20,44 @@ Installation
============
The latest stable release can be downloaded from https://nominatim.org.
There you can also find [installation instructions for the release](https://nominatim.org/release-docs/latest/admin/Installation), as well as an extensive [Troubleshooting/FAQ section](https://nominatim.org/release-docs/latest/admin/Faq/).
There you can also find [installation instructions for the release](https://nominatim.org/release-docs/latest/admin/Installation).
[Detailed installation instructions for current master](https://nominatim.org/release-docs/develop/admin/Installation)
can be found at nominatim.org as well.
Detailed installation instructions for the development version can be
found at [nominatim.org](https://nominatim.org/release-docs/develop/admin/Installation)
as well.
A quick summary of the necessary steps:
1. Create a Python virtualenv and install the packages:
1. Compile Nominatim:
python3 -m venv nominatim-venv
./nominatim-venv/bin/pip install packaging/nominatim-{api,db}
mkdir build
cd build
cmake ..
make
2. Create a project directory, get OSM data and import:
2. Get OSM data and import:
mkdir nominatim-project
cd nominatim-project
../nominatim-venv/bin/nominatim import --osm-file <your planet file>
./build/utils/setup.php --osm-file <your planet file> --all
3. Start the webserver:
./nominatim-venv/bin/pip install uvicorn falcon
../nominatim-venv/bin/nominatim serve
3. Point your webserver to the ./build/website directory.
License
=======
The Python source code is available under a GPL license version 3 or later.
The Lua configuration files for osm2pgsql are released under the
Apache License, Version 2.0. All other files are under a GPLv2 license.
The source code is available under a GPLv2 license.
Contributing
============
Contributions, bug reports and pull requests are welcome. When reporting a
bug, please use one of the
[issue templates](https://github.com/osm-search/Nominatim/issues/new/choose)
and make sure to provide all the information requested. If you are not
sure if you have really found a bug, please ask for help in the forums
first (see 'Questions' below).
Contributions are welcome. For details see [contribution guide](CONTRIBUTING.md).
For details on contributing, have a look at the
[contribution guide](CONTRIBUTING.md).
Both bug reports and pull requests are welcome.
Questions and help
==================
Mailing list
============
If you have questions about search results and the OpenStreetMap data
used in the search, use the [OSM Forum](https://community.openstreetmap.org/).
For questions, community help and discussions around the software and
your own installation of Nominatim, use the
[Github discussions forum](https://github.com/osm-search/Nominatim/discussions).
For questions you can join the geocoding mailing list, see
https://lists.openstreetmap.org/listinfo/geocoding

View File

@@ -1,41 +0,0 @@
# Security Policy
## Supported Versions
All Nominatim releases receive security updates for two years.
The following table lists the end of support for all currently supported
versions.
| Version | End of support for security updates |
| ------- | ----------------------------------- |
| 5.0.x | 2027-02-06
| 4.5.x | 2026-09-12 |
| 4.4.x | 2026-03-07 |
| 4.3.x | 2025-09-07 |
## Reporting a Vulnerability
If you believe, you have found an issue in Nominatim that has implications on
security, please send a description of the issue to **security@nominatim.org**.
You will receive an acknowledgement of your mail within 3 work days where we
also notify you of the next steps.
## How we Disclose Security Issues
** The following section only applies to security issues found in released
versions. Issues that concern the master development branch only will be
fixed immediately on the branch with the corresponding PR containing the
description of the nature and severity of the issue. **
Patches for identified security issues are applied to all affected versions and
new minor versions are released. At the same time we release a statement at
the [Nominatim blog](https://nominatim.org/blog/) describing the nature of the
incident. Announcements will also be published at the
[geocoding mailinglist](https://lists.openstreetmap.org/listinfo/geocoding).
## List of Previous Incidents
* 2023-11-20 - [SQL injection vulnerability](https://nominatim.org/2023/11/20/release-432.html)
* 2023-02-21 - [cross-site scripting vulnerability](https://nominatim.org/2023/02/21/release-421.html)
* 2020-05-04 - [SQL injection issue on /details endpoint](https://lists.openstreetmap.org/pipermail/geocoding/2020-May/002012.html)

View File

@@ -1,6 +1,6 @@
# Install Nominatim in a virtual machine for development and testing
This document describes how you can install Nominatim inside a Ubuntu 24
This document describes how you can install Nominatim inside a Ubuntu 16
virtual machine on your desktop/laptop (host machine). The goal is to give
you a development environment to easily edit code and run the test suite
without affecting the rest of your system.
@@ -15,29 +15,36 @@ is.
2. [Vagrant](https://www.vagrantup.com/downloads.html)
3. Nominatim
3. Nominatim
git clone --recursive https://github.com/openstreetmap/Nominatim.git
If you forgot `--recursive`, it you can later load the submodules using
git submodule init
git submodule update
git clone https://github.com/openstreetmap/Nominatim.git
## Installation
1. Start the virtual machine
vagrant up ubuntu24-nginx
vagrant up ubuntu
2. Log into the virtual machine
vagrant ssh ubuntu24-nginx
vagrant ssh ubuntu
3. Import a small country (Monaco)
See the FAQ how to skip this step and point Nominatim to an existing database.
```
# inside the virtual machine:
cd nominatim-project
wget --no-verbose --output-document=monaco.osm.pbf http://download.geofabrik.de/europe/monaco-latest.osm.pbf
nominatim import --osm-file monaco.osm.pbf 2>&1 | tee monaco.$$.log
cd build
wget --no-verbose --output-document=/tmp/monaco.osm.pbf http://download.geofabrik.de/europe/monaco-latest.osm.pbf
./utils/setup.php --osm-file /tmp/monaco.osm.pbf --osm2pgsql-cache 1000 --all 2>&1 | tee monaco.$$.log
```
To repeat an import you'd need to delete the database first
@@ -49,49 +56,96 @@ is.
## Development
Vagrant maps the virtual machine's port 8089 to your host machine. Thus you can
see Nominatim in action on [localhost:8089](http://localhost:8089/nominatim/).
see Nominatim in action on [locahost:8089](http://localhost:8089/nominatim/).
You edit code on your host machine in any editor you like. There is no need to
restart any software: just refresh your browser window.
Use the functions of the `log()` object to create temporary debug output.
Add `&debug=1` to the URL to see the output.
Note that the webserver uses files from the /build directory. If you change
files in Nominatim/website or Nominatim/utils for example you first need to
copy them into the /build directory by running the `cmake` step from the
installation.
PHP errors are written to `/var/log/apache2/error.log`.
With `echo` and `var_dump()` you write into the output (HTML/XML/JSON) when
you either add `&debug=1` to the URL (preferred) or set
`@define('CONST_Debug', true);` in `settings/local.php`.
In the Python BDD test you can use `logger.info()` for temporary debug
statements.
For more information on running tests, see
https://nominatim.org/release-docs/develop/develop/Testing/
## Running unit tests
cd ~/Nominatim/tests/php
phpunit ./
## Running PHP code style tests
cd ~/Nominatim
phpcs --colors .
## Running functional tests
Tests in `test/bdd/db` and `test/bdd/osm2pgsql` have to pass 100%. Other
tests might require full planet-wide data. Sadly even if you have your own
planet-wide data there will be enough differences to the openstreetmap.org
installation to cause false positives in the other tests (see FAQ).
To run the full test suite
cd ~/Nominatim/test/bdd
behave -DBUILDDIR=/home/vagrant/build/ db osm2pgsql
To run a single file
behave -DBUILDDIR=/home/vagrant/build/ api/lookup/simple.feature
Or a single test by line number
behave -DBUILDDIR=/home/vagrant/build/ api/lookup/simple.feature:34
To run specific groups of tests you can add tags just before the `Scenario line`, e.g.
@bug-34
Scenario: address lookup for non-existing or invalid node, way, relation
and then
behave -DBUILDDIR=/home/vagrant/build/ --tags @bug-34
## FAQ
##### Will it run on Windows?
Yes, Vagrant and Virtualbox can be installed on MS Windows just fine. You need
a 64bit version of Windows.
Yes, Vagrant and Virtualbox can be installed on MS Windows just fine. You need a 64bit
version of Windows.
##### Will it run on Apple Silicon?
You might need to replace Virtualbox with [Parallels](https://www.parallels.com/products/desktop/).
There is no free/open source version of Parallels.
##### Why Monaco, can I use another country?
Of course! The Monaco import takes less than 10 minutes and works with 2GB RAM.
Of course! The Monaco import takes less than 30 minutes and works with 2GB RAM.
##### Will the results be the same as those from nominatim.openstreetmap.org?
No. Long-running Nominatim installations will differ once new import features (or
No. Long running Nominatim installations will differ once new import features (or
bug fixes) get added since those usually only get applied to new/changed data.
Also this document skips the optional Wikipedia data import which affects ranking
of search results. See [Nominatim installation](https://nominatim.org/release-docs/latest/admin/Installation)
for details.
of search results. See [Nominatim installation](http://nominatim.org/release-docs/latest/Installation) for details.
##### Why Ubuntu? Can I test CentOS/Fedora/CoreOS/FreeBSD?
There used to be a Vagrant script for CentOS available, but the Nominatim directory
There is a Vagrant script for CentOS available, but the Nominatim directory
isn't symlinked/mounted to the host which makes development trickier. We used
it mainly for debugging installation with SELinux.
@@ -100,33 +154,33 @@ are slightly different, e.g. the name of the package manager, Apache2 package
name, location of files. We chose Ubuntu because that is closest to the
nominatim.openstreetmap.org production environment.
You can configure/download other Vagrant boxes from
[https://app.vagrantup.com/boxes/search](https://app.vagrantup.com/boxes/search).
You can configure/download other Vagrant boxes from [https://app.vagrantup.com/boxes/search](https://app.vagrantup.com/boxes/search).
##### How can I connect to an existing database?
Let's say you have a Postgres database named `nominatim_it` on server `your-server.com`
and port `5432`. The Postgres username is `postgres`. You can edit the `.env` in your
project directory and point Nominatim to it.
Let's say you have a Postgres database named `nominatim_it` on server `your-server.com` and port `5432`. The Postgres username is `postgres`. You can edit `settings/local.php` and point Nominatim to it.
NOMINATIM_DATABASE_DSN="pgsql:host=your-server.com;port=5432;user=postgres;dbname=nominatim_it
No data import or restarting necessary.
pgsql://postgres@your-server.com:5432/nominatim_it
No data import necessary or restarting necessary.
If the Postgres installation is behind a firewall, you can try
ssh -L 9999:localhost:5432 your-username@your-server.com
inside the virtual machine. It will map the port to `localhost:9999` and then
you edit `.env` file with
you edit `settings/local.php` with
NOMINATIM_DATABASE_DSN="pgsql:host=localhost;port=9999;user=postgres;dbname=nominatim_it"
@define('CONST_Database_DSN', 'pgsql:host=localhost;port=9999;user=postgres;dbname=nominatim_it');
To access postgres directly remember to specify the hostname,
e.g. `psql --host localhost --port 9999 nominatim_it`
To access postgres directly remember to specify the hostname, e.g. `psql --host localhost --port 9999 nominatim_it`
##### My computer is slow and the import takes too long. Can I start the virtual machine "in the cloud"?
Yes. It's possible to start the virtual machine on [Amazon AWS (plugin)](https://github.com/mitchellh/vagrant-aws)
or [DigitalOcean (plugin)](https://github.com/smdahlen/vagrant-digitalocean).

136
Vagrantfile vendored
View File

@@ -4,107 +4,67 @@
Vagrant.configure("2") do |config|
# Apache webserver
config.vm.network "forwarded_port", guest: 80, host: 8089
config.vm.network "forwarded_port", guest: 8088, host: 8088
# If true, then any SSH connections made will enable agent forwarding.
config.ssh.forward_agent = true
# Never sync the current directory to /vagrant.
config.vm.synced_folder ".", "/vagrant", disabled: true
checkout = "yes"
if ENV['CHECKOUT'] != 'y' then
checkout = "no"
config.vm.synced_folder ".", "/home/vagrant/Nominatim"
checkout = "no"
end
config.vm.provider "hyperv" do |hv, override|
hv.memory = 2048
hv.linked_clone = true
if ENV['CHECKOUT'] != 'y' then
override.vm.synced_folder ".", "/home/vagrant/Nominatim", type: "smb", smb_host: ENV['SMB_HOST'] || ENV['COMPUTERNAME']
end
config.vm.define "ubuntu", primary: true do |sub|
sub.vm.box = "bento/ubuntu-18.04"
sub.vm.provision :shell do |s|
s.path = "vagrant/Install-on-Ubuntu-18.sh"
s.privileged = false
s.args = [checkout]
end
end
config.vm.provider "virtualbox" do |vb, override|
config.vm.define "ubuntu18nginx" do |sub|
sub.vm.box = "bento/ubuntu-18.04"
sub.vm.provision :shell do |s|
s.path = "vagrant/Install-on-Ubuntu-18-nginx.sh"
s.privileged = false
s.args = [checkout]
end
end
config.vm.define "ubuntu16" do |sub|
sub.vm.box = "bento/ubuntu-16.04"
sub.vm.provision :shell do |s|
s.path = "vagrant/Install-on-Ubuntu-16.sh"
s.privileged = false
s.args = [checkout]
end
end
config.vm.define "travis" do |sub|
sub.vm.box = "bento/ubuntu-14.04"
sub.vm.provision :shell do |s|
s.path = "vagrant/install-on-travis-ci.sh"
s.privileged = false
s.args = [checkout]
end
end
config.vm.define "centos" do |sub|
sub.vm.box = "centos/7"
sub.vm.provision :shell do |s|
s.path = "vagrant/Install-on-Centos-7.sh"
s.privileged = false
s.args = "yes"
end
sub.vm.synced_folder ".", "/home/vagrant/Nominatim", disabled: true
sub.vm.synced_folder ".", "/vagrant", disabled: true
end
config.vm.provider "virtualbox" do |vb|
vb.gui = false
vb.memory = 2048
vb.customize ["setextradata", :id, "VBoxInternal2/SharedFoldersEnableSymlinksCreate//vagrant","0"]
if ENV['CHECKOUT'] != 'y' then
override.vm.synced_folder ".", "/home/vagrant/Nominatim"
end
end
config.vm.provider "parallels" do |prl, override|
prl.update_guest_tools = false
prl.memory = 2048
if ENV['CHECKOUT'] != 'y' then
override.vm.synced_folder ".", "/home/vagrant/Nominatim"
end
end
config.vm.provider "libvirt" do |lv, override|
lv.memory = 2048
lv.nested = true
if ENV['CHECKOUT'] != 'y' then
override.vm.synced_folder ".", "/home/vagrant/Nominatim", type: 'nfs', nfs_udp: false
end
end
config.vm.define "ubuntu22", primary: true do |sub|
sub.vm.box = "generic/ubuntu2204"
sub.vm.provision :shell do |s|
s.path = "vagrant/Install-on-Ubuntu-22.sh"
s.privileged = false
s.args = [checkout]
end
end
config.vm.define "ubuntu22-apache" do |sub|
sub.vm.box = "generic/ubuntu2204"
sub.vm.provision :shell do |s|
s.path = "vagrant/Install-on-Ubuntu-22.sh"
s.privileged = false
s.args = [checkout, "install-apache"]
end
end
config.vm.define "ubuntu22-nginx" do |sub|
sub.vm.box = "generic/ubuntu2204"
sub.vm.provision :shell do |s|
s.path = "vagrant/Install-on-Ubuntu-22.sh"
s.privileged = false
s.args = [checkout, "install-nginx"]
end
end
config.vm.define "ubuntu24" do |sub|
sub.vm.box = "bento/ubuntu-24.04"
if RUBY_PLATFORM.include?('darwin') && RUBY_PLATFORM.include?('arm64')
# Apple M processor
sub.vm.box = 'gutehall/ubuntu24-04'
end
sub.vm.provision :shell do |s|
s.path = "vagrant/Install-on-Ubuntu-24.sh"
s.privileged = false
s.args = [checkout]
end
end
config.vm.define "ubuntu24-apache" do |sub|
sub.vm.box = "bento/ubuntu-24.04"
sub.vm.provision :shell do |s|
s.path = "vagrant/Install-on-Ubuntu-24.sh"
s.privileged = false
s.args = [checkout, "install-apache"]
end
end
config.vm.define "ubuntu24-nginx" do |sub|
sub.vm.box = "bento/ubuntu-24.04"
sub.vm.provision :shell do |s|
s.path = "vagrant/Install-on-Ubuntu-24.sh"
s.privileged = false
s.args = [checkout, "install-nginx"]
end
end
end

4
cmake/script.tmpl Executable file
View File

@@ -0,0 +1,4 @@
#!@PHP_BIN@ -Cq
<?php
require_once(dirname(dirname(__FILE__)).'/settings/settings.php');
require_once(CONST_BasePath.'/@script_source@');

3
cmake/website.tmpl Executable file
View File

@@ -0,0 +1,3 @@
<?php
require_once(dirname(dirname(__FILE__)).'/settings/settings.php');
require_once(CONST_BasePath.'/@script_source@');

View File

@@ -0,0 +1,77 @@
# Fallback Country Boundaries
Each place is assigned a `country_code` and partition. Partitions derive from `country_code`.
Nominatim imports two pre-generated files
* `data/country_name.sql` (country code, name, default language, partition)
* `data/country_osm_grid.sql` (country code, geometry)
before creating places in the database. This helps with fast lookups and missing data (e.g. if the data the user wants to import doesn't contain any country places).
The number of countries in the world can change (South Sudan created 2011, Germany reunification), so can their boundaries. This document explain how the pre-generated files can be updated.
## Country code
Each place is assigned a two letter country_code based on its location, e.g. `gb` for Great Britain. Or `NULL` if no suitable country is found (usually it's in open water then).
In `sql/functions.sql: get_country_code(geometry)` the place's center is checked against
1. country places already imported from the user's data file. Places are imported by rank low-to-high. Lowest rank 2 is countries so most places should be matched. Still the data file might be incomplete.
2. if unmatched: OSM grid boundaries
3. if still unmatched: OSM grid boundaries, but allow a small distance
## Partitions
Each place is assigned partition, which is a number 0..250. 0 is fallback/other.
During place indexing (`sql/functions.sql: placex_insert()`) a place is assigned the partition based on its country code (`sql/functions.sql: get_partition(country_code)`). It checks in the `country_name` table.
Most countries have their own partition, some share a partition. Thus partition counts vary greatly.
Several database tables are split by partition to allow queries to run against less indices and improve caching.
* `location_area_large_<partition>`
* `search_name_<partition>`
* `location_road_<partition>`
## Data files
### data/country_name.sql
Export from existing database table plus manual changes. `country_default_language_code` most taken from [https://wiki.openstreetmap.org/wiki/Nominatim/Country_Codes](), see `utils/country_languages.php`.
### data/country_osm_grid.sql
`country_grid.sql` merges territories by country. Then uses `function.sql: quad_split_geometry` to split each country into multiple [Quadtree](https://en.wikipedia.org/wiki/Quadtree) polygons for faster point-in-polygon lookups.
To visualize one country as geojson feature collection, e.g. for loading into [geojson.io](http://geojson.io/):
```
-- http://www.postgresonline.com/journal/archives/267-Creating-GeoJSON-Feature-Collections-with-JSON-and-PostGIS-functions.html
SELECT row_to_json(fc)
FROM (
SELECT 'FeatureCollection' As type, array_to_json(array_agg(f)) As features
FROM (
SELECT 'Feature' As type,
ST_AsGeoJSON(lg.geometry)::json As geometry,
row_to_json((country_code, area)) As properties
FROM country_osm_grid As lg where country_code='mx'
) As f
) As fc;
```
`cat /tmp/query.sql | psql -At nominatim > /tmp/mexico.quad.geojson`
![mexico](mexico.quad.png)

View File

@@ -0,0 +1,33 @@
-- Script to build a calculated country grid from existing tables
DROP TABLE IF EXISTS tmp_country_osm_grid;
CREATE TABLE tmp_country_osm_grid as select country_name.country_code,st_union(placex.geometry) as geometry from country_name,
placex
where (lower(placex.country_code) = country_name.country_code)
and placex.rank_search < 16 and st_area(placex.geometry) > 0
group by country_name.country_code;
ALTER TABLE tmp_country_osm_grid add column area double precision;
UPDATE tmp_country_osm_grid set area = st_area(geometry::geography);
-- compare old and new
select country_code, round, round(log(area)) from (select distinct country_code,round(log(area)) from country_osm_grid order by country_code) as x
left outer join tmp_country_osm_grid using (country_code) where area is null or round(log(area)) != round;
DROP TABLE IF EXISTS new_country_osm_grid;
CREATE TABLE new_country_osm_grid as select country_code,area,quad_split_geometry(geometry,0.5,20) as geometry from tmp_country_osm_grid;
CREATE INDEX new_idx_country_osm_grid_geometry ON new_country_osm_grid USING GIST (geometry);
-- Sometimes there are problems calculating area due to invalid data - optionally recalc
UPDATE new_country_osm_grid set area = sum from (select country_code,sum(case when st_area(geometry::geography) = 'NaN' THEN 0 ELSE st_area(geometry::geography) END)
from new_country_osm_grid group by country_code) as x where x.country_code = new_country_osm_grid.country_code;
-- compare old and new
select country_code, x.round, y.round from (select distinct country_code,round(log(area)) from country_osm_grid order by country_code) as x
left outer join (select distinct country_code,round(log(area)) from new_country_osm_grid order by country_code) as y
using (country_code) where x.round != y.round;
-- Flip the new table in
BEGIN;
DROP TABLE IF EXISTS country_osm_grid;
ALTER TABLE new_country_osm_grid rename to country_osm_grid;
ALTER INDEX new_idx_country_osm_grid_geometry RENAME TO idx_country_osm_grid_geometry;
COMMIT;

Binary file not shown.

After

Width:  |  Height:  |  Size: 320 KiB

View File

@@ -0,0 +1,56 @@
# GB Postcodes
The server [importing instructions](https://www.nominatim.org/release-docs/latest/admin/Import-and-Update/) allow optionally download [`gb_postcode_data.sql.gz`](https://www.nominatim.org/data/gb_postcode_data.sql.gz). This document explains how the file got created.
## GB vs UK
GB (Great Britain) is more correct as the Ordnance Survey dataset doesn't contain postcodes from Northern Ireland.
## Importing separately after the initial import
If you forgot to download the file, or have a new version, you can import it separately:
1. Import the downloaded `gb_postcode_data.sql.gz` file.
2. Run the SQL query `SELECT count(getorcreate_postcode_id(postcode)) FROM gb_postcode;`. This will update the search index.
3. Run `utils/setup.php --calculate-postcodes` from the build directory. This will copy data form the `gb_postcode` table to the `location_postcodes` table.
## Converting Code-Point Open data
1. Download from [Code-Point® Open](https://www.ordnancesurvey.co.uk/business-and-government/products/code-point-open.html). It requires an email address where a download link will be send to.
2. `unzip codepo_gb.zip`
Unpacked you'll see a directory of CSV files.
$ more codepo_gb/Data/CSV/n.csv
"N1 0AA",10,530626,183961,"E92000001","E19000003","E18000007","","E09000019","E05000368"
"N1 0AB",10,530559,183978,"E92000001","E19000003","E18000007","","E09000019","E05000368"
The coordinates are "Northings" and "Eastings" in [OSGB 1936](http://epsg.io/1314) projection. They can be projected to WGS84 like this
SELECT ST_AsText(ST_Transform(ST_SetSRID('POINT(530626 183961)'::geometry,27700), 4326));
POINT(-0.117872733220225 51.5394424719303)
[-0.117872733220225 51.5394424719303 on OSM map](https://www.openstreetmap.org/?mlon=-0.117872733220225&mlat=51.5394424719303&zoom=16)
3. Create database, import CSV files, add geometry column, dump into file
DBNAME=create_gb_postcode_file
createdb $DBNAME
echo 'CREATE EXTENSION postgis' | psql $DBNAME
cat data/gb_postcode_table.sql | psql $DBNAME
cat codepo_gb/Data/CSV/*.csv | ./data-sources/gb-postcodes/convert_codepoint.php | psql $DBNAME
cat codepo_gb/Doc/licence.txt | iconv -f iso-8859-1 -t utf-8 | dos2unix | sed 's/^/-- /g' > gb_postcode_data.sql
pg_dump -a -t gb_postcode $DBNAME | grep -v '^--' >> gb_postcode_data.sql
gzip -9 -f gb_postcode_data.sql
ls -lah gb_postcode_data.*
# dropdb $DBNAME

View File

@@ -0,0 +1,37 @@
#!/usr/bin/env php
<?php
echo <<< EOT
ALTER TABLE gb_postcode ADD COLUMN easting bigint;
ALTER TABLE gb_postcode ADD COLUMN northing bigint;
TRUNCATE gb_postcode;
COPY gb_postcode (id, postcode, easting, northing) FROM stdin;
EOT;
$iCounter = 0;
while ($sLine = fgets(STDIN)) {
$aColumns = str_getcsv($sLine);
// insert space before the third last position
// https://stackoverflow.com/a/9144834
$postcode = $aColumns[0];
$postcode = preg_replace('/\s*(...)$/', ' $1', $postcode);
echo join("\t", array($iCounter, $postcode, $aColumns[2], $aColumns[3]))."\n";
$iCounter = $iCounter + 1;
}
echo <<< EOT
\.
UPDATE gb_postcode SET geometry=ST_Transform(ST_SetSRID(CONCAT('POINT(', easting, ' ', northing, ')')::geometry, 27700), 4326);
ALTER TABLE gb_postcode DROP COLUMN easting;
ALTER TABLE gb_postcode DROP COLUMN northing;
EOT;

View File

@@ -0,0 +1,26 @@
# US TIGER address data
Convert [TIGER](https://www.census.gov/geo/maps-data/data/tiger.html)/Line dataset of the US Census Bureau to SQL files which can be imported by Nominatim. The created tables in the Nominatim database are separate from OpenStreetMap tables and get queried at search time separately.
The dataset gets updated once per year. Downloading is prone to be slow (can take a full day) and converting them can take hours as well.
Replace '2019' with the current year throughout.
1. Install the GDAL library and python bindings and the unzip tool
# Ubuntu:
sudo apt-get install python3-gdal unzip
2. Get the TIGER 2019 data. You will need the EDGES files
(3,233 zip files, 11GB total).
wget -r ftp://ftp2.census.gov/geo/tiger/TIGER2019/EDGES/
3. Convert the data into SQL statements. Adjust the file paths in the scripts as needed
cd data-sources/us-tiger
./convert.sh <input-path> <output-path>
4. Maybe: package the created files
tar -czf tiger2019-nominatim-preprocessed.tar.gz tiger

View File

@@ -0,0 +1,48 @@
#!/bin/bash
INPATH=$1
OUTPATH=$2
if [[ ! -d "$INPATH" ]]; then
echo "input path does not exist"
exit 1
fi
if [[ ! -d "$OUTPATH" ]]; then
echo "output path does not exist"
exit 1
fi
INREGEX='_([0-9]{5})_edges.zip'
WORKPATH="$OUTPATH/tmp-workdir/"
mkdir -p "$WORKPATH"
INFILES=($INPATH/*.zip)
echo "Found ${#INFILES[*]} files."
for F in ${INFILES[*]}; do
# echo $F
if [[ "$F" =~ $INREGEX ]]; then
COUNTYID=${BASH_REMATCH[1]}
SHAPEFILE="$WORKPATH/$(basename $F '.zip').shp"
SQLFILE="$OUTPATH/$COUNTYID.sql"
unzip -o -q -d "$WORKPATH" "$F"
if [[ ! -e "$SHAPEFILE" ]]; then
echo "Unzip failed. $SHAPEFILE not found."
exit 1
fi
./tiger_address_convert.py "$SHAPEFILE" "$SQLFILE"
rm $WORKPATH/*
fi
done
OUTFILES=($OUTPATH/*.sql)
echo "Wrote ${#OUTFILES[*]} files."
rmdir $WORKPATH

View File

@@ -0,0 +1,620 @@
#!/usr/bin/python3
# Tiger road data to OSM conversion script
# Creates Karlsruhe-style address ways beside the main way
# based on the Massachusetts GIS script by christopher schmidt
#BUGS:
# On very tight curves, a loop may be generated in the address way.
# It would be nice if the ends of the address ways were not pulled back from dead ends
# Ways that include these mtfccs should not be uploaded
# H1100 Connector
# H3010 Stream/River
# H3013 Braided Stream
# H3020 Canal, Ditch or Aqueduct
# L4130 Point-to-Point Line
# L4140 Property/Parcel Line (Including PLSS)
# P0001 Nonvisible Linear Legal/Statistical Boundary
# P0002 Perennial Shoreline
# P0003 Intermittent Shoreline
# P0004 Other non-visible bounding Edge (e.g., Census water boundary, boundary of an areal feature)
ignoremtfcc = [ "H1100", "H3010", "H3013", "H3020", "L4130", "L4140", "P0001", "P0002", "P0003", "P0004" ]
# Sets the distance that the address ways should be from the main way, in feet.
address_distance = 30
# Sets the distance that the ends of the address ways should be pulled back from the ends of the main way, in feet
address_pullback = 45
import sys, os.path, json
try:
from osgeo import ogr
from osgeo import osr
except:
import ogr
import osr
# https://www.census.gov/geo/reference/codes/cou.html
# tiger_county_fips.json was generated from the following:
# wget https://www2.census.gov/geo/docs/reference/codes/files/national_county.txt
# cat national_county.txt | perl -F, -naE'($F[0] ne 'AS') && $F[3] =~ s/ ((city|City|County|District|Borough|City and Borough|Municipio|Municipality|Parish|Island|Census Area)(?:, |\Z))+//; say qq( "$F[1]$F[2]": "$F[3], $F[0]",)'
json_fh = open(os.path.dirname(sys.argv[0]) + "/tiger_county_fips.json")
county_fips_data = json.load(json_fh)
def parse_shp_for_geom_and_tags( filename ):
#ogr.RegisterAll()
dr = ogr.GetDriverByName("ESRI Shapefile")
poDS = dr.Open( filename )
if poDS == None:
raise "Open failed."
poLayer = poDS.GetLayer( 0 )
fieldNameList = []
layerDefinition = poLayer.GetLayerDefn()
for i in range(layerDefinition.GetFieldCount()):
fieldNameList.append(layerDefinition.GetFieldDefn(i).GetName())
# sys.stderr.write(",".join(fieldNameList))
poLayer.ResetReading()
ret = []
poFeature = poLayer.GetNextFeature()
while poFeature:
tags = {}
# WAY ID
tags["tiger:way_id"] = int( poFeature.GetField("TLID") )
# FEATURE IDENTIFICATION
mtfcc = poFeature.GetField("MTFCC");
if mtfcc != None:
if mtfcc == "L4010": #Pipeline
tags["man_made"] = "pipeline"
if mtfcc == "L4020": #Powerline
tags["power"] = "line"
if mtfcc == "L4031": #Aerial Tramway/Ski Lift
tags["aerialway"] = "cable_car"
if mtfcc == "L4110": #Fence Line
tags["barrier"] = "fence"
if mtfcc == "L4125": #Cliff/Escarpment
tags["natural"] = "cliff"
if mtfcc == "L4165": #Ferry Crossing
tags["route"] = "ferry"
if mtfcc == "R1011": #Railroad Feature (Main, Spur, or Yard)
tags["railway"] = "rail"
ttyp = poFeature.GetField("TTYP")
if ttyp != None:
if ttyp == "S":
tags["service"] = "spur"
if ttyp == "Y":
tags["service"] = "yard"
tags["tiger:ttyp"] = ttyp
if mtfcc == "R1051": #Carline, Streetcar Track, Monorail, Other Mass Transit Rail)
tags["railway"] = "light_rail"
if mtfcc == "R1052": #Cog Rail Line, Incline Rail Line, Tram
tags["railway"] = "incline"
if mtfcc == "S1100":
tags["highway"] = "primary"
if mtfcc == "S1200":
tags["highway"] = "secondary"
if mtfcc == "S1400":
tags["highway"] = "residential"
if mtfcc == "S1500":
tags["highway"] = "track"
if mtfcc == "S1630": #Ramp
tags["highway"] = "motorway_link"
if mtfcc == "S1640": #Service Drive usually along a limited access highway
tags["highway"] = "service"
if mtfcc == "S1710": #Walkway/Pedestrian Trail
tags["highway"] = "path"
if mtfcc == "S1720":
tags["highway"] = "steps"
if mtfcc == "S1730": #Alley
tags["highway"] = "service"
tags["service"] = "alley"
if mtfcc == "S1740": #Private Road for service vehicles (logging, oil, fields, ranches, etc.)
tags["highway"] = "service"
tags["access"] = "private"
if mtfcc == "S1750": #Private Driveway
tags["highway"] = "service"
tags["access"] = "private"
tags["service"] = "driveway"
if mtfcc == "S1780": #Parking Lot Road
tags["highway"] = "service"
tags["service"] = "parking_aisle"
if mtfcc == "S1820": #Bike Path or Trail
tags["highway"] = "cycleway"
if mtfcc == "S1830": #Bridle Path
tags["highway"] = "bridleway"
tags["tiger:mtfcc"] = mtfcc
# FEATURE NAME
if poFeature.GetField("FULLNAME"):
#capitalizes the first letter of each word
name = poFeature.GetField( "FULLNAME" )
tags["name"] = name
#Attempt to guess highway grade
if name[0:2] == "I-":
tags["highway"] = "motorway"
if name[0:3] == "US ":
tags["highway"] = "primary"
if name[0:3] == "US-":
tags["highway"] = "primary"
if name[0:3] == "Hwy":
if tags["highway"] != "primary":
tags["highway"] = "secondary"
# TIGER 2017 no longer contains this field
if 'DIVROAD' in fieldNameList:
divroad = poFeature.GetField("DIVROAD")
if divroad != None:
if divroad == "Y" and "highway" in tags and tags["highway"] == "residential":
tags["highway"] = "tertiary"
tags["tiger:separated"] = divroad
statefp = poFeature.GetField("STATEFP")
countyfp = poFeature.GetField("COUNTYFP")
if (statefp != None) and (countyfp != None):
county_name = county_fips_data.get(statefp + '' + countyfp)
if county_name:
tags["tiger:county"] = county_name
# tlid = poFeature.GetField("TLID")
# if tlid != None:
# tags["tiger:tlid"] = tlid
lfromadd = poFeature.GetField("LFROMADD")
if lfromadd != None:
tags["tiger:lfromadd"] = lfromadd
rfromadd = poFeature.GetField("RFROMADD")
if rfromadd != None:
tags["tiger:rfromadd"] = rfromadd
ltoadd = poFeature.GetField("LTOADD")
if ltoadd != None:
tags["tiger:ltoadd"] = ltoadd
rtoadd = poFeature.GetField("RTOADD")
if rtoadd != None:
tags["tiger:rtoadd"] = rtoadd
zipl = poFeature.GetField("ZIPL")
if zipl != None:
tags["tiger:zip_left"] = zipl
zipr = poFeature.GetField("ZIPR")
if zipr != None:
tags["tiger:zip_right"] = zipr
if mtfcc not in ignoremtfcc:
# COPY DOWN THE GEOMETRY
geom = []
rawgeom = poFeature.GetGeometryRef()
for i in range( rawgeom.GetPointCount() ):
geom.append( (rawgeom.GetX(i), rawgeom.GetY(i)) )
ret.append( (geom, tags) )
poFeature = poLayer.GetNextFeature()
return ret
# ====================================
# to do read .prj file for this data
# Change the Projcs_wkt to match your datas prj file.
# ====================================
projcs_wkt = \
"""GEOGCS["GCS_North_American_1983",
DATUM["D_North_American_1983",
SPHEROID["GRS_1980",6378137,298.257222101]],
PRIMEM["Greenwich",0],
UNIT["Degree",0.017453292519943295]]"""
from_proj = osr.SpatialReference()
from_proj.ImportFromWkt( projcs_wkt )
# output to WGS84
to_proj = osr.SpatialReference()
to_proj.SetWellKnownGeogCS( "EPSG:4326" )
tr = osr.CoordinateTransformation( from_proj, to_proj )
import math
def length(segment, nodelist):
'''Returns the length (in feet) of a segment'''
first = True
distance = 0
lat_feet = 364613 #The approximate number of feet in one degree of latitude
for point in segment:
pointid, (lat, lon) = nodelist[ round_point( point ) ]
if first:
first = False
else:
#The approximate number of feet in one degree of longitute
lrad = math.radians(lat)
lon_feet = 365527.822 * math.cos(lrad) - 306.75853 * math.cos(3 * lrad) + 0.3937 * math.cos(5 * lrad)
distance += math.sqrt(((lat - previous[0])*lat_feet)**2 + ((lon - previous[1])*lon_feet)**2)
previous = (lat, lon)
return distance
def addressways(waylist, nodelist, first_id):
id = first_id
lat_feet = 364613 #The approximate number of feet in one degree of latitude
distance = float(address_distance)
ret = []
for waykey, segments in waylist.items():
waykey = dict(waykey)
rsegments = []
lsegments = []
for segment in segments:
lsegment = []
rsegment = []
lastpoint = None
# Don't pull back the ends of very short ways too much
seglength = length(segment, nodelist)
if seglength < float(address_pullback) * 3.0:
pullback = seglength / 3.0
else:
pullback = float(address_pullback)
if "tiger:lfromadd" in waykey:
lfromadd = waykey["tiger:lfromadd"]
else:
lfromadd = None
if "tiger:ltoadd" in waykey:
ltoadd = waykey["tiger:ltoadd"]
else:
ltoadd = None
if "tiger:rfromadd" in waykey:
rfromadd = waykey["tiger:rfromadd"]
else:
rfromadd = None
if "tiger:rtoadd" in waykey:
rtoadd = waykey["tiger:rtoadd"]
else:
rtoadd = None
if rfromadd != None and rtoadd != None:
right = True
else:
right = False
if lfromadd != None and ltoadd != None:
left = True
else:
left = False
if left or right:
first = True
firstpointid, firstpoint = nodelist[ round_point( segment[0] ) ]
finalpointid, finalpoint = nodelist[ round_point( segment[len(segment) - 1] ) ]
for point in segment:
pointid, (lat, lon) = nodelist[ round_point( point ) ]
#The approximate number of feet in one degree of longitute
lrad = math.radians(lat)
lon_feet = 365527.822 * math.cos(lrad) - 306.75853 * math.cos(3 * lrad) + 0.3937 * math.cos(5 * lrad)
#Calculate the points of the offset ways
if lastpoint != None:
#Skip points too close to start
if math.sqrt((lat * lat_feet - firstpoint[0] * lat_feet)**2 + (lon * lon_feet - firstpoint[1] * lon_feet)**2) < pullback:
#Preserve very short ways (but will be rendered backwards)
if pointid != finalpointid:
continue
#Skip points too close to end
if math.sqrt((lat * lat_feet - finalpoint[0] * lat_feet)**2 + (lon * lon_feet - finalpoint[1] * lon_feet)**2) < pullback:
#Preserve very short ways (but will be rendered backwards)
if (pointid != firstpointid) and (pointid != finalpointid):
continue
X = (lon - lastpoint[1]) * lon_feet
Y = (lat - lastpoint[0]) * lat_feet
if Y != 0:
theta = math.pi/2 - math.atan( X / Y)
Xp = math.sin(theta) * distance
Yp = math.cos(theta) * distance
else:
Xp = 0
if X > 0:
Yp = -distance
else:
Yp = distance
if Y > 0:
Xp = -Xp
else:
Yp = -Yp
if first:
first = False
dX = - (Yp * (pullback / distance)) / lon_feet #Pull back the first point
dY = (Xp * (pullback / distance)) / lat_feet
if left:
lpoint = (lastpoint[0] + (Yp / lat_feet) - dY, lastpoint[1] + (Xp / lon_feet) - dX)
lsegment.append( (id, lpoint) )
id += 1
if right:
rpoint = (lastpoint[0] - (Yp / lat_feet) - dY, lastpoint[1] - (Xp / lon_feet) - dX)
rsegment.append( (id, rpoint) )
id += 1
else:
#round the curves
if delta[1] != 0:
theta = abs(math.atan(delta[0] / delta[1]))
else:
theta = math.pi / 2
if Xp != 0:
theta = theta - abs(math.atan(Yp / Xp))
else: theta = theta - math.pi / 2
r = 1 + abs(math.tan(theta/2))
if left:
lpoint = (lastpoint[0] + (Yp + delta[0]) * r / (lat_feet * 2), lastpoint[1] + (Xp + delta[1]) * r / (lon_feet * 2))
lsegment.append( (id, lpoint) )
id += 1
if right:
rpoint = (lastpoint[0] - (Yp + delta[0]) * r / (lat_feet * 2), lastpoint[1] - (Xp + delta[1]) * r / (lon_feet * 2))
rsegment.append( (id, rpoint) )
id += 1
delta = (Yp, Xp)
lastpoint = (lat, lon)
#Add in the last node
dX = - (Yp * (pullback / distance)) / lon_feet
dY = (Xp * (pullback / distance)) / lat_feet
if left:
lpoint = (lastpoint[0] + (Yp + delta[0]) / (lat_feet * 2) + dY, lastpoint[1] + (Xp + delta[1]) / (lon_feet * 2) + dX )
lsegment.append( (id, lpoint) )
id += 1
if right:
rpoint = (lastpoint[0] - Yp / lat_feet + dY, lastpoint[1] - Xp / lon_feet + dX)
rsegment.append( (id, rpoint) )
id += 1
#Generate the tags for ways and nodes
zipr = ''
zipl = ''
name = ''
county = ''
if "tiger:zip_right" in waykey:
zipr = waykey["tiger:zip_right"]
if "tiger:zip_left" in waykey:
zipl = waykey["tiger:zip_left"]
if "name" in waykey:
name = waykey["name"]
if "tiger:county" in waykey:
county = waykey["tiger:county"]
if "tiger:separated" in waykey: # No longer set in Tiger-2017
separated = waykey["tiger:separated"]
else:
separated = "N"
#Write the nodes of the offset ways
if right:
rlinestring = [];
for i, point in rsegment:
rlinestring.append( "%f %f" % (point[1], point[0]) )
if left:
llinestring = [];
for i, point in lsegment:
llinestring.append( "%f %f" % (point[1], point[0]) )
if right:
rsegments.append( rsegment )
if left:
lsegments.append( lsegment )
rtofromint = right #Do the addresses convert to integers?
ltofromint = left #Do the addresses convert to integers?
if right:
try: rfromint = int(rfromadd)
except:
print("Non integer address: %s" % rfromadd)
rtofromint = False
try: rtoint = int(rtoadd)
except:
print("Non integer address: %s" % rtoadd)
rtofromint = False
if left:
try: lfromint = int(lfromadd)
except:
print("Non integer address: %s" % lfromadd)
ltofromint = False
try: ltoint = int(ltoadd)
except:
print("Non integer address: %s" % ltoadd)
ltofromint = False
if right:
id += 1
interpolationtype = "all";
if rtofromint:
if (rfromint % 2) == 0 and (rtoint % 2) == 0:
if separated == "Y": #Doesn't matter if there is another side
interpolationtype = "even";
elif ltofromint and (lfromint % 2) == 1 and (ltoint % 2) == 1:
interpolationtype = "even";
elif (rfromint % 2) == 1 and (rtoint % 2) == 1:
if separated == "Y": #Doesn't matter if there is another side
interpolationtype = "odd";
elif ltofromint and (lfromint % 2) == 0 and (ltoint % 2) == 0:
interpolationtype = "odd";
ret.append( "SELECT tiger_line_import(ST_GeomFromText('LINESTRING(%s)',4326), %s, %s, %s, %s, %s, %s);" %
( ",".join(rlinestring), sql_quote(rfromadd), sql_quote(rtoadd), sql_quote(interpolationtype), sql_quote(name), sql_quote(county), sql_quote(zipr) ) )
if left:
id += 1
interpolationtype = "all";
if ltofromint:
if (lfromint % 2) == 0 and (ltoint % 2) == 0:
if separated == "Y":
interpolationtype = "even";
elif rtofromint and (rfromint % 2) == 1 and (rtoint % 2) == 1:
interpolationtype = "even";
elif (lfromint % 2) == 1 and (ltoint % 2) == 1:
if separated == "Y":
interpolationtype = "odd";
elif rtofromint and (rfromint %2 ) == 0 and (rtoint % 2) == 0:
interpolationtype = "odd";
ret.append( "SELECT tiger_line_import(ST_GeomFromText('LINESTRING(%s)',4326), %s, %s, %s, %s, %s, %s);" %
( ",".join(llinestring), sql_quote(lfromadd), sql_quote(ltoadd), sql_quote(interpolationtype), sql_quote(name), sql_quote(county), sql_quote(zipl) ) )
return ret
def sql_quote( string ):
return "'" + string.replace("'", "''") + "'"
def unproject( point ):
pt = tr.TransformPoint( point[0], point[1] )
return (pt[1], pt[0])
def round_point( point, accuracy=8 ):
return tuple( [ round(x,accuracy) for x in point ] )
def compile_nodelist( parsed_gisdata, first_id=1 ):
nodelist = {}
i = first_id
for geom, tags in parsed_gisdata:
if len( geom )==0:
continue
for point in geom:
r_point = round_point( point )
if r_point not in nodelist:
nodelist[ r_point ] = (i, unproject( point ))
i += 1
return (i, nodelist)
def adjacent( left, right ):
left_left = round_point(left[0])
left_right = round_point(left[-1])
right_left = round_point(right[0])
right_right = round_point(right[-1])
return ( left_left == right_left or
left_left == right_right or
left_right == right_left or
left_right == right_right )
def glom( left, right ):
left = list( left )
right = list( right )
left_left = round_point(left[0])
left_right = round_point(left[-1])
right_left = round_point(right[0])
right_right = round_point(right[-1])
if left_left == right_left:
left.reverse()
return left[0:-1] + right
if left_left == right_right:
return right[0:-1] + left
if left_right == right_left:
return left[0:-1] + right
if left_right == right_right:
right.reverse()
return left[0:-1] + right
raise 'segments are not adjacent'
def glom_once( segments ):
if len(segments)==0:
return segments
unsorted = list( segments )
x = unsorted.pop(0)
while len( unsorted ) > 0:
n = len( unsorted )
for i in range(0, n):
y = unsorted[i]
if adjacent( x, y ):
y = unsorted.pop(i)
x = glom( x, y )
break
# Sorted and unsorted lists have no adjacent segments
if len( unsorted ) == n:
break
return x, unsorted
def glom_all( segments ):
unsorted = segments
chunks = []
while unsorted != []:
chunk, unsorted = glom_once( unsorted )
chunks.append( chunk )
return chunks
def compile_waylist( parsed_gisdata ):
waylist = {}
#Group by tiger:way_id
for geom, tags in parsed_gisdata:
way_key = tags.copy()
way_key = ( way_key['tiger:way_id'], tuple( [(k,v) for k,v in way_key.items()] ) )
if way_key not in waylist:
waylist[way_key] = []
waylist[way_key].append( geom )
ret = {}
for (way_id, way_key), segments in waylist.items():
ret[way_key] = glom_all( segments )
return ret
def shape_to_sql( shp_filename, sql_filename ):
print("parsing shpfile %s" % shp_filename)
parsed_features = parse_shp_for_geom_and_tags( shp_filename )
print("compiling nodelist")
i, nodelist = compile_nodelist( parsed_features )
print("compiling waylist")
waylist = compile_waylist( parsed_features )
print("preparing address ways")
sql_lines = addressways(waylist, nodelist, i)
print("writing %s" % sql_filename)
fp = open( sql_filename, "w" )
fp.write( "\n".join( sql_lines ) )
fp.close()
if __name__ == '__main__':
import sys, os.path
if len(sys.argv) < 3:
print("%s input.shp output.sql" % sys.argv[0])
sys.exit()
shp_filename = sys.argv[1]
sql_filename = sys.argv[2]
shape_to_sql(shp_filename, sql_filename)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,58 @@
## Add Wikipedia and Wikidata to Nominatim
OSM contributors frequently tag items with links to Wikipedia and Wikidata. Nominatim can use the page ranking of Wikipedia pages to help indicate the relative importance of osm features. This is done by calculating an importance score between 0 and 1 based on the number of inlinks to an article for a location. If two places have the same name and one is more important than the other, the wikipedia score often points to the correct place.
These scripts extract and prepare both Wikipedia page rank and Wikidata links for use in Nominatim.
#### Create a new postgres DB for Processing
Due to the size of initial and intermediate tables, processing can be done in an external database:
```
CREATE DATABASE wikiprocessingdb;
```
---
Wikipedia
---
Processing these data requires a large amount of disk space (~1TB) and considerable time (>24 hours).
#### Import & Process Wikipedia tables
This step downloads and converts [Wikipedia](https://dumps.wikimedia.org/) page data SQL dumps to postgreSQL files which can be imported and processed with pagelink information from Wikipedia language sites to calculate importance scores.
- The script will processes data from whatever set of Wikipedia languages are specified in the initial languages array
- Note that processing the top 40 Wikipedia languages can take over a day, and will add nearly 1TB to the processing database. The final output tables will be approximately 11GB and 2GB in size
To download, convert, and import the data, then process summary statistics and compute importance scores, run:
```
./wikipedia_import.sh
```
---
Wikidata
---
This script downloads and processes Wikidata to enrich the previously created Wikipedia tables for use in Nominatim.
#### Import & Process Wikidata
This step downloads and converts [Wikidata](https://dumps.wikimedia.org/wikidatawiki/) page data SQL dumps to postgreSQL files which can be processed and imported into Nominatim database. Also utilizes Wikidata Query Service API to discover and include place types.
- Script presumes that the user has already processed Wikipedia tables as specified above
- Script requires wikidata_place_types.txt and wikidata_place_type_levles.csv
- script requires the [jq json parser](https://stedolan.github.io/jq/)
- Script processes data from whatever set of Wikipedia languages are specified in the initial languages array
- Script queries Wikidata Query Service API and imports all instances of place types listed in wikidata_place_types.txt
- Script updates wikipedia_articles table with extracted wikidata
By including Wikidata in the wikipedia_articles table, new connections can be made on the fly from the Nominatim placex table to wikipedia_article importance scores.
To download, convert, and import the data, then process required items, run:
```
./wikidata_import.sh
```

View File

@@ -0,0 +1,95 @@
#!/bin/bash
psqlcmd() {
psql wikiprocessingdb
}
mysql2pgsqlcmd() {
./mysql2pgsql.perl /dev/stdin /dev/stdout
}
# list the languages to process (refer to List of Wikipedias here: https://en.wikipedia.org/wiki/List_of_Wikipedias)
language=( "ar" "bg" "ca" "cs" "da" "de" "en" "es" "eo" "eu" "fa" "fr" "ko" "hi" "hr" "id" "it" "he" "lt" "hu" "ms" "nl" "ja" "no" "pl" "pt" "kk" "ro" "ru" "sk" "sl" "sr" "fi" "sv" "tr" "uk" "vi" "vo" "war" "zh" )
# get a few wikidata dump tables
wget https://dumps.wikimedia.org/wikidatawiki/latest/wikidatawiki-latest-geo_tags.sql.gz
wget https://dumps.wikimedia.org/wikidatawiki/latest/wikidatawiki-latest-page.sql.gz
wget https://dumps.wikimedia.org/wikidatawiki/latest/wikidatawiki-latest-wb_items_per_site.sql.gz
# import wikidata tables
gzip -dc wikidatawiki-latest-geo_tags.sql.gz | mysql2pgsqlcmd | psqlcmd
gzip -dc wikidatawiki-latest-page.sql.gz | mysql2pgsqlcmd | psqlcmd
gzip -dc wikidatawiki-latest-wb_items_per_site.sql.gz | mysql2pgsqlcmd | psqlcmd
# get wikidata places from wikidata query API
while read F ; do
wget "https://query.wikidata.org/bigdata/namespace/wdq/sparql?format=json&query=SELECT ?item WHERE{?item wdt:P31*/wdt:P279*wd:$F;}" -O $F.json
jq -r '.results | .[] | .[] | [.item.value] | @csv' $F.json >> $F.txt
awk -v qid=$F '{print $0 ","qid}' $F.txt | sed -e 's!"http://www.wikidata.org/entity/!!' | sed 's/"//g' >> $F.csv
cat $F.csv >> wikidata_place_dump.csv
rm $F.json $F.txt $F.csv
done < wikidata_place_types.txt
# import wikidata places
echo "CREATE TABLE wikidata_place_dump (item text, instance_of text);" | psqlcmd
echo "COPY wikidata_place_dump (item, instance_of) FROM '/srv/nominatim/Nominatim/data-sources/wikipedia-wikidata/wikidata_place_dump.csv' DELIMITER ',' CSV;" | psqlcmd
echo "CREATE TABLE wikidata_place_type_levels (place_type text, level integer);" | psqlcmd
echo "COPY wikidata_place_type_levels (place_type, level) FROM '/srv/nominatim/Nominatim/data-sources/wikipedia-wikidata/wikidata_place_type_levels.csv' DELIMITER ',' CSV HEADER;" | psqlcmd
# create derived tables
echo "CREATE TABLE geo_earth_primary AS SELECT gt_page_id, gt_lat, gt_lon FROM geo_tags WHERE gt_globe = 'earth' AND gt_primary = 1 AND NOT( gt_lat < -90 OR gt_lat > 90 OR gt_lon < -180 OR gt_lon > 180 OR gt_lat=0 OR gt_lon=0) ;" | psqlcmd
echo "CREATE TABLE geo_earth_wikidata AS SELECT DISTINCT geo_earth_primary.gt_page_id, geo_earth_primary.gt_lat, geo_earth_primary.gt_lon, page.page_title, page.page_namespace FROM geo_earth_primary LEFT OUTER JOIN page ON (geo_earth_primary.gt_page_id = page.page_id) ORDER BY geo_earth_primary.gt_page_id;" | psqlcmd
echo "ALTER TABLE wikidata_place_dump ADD COLUMN ont_level integer, ADD COLUMN lat numeric(11,8), ADD COLUMN lon numeric(11,8);" | psqlcmd
echo "UPDATE wikidata_place_dump SET ont_level = wikidata_place_type_levels.level FROM wikidata_place_type_levels WHERE wikidata_place_dump.instance_of = wikidata_place_type_levels.place_type;" | psqlcmd
echo "CREATE TABLE wikidata_places AS SELECT DISTINCT ON (item) item, instance_of, MAX(ont_level) AS ont_level, lat, lon FROM wikidata_place_dump GROUP BY item, instance_of, ont_level, lat, lon ORDER BY item;" | psqlcmd
echo "UPDATE wikidata_places SET lat = geo_earth_wikidata.gt_lat, lon = geo_earth_wikidata.gt_lon FROM geo_earth_wikidata WHERE wikidata_places.item = geo_earth_wikidata.page_title" | psqlcmd
# process language pages
echo "CREATE TABLE wikidata_pages (item text, instance_of text, lat numeric(11,8), lon numeric(11,8), ips_site_page text, language text );" | psqlcmd
for i in "${language[@]}"
do
echo "CREATE TABLE wikidata_${i}_pages as select wikidata_places.item, wikidata_places.instance_of, wikidata_places.lat, wikidata_places.lon, wb_items_per_site.ips_site_page FROM wikidata_places LEFT JOIN wb_items_per_site ON (CAST (( LTRIM(wikidata_places.item, 'Q')) AS INTEGER) = wb_items_per_site.ips_item_id) WHERE ips_site_id = '${i}wiki' AND LEFT(wikidata_places.item,1) = 'Q' order by wikidata_places.item;" | psqlcmd
echo "ALTER TABLE wikidata_${i}_pages ADD COLUMN language text;" | psqlcmd
echo "UPDATE wikidata_${i}_pages SET language = '${i}';" | psqlcmd
echo "INSERT INTO wikidata_pages SELECT item, instance_of, lat, lon, ips_site_page, language FROM wikidata_${i}_pages;" | psqlcmd
done
echo "ALTER TABLE wikidata_pages ADD COLUMN wp_page_title text;" | psqlcmd
echo "UPDATE wikidata_pages SET wp_page_title = REPLACE(ips_site_page, ' ', '_');" | psqlcmd
echo "ALTER TABLE wikidata_pages DROP COLUMN ips_site_page;" | psqlcmd
# add wikidata to wikipedia_article table
echo "UPDATE wikipedia_article SET lat = wikidata_pages.lat, lon = wikidata_pages.lon, wd_page_title = wikidata_pages.item, instance_of = wikidata_pages.instance_of FROM wikidata_pages WHERE wikipedia_article.language = wikidata_pages.language AND wikipedia_article.title = wikidata_pages.wp_page_title;" | psqlcmd
echo "CREATE TABLE wikipedia_article_slim AS SELECT * FROM wikipedia_article WHERE wikidata_id IS NOT NULL;" | psqlcmd
echo "ALTER TABLE wikipedia_article RENAME TO wikipedia_article_full;" | psqlcmd
echo "ALTER TABLE wikipedia_article_slim RENAME TO wikipedia_article;" | psqlcmd
# clean up intermediate tables
echo "DROP TABLE wikidata_place_dump;" | psqlcmd
echo "DROP TABLE geo_earth_primary;" | psqlcmd
for i in "${language[@]}"
do
echo "DROP TABLE wikidata_${i}_pages;" | psqlcmd
done

View File

@@ -0,0 +1,77 @@
#!/bin/bash
psqlcmd() {
psql wikiprocessingdb
}
mysql2pgsqlcmd() {
./mysql2pgsql.perl /dev/stdin /dev/stdout
}
# list the languages to process (refer to List of Wikipedias here: https://en.wikipedia.org/wiki/List_of_Wikipedias)
language=( "ar" "bg" "ca" "cs" "da" "de" "en" "es" "eo" "eu" "fa" "fr" "ko" "hi" "hr" "id" "it" "he" "lt" "hu" "ms" "nl" "ja" "no" "pl" "pt" "kk" "ro" "ru" "sk" "sl" "sr" "fi" "sv" "tr" "uk" "vi" "vo" "war" "zh" )
# create wikipedia calculation tables
echo "CREATE TABLE linkcounts (language text, title text, count integer, sumcount integer, lat double precision, lon double precision);" | psqlcmd
echo "CREATE TABLE wikipedia_article (language text NOT NULL, title text NOT NULL, langcount integer, othercount integer, totalcount integer, lat double precision, lon double precision, importance double precision, title_en text, osm_type character(1), osm_id bigint );" | psqlcmd
echo "CREATE TABLE wikipedia_redirect (language text, from_title text, to_title text );" | psqlcmd
# download individual wikipedia language tables
for i in "${language[@]}"
do
wget https://dumps.wikimedia.org/${i}wiki/latest/${i}wiki-latest-page.sql.gz
wget https://dumps.wikimedia.org/${i}wiki/latest/${i}wiki-latest-pagelinks.sql.gz
wget https://dumps.wikimedia.org/${i}wiki/latest/${i}wiki-latest-langlinks.sql.gz
wget https://dumps.wikimedia.org/${i}wiki/latest/${i}wiki-latest-redirect.sql.gz
done
# import individual wikipedia language tables
for i in "${language[@]}"
do
gzip -dc ${i}wiki-latest-pagelinks.sql.gz | sed "s/\`pagelinks\`/\`${i}pagelinks\`/g" | mysql2pgsqlcmd | psqlcmd
gzip -dc ${i}wiki-latest-page.sql.gz | sed "s/\`page\`/\`${i}page\`/g" | mysql2pgsqlcmd | psqlcmd
gzip -dc ${i}wiki-latest-langlinks.sql.gz | sed "s/\`langlinks\`/\`${i}langlinks\`/g" | mysql2pgsqlcmd | psqlcmd
gzip -dc ${i}wiki-latest-redirect.sql.gz | sed "s/\`redirect\`/\`${i}redirect\`/g" | mysql2pgsqlcmd | psqlcmd
done
# process language tables and associated pagelink counts
for i in "${language[@]}"
do
echo "create table ${i}pagelinkcount as select pl_title as title,count(*) as count from ${i}pagelinks where pl_namespace = 0 group by pl_title;" | psqlcmd
echo "insert into linkcounts select '${i}',pl_title,count(*) from ${i}pagelinks where pl_namespace = 0 group by pl_title;" | psqlcmd
echo "insert into wikipedia_redirect select '${i}',page_title,rd_title from ${i}redirect join ${i}page on (rd_from = page_id) where page_namespace = 0 and rd_namespace = 0;" | psqlcmd
echo "alter table ${i}pagelinkcount add column othercount integer;" | psqlcmd
echo "update ${i}pagelinkcount set othercount = 0;" | psqlcmd
for j in "${language[@]}"
do
echo "update ${i}pagelinkcount set othercount = ${i}pagelinkcount.othercount + x.count from (select page_title as title,count from ${i}langlinks join ${i}page on (ll_from = page_id) join ${j}pagelinkcount on (ll_lang = '${j}' and ll_title = title)) as x where x.title = ${i}pagelinkcount.title;" | psqlcmd
done
echo "insert into wikipedia_article select '${i}', title, count, othercount, count+othercount from ${i}pagelinkcount;" | psqlcmd
done
# calculate importance score for each wikipedia page
echo "update wikipedia_article set importance = log(totalcount)/log((select max(totalcount) from wikipedia_article))" | psqlcmd
# clean up intermediate tables to conserve space
for i in "${language[@]}"
do
echo "DROP TABLE ${i}pagelinks;" | psqlcmd
echo "DROP TABLE ${i}page;" | psqlcmd
echo "DROP TABLE ${i}langlinks;" | psqlcmd
echo "DROP TABLE ${i}redirect;" | psqlcmd
echo "DROP TABLE ${i}pagelinkcount;" | psqlcmd
done

View File

@@ -0,0 +1,951 @@
#!/usr/bin/perl -w
# mysql2pgsql
# MySQL to PostgreSQL dump file converter
#
# For usage: perl mysql2pgsql.perl --help
#
# ddl statments are changed but none or only minimal real data
# formatting are done.
# data consistency is up to the DBA.
#
# (c) 2004-2007 Jose M Duarte and Joseph Speigle ... gborg
#
# (c) 2000-2004 Maxim Rudensky <fonin@omnistaronline.com>
# (c) 2000 Valentine Danilchuk <valdan@ziet.zhitomir.ua>
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# 3. All advertising materials mentioning features or use of this software
# must display the following acknowledgement:
# This product includes software developed by the Max Rudensky
# and its contributors.
# 4. Neither the name of the author nor the names of its contributors
# may be used to endorse or promote products derived from this software
# without specific prior written permission.
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
use Getopt::Long;
use POSIX;
use strict;
use warnings;
# main sections
# -------------
# 1 variable declarations
# 2 subroutines
# 3 get commandline options and specify help statement
# 4 loop through file and process
# 5. print_plpgsql function prototype
#################################################################
# 1. variable declarations
#################################################################
# command line options
my( $ENC_IN, $ENC_OUT, $PRESERVE_CASE, $HELP, $DEBUG, $SCHEMA, $LOWERCASE, $CHAR2VARCHAR, $NODROP, $SEP_FILE, $opt_debug, $opt_help, $opt_schema, $opt_preserve_case, $opt_char2varchar, $opt_nodrop, $opt_sepfile, $opt_enc_in, $opt_enc_out );
# variables for constructing pre-create-table entities
my $pre_create_sql=''; # comments, 'enum' constraints preceding create table statement
my $auto_increment_seq= ''; # so we can easily substitute it if we need a default value
my $create_sql=''; # all the datatypes in the create table section
my $post_create_sql=''; # create indexes, foreign keys, table comments
my $function_create_sql = ''; # for the set (function,trigger) and CURRENT_TIMESTAMP ( function,trigger )
# constraints
my ($type, $column_valuesStr, @column_values, $value );
my %constraints=(); # holds values constraints used to emulate mysql datatypes (e.g. year, set)
# datatype conversion variables
my ( $index,$seq);
my ( $column_name, $col, $quoted_column);
my ( @year_holder, $year, $constraint_table_name);
my $table=""; # table_name for create sql statements
my $table_no_quotes=""; # table_name for create sql statements
my $sl = '^\s+\w+\s+'; # matches the column name
my $tables_first_timestamp_column= 1; # decision to print warnings about default_timestamp not being in postgres
my $mysql_numeric_datatypes = "TINYINT|SMALLINT|MEDIUMINT|INT|INTEGER|BIGINT|REAL|DOUBLE|FLOAT|DECIMAL|NUMERIC";
my $mysql_datetime_datatypes = "|DATE|TIME|TIMESTAMP|DATETIME|YEAR";
my $mysql_text_datatypes = "CHAR|VARCHAR|BINARY|VARBINARY|TINYBLOB|BLOB|MEDIUMBLOB|LONGBLOB|TINYTEXT|TEXT|MEDIUMTEXT|LONGTEXT|ENUM|SET";
my $mysql_datatypesStr = $mysql_numeric_datatypes . "|". $mysql_datetime_datatypes . "|". $mysql_text_datatypes ;
# handling INSERT INTO statements
my $rowRe = qr{
\( # opening parens
( # (start capture)
(?: # (start group)
' # string start
[^'\\]* # up to string-end or backslash (escape)
(?: # (start group)
\\. # gobble escaped character
[^'\\]* # up to string-end of backslash
)* # (end group, repeat zero or more)
' # string end
| # (OR)
.*? # everything else (not strings)
)* # (end group, repeat zero or more)
) # (end capture)
\) # closing parent
}x;
my ($insert_table, $valueString);
#
########################################################
# 2. subroutines
#
# get_identifier
# print_post_create_sql()
# quote_and_lc()
# make_plpgsql($table,$column_name) -- at end of file
########################################################
# returns an identifier with the given suffix doing controlled
# truncation if necessary
sub get_identifier($$$) {
my ($table, $col, $suffix) = @_;
my $name = '';
$table=~s/\"//g; # make sure that $table doesn't have quotes so we don't end up with redundant quoting
# in the case of multiple columns
my @cols = split(/,/,$col);
$col =~ s/,//g;
# in case all columns together too long we have to truncate them
if (length($col) > 55) {
my $totaltocut = length($col)-55;
my $tocut = ceil($totaltocut / @cols);
@cols = map {substr($_,0,abs(length($_)-$tocut))} @cols;
$col="";
foreach (@cols){
$col.=$_;
}
}
my $max_table_length = 63 - length("_${col}_$suffix");
if (length($table) > $max_table_length) {
$table = substr($table, length($table) - $max_table_length, $max_table_length);
}
return quote_and_lc("${table}_${col}_${suffix}");
}
#
#
# called when we encounter next CREATE TABLE statement
# also called at EOF to print out for last table
# prints comments, indexes, foreign key constraints (the latter 2 possibly to a separate file)
sub print_post_create_sql() {
my ( @create_idx_comments_constraints_commandsArr, $stmts, $table_field_combination);
my %stmts;
# loop to check for duplicates in $post_create_sql
# Needed because of duplicate key declarations ( PRIMARY KEY and KEY), auto_increment columns
@create_idx_comments_constraints_commandsArr = split(';\n?', $post_create_sql);
if ($SEP_FILE) {
open(SEP_FILE, ">>:encoding($ENC_OUT)", $SEP_FILE) or die "Unable to open $SEP_FILE for output: $!\n";
}
foreach (@create_idx_comments_constraints_commandsArr) {
if (m/CREATE INDEX "*(\S+)"*\s/i) { # CREATE INDEX korean_english_wordsize_idx ON korean_english USING btree (wordsize);
$table_field_combination = $1;
# if this particular table_field_combination was already used do not print the statement:
if ($SEP_FILE) {
print SEP_FILE "$_;\n" if !defined($stmts{$table_field_combination});
} else {
print OUT "$_;\n" if !defined($stmts{$table_field_combination});
}
$stmts{$table_field_combination} = 1;
}
elsif (m/COMMENT/i) { # COMMENT ON object IS 'text'; but comment may be part of table name so use 'elsif'
print OUT "$_;\n"
} else { # foreign key constraint or comments (those preceded by -- )
if ($SEP_FILE) {
print SEP_FILE "$_;\n";
} else {
print OUT "$_;\n"
}
}
}
if ($SEP_FILE) {
close SEP_FILE;
}
$post_create_sql='';
# empty %constraints for next " create table" statement
}
# quotes a string or a multicolumn string (comma separated)
# and optionally lowercase (if LOWERCASE is set)
# lowercase .... if user wants default postgres behavior
# quotes .... to preserve keywords and to preserve case when case-sensitive tables are to be used
sub quote_and_lc($)
{
my $col = shift;
if ($LOWERCASE) {
$col = lc($col);
}
if ($col =~ m/,/) {
my @cols = split(/,\s?/, $col);
@cols = map {"\"$_\""} @cols;
return join(', ', @cols);
} else {
return "\"$col\"";
}
}
########################################################
# 3. get commandline options and maybe print help
########################################################
GetOptions("help", "debug"=> \$opt_debug, "schema=s" => \$SCHEMA, "preserve_case" => \$opt_preserve_case, "char2varchar" => \$opt_char2varchar, "nodrop" => \$opt_nodrop, "sepfile=s" => \$opt_sepfile, "enc_in=s" => \$opt_enc_in, "enc_out=s" => \$opt_enc_out );
$HELP = $opt_help || 0;
$DEBUG = $opt_debug || 0;
$PRESERVE_CASE = $opt_preserve_case || 0;
if ($PRESERVE_CASE == 1) { $LOWERCASE = 0; }
else { $LOWERCASE = 1; }
$CHAR2VARCHAR = $opt_char2varchar || 0;
$NODROP = $opt_nodrop || 0;
$SEP_FILE = $opt_sepfile || 0;
$ENC_IN = $opt_enc_in || 'utf8';
$ENC_OUT = $opt_enc_out || 'utf8';
if (($HELP) || ! defined($ARGV[0]) || ! defined($ARGV[1])) {
print "\n\nUsage: perl $0 {--help --debug --preserve_case --char2varchar --nodrop --schema --sepfile --enc_in --enc_out } mysql.sql pg.sql\n";
print "\t* OPTIONS WITHOUT ARGS\n";
print "\t--help: prints this message \n";
print "\t--debug: output the commented-out mysql line above the postgres line in pg.sql \n";
print "\t--preserve_case: prevents automatic case-lowering of column and table names\n";
print "\t\tIf you want to preserve case, you must set this flag. For example,\n";
print "\t\tIf your client application quotes table and column-names and they have cases in them, set this flag\n";
print "\t--char2varchar: converts all char fields to varchar\n";
print "\t--nodrop: strips out DROP TABLE statements\n";
print "\t\totherise harmless warnings are printed by psql when the dropped table does not exist\n";
print "\n\t* OPTIONS WITH ARGS\n";
print "\t--schema: outputs a line into the postgres sql file setting search_path \n";
print "\t--sepfile: output foreign key constraints and indexes to a separate file so that it can be\n";
print "\t\timported after large data set is inserted from another dump file\n";
print "\t--enc_in: encoding of mysql in file (default utf8) \n";
print "\t--enc_out: encoding of postgres out file (default utf8) \n";
print "\n\t* REQUIRED ARGUMENTS\n";
if (defined ($ARGV[0])) {
print "\tmysql.sql ($ARGV[0])\n";
} else {
print "\tmysql.sql (undefined)\n";
}
if (defined ($ARGV[1])) {
print "\tpg.sql ($ARGV[1])\n";
} else {
print "\tpg.sql (undefined)\n";
}
print "\n";
exit 1;
}
########################################################
# 4. process through mysql_dump.sql file
# in a big loop
########################################################
# open in and out files
open(IN,"<:encoding($ENC_IN)", $ARGV[0]) || die "can't open mysql dump file $ARGV[0]";
open(OUT,">:encoding($ENC_OUT)", $ARGV[1]) || die "can't open pg dump file $ARGV[1]";
# output header
print OUT "--\n";
print OUT "-- Generated from mysql2pgsql.perl\n";
print OUT "-- http://gborg.postgresql.org/project/mysql2psql/\n";
print OUT "-- (c) 2001 - 2007 Jose M. Duarte, Joseph Speigle\n";
print OUT "--\n";
print OUT "\n";
print OUT "-- warnings are printed for drop tables if they do not exist\n";
print OUT "-- please see http://archives.postgresql.org/pgsql-novice/2004-10/msg00158.php\n\n";
print OUT "-- ##############################################################\n";
if ($SCHEMA ) {
print OUT "set search_path='" . $SCHEMA . "'\\g\n" ;
}
# loop through mysql file on a per-line basis
while(<IN>) {
############## flow #########################
# (the lines are directed to different string variables at different times)
#
# handle drop table , unlock, connect statements
# if ( start of create table) {
# print out post_create table (indexes, foreign key constraints, comments from previous table)
# add drop table statement if !$NODROP to pre_create_sql
# next;
# }
# else if ( inside create table) {
# add comments in this portion to create_sql
# if ( end of create table) {
# delete mysql-unique CREATE TABLE commands
# print pre_create_sql
# print the constraint tables for set and year datatypes
# print create_sql
# print function_create_sql (this is for the enum columns only)
# next;
# }
# do substitutions
# -- NUMERIC DATATYPES
# -- CHARACTER DATATYPES
# -- DATE AND TIME DATATYPES
# -- KEY AND UNIQUE CREATIONS
# and append them to create_sql
# } else {
# print inserts on-the-spot (this script only changes default timestamp of 0000-00-00)
# }
# LOOP until EOF
#
########################################################
if (!/^\s*insert into/i) { # not inside create table so don't worry about data corruption
s/`//g; # '`pgsql uses no backticks to denote table name (CREATE TABLE `sd`) or around field
# and table names like mysql
# doh! we hope all dashes and special chars are caught by the regular expressions :)
}
if (/^\s*USE\s*([^;]*);/) {
print OUT "\\c ". $1;
next;
}
if (/^(UN)?LOCK TABLES/i || /drop\s+table/i ) {
# skip
# DROP TABLE is added when we see the CREATE TABLE
next;
}
if (/(create\s+table\s+)([-_\w]+)\s/i) { # example: CREATE TABLE `english_english`
print_post_create_sql(); # for last table
$tables_first_timestamp_column= 1; # decision to print warnings about default_timestamp not being in postgres
$create_sql = '';
$table_no_quotes = $2 ;
$table=quote_and_lc($2);
if ( !$NODROP ) { # always print drop table if user doesn't explicitly say not to
# to drop a table that is referenced by a view or a foreign-key constraint of another table,
# CASCADE must be specified. (CASCADE will remove a dependent view entirely, but in the
# in the foreign-key case it will only remove the foreign-key constraint, not the other table entirely.)
# (source: 8.1.3 docs, section "drop table")
warn "table $table will be dropped CASCADE\n";
$pre_create_sql .= "DROP TABLE $table CASCADE;\n"; # custom dumps may be missing the 'dump' commands
}
s/(create\s+table\s+)([-_\w]+)\s/$1 $table /i;
if ($DEBUG) {
$create_sql .= '-- ' . $_;
}
$create_sql .= $_;
next;
}
if ($create_sql ne "") { # we are inside create table statement so lets process datatypes
# print out comments or empty lines in context
if ($DEBUG) {
$create_sql .= '-- ' . $_;
}
if (/^#/ || /^$/ || /^\s*--/) {
s/^#/--/; # Two hyphens (--) is the SQL-92 standard indicator for comments
$create_sql.=$_;
next;
}
if (/\).*;/i) { # end of create table squence
s/INSERT METHOD[=\s+][^;\s]+//i;
s/PASSWORD=[^;\s]+//i;
s/ROW_FORMAT=(?:DEFAULT|DYNAMIC|FIXED|COMPRESSED|REDUNDANT|COMPACT)+//i;
s/KEY_BLOCK_SIZE=8//i;
s/DELAY KEY WRITE=[^;\s]+//i;
s/INDEX DIRECTORY[=\s+][^;\s]+//i;
s/DATA DIRECTORY=[^;\s]+//i;
s/CONNECTION=[^;\s]+//i;
s/CHECKSUM=[^;\s]+//i;
s/Type=[^;\s]+//i; # ISAM , # older versions
s/COLLATE=[^;\s]+//i; # table's collate
s/COLLATE\s+[^;\s]+//i; # table's collate
# possible AUTO_INCREMENT starting index, it is used in mysql 5.0.26, not sure since which version
if (/AUTO_INCREMENT=(\d+)/i) {
# should take < ---- ) ENGINE=MyISAM AUTO_INCREMENT=16 DEFAULT CHARSET=latin1;
# and should ouput ---> CREATE SEQUENCE "rhm_host_info_id_seq" START WITH 16;
my $start_value = $1;
print $auto_increment_seq . "--\n";
# print $pre_create_sql . "--\n";
$pre_create_sql =~ s/(CREATE SEQUENCE $auto_increment_seq )/$1 START WITH $start_value /;
}
s/AUTO_INCREMENT=\d+//i;
s/PACK_KEYS=\d//i; # mysql 5.0.22
s/DEFAULT CHARSET=[^;\s]+//i; # my mysql version is 4.1.11
s/ENGINE\s*=\s*[^;\s]+//i; # my mysql version is 4.1.11
s/ROW_FORMAT=[^;\s]+//i; # my mysql version is 5.0.22
s/KEY_BLOCK_SIZE=8//i;
s/MIN_ROWS=[^;\s]+//i;
s/MAX_ROWS=[^;\s]+//i;
s/AVG_ROW_LENGTH=[^;\s]+//i;
if (/COMMENT='([^']*)'/) { # ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='must be country zones';
$post_create_sql.="COMMENT ON TABLE $table IS '$1'\;"; # COMMENT ON table_name IS 'text';
s/COMMENT='[^']*'//i;
}
$create_sql =~ s/,$//g; # strip last , inside create table
# make sure we end in a comma, as KEY statments are turned
# into post_create_sql indices
# they often are the last line so leaving a 'hanging comma'
my @array = split("\n", $create_sql);
for (my $a = $#array; $a >= 0; $a--) { #loop backwards
if ($a == $#array && $array[$a] =~ m/,\s*$/) { # for last line
$array[$a] =~ s/,\s*$//;
next;
}
if ($array[$a] !~ m/create table/i) { # i.e. if there was more than one column in table
if ($a != $#array && $array[$a] !~ m/,\s*$/ ) { # for second to last
$array[$a] =~ s/$/,/;
last;
}
elsif ($a != $#array && $array[$a] =~ m/,\s*$/ ) { # for second to last
last;
}
}
}
$create_sql = join("\n", @array) . "\n";
$create_sql .= $_;
# put comments out first
print OUT $pre_create_sql;
# create separate table to reference and to hold mysql's possible set data-type
# values. do that table's creation before create table
# definition
foreach $column_name (keys %constraints) {
$type=$constraints{$column_name}{'type'};
$column_valuesStr = $constraints{$column_name}{'values'};
$constraint_table_name = get_identifier(${table},${column_name} ,"constraint_table");
if ($type eq 'set') {
print OUT qq~DROP TABLE $constraint_table_name CASCADE\\g\n~ ;
print OUT qq~create table $constraint_table_name ( set_values varchar UNIQUE)\\g\n~ ;
$function_create_sql .= make_plpgsql($table,$column_name);
} elsif ($type eq 'year') {
print OUT qq~DROP TABLE $constraint_table_name CASCADE\\g\n~ ;
print OUT qq~create table $constraint_table_name ( year_values varchar UNIQUE)\\g\n~ ;
}
@column_values = split /,/, $column_valuesStr;
foreach $value (@column_values) {
print OUT qq~insert into $constraint_table_name values ( $value )\\g\n~; # ad ' for ints and varchars
}
}
$create_sql =~ s/double double/double precision/g;
# print create table and reset create table vars
# when moving from each "create table" to "insert" part of dump
print OUT $create_sql;
print OUT $function_create_sql;
$pre_create_sql="";
$auto_increment_seq="";
$create_sql="";
$function_create_sql='';
%constraints=();
# the post_create_sql for this table is output at the beginning of the next table def
# in case we want to make indexes after doing inserting
next;
}
if (/^\s*(\w+)\s+.*COMMENT\s*'([^']*)'/) { #`zone_country_id` int(11) COMMENT 'column comment here',
$quoted_column=quote_and_lc($1);
$post_create_sql.="COMMENT ON COLUMN $table"."."." $quoted_column IS '$2'\;"; # COMMENT ON table_name.column_name IS 'text';
s/COMMENT\s*'[^']*'//i;
}
# NUMERIC DATATYPES
#
# auto_increment -> sequences
# UNSIGNED conversions
# TINYINT
# SMALLINT
# MEDIUMINT
# INT, INTEGER
# BIGINT
#
# DOUBLE [PRECISION], REAL
# DECIMAL(M,D), NUMERIC(M,D)
# FLOAT(p)
# FLOAT
s/(\w*int)\(\d+\)/$1/g; # hack of the (n) stuff for e.g. mediumint(2) int(3)
if (/^(\s*)(\w+)\s*.*numeric.*auto_increment/i) { # int,auto_increment -> serial
$seq = get_identifier($table, $2, 'seq');
$quoted_column=quote_and_lc($2);
# Smash datatype to int8 and autogenerate the sequence.
s/^(\s*)(\w+)\s*.*NUMERIC(.*)auto_increment([^,]*)/$1 $quoted_column serial8 $4/ig;
$create_sql.=$_;
next;
}
if (/^\s*(\w+)\s+.*int.*auto_increment/i) { # example: data_id mediumint(8) unsigned NOT NULL auto_increment,
$seq = get_identifier($table, $1, 'seq');
$quoted_column=quote_and_lc($1);
s/(\s*)(\w+)\s+.*int.*auto_increment([^,]*)/$1 $quoted_column serial8 $3/ig;
$create_sql.=$_;
next;
}
# convert UNSIGNED to CHECK constraints
if (m/^(\s*)(\w+)\s+((float|double precision|double|real|decimal|numeric))(.*)unsigned/i) {
$quoted_column = quote_and_lc($2);
s/^(\s*)(\w+)\s+((float|double precision|double|real|decimal|numeric))(.*)unsigned/$1 $quoted_column $3 $4 CHECK ($quoted_column >= 0)/i;
}
# example: `wordsize` tinyint(3) unsigned default NULL,
if (m/^(\s+)(\w+)\s+(\w+)\s+unsigned/i) {
$quoted_column=quote_and_lc($2);
s/^(\s+)(\w+)\s+(\w+)\s+unsigned/$1 $quoted_column $3 CHECK ($quoted_column >= 0)/i;
}
if (m/^(\s*)(\w+)\s+(bigint.*)unsigned/) {
$quoted_column=quote_and_lc($2);
# see http://archives.postgresql.org/pgsql-general/2005-07/msg01178.php
# and see http://www.postgresql.org/docs/8.2/interactive/datatype-numeric.html
# see http://dev.mysql.com/doc/refman/5.1/en/numeric-types.html max size == 20 digits
s/^(\s*)(\w+)\s+bigint(.*)unsigned/$1 $quoted_column NUMERIC (20,0) CHECK ($quoted_column >= 0)/i;
}
# int type conversion
# TINYINT (signed) -128 to 127 (unsigned) 0 255
# SMALLINT A small integer. The signed range is -32768 to 32767. The unsigned range is 0 to 65535.
# MEDIUMINT A medium-sized integer. The signed range is -8388608 to 8388607. The unsigned range is 0 to 16777215.
# INT A normal-size integer. The signed range is -2147483648 to 2147483647. The unsigned range is 0 to 4294967295.
# BIGINT The signed range is -9223372036854775808 to 9223372036854775807. The unsigned range is 0 to 18446744073709551615
# for postgres see http://www.postgresql.org/docs/8.2/static/datatype-numeric.html#DATATYPE-INT
s/^(\s+"*\w+"*\s+)tinyint/$1 smallint/i;
s/^(\s+"*\w+"*\s+)mediumint/$1 integer/i;
# the floating point types
# double -> double precision
# double(n,m) -> double precision
# float - no need for conversion
# float(n) - no need for conversion
# float(n,m) -> double precision
s/(^\s*\w+\s+)double(\(\d+,\d+\))?/$1float/i;
s/float(\(\d+,\d+\))/float/i;
#
# CHARACTER TYPES
#
# set
# enum
# binary(M), VARBINARy(M), tinyblob, tinytext,
# bit
# char(M), varchar(M)
# blob -> text
# mediumblob
# longblob, longtext
# text -> text
# mediumtext
# longtext
# mysql docs: A BLOB is a binary large object that can hold a variable amount of data.
# set
# For example, a column specified as SET('one', 'two') NOT NULL can have any of these values:
# ''
# 'one'
# 'two'
# 'one,two'
if (/(\w*)\s+set\(((?:['"]\w+['"]\s*,*)+(?:['"]\w+['"])*)\)(.*)$/i) { # example: `au_auth` set('r','w','d') NOT NULL default '',
$column_name = $1;
$constraints{$column_name}{'values'} = $2; # 'abc','def', ...
$constraints{$column_name}{'type'} = "set"; # 'abc','def', ...
$_ = qq~ $column_name varchar , ~;
$column_name = quote_and_lc($1);
$create_sql.=$_;
next;
}
if (/(\S*)\s+enum\(((?:['"][^'"]+['"]\s*,)+['"][^'"]+['"])\)(.*)$/i) { # enum handling
# example: `test` enum('?','+','-') NOT NULL default '?'
# $2 is the values of the enum 'abc','def', ...
$quoted_column=quote_and_lc($1);
# "test" NOT NULL default '?' CONSTRAINT test_test_constraint CHECK ("test" IN ('?','+','-'))
$_ = qq~ $quoted_column varchar CHECK ($quoted_column IN ( $2 ))$3\n~; # just assume varchar?
$create_sql.=$_;
next;
}
# Take care of "binary" option for char and varchar
# (pre-4.1.2, it indicated a byte array; from 4.1.2, indicates
# a binary collation)
s/(?:var)?char(?:\(\d+\))? (?:byte|binary)/text/i;
if (m/(?:var)?binary\s*\(\d+\)/i) { # c varBINARY(3) in Mysql
warn "WARNING in table '$table' '$_': binary type is converted to bytea (unsized) for Postgres\n";
}
s/(?:var)?binary(?:\(\d+\))?/text/i; # c varBINARY(3) in Mysql
s/bit(?:\(\d+\))?/bytea/i; # bit datatype -> bytea
# large datatypes
s/\w*blob/bytea/gi;
s/tinytext/text/gi;
s/mediumtext/text/gi;
s/longtext/text/gi;
# char -> varchar -- if specified as a command line option
# PostgreSQL would otherwise pad with spaces as opposed
# to MySQL! Your user interface may depend on this!
if ($CHAR2VARCHAR) {
s/(^\s+\S+\s+)char/${1}varchar/gi;
}
# nuke column's collate and character set
s/(\S+)\s+character\s+set\s+\w+/$1/gi;
s/(\S+)\s+collate\s+\w+/$1/gi;
#
# DATE AND TIME TYPES
#
# date time
# year
# datetime
# timestamp
# date time
# these are the same types in postgres, just do the replacement of 0000-00-00 date
if (m/default '(\d+)-(\d+)-(\d+)([^']*)'/i) { # we grab the year, month and day
# NOTE: times of 00:00:00 are possible and are okay
my $time = '';
my $year=$1;
my $month= $2;
my $day = $3;
if ($4) {
$time = $4;
}
if ($year eq "0000") { $year = '1970'; }
if ($month eq "00") { $month = '01'; }
if ($day eq "00") { $day = '01'; }
s/default '[^']+'/default '$year-$month-$day$time'/i; # finally we replace with $datetime
}
# convert mysql's year datatype to a constraint
if (/(\w*)\s+year\(4\)(.*)$/i) { # can be integer OR string 1901-2155
$constraint_table_name = get_identifier($table,$1 ,"constraint_table");
$column_name=quote_and_lc($1);
@year_holder = ();
$year='';
for (1901 .. 2155) {
$year = "'$_'";
unless ($year =~ /2155/) { $year .= ','; }
push( @year_holder, $year);
}
$constraints{$column_name}{'values'} = join('','',@year_holder); # '1901','1902', ...
$constraints{$column_name}{'type'} = "year";
$_ = qq~ $column_name varchar CONSTRAINT ${table}_${column_name}_constraint REFERENCES $constraint_table_name ("year_values") $2\n~;
$create_sql.=$_;
next;
} elsif (/(\w*)\s+year\(2\)(.*)$/i) { # same for a 2-integer string
$constraint_table_name = get_identifier($table,$1 ,"constraint_table");
$column_name=quote_and_lc($1);
@year_holder = ();
$year='';
for (1970 .. 2069) {
$year = "'$_'";
if ($year =~ /2069/) { next; }
push( @year_holder, $year);
}
push( @year_holder, '0000');
$constraints{$column_name}{'values'} = join(',',@year_holder); # '1971','1972', ...
$constraints{$column_name}{'type'} = "year"; # 'abc','def', ...
$_ = qq~ $1 varchar CONSTRAINT ${table}_${column_name}_constraint REFERENCES $constraint_table_name ("year_values") $2\n~;
$create_sql.=$_;
next;
}
# datetime
# Default on a dump from MySQL 5.0.22 is in the same form as datetime so let it flow down
# to the timestamp section and deal with it there
s/(${sl})datetime /$1timestamp without time zone /i;
# change not null datetime field to null valid ones
# (to support remapping of "zero time" to null
# s/($sl)datetime not null/$1timestamp without time zone/i;
# timestamps
#
# nuke datetime representation (not supported in PostgreSQL)
# change default time of 0000-00-00 to 1970-01-01
# we may possibly need to create a trigger to provide
# equal functionality with ON UPDATE CURRENT TIMESTAMP
if (m/${sl}timestamp/i) {
if ( m/ON UPDATE CURRENT_TIMESTAMP/i ) { # the ... default CURRENT_TIMESTAMP only applies for blank inserts, not updates
s/ON UPDATE CURRENT_TIMESTAMP//i ;
m/^\s*(\w+)\s+timestamp/i ;
# automatic trigger creation
$table_no_quotes =~ s/"//g;
$function_create_sql .= " CREATE OR REPLACE FUNCTION update_". $table_no_quotes . "() RETURNS trigger AS '
BEGIN
NEW.$1 := CURRENT_TIMESTAMP;
RETURN NEW;
END;
' LANGUAGE 'plpgsql';
-- before INSERT is handled by 'default CURRENT_TIMESTAMP'
CREATE TRIGGER add_current_date_to_".$table_no_quotes." BEFORE UPDATE ON ". $table . " FOR EACH ROW EXECUTE PROCEDURE
update_".$table_no_quotes."();\n";
}
if ($tables_first_timestamp_column && m/DEFAULT NULL/i) {
# DEFAULT NULL is the same as DEFAULT CURRENT_TIMESTAMP for the first TIMESTAMP column. (MYSQL manual)
s/($sl)(timestamp\s+)default null/$1 $2 DEFAULT CURRENT_TIMESTAMP/i;
}
$tables_first_timestamp_column= 0;
if (m/${sl}timestamp\s*\(\d+\)/i) { # fix for timestamps with width spec not handled (ID: 1628)
warn "WARNING for in table '$table' '$_': your default timestamp width is being ignored for table $table \n";
s/($sl)timestamp(?:\(\d+\))/$1datetime/i;
}
} # end timestamp section
# KEY AND UNIQUE CREATIONS
#
# unique
if ( /^\s+unique\s+\(([^(]+)\)/i ) { # example UNIQUE `name` (`name`), same as UNIQUE KEY
# POSTGRESQL: treat same as mysql unique
$quoted_column = quote_and_lc($1);
s/\s+unique\s+\(([^(]+)\)/ unique ($quoted_column) /i;
$create_sql.=$_;
next;
} elsif ( /^\s+unique\s+key\s*(\w+)\s*\(([^(]+)\)/i ) { # example UNIQUE KEY `name` (`name`)
# MYSQL: unique key: allows null=YES, allows duplicates=NO (*)
# ... new ... UNIQUE KEY `unique_fullname` (`fullname`) in my mysql v. Ver 14.12 Distrib 5.1.7-beta
# POSTGRESQL: treat same as mysql unique
# just quote columns
$quoted_column = quote_and_lc($2);
s/\s+unique\s+key\s*(\w+)\s*\(([^(]+)\)/ unique ($quoted_column) /i;
$create_sql.=$_;
# the index corresponding to the 'key' is automatically created
next;
}
# keys
if ( /^\s+fulltext key\s+/i) { # example: FULLTEXT KEY `commenttext` (`commenttext`)
# that is key as a word in the first check for a match
# the tsvector datatype is made for these types of things
# example mysql file:
# what is tsvector datatype?
# http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html
warn "dba must do fulltext key transformation for $table\n";
next;
}
if ( /^(\s+)constraint (\S+) foreign key \((\S+)\) references (\S+) \((\S+)\)(.*)/i ) {
$quoted_column =quote_and_lc($3);
$col=quote_and_lc($5);
$post_create_sql .= "ALTER TABLE $table ADD FOREIGN KEY ($quoted_column) REFERENCES " . quote_and_lc($4) . " ($col);\n";
next;
}
if ( /^\s*primary key\s*\(([^)]+)\)([,\s]+)/i ) { # example PRIMARY KEY (`name`)
# MYSQL: primary key: allows null=NO , allows duplicates=NO
# POSTGRESQL: When an index is declared unique, multiple table rows with equal indexed values will not be
# allowed. Null values are not considered equal.
# POSTGRESQL quote's source: 8.1.3 docs section 11.5 "unique indexes"
# so, in postgres, we need to add a NOT NULL to the UNIQUE constraint
# and, primary key (mysql) == primary key (postgres) so that we *really* don't need change anything
$quoted_column = quote_and_lc($1);
s/(\s*)primary key\s+\(([^)]+)\)([,\s]+)/$1 primary key ($quoted_column)$3/i;
# indexes are automatically created for unique columns
$create_sql.=$_;
next;
} elsif (m/^\s+key\s[-_\s\w]+\((.+)\)/i ) { # example: KEY `idx_mod_english_def_word` (`word`),
# regular key: allows null=YES, allows duplicates=YES
# MYSQL: KEY is normally a synonym for INDEX. http://dev.mysql.com/doc/refman/5.1/en/create-table.html
#
# * MySQL: ALTER TABLE {$table} ADD KEY $column ($column)
# * PostgreSQL: CREATE INDEX {$table}_$column_idx ON {$table}($column) // Please note the _idx "extension"
# PRIMARY KEY (`postid`),
# KEY `ownerid` (`ownerid`)
# create an index for everything which has a key listed for it.
my $col = $1;
# TODO we don't have a translation for the substring syntax in text columns in MySQL (e.g. "KEY my_idx (mytextcol(20))")
# for now just getting rid of the brackets and numbers (the substring specifier):
$col=~s/\(\d+\)//g;
$quoted_column = quote_and_lc($col);
if ($col =~ m/,/) {
$col = s/,/_/;
}
$index = get_identifier($table, $col, 'idx');
$post_create_sql.="CREATE INDEX $index ON $table USING btree ($quoted_column)\;";
# just create index do not add to create table statement
next;
}
# handle 'key' declared at end of column
if (/\w+.*primary key/i) { # mysql: key is normally just a synonym for index
# just leave as is ( postgres has primary key type)
} elsif (/(\w+\s+(?:$mysql_datatypesStr)\s+.*)key/i) { # mysql: key is normally just a synonym for index
# I can't find a reference for 'key' in a postgres command without using the word 'primary key'
s/$1key/$1/i ;
$index = get_identifier($table, $1, 'idx');
$quoted_column =quote_and_lc($1);
$post_create_sql.="CREATE INDEX $index ON $table USING btree ($quoted_column) \;";
$create_sql.=$_;
}
# do we really need this anymore?
# remap colums with names of existing system attribute
if (/"oid"/i) {
s/"oid"/"_oid"/g;
print STDERR "WARNING: table $table uses column \"oid\" which is renamed to \"_oid\"\nYou should fix application manually! Press return to continue.";
my $wait=<STDIN>;
}
s/oid/_oid/i if (/key/i && /oid/i); # fix oid in key
# FINAL QUOTING OF ALL COLUMNS
# quote column names which were not already quoted
# perhaps they were not quoted because they were not explicitly handled
if (!/^\s*"(\w+)"(\s+)/i) {
/^(\s*)(\w+)(\s+)(.*)$/i ;
$quoted_column= quote_and_lc($2);
s/^(\s*)(\w+)(\s+)(.*)$/$1 $quoted_column $3 $4 /;
}
$create_sql.=$_;
# END of if ($create_sql ne "") i.e. were inside create table statement so processed datatypes
}
# add "not in create table" comments or empty lines to pre_create_sql
elsif (/^#/ || /^$/ || /^\s*--/) {
s/^#/--/; # Two hyphens (--) is the SQL-92 standard indicator for comments
$pre_create_sql .= $_ ; # printed above create table statement
next;
}
elsif (/^\s*insert into/i) { # not inside create table and doing insert
# fix mysql's zero/null value for timestamps
s/'0000-00-00/'1970-01-01/gi;
# commented out to fix bug "Field contents interpreted as a timestamp", what was the point of this line anyway?
#s/([12]\d\d\d)([01]\d)([0-3]\d)([0-2]\d)([0-6]\d)([0-6]\d)/'$1-$2-$3 $4:$5:$6'/;
#---- fix data in inserted data: (from MS world)
s!\x96!-!g; # --
s!\x93!"!g; # ``
s!\x94!"!g; # ''
s!\x85!... !g; # \ldots
s!\x92!`!g;
print OUT $pre_create_sql; # print comments preceding the insert section
$pre_create_sql="";
$auto_increment_seq = "";
s/'((?:[^'\\]++|\\.)*+)'(?=[),])/E'$1'/g;
# for the E'' see http://www.postgresql.org/docs/8.2/interactive/release-8-1.html
s!\\\\!\\\\\\\\!g; # replace \\ with ]\\\\
# split 'extended' INSERT INTO statements to something PostgreSQL can understand
( $insert_table, $valueString) = $_ =~ m/^INSERT\s+INTO\s+['`"]*(.*?)['`"]*\s+VALUES\s*(.*)/i;
$insert_table = quote_and_lc($insert_table);
s/^INSERT INTO.*?\);//i; # hose the statement which is to be replaced whether a run-on or not
# guarantee table names are quoted
print OUT qq(INSERT INTO $insert_table VALUES $valueString \n);
} else {
print OUT $_ ; # example: /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
}
# keep looping and get next line of IN file
} # END while(<IN>)
print_post_create_sql(); # in case there is extra from the last table
#################################################################
# 5. print_plgsql function prototype
# emulate the set datatype with the following plpgsql function
# looks ugly so putting at end of file
#################################################################
#
sub make_plpgsql {
my ($table,$column_name) = ($_[0],$_[1]);
$table=~s/\"//g; # make sure that $table doesn't have quotes so we don't end up with redundant quoting
my $constraint_table = get_identifier($table,$column_name ,"constraint_table");
return "
-- this function is called by the insert/update trigger
-- it checks if the INSERT/UPDATE for the 'set' column
-- contains members which comprise a valid mysql set
-- this TRIGGER function therefore acts like a constraint
-- provided limited functionality for mysql's set datatype
-- just verifies and matches for string representations of the set at this point
-- though the set datatype uses bit comparisons, the only supported arguments to our
-- set datatype are VARCHAR arguments
-- to add a member to the set add it to the ".$table."_".$column_name." table
CREATE OR REPLACE FUNCTION check_".$table."_".$column_name."_set( ) RETURNS TRIGGER AS \$\$\n
DECLARE
----
arg_str VARCHAR ;
argx VARCHAR := '';
nobreak INT := 1;
rec_count INT := 0;
psn INT := 0;
str_in VARCHAR := NEW.$column_name;
----
BEGIN
----
IF str_in IS NULL THEN RETURN NEW ; END IF;
arg_str := REGEXP_REPLACE(str_in, '\\',\\'', ','); -- str_in is CONSTANT
arg_str := REGEXP_REPLACE(arg_str, '^\\'', '');
arg_str := REGEXP_REPLACE(arg_str, '\\'\$', '');
-- RAISE NOTICE 'arg_str %',arg_str;
psn := POSITION(',' in arg_str);
IF psn > 0 THEN
psn := psn - 1; -- minus-1 from comma position
-- RAISE NOTICE 'psn %',psn;
argx := SUBSTRING(arg_str FROM 1 FOR psn); -- get one set member
psn := psn + 2; -- go to first starting letter
arg_str := SUBSTRING(arg_str FROM psn); -- hack it off
ELSE
psn := 0; -- minus-1 from comma position
argx := arg_str;
END IF;
-- RAISE NOTICE 'argx %',argx;
-- RAISE NOTICE 'new arg_str: %',arg_str;
WHILE nobreak LOOP
EXECUTE 'SELECT count(*) FROM $constraint_table WHERE set_values = ' || quote_literal(argx) INTO rec_count;
IF rec_count = 0 THEN RAISE EXCEPTION 'one of the set values was not found';
END IF;
IF psn > 0 THEN
psn := psn - 1; -- minus-1 from comma position
-- RAISE NOTICE 'psn %',psn;
argx := SUBSTRING(arg_str FROM 1 FOR psn); -- get one set member
psn := psn + 2; -- go to first starting letter
arg_str := SUBSTRING(arg_str FROM psn); -- hack it off
psn := POSITION(',' in arg_str);
ELSE nobreak = 0;
END IF;
-- RAISE NOTICE 'next argx % and next arg_str %', argx, arg_str;
END LOOP;
RETURN NEW;
----
END;
\$\$ LANGUAGE 'plpgsql' VOLATILE;
drop trigger set_test ON $table;
-- make a trigger for each set field
-- make trigger and hard-code in column names
-- see http://archives.postgresql.org/pgsql-interfaces/2005-02/msg00020.php
CREATE TRIGGER set_test
BEFORE INSERT OR UPDATE ON $table FOR EACH ROW
EXECUTE PROCEDURE check_".$table."_".$column_name."_set();\n";
} # end sub make_plpgsql();

View File

@@ -0,0 +1,199 @@
place_type,level
Q9842,4
Q9430,3
Q928830,4
Q9259,1
Q91028,5
Q8514,2
Q8502,2
Q83405,3
Q82794,2
Q820477,1
Q811979,1
Q8072,2
Q79007,2
Q786014,3
Q75848,2
Q75520,2
Q728937,4
Q7275,2
Q719456,3
Q7075,3
Q697295,4
Q6852233,2
Q682943,3
Q665487,5
Q655686,3
Q643589,5
Q641226,2
Q631305,2
Q6256,2
Q6023295,2
Q5773747,5
Q56061,1
Q55659167,4
Q55488,4
Q55465477,3
Q54050,2
Q532,3
Q53060,2
Q52177058,4
Q515716,5
Q5153984,4
Q515,3
Q5144960,5
Q5119,4
Q5119,4
Q5107,2
Q5084,4
Q5031071,4
Q5003624,2
Q4989906,1
Q4976993,3
Q486972,1
Q486972,2
Q483110,3
Q4830453,4
Q47521,3
Q473972,1
Q46831,2
Q46614560,5
Q44782,3
Q44613,4
Q44539,4
Q44494,2
Q44377,2
Q4421,2
Q43501,2
Q4286337,3
Q42523,3
Q41176,2
Q40357,3
Q4022,4
Q40080,2
Q39816,2
Q39715,3
Q39614,1
Q3957,3
Q3947,4
Q3914,3
Q38723,2
Q38720,3
Q3623867,5
Q35666,2
Q355304,3
Q35509,2
Q35112127,3
Q34985575,4
Q34876,5
Q34763,2
Q34627,4
Q3455524,3
Q34442,4
Q33837,2
Q33506,3
Q32815,4
Q3257686,2
Q3240715,2
Q3191695,5
Q3153117,2
Q30198,2
Q30139652,3
Q294422,3
Q2870166,3
Q27686,3
Q274153,3
Q271669,1
Q2659904,2
Q24529780,2
Q24354,3
Q2354973,4
Q23442,2
Q23413,3
Q23397,3
Q2327515,4
Q2311958,5
Q22927291,6
Q22698,1
Q2175765,4
Q205495,4
Q204832,3
Q2042028,2
Q202216,6
Q1970725,3
Q194203,5
Q194195,2
Q190429,2
Q185187,3
Q185113,2
Q183366,2
Q1799794,1
Q1788454,4
Q1785071,3
Q1777138,3
Q177634,2
Q177380,2
Q174814,4
Q174782,2
Q17350442,2
Q17343829,3
Q17334923,0
Q17018380,3
Q16970,4
Q16917,3
Q16831714,4
Q165,3
Q160742,4
Q159719,3
Q159334,4
Q15640612,5
Q15324,2
Q15284,5
Q15243209,6
Q152081,1
Q15195406,4
Q1500350,5
Q149621,5
Q14757767,4
Q14350,3
Q1410668,3
Q1394476,3
Q1377575,2
Q1353183,3
Q134447,4
Q133215,3
Q133056,2
Q13221722,3
Q13220204,2
Q1311958,4
Q1303167,3
Q130003,3
Q12518,2
Q12516,3
Q1248784,3
Q123705,3
Q12323,3
Q12284,4
Q12280,4
Q121359,2
Q1210950,2
Q11755880,3
Q11707,3
Q11315,3
Q11303,3
Q1115575,4
Q1107656,1
Q10864048,1
Q1076486,2
Q105731,3
Q105190,3
Q1048525,3
Q102496,5
Q28872924,1
Q15617994,1
Q159313,2
Q24398318,3
Q327333,2
Q43229,1
Q860861,1
Q4989906,1
1 place_type level
2 Q9842 4
3 Q9430 3
4 Q928830 4
5 Q9259 1
6 Q91028 5
7 Q8514 2
8 Q8502 2
9 Q83405 3
10 Q82794 2
11 Q820477 1
12 Q811979 1
13 Q8072 2
14 Q79007 2
15 Q786014 3
16 Q75848 2
17 Q75520 2
18 Q728937 4
19 Q7275 2
20 Q719456 3
21 Q7075 3
22 Q697295 4
23 Q6852233 2
24 Q682943 3
25 Q665487 5
26 Q655686 3
27 Q643589 5
28 Q641226 2
29 Q631305 2
30 Q6256 2
31 Q6023295 2
32 Q5773747 5
33 Q56061 1
34 Q55659167 4
35 Q55488 4
36 Q55465477 3
37 Q54050 2
38 Q532 3
39 Q53060 2
40 Q52177058 4
41 Q515716 5
42 Q5153984 4
43 Q515 3
44 Q5144960 5
45 Q5119 4
46 Q5119 4
47 Q5107 2
48 Q5084 4
49 Q5031071 4
50 Q5003624 2
51 Q4989906 1
52 Q4976993 3
53 Q486972 1
54 Q486972 2
55 Q483110 3
56 Q4830453 4
57 Q47521 3
58 Q473972 1
59 Q46831 2
60 Q46614560 5
61 Q44782 3
62 Q44613 4
63 Q44539 4
64 Q44494 2
65 Q44377 2
66 Q4421 2
67 Q43501 2
68 Q4286337 3
69 Q42523 3
70 Q41176 2
71 Q40357 3
72 Q4022 4
73 Q40080 2
74 Q39816 2
75 Q39715 3
76 Q39614 1
77 Q3957 3
78 Q3947 4
79 Q3914 3
80 Q38723 2
81 Q38720 3
82 Q3623867 5
83 Q35666 2
84 Q355304 3
85 Q35509 2
86 Q35112127 3
87 Q34985575 4
88 Q34876 5
89 Q34763 2
90 Q34627 4
91 Q3455524 3
92 Q34442 4
93 Q33837 2
94 Q33506 3
95 Q32815 4
96 Q3257686 2
97 Q3240715 2
98 Q3191695 5
99 Q3153117 2
100 Q30198 2
101 Q30139652 3
102 Q294422 3
103 Q2870166 3
104 Q27686 3
105 Q274153 3
106 Q271669 1
107 Q2659904 2
108 Q24529780 2
109 Q24354 3
110 Q2354973 4
111 Q23442 2
112 Q23413 3
113 Q23397 3
114 Q2327515 4
115 Q2311958 5
116 Q22927291 6
117 Q22698 1
118 Q2175765 4
119 Q205495 4
120 Q204832 3
121 Q2042028 2
122 Q202216 6
123 Q1970725 3
124 Q194203 5
125 Q194195 2
126 Q190429 2
127 Q185187 3
128 Q185113 2
129 Q183366 2
130 Q1799794 1
131 Q1788454 4
132 Q1785071 3
133 Q1777138 3
134 Q177634 2
135 Q177380 2
136 Q174814 4
137 Q174782 2
138 Q17350442 2
139 Q17343829 3
140 Q17334923 0
141 Q17018380 3
142 Q16970 4
143 Q16917 3
144 Q16831714 4
145 Q165 3
146 Q160742 4
147 Q159719 3
148 Q159334 4
149 Q15640612 5
150 Q15324 2
151 Q15284 5
152 Q15243209 6
153 Q152081 1
154 Q15195406 4
155 Q1500350 5
156 Q149621 5
157 Q14757767 4
158 Q14350 3
159 Q1410668 3
160 Q1394476 3
161 Q1377575 2
162 Q1353183 3
163 Q134447 4
164 Q133215 3
165 Q133056 2
166 Q13221722 3
167 Q13220204 2
168 Q1311958 4
169 Q1303167 3
170 Q130003 3
171 Q12518 2
172 Q12516 3
173 Q1248784 3
174 Q123705 3
175 Q12323 3
176 Q12284 4
177 Q12280 4
178 Q121359 2
179 Q1210950 2
180 Q11755880 3
181 Q11707 3
182 Q11315 3
183 Q11303 3
184 Q1115575 4
185 Q1107656 1
186 Q10864048 1
187 Q1076486 2
188 Q105731 3
189 Q105190 3
190 Q1048525 3
191 Q102496 5
192 Q28872924 1
193 Q15617994 1
194 Q159313 2
195 Q24398318 3
196 Q327333 2
197 Q43229 1
198 Q860861 1
199 Q4989906 1

View File

@@ -0,0 +1,195 @@
Q9842
Q9430
Q928830
Q9259
Q91028
Q8514
Q8502
Q83405
Q82794
Q820477
Q811979
Q8072
Q79007
Q786014
Q75848
Q75520
Q728937
Q7275
Q719456
Q7075
Q697295
Q6852233
Q682943
Q665487
Q655686
Q643589
Q641226
Q631305
Q6256
Q6023295
Q5773747
Q56061
Q55659167
Q55488
Q55465477
Q54050
Q532
Q53060
Q52177058
Q515716
Q5153984
Q515
Q5144960
Q5119
Q5107
Q5084
Q5031071
Q5003624
Q4989906
Q4976993
Q486972
Q483110
Q4830453
Q47521
Q473972
Q46831
Q46614560
Q44782
Q44613
Q44539
Q44494
Q44377
Q4421
Q43501
Q4286337
Q42523
Q41176
Q40357
Q4022
Q40080
Q39816
Q39715
Q39614
Q3957
Q3947
Q3914
Q38723
Q38720
Q3623867
Q35666
Q355304
Q35509
Q35112127
Q34985575
Q34876
Q34763
Q34627
Q3455524
Q34442
Q33837
Q33506
Q32815
Q3257686
Q3240715
Q3191695
Q3153117
Q30198
Q30139652
Q294422
Q2870166
Q27686
Q274153
Q271669
Q2659904
Q24529780
Q24354
Q2354973
Q23442
Q23413
Q23397
Q2327515
Q2311958
Q22927291
Q22698
Q2175765
Q205495
Q204832
Q2042028
Q202216
Q1970725
Q194203
Q194195
Q190429
Q185187
Q185113
Q183366
Q1799794
Q1788454
Q1785071
Q1777138
Q177634
Q177380
Q174814
Q174782
Q17350442
Q17343829
Q17334923
Q17018380
Q16970
Q16917
Q16831714
Q165
Q160742
Q159719
Q159334
Q15640612
Q15324
Q15284
Q15243209
Q152081
Q15195406
Q1500350
Q149621
Q14757767
Q14350
Q1410668
Q1394476
Q1377575
Q1353183
Q134447
Q133215
Q133056
Q13221722
Q13220204
Q1311958
Q1303167
Q130003
Q12518
Q12516
Q1248784
Q123705
Q12323
Q12284
Q12280
Q121359
Q1210950
Q11755880
Q11707
Q11315
Q11303
Q1115575
Q1107656
Q10864048
Q1076486
Q105731
Q105190
Q1048525
Q102496
Q28872924
Q15617994
Q159313
Q24398318
Q327333
Q43229
Q860861

View File

@@ -0,0 +1,200 @@
## Wikidata place types and related OSM Tags
Wikidata does not have any official ontologies, however the [DBpedia project](https://wiki.dbpedia.org/) has created an [ontology](https://wiki.dbpedia.org/services-resources/ontology) that covered [place types](http://mappings.dbpedia.org/server/ontology/classes/#Place). The table below used the DBpedia place ontology as a starting point, and is provided as a cross-reference to the relevant OSM tags.
The Wikidata place types listed in the table below can be used in conjunction with the [Wikidata Query Service](https://query.wikidata.org/) to retrieve instances of those place types from the Wikidata knowledgebase.
```
SELECT ?item ?lat ?lon
WHERE {
?item wdt:P31*/wdt:P279*wd:Q9430; wdt:P625 ?pt.
?item p:P625?loc.
?loc psv:P625?cnode.
?cnode wikibase:geoLatitude?lat.
?cnode wikibase:geoLongitude?lon.
}
```
An example json return for all instances of the Wikidata item "Q9430" (Ocean) can be seen at [json](https://query.wikidata.org/bigdata/namespace/wdq/sparql?format=json&query=SELECT?item?lat?lon%20WHERE{?item%20wdt:P31*/wdt:P279*wd:Q9430;wdt:P625?pt.?item%20p:P625?loc.?loc%20psv:P625?cnode.?cnode%20wikibase:geoLatitude?lat.?cnode%20wikibase:geoLongitude?lon.})
**NOTE** the OSM tags listed are those listed in the wikidata entries, and not all the possible matches for tags within OSM.
title | concept | OSM Tag |
-----------|---------------------------------------|------------------|
[Q17334923](https://www.wikidata.org/entity/Q17334923) | Location | |
[Q811979](https://www.wikidata.org/entity/Q811979) | Architectural Structure | |
[Q194195](https://www.wikidata.org/entity/Q194195) | Amusement park |
[Q204832](https://www.wikidata.org/entity/Q204832) | Roller coaster | [attraction=roller_coaster](https://wiki.openstreetmap.org/wiki/Tag:attraction=roller_coaster) |
[Q2870166](https://www.wikidata.org/entity/Q2870166) | Water ride | |
[Q641226](https://www.wikidata.org/entity/Q641226) | Arena | [amenity=events_centre](https://wiki.openstreetmap.org/wiki/Tag:amenity=events_centre) |
[Q41176](https://www.wikidata.org/entity/Q41176) | Building | [building=yes](https://wiki.openstreetmap.org/wiki/Key:building) |
[Q1303167](https://www.wikidata.org/entity/Q1303167) | Barn | [building=barn](https://wiki.openstreetmap.org/wiki/Tag:building=barn) |
[Q655686](https://www.wikidata.org/entity/Q655686) | Commercial building | [building=commercial](https://wiki.openstreetmap.org/wiki/Tag:building=commercial) |
[Q4830453](https://www.wikidata.org/entity/Q4830453) | Business | |
[Q7075](https://www.wikidata.org/entity/Q7075) | Library | [amenity=library](https://wiki.openstreetmap.org/wiki/Tag:amenity=library) |
[Q133215](https://www.wikidata.org/entity/Q133215) | Casino | [amenity=casino](https://wiki.openstreetmap.org/wiki/Tag:amenity=casino) |
[Q23413](https://www.wikidata.org/entity/Q23413) | Castle | [historic=castle](https://wiki.openstreetmap.org/wiki/Tag:historic=castle) |
[Q83405](https://www.wikidata.org/entity/Q83405) | Factory | |
[Q53060](https://www.wikidata.org/entity/Q53060) | Gate | [barrier=gate](https://wiki.openstreetmap.org/wiki/Tag:barrier=gate) |cnode%20wikibase:geoLatitude?lat.?cnode%20wikibase:geoLongitude?lon.})
[Q11755880](https://www.wikidata.org/entity/Q11755880) | Residential Building | [building=residential](https://wiki.openstreetmap.org/wiki/Tag:building=residential) |
[Q3947](https://www.wikidata.org/entity/Q3947) | House | [building=house](https://wiki.openstreetmap.org/wiki/Tag:building=house) |
[Q35112127](https://www.wikidata.org/entity/Q35112127) | Historic Building | |
[Q5773747](https://www.wikidata.org/entity/Q5773747) | Historic house | |
[Q38723](https://www.wikidata.org/entity/Q38723) | Higher Education Institution |
[Q3914](https://www.wikidata.org/entity/Q3914) | School | [amenity=school](https://wiki.openstreetmap.org/wiki/Tag:amenity=school) |
[Q9842](https://www.wikidata.org/entity/Q9842) | Primary school | |
[Q159334](https://www.wikidata.org/entity/Q159334) | Secondary school | |
[Q16917](https://www.wikidata.org/entity/Q16917) | Hospital | [amenity=hospital](https://wiki.openstreetmap.org/wiki/Tag:amenity=hospital), [healthcare=hospital](https://wiki.openstreetmap.org/wiki/Tag:healthcare=hospital), [building=hospital](https://wiki.openstreetmap.org/wiki/Tag:building=hospital) |
[Q27686](https://www.wikidata.org/entity/Q27686) | Hotel | [tourism=hotel](https://wiki.openstreetmap.org/wiki/Tag:tourism=hotel), [building=hotel](https://wiki.openstreetmap.org/wiki/Tag:building=hotel) |
[Q33506](https://www.wikidata.org/entity/Q33506) | Museum | [tourism=museum](https://wiki.openstreetmap.org/wiki/Tag:tourism=museum) |
[Q40357](https://www.wikidata.org/entity/Q40357) | Prison | [amenity=prison](https://wiki.openstreetmap.org/wiki/Tag:amenity=prison) |
[Q24398318](https://www.wikidata.org/entity/Q24398318) | Religious Building | |
[Q160742](https://www.wikidata.org/entity/Q160742) | Abbey | |
[Q16970](https://www.wikidata.org/entity/Q16970) | Church (building) | [building=church](https://wiki.openstreetmap.org/wiki/Tag:building=church) |
[Q44613](https://www.wikidata.org/entity/Q44613) | Monastery | [amenity=monastery](https://wiki.openstreetmap.org/wiki/Tag:amenity=monastery) |
[Q32815](https://www.wikidata.org/entity/Q32815) | Mosque | [building=mosque](https://wiki.openstreetmap.org/wiki/Tag:building=mosque) |
[Q697295](https://www.wikidata.org/entity/Q697295) | Shrine | [building=shrine](https://wiki.openstreetmap.org/wiki/Tag:building=shrine) |
[Q34627](https://www.wikidata.org/entity/Q34627) | Synagogue | [building=synagogue](https://wiki.openstreetmap.org/wiki/Tag:building=synagogue) |
[Q44539](https://www.wikidata.org/entity/Q44539) | Temple | [building=temple](https://wiki.openstreetmap.org/wiki/Tag:building=temple) |
[Q11707](https://www.wikidata.org/entity/Q11707) | Restaurant | [amenity=restaurant](https://wiki.openstreetmap.org/wiki/Tag:amenity=restaurant) |
[Q11315](https://www.wikidata.org/entity/Q11315) | Shopping mall | [shop=mall](https://wiki.openstreetmap.org/wiki/Tag:shop=mall), [shop=shopping_centre](https://wiki.openstreetmap.org/wiki/Tag:shop=shopping_centre) |
[Q11303](https://www.wikidata.org/entity/Q11303) | Skyscraper | |
[Q17350442](https://www.wikidata.org/entity/Q17350442) | Venue | |
[Q41253](https://www.wikidata.org/entity/Q41253) | Movie Theater | [amenity=cinema](https://wiki.openstreetmap.org/wiki/Tag:amenity=cinema) |
[Q483110](https://www.wikidata.org/entity/Q483110) | Stadium | [leisure=stadium](https://wiki.openstreetmap.org/wiki/Tag:leisure=stadium), [building=stadium](https://wiki.openstreetmap.org/wiki/Tag:building=stadium) |
[Q24354](https://www.wikidata.org/entity/Q24354) | Theater (structure) | [amenity=theatre](https://wiki.openstreetmap.org/wiki/Tag:amenity=theatre) |
[Q121359](https://www.wikidata.org/entity/Q121359) | Infrastructure | |
[Q1248784](https://www.wikidata.org/entity/Q1248784) | Airport | |
[Q12323](https://www.wikidata.org/entity/Q12323) | Dam | [waterway=dam](https://wiki.openstreetmap.org/wiki/Tag:waterway=dam) |
[Q1353183](https://www.wikidata.org/entity/Q1353183) | Launch pad | |
[Q105190](https://www.wikidata.org/entity/Q105190) | Levee | [man_made=dyke](https://wiki.openstreetmap.org/wiki/Tag:man_made=dyke) |
[Q105731](https://www.wikidata.org/entity/Q105731) | Lock (water navigation) | [lock=yes](https://wiki.openstreetmap.org/wiki/Key:lock) |
[Q44782](https://www.wikidata.org/entity/Q44782) | Port | |
[Q159719](https://www.wikidata.org/entity/Q159719) | Power station | [power=plant](https://wiki.openstreetmap.org/wiki/Tag:power=plant) |
[Q174814](https://www.wikidata.org/entity/Q174814) | Electrical substation | |
[Q134447](https://www.wikidata.org/entity/Q134447) | Nuclear power plant | [plant:source=nuclear](https://wiki.openstreetmap.org/wiki/Tag:plant:source=nuclear) |
[Q786014](https://www.wikidata.org/entity/Q786014) | Rest area | [highway=rest_area](https://wiki.openstreetmap.org/wiki/Tag:highway=rest_area), [highway=services](https://wiki.openstreetmap.org/wiki/Tag:highway=services) |
[Q12280](https://www.wikidata.org/entity/Q12280) | Bridge | [bridge=* ](https://wiki.openstreetmap.org/wiki/Key:bridge), [man_made=bridge](https://wiki.openstreetmap.org/wiki/Tag:man_made=bridge) |
[Q728937](https://www.wikidata.org/entity/Q728937) | Railroad Line | [railway=rail](https://wiki.openstreetmap.org/wiki/Tag:railway=rail) |
[Q1311958](https://www.wikidata.org/entity/Q1311958) | Railway Tunnel | |
[Q34442](https://www.wikidata.org/entity/Q34442) | Road | [highway=* ](https://wiki.openstreetmap.org/wiki/Key:highway), [route=road](https://wiki.openstreetmap.org/wiki/Tag:route=road) |
[Q1788454](https://www.wikidata.org/entity/Q1788454) | Road junction | |
[Q44377](https://www.wikidata.org/entity/Q44377) | Tunnel | [tunnel=* ](https://wiki.openstreetmap.org/wiki/Key:tunnel) |
[Q5031071](https://www.wikidata.org/entity/Q5031071) | Canal tunnel | |
[Q719456](https://www.wikidata.org/entity/Q719456) | Station | [public_transport=station](https://wiki.openstreetmap.org/wiki/Tag:public_transport=station) |
[Q205495](https://www.wikidata.org/entity/Q205495) | Filling station | [amenity=fuel](https://wiki.openstreetmap.org/wiki/Tag:amenity=fuel) |
[Q928830](https://www.wikidata.org/entity/Q928830) | Metro station | [station=subway](https://wiki.openstreetmap.org/wiki/Tag:station=subway) |
[Q55488](https://www.wikidata.org/entity/Q55488) | Train station | [railway=station](https://wiki.openstreetmap.org/wiki/Tag:railway=station) |
[Q2175765](https://www.wikidata.org/entity/Q2175765) | Tram stop | [railway=tram_stop](https://wiki.openstreetmap.org/wiki/Tag:railway=tram_stop), [public_transport=stop_position](https://wiki.openstreetmap.org/wiki/Tag:public_transport=stop_position) |
[Q6852233](https://www.wikidata.org/entity/Q6852233) | Military building | |
[Q44494](https://www.wikidata.org/entity/Q44494) | Mill (grinding) | |
[Q185187](https://www.wikidata.org/entity/Q185187) | Watermill | [man_made=watermill](https://wiki.openstreetmap.org/wiki/Tag:man_made=watermill) |
[Q38720](https://www.wikidata.org/entity/Q38720) | Windmill | [man_made=windmill](https://wiki.openstreetmap.org/wiki/Tag:man_made=windmill) |
[Q4989906](https://www.wikidata.org/entity/Q4989906) | Monument | [historic=monument](https://wiki.openstreetmap.org/wiki/Tag:historic=monument) |
[Q5003624](https://www.wikidata.org/entity/Q5003624) | Memorial | [historic=memorial](https://wiki.openstreetmap.org/wiki/Tag:historic=memorial) |
[Q271669](https://www.wikidata.org/entity/Q271669) | Landform | |
[Q190429](https://www.wikidata.org/entity/Q190429) | Depression (geology) | |
[Q17018380](https://www.wikidata.org/entity/Q17018380) | Bight (geography) | |
[Q54050](https://www.wikidata.org/entity/Q54050) | Hill | |
[Q1210950](https://www.wikidata.org/entity/Q1210950) | Channel (geography) | |
[Q23442](https://www.wikidata.org/entity/Q23442) | Island | [place=island](https://wiki.openstreetmap.org/wiki/Tag:place=island) |
[Q42523](https://www.wikidata.org/entity/Q42523) | Atoll | |
[Q34763](https://www.wikidata.org/entity/Q34763) | Peninsula | |
[Q355304](https://www.wikidata.org/entity/Q355304) | Watercourse | |
[Q30198](https://www.wikidata.org/entity/Q30198) | Marsh | [wetland=marsh](https://wiki.openstreetmap.org/wiki/Tag:wetland=marsh) |
[Q75520](https://www.wikidata.org/entity/Q75520) | Plateau | |
[Q2042028](https://www.wikidata.org/entity/Q2042028) | Ravine | |
[Q631305](https://www.wikidata.org/entity/Q631305) | Rock formation | |
[Q12516](https://www.wikidata.org/entity/Q12516) | Pyramid | |
[Q1076486](https://www.wikidata.org/entity/Q1076486) | Sports venue | |
[Q682943](https://www.wikidata.org/entity/Q682943) | Cricket field | [sport=cricket](https://wiki.openstreetmap.org/wiki/Tag:sport=cricket) |
[Q1048525](https://www.wikidata.org/entity/Q1048525) | Golf course | [leisure=golf_course](https://wiki.openstreetmap.org/wiki/Tag:leisure=golf_course) |
[Q1777138](https://www.wikidata.org/entity/Q1777138) | Race track | [highway=raceway](https://wiki.openstreetmap.org/wiki/Tag:highway=raceway) |
[Q130003](https://www.wikidata.org/entity/Q130003) | Ski resort | |
[Q174782](https://www.wikidata.org/entity/Q174782) | Town square | [place=square](https://wiki.openstreetmap.org/wiki/Tag:place=square) |
[Q12518](https://www.wikidata.org/entity/Q12518) | Tower | [building=tower](https://wiki.openstreetmap.org/wiki/Tag:building=tower), [man_made=tower](https://wiki.openstreetmap.org/wiki/Tag:man_made=tower) |
[Q39715](https://www.wikidata.org/entity/Q39715) | Lighthouse | [man_made=lighthouse](https://wiki.openstreetmap.org/wiki/Tag:man_made=lighthouse) |
[Q274153](https://www.wikidata.org/entity/Q274153) | Water tower | [building=water_tower](https://wiki.openstreetmap.org/wiki/Tag:building=water_tower), [man_made=water_tower](https://wiki.openstreetmap.org/wiki/Tag:man_made=water_tower) |
[Q43501](https://www.wikidata.org/entity/Q43501) | Zoo | [tourism=zoo](https://wiki.openstreetmap.org/wiki/Tag:tourism=zoo) |
[Q39614](https://www.wikidata.org/entity/Q39614) | Cemetery | [amenity=grave_yard](https://wiki.openstreetmap.org/wiki/Tag:amenity=grave_yard), [landuse=cemetery](https://wiki.openstreetmap.org/wiki/Tag:landuse=cemetery) |
[Q152081](https://www.wikidata.org/entity/Q152081) | Concentration camp | |
[Q1107656](https://www.wikidata.org/entity/Q1107656) | Garden | [leisure=garden](https://wiki.openstreetmap.org/wiki/Tag:leisure=garden) |
[Q820477](https://www.wikidata.org/entity/Q820477) | Mine | |
[Q33837](https://www.wikidata.org/entity/Q33837) | Archipelago | [place=archipelago](https://wiki.openstreetmap.org/wiki/Tag:place=archipelago) |
[Q40080](https://www.wikidata.org/entity/Q40080) | Beach | [natural=beach](https://wiki.openstreetmap.org/wiki/Tag:natural=beach) |
[Q15324](https://www.wikidata.org/entity/Q15324) | Body of water | [natural=water](https://wiki.openstreetmap.org/wiki/Tag:natural=water) |
[Q23397](https://www.wikidata.org/entity/Q23397) | Lake | [water=lake](https://wiki.openstreetmap.org/wiki/Tag:water=lake) |
[Q9430](https://www.wikidata.org/entity/Q9430) | Ocean | |
[Q165](https://www.wikidata.org/entity/Q165) | Sea | |
[Q47521](https://www.wikidata.org/entity/Q47521) | Stream | |
[Q12284](https://www.wikidata.org/entity/Q12284) | Canal | [waterway=canal](https://wiki.openstreetmap.org/wiki/Tag:waterway=canal) |
[Q4022](https://www.wikidata.org/entity/Q4022) | River | [waterway=river](https://wiki.openstreetmap.org/wiki/Tag:waterway=river), [type=waterway](https://wiki.openstreetmap.org/wiki/Relation:waterway) |
[Q185113](https://www.wikidata.org/entity/Q185113) | Cape | [natural=cape](https://wiki.openstreetmap.org/wiki/Tag:natural=cape) |
[Q35509](https://www.wikidata.org/entity/Q35509) | Cave | [natural=cave_entrance](https://wiki.openstreetmap.org/wiki/Tag:natural=cave_entrance) |
[Q8514](https://www.wikidata.org/entity/Q8514) | Desert | |
[Q4421](https://www.wikidata.org/entity/Q4421) | Forest | [natural=wood](https://wiki.openstreetmap.org/wiki/Tag:natural=wood) |
[Q35666](https://www.wikidata.org/entity/Q35666) | Glacier | [natural=glacier](https://wiki.openstreetmap.org/wiki/Tag:natural=glacier) |
[Q177380](https://www.wikidata.org/entity/Q177380) | Hot spring | |
[Q8502](https://www.wikidata.org/entity/Q8502) | Mountain | [natural=peak](https://wiki.openstreetmap.org/wiki/Tag:natural=peak) |
[Q133056](https://www.wikidata.org/entity/Q133056) | Mountain pass | |
[Q46831](https://www.wikidata.org/entity/Q46831) | Mountain range | |
[Q39816](https://www.wikidata.org/entity/Q39816) | Valley | [natural=valley](https://wiki.openstreetmap.org/wiki/Tag:natural=valley) |
[Q8072](https://www.wikidata.org/entity/Q8072) | Volcano | [natural=volcano](https://wiki.openstreetmap.org/wiki/Tag:natural=volcano) |
[Q43229](https://www.wikidata.org/entity/Q43229) | Organization | |
[Q327333](https://www.wikidata.org/entity/Q327333) | Government agency | [office=government](https://wiki.openstreetmap.org/wiki/Tag:office=government)|
[Q22698](https://www.wikidata.org/entity/Q22698) | Park | [leisure=park](https://wiki.openstreetmap.org/wiki/Tag:leisure=park) |
[Q159313](https://www.wikidata.org/entity/Q159313) | Urban agglomeration | |
[Q177634](https://www.wikidata.org/entity/Q177634) | Community | |
[Q5107](https://www.wikidata.org/entity/Q5107) | Continent | [place=continent](https://wiki.openstreetmap.org/wiki/Tag:place=continent) |
[Q6256](https://www.wikidata.org/entity/Q6256) | Country | [place=country](https://wiki.openstreetmap.org/wiki/Tag:place=country) |
[Q75848](https://www.wikidata.org/entity/Q75848) | Gated community | |
[Q3153117](https://www.wikidata.org/entity/Q3153117) | Intercommunality | |
[Q82794](https://www.wikidata.org/entity/Q82794) | Region | |
[Q56061](https://www.wikidata.org/entity/Q56061) | Administrative division | [boundary=administrative](https://wiki.openstreetmap.org/wiki/Tag:boundary=administrative) |
[Q665487](https://www.wikidata.org/entity/Q665487) | Diocese | |
[Q4976993](https://www.wikidata.org/entity/Q4976993) | Parish | [boundary=civil_parish](https://wiki.openstreetmap.org/wiki/Tag:boundary=civil_parish) |
[Q194203](https://www.wikidata.org/entity/Q194203) | Arrondissements of France | |
[Q91028](https://www.wikidata.org/entity/Q91028) | Arrondissements of Belgium | |
[Q3623867](https://www.wikidata.org/entity/Q3623867) | Arrondissements of Benin | |
[Q2311958](https://www.wikidata.org/entity/Q2311958) | Canton (country subdivision) | [political_division=canton](https://wiki.openstreetmap.org/wiki/FR:Cantons_in_France) |
[Q643589](https://www.wikidata.org/entity/Q643589) | Department | |
[Q202216](https://www.wikidata.org/entity/Q202216) | Overseas department and region | |
[Q149621](https://www.wikidata.org/entity/Q149621) | District | [place=district](https://wiki.openstreetmap.org/wiki/Tag:place=district) |
[Q15243209](https://www.wikidata.org/wiki/Q15243209) | Historic district | |
[Q5144960](https://www.wikidata.org/entity/Q5144960) | Microregion | |
[Q15284](https://www.wikidata.org/entity/Q15284) | Municipality | |
[Q515716](https://www.wikidata.org/entity/Q515716) | Prefecture | |
[Q34876](https://www.wikidata.org/entity/Q34876) | Province | |
[Q3191695](https://www.wikidata.org/entity/Q3191695) | Regency (Indonesia) | |
[Q1970725](https://www.wikidata.org/entity/Q1970725) | Natural region | |
[Q486972](https://www.wikidata.org/entity/Q486972) | Human settlement | |
[Q515](https://www.wikidata.org/entity/Q515) | City | [place=city](https://wiki.openstreetmap.org/wiki/Tag:place=city) |
[Q5119](https://www.wikidata.org/entity/Q5119) | Capital city | [capital=yes](https://wiki.openstreetmap.org/wiki/Key:capital) |
[Q4286337](https://www.wikidata.org/entity/Q4286337) | City district | |
[Q1394476](https://www.wikidata.org/entity/Q1394476) | Civil township | |
[Q1115575](https://www.wikidata.org/entity/Q1115575) | Civil parish | [designation=civil_parish](https://wiki.openstreetmap.org/wiki/Tag:designation=civil_parish) |
[Q5153984](https://www.wikidata.org/entity/Q5153984) | Commune-level subdivisions | |
[Q123705](https://www.wikidata.org/entity/Q123705) | Neighbourhood | [place=neighbourhood](https://wiki.openstreetmap.org/wiki/Tag:place=neighbourhood) |
[Q1500350](https://www.wikidata.org/entity/Q1500350) | Townships of China | |
[Q17343829](https://www.wikidata.org/entity/Q17343829) | Unincorporated Community | |
[Q3957](https://www.wikidata.org/entity/Q3957) | Town | [place=town](https://wiki.openstreetmap.org/wiki/Tag:place=town) |
[Q532](https://www.wikidata.org/entity/Q532) | Village | [place=village](https://wiki.openstreetmap.org/wiki/Tag:place=village) |
[Q5084](https://www.wikidata.org/entity/Q5084) | Hamlet | [place=hamlet](https://wiki.openstreetmap.org/wiki/Tag:place=hamlet) |
[Q7275](https://www.wikidata.org/entity/Q7275) | State | |
[Q79007](https://www.wikidata.org/entity/Q79007) | Street | |
[Q473972](https://www.wikidata.org/entity/Q473972) | Protected area | [boundary=protected_area](https://wiki.openstreetmap.org/wiki/Tag:boundary=protected_area) |
[Q1377575](https://www.wikidata.org/entity/Q1377575) | Wildlife refuge | |
[Q1410668](https://www.wikidata.org/entity/Q1410668) | National Wildlife Refuge | [protection_title=National Wildlife Refuge](ownership=national), [ownership=national](https://wiki.openstreetmap.org/wiki/Tag:ownership=national)|
[Q9259](https://www.wikidata.org/entity/Q9259) | World Heritage Site | |
---
### Future Work
The Wikidata improvements to Nominatim can be further enhanced by:
- continuing to add new Wikidata links to OSM objects
- increasing the number of place types accounted for in the wikipedia_articles table
- working to use place types in the wikipedia_article matching process

279
data/country_name.sql Normal file

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,26 @@
-- This data contains Ordnance Survey data © Crown copyright and database right 2010.
-- Code-Point Open contains Royal Mail data © Royal Mail copyright and database right 2010.
-- OS data may be used under the terms of the OS OpenData licence:
-- http://www.ordnancesurvey.co.uk/oswebsite/opendata/licence/docs/licence.pdf
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;
SET search_path = public, pg_catalog;
SET default_tablespace = '';
SET default_with_oids = false;
CREATE TABLE gb_postcode (
id integer,
postcode character varying(9),
geometry geometry,
CONSTRAINT enforce_dims_geometry CHECK ((st_ndims(geometry) = 2)),
CONSTRAINT enforce_srid_geometry CHECK ((st_srid(geometry) = 4326))
);

View File

@@ -0,0 +1,16 @@
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET check_function_bodies = false;
SET client_min_messages = warning;
SET search_path = public, pg_catalog;
SET default_tablespace = '';
SET default_with_oids = false;
CREATE TABLE us_postcode (
postcode text,
x double precision,
y double precision
);

View File

@@ -29787,7 +29787,7 @@ st 5557484
-- prefill word table
select count(precompute_words(v)) from (select distinct svals(name) as v from place) as w where v is not null;
select count(make_keywords(v)) from (select distinct svals(name) as v from place) as w where v is not null;
select count(getorcreate_housenumber_id(make_standard_name(v))) from (select distinct address->'housenumber' as v from place where address ? 'housenumber') as w;
-- copy the word frequencies

28
docs/CMakeLists.txt Normal file
View File

@@ -0,0 +1,28 @@
# Auto-generated vagrant install documentation
# build the actual documentation
configure_file(mkdocs.yml ../mkdocs.yml)
file(MAKE_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/appendix)
file(MAKE_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/data-sources)
ADD_CUSTOM_TARGET(doc
COMMAND ${CMAKE_COMMAND} -E create_symlink ${CMAKE_CURRENT_SOURCE_DIR}/admin ${CMAKE_CURRENT_BINARY_DIR}/admin
COMMAND ${CMAKE_COMMAND} -E create_symlink ${CMAKE_CURRENT_SOURCE_DIR}/develop ${CMAKE_CURRENT_BINARY_DIR}/develop
COMMAND ${CMAKE_COMMAND} -E create_symlink ${CMAKE_CURRENT_SOURCE_DIR}/api ${CMAKE_CURRENT_BINARY_DIR}/api
COMMAND ${CMAKE_COMMAND} -E create_symlink ${CMAKE_CURRENT_SOURCE_DIR}/index.md ${CMAKE_CURRENT_BINARY_DIR}/index.md
COMMAND ${CMAKE_COMMAND} -E create_symlink ${CMAKE_CURRENT_SOURCE_DIR}/extra.css ${CMAKE_CURRENT_BINARY_DIR}/extra.css
COMMAND ${CMAKE_COMMAND} -E create_symlink ${CMAKE_CURRENT_SOURCE_DIR}/data-sources/overview.md ${CMAKE_CURRENT_BINARY_DIR}/data-sources/overview.md
COMMAND ${CMAKE_COMMAND} -E create_symlink ${PROJECT_SOURCE_DIR}/data-sources/us-tiger/README.md ${CMAKE_CURRENT_BINARY_DIR}/data-sources/US-Tiger.md
COMMAND ${CMAKE_COMMAND} -E create_symlink ${PROJECT_SOURCE_DIR}/data-sources/gb-postcodes/README.md ${CMAKE_CURRENT_BINARY_DIR}/data-sources/GB-Postcodes.md
COMMAND ${CMAKE_COMMAND} -E create_symlink ${PROJECT_SOURCE_DIR}/data-sources/country-grid/README.md ${CMAKE_CURRENT_BINARY_DIR}/data-sources/Country-Grid.md
COMMAND ${CMAKE_COMMAND} -E create_symlink ${PROJECT_SOURCE_DIR}/data-sources/country-grid/mexico.quad.png ${CMAKE_CURRENT_BINARY_DIR}/data-sources/mexico.quad.png
COMMAND ${CMAKE_COMMAND} -E create_symlink ${PROJECT_SOURCE_DIR}/data-sources/wikipedia-wikidata/README.md ${CMAKE_CURRENT_BINARY_DIR}/data-sources/Wikipedia-Wikidata.md
COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/bash2md.sh ${PROJECT_SOURCE_DIR}/vagrant/Install-on-Centos-7.sh ${CMAKE_CURRENT_BINARY_DIR}/appendix/Install-on-Centos-7.md
COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/bash2md.sh ${PROJECT_SOURCE_DIR}/vagrant/Install-on-Ubuntu-16.sh ${CMAKE_CURRENT_BINARY_DIR}/appendix/Install-on-Ubuntu-16.md
COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/bash2md.sh ${PROJECT_SOURCE_DIR}/vagrant/Install-on-Ubuntu-18.sh ${CMAKE_CURRENT_BINARY_DIR}/appendix/Install-on-Ubuntu-18.md
COMMAND mkdocs build -d ${CMAKE_CURRENT_BINARY_DIR}/../site-html -f ${CMAKE_CURRENT_BINARY_DIR}/../mkdocs.yml
)

View File

@@ -1,170 +0,0 @@
# Advanced installations
This page contains instructions for setting up multiple countries in
your Nominatim database. It is assumed that you have already successfully
installed the Nominatim software itself, if not return to the
[installation page](Installation.md).
## Importing with a database user without superuser rights
Nominatim usually creates its own PostgreSQL database at the beginning of the
import process. This makes usage easier for the user but means that the
database user doing the import needs the appropriate rights.
If you prefer to run the import with a database user with limited rights,
you can do so by changing the import process as follows:
1. Run the command for database preparation with a database user with
superuser rights. For example, to use a db user 'dbadmin' for a
database 'nominatim', execute:
```
NOMINATIM_DATABASE_DSN="pgsql:dbname=nominatim;user=dbadmin" nominatim import --prepare-database
```
2. Grant the import user the right to create tables. For example, foe user 'import-user':
```
psql -d nominatim -c 'GRANT CREATE ON SCHEMA public TO "import-user"'
```
3. Now run the reminder of the import with the import user:
```
NOMINATIM_DATABASE_DSN="pgsql:dbname=nominatim;user=import-user" nominatim import --continue import-from-file --osm-file file.pbf
```
## Importing multiple regions (without updates)
To import multiple regions in your database you can simply give multiple
OSM files to the import command:
```
nominatim import --osm-file file1.pbf --osm-file file2.pbf
```
If you already have imported a file and want to add another one, you can
use the add-data function to import the additional data as follows:
```
nominatim add-data --file <FILE>
nominatim refresh --postcodes
nominatim index -j <NUMBER OF THREADS>
```
Please note that adding additional data is always significantly slower than
the original import.
## Importing multiple regions (with updates)
If you want to import multiple regions _and_ be able to keep them up-to-date
with updates, then you can use the scripts provided in the `utils` directory.
These scripts will set up an `update` directory in your project directory,
which has the following structure:
```bash
update
├── europe
│ ├── andorra
│ │ └── sequence.state
│ └── monaco
│ └── sequence.state
└── tmp
└── europe
├── andorra-latest.osm.pbf
└── monaco-latest.osm.pbf
```
The `sequence.state` files contain the sequence ID for each region. They will
be used by pyosmium to get updates. The `tmp` folder is used for import dump and
can be deleted once the import is complete.
### Setting up multiple regions
Create a project directory as described for the
[simple import](Import.md#creating-the-project-directory). If necessary,
you can also add an `.env` configuration with customized options. In particular,
you need to make sure that `NOMINATIM_REPLICATION_UPDATE_INTERVAL` and
`NOMINATIM_REPLICATION_RECHECK_INTERVAL` are set according to the update
interval of the extract server you use.
Copy the scripts `utils/import_multiple_regions.sh` and `utils/update_database.sh`
into the project directory.
Now customize both files as per your requirements
1. List of countries. e.g.
COUNTRIES="europe/monaco europe/andorra"
2. URL to the service providing the extracts and updates. eg:
BASEURL="https://download.geofabrik.de"
DOWNCOUNTRYPOSTFIX="-latest.osm.pbf"
5. Followup in the update script can be set according to your installation.
E.g. for Photon,
FOLLOWUP="curl http://localhost:2322/nominatim-update"
will handle the indexing.
To start the initial import, change into the project directory and run
```
bash import_multiple_regions.sh
```
### Updating the database
Change into the project directory and run the following command:
bash update_database.sh
This will get diffs from the replication server, import diffs and index
the database. The default replication server in the
script ([Geofabrik](https://download.geofabrik.de)) provides daily updates.
## Using an external PostgreSQL database
You can install Nominatim using a database that runs on a different server.
Simply point the configuration variable `NOMINATIM_DATABASE_DSN` to the
server and follow the standard import documentation.
The import will be faster, if the import is run directly from the database
machine. You can easily switch to a different machine for the query frontend
after the import.
## Moving the database to another machine
For some configurations it may be useful to run the import on one machine, then
move the database to another machine and run the Nominatim service from there.
For example, you might want to use a large machine to be able to run the import
quickly but only want a smaller machine for production because there is not so
much load. Or you might want to do the import once and then replicate the
database to many machines.
The important thing to keep in mind when transferring the Nominatim installation
is that you need to transfer the database _and the project directory_. Both
parts are essential for your installation.
The Nominatim database can be transferred using the `pg_dump`/`pg_restore` tool.
Make sure to use the same version of PostgreSQL and PostGIS on source and
target machine.
!!! note
Before creating a dump of your Nominatim database, consider running
`nominatim freeze` first. Your database looses the ability to receive further
data updates but the resulting database is only about a third of the size
of a full database.
Next install nominatim-api on the target machine by following the standard
installation instructions. Again, make sure to use the same version as the
source machine.
Create a project directory on your destination machine and set up the `.env`
file to match the configuration on the source machine. That's all.

View File

@@ -1,148 +0,0 @@
# Deploying the Nominatim Python frontend
Nominatim can be run as a Python-based
[ASGI web application](https://asgi.readthedocs.io/en/latest/). You have the
choice between [Falcon](https://falcon.readthedocs.io/en/stable/)
and [Starlette](https://www.starlette.io/) as the ASGI framework.
This section gives a quick overview on how to configure Nginx to serve
Nominatim. Please refer to the documentation of
[Nginx](https://nginx.org/en/docs/) for background information on how
to configure it.
!!! Note
Throughout this page, we assume your Nominatim project directory is
located in `/srv/nominatim-project`. If you have put it somewhere else,
you need to adjust the commands and configuration accordingly.
### Installing the required packages
The Nominatim frontend is best run from its own virtual environment. If
you have already created one for the database backend during the
[installation](Installation.md#building-nominatim), you can use that. Otherwise
create one now with:
```sh
sudo apt-get install virtualenv
virtualenv /srv/nominatim-venv
```
The Nominatim frontend is contained in the 'nominatim-api' package. To
install directly from the source tree run:
```sh
cd Nominatim
/srv/nominatim-venv/bin/pip install packaging/nominatim-api
```
The recommended way to deploy a Python ASGI application is to run
the ASGI runner [uvicorn](https://www.uvicorn.org/)
together with [gunicorn](https://gunicorn.org/) HTTP server. We use
Falcon here as the web framework.
Add the necessary packages to your virtual environment:
``` sh
/srv/nominatim-venv/bin/pip install falcon uvicorn gunicorn
```
### Setting up Nominatim as a systemd job
Next you need to set up the service that runs the Nominatim frontend. This is
easiest done with a systemd job.
First you need to tell systemd to create a socket file to be used by
hunicorn. Create the following file `/etc/systemd/system/nominatim.socket`:
``` systemd
[Unit]
Description=Gunicorn socket for Nominatim
[Socket]
ListenStream=/run/nominatim.sock
SocketUser=www-data
[Install]
WantedBy=multi-user.target
```
Now you can add the systemd service for Nominatim itself.
Create the following file `/etc/systemd/system/nominatim.service`:
``` systemd
[Unit]
Description=Nominatim running as a gunicorn application
After=network.target
Requires=nominatim.socket
[Service]
Type=simple
User=www-data
Group=www-data
WorkingDirectory=/srv/nominatim-project
ExecStart=/srv/nominatim-venv/bin/gunicorn -b unix:/run/nominatim.sock -w 4 -k uvicorn.workers.UvicornWorker "nominatim_api.server.falcon.server:run_wsgi()"
ExecReload=/bin/kill -s HUP $MAINPID
StandardOutput=append:/var/log/gunicorn-nominatim.log
StandardError=inherit
PrivateTmp=true
TimeoutStopSec=5
KillMode=mixed
[Install]
WantedBy=multi-user.target
```
This sets up gunicorn with 4 workers (`-w 4` in ExecStart). Each worker runs
its own Python process using
[`NOMINATIM_API_POOL_SIZE`](../customize/Settings.md#nominatim_api_pool_size)
connections to the database to serve requests in parallel.
Make the new services known to systemd and start it:
``` sh
sudo systemctl daemon-reload
sudo systemctl enable nominatim.socket
sudo systemctl start nominatim.socket
sudo systemctl enable nominatim.service
sudo systemctl start nominatim.service
```
This sets the service up, so that Nominatim is automatically started
on reboot.
### Configuring nginx
To make the service available to the world, you need to proxy it through
nginx. Add the following definition to the default configuration:
``` nginx
upstream nominatim_service {
server unix:/run/nominatim.sock fail_timeout=0;
}
server {
listen 80;
listen [::]:80;
root /var/www/html;
index /search;
location / {
proxy_set_header Host $http_host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_redirect off;
proxy_pass http://nominatim_service;
}
}
```
Reload nginx with
```
sudo systemctl reload nginx
```
and you should be able to see the status of your server under
`http://localhost/status`.

View File

@@ -16,26 +16,68 @@ was killed. If it looks like this:
then you can resume with the following command:
```sh
nominatim import --continue indexing
./utils/setup.php --index --create-search-indices --create-country-names
```
If the reported rank is 26 or higher, you can also safely add `--index-noanalyse`.
### PostgreSQL crashed "invalid page in block"
### PHP "open_basedir restriction in effect" warnings
Usually serious problem, can be a hardware issue, not all data written to disc
for example. Check PostgreSQL log file and search PostgreSQL issues/mailing
list for hints.
PHP Warning: file_get_contents(): open_basedir restriction in effect.
If it happened during index creation you can try rerunning the step with
You need to adjust the [open_basedir](https://www.php.net/manual/en/ini.core.php#ini.open-basedir) setting
in your PHP configuration (`php.ini file`). By default this setting may look like this:
```sh
nominatim import --continue indexing
open_basedir = /srv/http/:/home/:/tmp/:/usr/share/pear/
Either add reported directories to the list or disable this setting temporarily by
dding ";" at the beginning of the line. Don't forget to enable this setting again
once you are done with the PHP command line operations.
### PHP timzeone warnings
The Apache log may contain lots of PHP warnings like this:
`PHP Warning: date_default_timezone_set() function.`
You should set the default time zone as instructed in the warning in
your `php.ini` file. Find the entry about timezone and set it to
something like this:
; Defines the default timezone used by the date functions
; https://php.net/date.timezone
date.timezone = 'America/Denver'
Or
```
echo "date.timezone = 'America/Denver'" > /etc/php.d/timezone.ini
```
Otherwise it's best to start the full setup from the beginning.
### nominatim.so version mismatch
When running the import you may get a version mismatch:
`COPY_END for place failed: ERROR: incompatible library "/srv/Nominatim/nominatim/build/module/nominatim.so": version mismatch`
pg_config seems to use bad includes sometimes when multiple versions
of PostgreSQL are available in the system. Make sure you remove the
server development libraries (`postgresql-server-dev-9.5` on Ubuntu)
and recompile (`cmake .. && make`).
## I see the error "ERROR: permission denied for language c"
`nominatim.so`, written in C, is required to be installed on the database
server. Some managed database (cloud) services like Amazon RDS do not allow
this. There is currently no work-around other than installing a database
on a non-managed machine.
### I see the error: "function transliteration(text) does not exist"
Reinstall the nominatim functions with `setup.php --create--functions`
and check for any errors, e.g. a missing `nominatim.so` file.
### I see the error: "ERROR: mmap (remap) failed"
@@ -49,8 +91,7 @@ vboxfs.
### nominatim UPDATE failed: ERROR: buffer 179261 is not owned by resource owner Portal
Several users [reported this](https://github.com/openstreetmap/Nominatim/issues/1168)
during the initial import of the database. It's
Several users [reported this](https://github.com/openstreetmap/Nominatim/issues/1168) during the initial import of the database. It's
something PostgreSQL internal Nominatim doesn't control. And PostgreSQL forums
suggest it's threading related but definitely some kind of crash of a process.
Users reported either rebooting the server, different hardware or just trying
@@ -62,12 +103,29 @@ The server cannot access your database. Add `&debug=1` to your URL
to get the full error message.
### On CentOS the website shows "Could not connect to server"
`could not connect to server: No such file or directory`
On CentOS v7 the PostgreSQL server is started with `systemd`.
Check if `/usr/lib/systemd/system/httpd.service` contains a line `PrivateTmp=true`.
If so then Apache cannot see the `/tmp/.s.PGSQL.5432` file. It's a good security feature,
so use the [preferred solution](../appendix/Install-on-Centos-7/#adding-selinux-security-settings).
However, you can solve this the quick and dirty way by commenting out that line and then run
sudo systemctl daemon-reload
sudo systemctl restart httpd
### "must be an array or an object that implements Countable" warning in /usr/share/pear/DB.php
The warning started with PHP 7.2. Make sure you have at least [version 1.9.3 of PEAR DB](https://github.com/pear/DB/releases)
installed.
### Website reports "DB Error: insufficient permissions"
The user the webserver, e.g. Apache, runs under needs to have access to the
Nominatim database. You can find the user like
[this](https://serverfault.com/questions/125865/finding-out-what-user-apache-is-running-as),
for default Ubuntu operating system for example it's `www-data`.
The user the webserver, e.g. Apache, runs under needs to have access to the Nominatim database. You can find the user like [this](https://serverfault.com/questions/125865/finding-out-what-user-apache-is-running-as), for default Ubuntu operating system for example it's `www-data`.
1. Repeat the `createuser` step of the installation instructions.
@@ -78,42 +136,80 @@ for default Ubuntu operating system for example it's `www-data`.
GRANT SELECT ON ALL TABLES IN SCHEMA public TO "www-data";
```
### Setup fails with "DB Error: extension not found"
### Website reports "Could not load library "nominatim.so"
Example error message
```
SELECT make_standard_name('3039 E MEADOWLARK LN') [nativecode=ERROR: could not
load library "/srv/nominatim/Nominatim-3.1.0/build/module/nominatim.so":
/srv/nominatim/Nominatim-3.1.0/build/module/nominatim.so: cannot open shared
object file: Permission denied
CONTEXT: PL/pgSQL function make_standard_name(text) line 5 at assignment]
```
The PostgreSQL database, i.e. user `postgres`, needs to have access to that file.
The permission need to be read & executable by everybody, e.g.
```
-rwxr-xr-x 1 nominatim nominatim 297984 build/module/nominatim.so
```
Try `chmod a+r nominatim.so; chmod a+x nominatim.so`.
When running SELinux, make sure that the
[context is set up correctly](../appendix/Install-on-Centos-7/#adding-selinux-security-settings).
### Setup.php fails with "DB Error: extension not found"
Make sure you have the PostgreSQL extensions "hstore" and "postgis" installed.
See the installation instructions for a full list of required packages.
See the installation instruction for a full list of required packages.
### UnicodeEncodeError: 'ascii' codec can't encode character
### Setup.php reports "Cannot redeclare getDB()"
Make sure that the operating system's locale is UTF-8. With some prebuilt
images (e.g. LXC containers from Proxmox, see
[discussion](https://github.com/osm-search/Nominatim/discussions/2343)) or
images that optimize for size it might be missing.
`Cannot redeclare getDB() (previously declared in /your/path/Nominatim/lib/db.php:4)`
On Ubuntu you can check the locale is installed:
The message is a bit misleading as PHP needs to load the file `DB.php` and
instead re-loads Nominatim's `db.php`. To solve this make sure you
have the [Pear module 'DB'](https://pear.php.net/package/DB/) installed.
```
grep UTF-8 /etc/default/locale
```
And install it using
```
dpkg-reconfigure locales
```
sudo pear install DB
### I forgot to delete the flatnodes file before starting an import.
That's fine. For each import the flatnodes file get overwritten.
See [https://help.openstreetmap.org/questions/52419/nominatim-flatnode-storage](https://help.openstreetmap.org/questions/52419/nominatim-flatnode-storage)
See [https://help.openstreetmap.org/questions/52419/nominatim-flatnode-storage]()
for more information.
## Running your own instance
### Can I import multiple countries and keep them up to date?
You should use the extracts and updates from https://download.geofabrik.de.
For the initial import, download the countries you need and merge them.
See [OSM Help](https://help.openstreetmap.org/questions/48843/merging-two-or-more-geographical-areas-to-import-two-or-more-osm-files-in-nominatim)
for examples how to do that. Use the resulting single osm file when
running `setup.php`.
For updates you need to download the change files for each country
once per day and apply them **separately** using
./utils/update.php --import-diff <filename> --index
See [this issue](https://github.com/openstreetmap/Nominatim/issues/60#issuecomment-18679446)
for a script that runs the updates using osmosis.
### Can I import negative OSM ids into Nominatim?
No, negative IDs are no longer supported by osm2pgsql. You can use
large 64-bit IDs that are guaranteed not to clash with OSM IDs. However,
you will not able to use a flatnode file with them.
See [this question of Stackoverflow](https://help.openstreetmap.org/questions/64662/nominatim-flatnode-with-negative-id).
### Missing XML or text declaration
The website might show: `XML Parsing Error: XML or text declaration not at start of entity Location.`
Make sure there are no spaces at the beginning of your `settings/local.php` file.

View File

@@ -0,0 +1,276 @@
# Importing and Updating the Database
The following instructions explain how to create a Nominatim database
from an OSM planet file and how to keep the database up to date. It
is assumed that you have already successfully installed the Nominatim
software itself, if not return to the [installation page](Installation.md).
## Configuration setup in settings/local.php
The Nominatim server can be customized via the file `settings/local.php`
in the build directory. Note that this is a PHP file, so it must always
start like this:
<?php
without any leading spaces.
There are lots of configuration settings you can tweak. Have a look
at `settings/default.php` for a full list. Most should have a sensible default.
#### Flatnode files
If you plan to import a large dataset (e.g. Europe, North America, planet),
you should also enable flatnode storage of node locations. With this
setting enabled, node coordinates are stored in a simple file instead
of the database. This will save you import time and disk storage.
Add to your `settings/local.php`:
@define('CONST_Osm2pgsql_Flatnode_File', '/path/to/flatnode.file');
Replace the second part with a suitable path on your system and make sure
the directory exists. There should be at least 40GB of free space.
## Downloading additional data
### Wikipedia rankings
Wikipedia can be used as an optional auxiliary data source to help indicate
the importance of OSM features. Nominatim will work without this information
but it will improve the quality of the results if this is installed.
This data is available as a binary download:
cd $NOMINATIM_SOURCE_DIR/data
wget https://www.nominatim.org/data/wikipedia_article.sql.bin
wget https://www.nominatim.org/data/wikipedia_redirect.sql.bin
Combined the 2 files are around 1.5GB and add around 30GB to the install
size of Nominatim. They also increase the install time by an hour or so.
*NOTE:* you'll need to download the Wikipedia rankings before performing
the initial import of the data if you want the rankings applied to the
loaded data.
### Great Britain, USA postcodes
Nominatim can use postcodes from an external source to improve searches that
involve a GB or US postcode. This data can be optionally downloaded:
cd $NOMINATIM_SOURCE_DIR/data
wget https://www.nominatim.org/data/gb_postcode_data.sql.gz
wget https://www.nominatim.org/data/us_postcode_data.sql.gz
## Choosing the Data to Import
In its default setup Nominatim is configured to import the full OSM data
set for the entire planet. Such a setup requires a powerful machine with
at least 32GB of RAM and around 800GB of SSD hard disks. Depending on your
use case there are various ways to reduce the amount of data imported. This
section discusses these methods. They can also be combined.
### Using an extract
If you only need geocoding for a smaller region, then precomputed extracts
are a good way to reduce the database size and import time.
[Geofabrik](https://download.geofabrik.de) offers extracts for most countries.
They even have daily updates which can be used with the update process described
below. There are also
[other providers for extracts](https://wiki.openstreetmap.org/wiki/Planet.osm#Downloading).
Please be aware that some extracts are not cut exactly along the country
boundaries. As a result some parts of the boundary may be missing which means
that Nominatim cannot compute the areas for some administrative areas.
### Dropping Data Required for Dynamic Updates
About half of the data in Nominatim's database is not really used for serving
the API. It is only there to allow the data to be updated from the latest
changes from OSM. For many uses these dynamic updates are not really required.
If you don't plan to apply updates, the dynamic part of the database can be
safely dropped using the following command:
```
./utils/setup.php --drop
```
Note that you still need to provide for sufficient disk space for the initial
import. So this option is particularly interesting if you plan to transfer the
database or reuse the space later.
### Reverse-only Imports
If you only want to use the Nominatim database for reverse lookups or
if you plan to use the installation only for exports to a
[photon](https://photon.komoot.de/) database, then you can set up a database
without search indexes. Add `--reverse-only` to your setup command above.
This saves about 5% of disk space.
### Filtering Imported Data
Nominatim normally sets up a full search database containing administrative
boundaries, places, streets, addresses and POI data. There are also other
import styles available which only read selected data:
* **settings/import-admin.style**
Only import administrative boundaries and places.
* **settings/import-street.style**
Like the admin style but also adds streets.
* **settings/import-address.style**
Import all data necessary to compute addresses down to house number level.
* **settings/import-full.style**
Default style that also includes points of interest.
The style can be changed with the configuration `CONST_Import_Style`.
To give you an idea of the impact of using the different styles, the table
below gives rough estimates of the final database size after import of a
2018 planet and after using the `--drop` option. It also shows the time
needed for the import on a machine with 32GB RAM, 4 CPUS and SSDs. Note that
the given sizes are just an estimate meant for comparison of style requirements.
Your planet import is likely to be larger as the OSM data grows with time.
style | Import time | DB size | after drop
----------|--------------|------------|------------
admin | 5h | 190 GB | 20 GB
street | 42h | 400 GB | 180 GB
address | 59h | 500 GB | 260 GB
full | 80h | 575 GB | 300 GB
You can also customize the styles further. For an description of the
style format see [the development section](../develop/Import.md).
## Initial import of the data
**Important:** first try the import with a small extract, for example from
[Geofabrik](https://download.geofabrik.de).
Download the data to import and load the data with the following command
from the build directory:
```sh
./utils/setup.php --osm-file <data file> --all [--osm2pgsql-cache 28000] 2>&1 | tee setup.log
```
The `--osm2pgsql-cache` parameter is optional but strongly recommended for
planet imports. It sets the node cache size for the osm2pgsql import part
(see `-C` parameter in osm2pgsql help). As a rule of thumb, this should be
about the same size as the file you are importing but never more than
2/3 of RAM available. If your machine starts swapping reduce the size.
Computing word frequency for search terms can improve the performance of
forward geocoding in particular under high load as it helps PostgreSQL's query
planner to make the right decisions. To recompute word counts run:
```sh
./utils/update.php --recompute-word-counts
```
This will take a couple of hours for a full planet installation. You can
also defer that step to a later point in time when you realise that
performance becomes an issue. Just make sure that updates are stopped before
running this function.
If you want to be able to search for places by their type through
[special key phrases](https://wiki.openstreetmap.org/wiki/Nominatim/Special_Phrases)
you also need to enable these key phrases like this:
./utils/specialphrases.php --wiki-import > specialphrases.sql
psql -d nominatim -f specialphrases.sql
Note that this command downloads the phrases from the wiki link above.
## Installing Tiger housenumber data for the US
Nominatim is able to use the official [TIGER](https://www.census.gov/geo/maps-data/data/tiger.html)
address set to complement the OSM house number data in the US. You can add
TIGER data to your own Nominatim instance by following these steps. The
entire US adds about 10GB to your database.
1. Get preprocessed TIGER 2019 data and unpack it into the
data directory in your Nominatim sources:
cd Nominatim/data
wget https://nominatim.org/data/tiger2019-nominatim-preprocessed.tar.gz
tar xf tiger2019-nominatim-preprocessed.tar.gz
`data-source/us-tiger/README.md` explains how the data got preprocessed.
2. Import the data into your Nominatim database:
./utils/setup.php --import-tiger-data
3. Enable use of the Tiger data in your `settings/local.php` by adding:
@define('CONST_Use_US_Tiger_Data', true);
4. Apply the new settings:
```sh
./utils/setup.php --create-functions --enable-diff-updates --create-partition-functions
```
## Updates
There are many different ways to update your Nominatim database.
The following section describes how to keep it up-to-date with Pyosmium.
For a list of other methods see the output of `./utils/update.php --help`.
#### Installing the newest version of Pyosmium
It is recommended to install Pyosmium via pip. Make sure to use python3.
Run (as the same user who will later run the updates):
```sh
pip3 install --user osmium
```
Nominatim needs a tool called `pyosmium-get-updates` which comes with
Pyosmium. You need to tell Nominatim where to find it. Add the
following line to your `settings/local.php`:
@define('CONST_Pyosmium_Binary', '/home/user/.local/bin/pyosmium-get-changes');
The path above is fine if you used the `--user` parameter with pip.
Replace `user` with your user name.
#### Setting up the update process
Next the update needs to be initialised. By default Nominatim is configured
to update using the global minutely diffs.
If you want a different update source you will need to add some settings
to `settings/local.php`. For example, to use the daily country extracts
diffs for Ireland from Geofabrik add the following:
// base URL of the replication service
@define('CONST_Replication_Url', 'https://download.geofabrik.de/europe/ireland-and-northern-ireland-updates');
// How often upstream publishes diffs
@define('CONST_Replication_Update_Interval', '86400');
// How long to sleep if no update found yet
@define('CONST_Replication_Recheck_Interval', '900');
To set up the update process now run the following command:
./utils/update.php --init-updates
It outputs the date where updates will start. Recheck that this date is
what you expect.
The `--init-updates` command needs to be rerun whenever the replication service
is changed.
#### Updating Nominatim
The following command will keep your database constantly up to date:
./utils/update.php --import-osmosis-all
(Note that even though the old name "import-osmosis-all" has been kept for compatibility reasons, Osmosis is not required to run this - it uses pyosmium behind the scenes.)
If you have imported multiple country extracts and want to keep them
up-to-date, have a look at the script in
[issue #60](https://github.com/openstreetmap/Nominatim/issues/60).

View File

@@ -1,316 +0,0 @@
# Importing the Database
The following instructions explain how to create a Nominatim database
from an OSM planet file. It is assumed that you have already successfully
installed the Nominatim software itself and the `nominatim` tool can be found
in your `PATH`. If this is not the case, return to the
[installation page](Installation.md).
## Creating the project directory
Before you start the import, you should create a project directory for your
new database installation. This directory receives all data that is related
to a single Nominatim setup: configuration, extra data, etc. Create a project
directory apart from the Nominatim software and change into the directory:
```
mkdir ~/nominatim-project
cd ~/nominatim-project
```
In the following, we refer to the project directory as `$PROJECT_DIR`. To be
able to copy&paste instructions, you can export the appropriate variable:
```
export PROJECT_DIR=~/nominatim-project
```
The Nominatim tool assumes per default that the current working directory is
the project directory but you may explicitly state a different directory using
the `--project-dir` parameter. The following instructions assume that you run
all commands from the project directory.
!!! tip "Migration Tip"
Nominatim used to be run directly from the build directory until version 3.6.
Essentially, the build directory functioned as the project directory
for the database installation. This setup still works and can be useful for
development purposes. It is not recommended anymore for production setups.
Create a project directory that is separate from the Nominatim software.
### Configuration setup in `.env`
The Nominatim server can be customized via an `.env` configuration file in the
project directory. This is a file in [dotenv](https://github.com/theskumar/python-dotenv)
format which looks the same as variable settings in a standard shell environment.
You can also set the same configuration via environment variables. All
settings have a `NOMINATIM_` prefix to avoid conflicts with other environment
variables.
There are lots of configuration settings you can tweak. A full reference
can be found in the chapter [Configuration Settings](../customize/Settings.md).
Most should have a sensible default.
#### Flatnode files
If you plan to import a large dataset (e.g. Europe, North America, planet),
you should also enable flatnode storage of node locations. With this
setting enabled, node coordinates are stored in a simple file instead
of the database. This will save you import time and disk storage.
Add to your `.env`:
NOMINATIM_FLATNODE_FILE="/path/to/flatnode.file"
Replace the second part with a suitable path on your system and make sure
the directory exists. There should be at least 75GB of free space.
## Downloading additional data
### Wikipedia/Wikidata rankings
Wikipedia can be used as an optional auxiliary data source to help indicate
the importance of OSM features. Nominatim will work without this information
but it will improve the quality of the results if this is installed.
This data is available as a binary download. Put it into your project directory:
cd $PROJECT_DIR
wget https://nominatim.org/data/wikimedia-importance.csv.gz
wget -O secondary_importance.sql.gz https://nominatim.org/data/wikimedia-secondary-importance.sql.gz
The files are about 400MB and add around 4GB to the Nominatim database. For
more information about importance,
see [Importance Customization](../customize/Importance.md).
!!! tip
If you forgot to download the wikipedia rankings, then you can
also add importances after the import. Download the SQL files, then
run `nominatim refresh --wiki-data --secondary-importance --importance`.
Updating importances for a planet will take a couple of hours.
### External postcodes
Nominatim can use postcodes from an external source to improve searching with
postcodes. We provide precomputed postcodes sets for the US (using TIGER data)
and the UK (using the [CodePoint OpenData set](https://osdatahub.os.uk/downloads/open/CodePointOpen).
This data can be optionally downloaded into the project directory:
cd $PROJECT_DIR
wget https://nominatim.org/data/gb_postcodes.csv.gz
wget https://nominatim.org/data/us_postcodes.csv.gz
You can also add your own custom postcode sources, see
[Customization of postcodes](../customize/Postcodes.md).
## Choosing the data to import
In its default setup Nominatim is configured to import the full OSM data
set for the entire planet. Such a setup requires a powerful machine with
at least 64GB of RAM and around 900GB of SSD hard disks. Depending on your
use case there are various ways to reduce the amount of data imported. This
section discusses these methods. They can also be combined.
### Using an extract
If you only need geocoding for a smaller region, then precomputed OSM extracts
are a good way to reduce the database size and import time.
[Geofabrik](https://download.geofabrik.de) offers extracts for most countries.
They even have daily updates which can be used with the update process described
[in the next section](Update.md). There are also
[other providers for extracts](https://wiki.openstreetmap.org/wiki/Planet.osm#Downloading).
Please be aware that some extracts are not cut exactly along the country
boundaries. As a result some parts of the boundary may be missing which means
that Nominatim cannot compute the areas for some administrative areas.
### Dropping Data Required for Dynamic Updates
About half of the data in Nominatim's database is not really used for serving
the API. It is only there to allow the data to be updated from the latest
changes from OSM. For many uses these dynamic updates are not really required.
If you don't plan to apply updates, you can run the import with the
`--no-updates` parameter. This will drop the dynamic part of the database as
soon as it is not required anymore.
You can also drop the dynamic part later using the following command:
```
nominatim freeze
```
Note that you still need to provide for sufficient disk space for the initial
import. So this option is particularly interesting if you plan to transfer the
database or reuse the space later.
!!! warning
The data structure for updates are also required when adding additional data
after the import, for example [TIGER housenumber data](../customize/Tiger.md).
If you plan to use those, you must not use the `--no-updates` parameter.
Do a normal import, add the external data and once you are done with
everything run `nominatim freeze`.
### Reverse-only Imports
If you only want to use the Nominatim database for reverse lookups or
if you plan to use the installation only for exports to a
[photon](https://photon.komoot.io/) database, then you can set up a database
without search indexes. Add `--reverse-only` to your setup command above.
This saves about 5% of disk space, import time won't be significant faster.
### Filtering Imported Data
Nominatim normally sets up a full search database containing administrative
boundaries, places, streets, addresses and POI data. There are also other
import styles available which only read selected data:
* **admin**
Only import administrative boundaries and places.
* **street**
Like the admin style but also adds streets.
* **address**
Import all data necessary to compute addresses down to house number level.
* **full**
Default style that also includes points of interest.
* **extratags**
Like the full style but also adds most of the OSM tags into the extratags
column.
The style can be changed with the configuration `NOMINATIM_IMPORT_STYLE`.
To give you an idea of the impact of using the different styles, the table
below gives rough estimates of the final database size after import of a
2020 planet and after using the `--drop` option. It also shows the time
needed for the import on a machine with 64GB RAM, 4 CPUS and NVME disks.
Note that the given sizes are just an estimate meant for comparison of
style requirements. Your planet import is likely to be larger as the
OSM data grows with time.
style | Import time | DB size | after drop
----------|--------------|------------|------------
admin | 4h | 215 GB | 20 GB
street | 22h | 440 GB | 185 GB
address | 36h | 545 GB | 260 GB
full | 54h | 640 GB | 330 GB
extratags | 54h | 650 GB | 340 GB
You can also customize the styles further.
A [description of the style format](../customize/Import-Styles.md)
can be found in the customization guide.
## Initial import of the data
!!! danger "Important"
First try the import with a small extract, for example from
[Geofabrik](https://download.geofabrik.de).
Download the data to import. Then issue the following command
from the **project directory** to start the import:
```sh
nominatim import --osm-file <data file> 2>&1 | tee setup.log
```
The **project directory** is the one that you have set up at the beginning.
See [creating the project directory](#creating-the-project-directory).
### Notes on full planet imports
Even on a perfectly configured machine
the import of a full planet takes around 2 days. Once you see messages
with `Rank .. ETA` appear, the indexing process has started. This part takes
the most time. There are 30 ranks to process. Rank 26 and 30 are the most complex.
They take each about a third of the total import time. If you have not reached
rank 26 after two days of import, it is worth revisiting your system
configuration as it may not be optimal for the import.
### Notes on memory usage
In the first step of the import Nominatim uses [osm2pgsql](https://osm2pgsql.org)
to load the OSM data into the PostgreSQL database. This step is very demanding
in terms of RAM usage. osm2pgsql and PostgreSQL are running in parallel at
this point. PostgreSQL blocks at least the part of RAM that has been configured
with the `shared_buffers` parameter during
[PostgreSQL tuning](Installation.md#tuning-the-postgresql-database)
and needs some memory on top of that. osm2pgsql needs at least 2GB of RAM for
its internal data structures, potentially more when it has to process very large
relations. In addition it needs to maintain a cache for node locations. The size
of this cache can be configured with the parameter `--osm2pgsql-cache`.
When importing with a flatnode file, it is best to disable the node cache
completely and leave the memory for the flatnode file. Nominatim will do this
by default, so you do not need to configure anything in this case.
For imports without a flatnode file, set `--osm2pgsql-cache` approximately to
the size of the OSM pbf file you are importing. The size needs to be given in
MB. Make sure you leave enough RAM for PostgreSQL and osm2pgsql as mentioned
above. If the system starts swapping or you are getting out-of-memory errors,
reduce the cache size or even consider using a flatnode file.
### Testing the installation
Run this script to verify that all required tables and indices got created
successfully.
```sh
nominatim admin --check-database
```
If you have installed the `nominatim-api` package, then you can try out
your installation by executing a simple query on the command line:
``` sh
nominatim search --query Berlin
```
or, when you have a reverse-only installation:
``` sh
nominatim reverse --lat 51 --lon 45
```
If you want to run Nominatim as a service, make sure you have installed
the right packages as per [Installation](Installation.md#software).
#### Testing the Python frontend
To run the test server against the Python frontend, you must choose a
web framework to use, either starlette or falcon. Make sure the appropriate
packages are installed. Then run
``` sh
nominatim serve
```
or, if you prefer to use Starlette instead of Falcon as webserver,
``` sh
nominatim serve --engine starlette
```
Go to `http://localhost:8088/status` and you should see the message `OK`.
You can also run a search query, e.g. `http://localhost:8088/search?q=Berlin`
or, for reverse-only installations a reverse query,
e.g. `http://localhost:8088/reverse?lat=27.1750090510034&lon=78.04209025`.
Do not use this test server in production.
To run Nominatim via webservers like Apache or nginx, please continue reading
[Deploy the Python frontend](Deployment-Python.md).
## Enabling search by category phrases
To be able to search for places by their type using
[special phrases](https://wiki.openstreetmap.org/wiki/Nominatim/Special_Phrases)
you also need to import these key phrases like this:
```sh
nominatim special-phrases --import-from-wiki
```
Note that this command downloads the phrases from the wiki link above. You
need internet access for the step.
You can also import special phrases from a csv file, for more
information please see the [Customization part](../customize/Special-Phrases.md).

View File

@@ -4,8 +4,9 @@ This page contains generic installation instructions for Nominatim and its
prerequisites. There are also step-by-step instructions available for
the following operating systems:
* [Ubuntu 24.04](Install-on-Ubuntu-24.md)
* [Ubuntu 22.04](Install-on-Ubuntu-22.md)
* [Ubuntu 18.04](../appendix/Install-on-Ubuntu-18.md)
* [Ubuntu 16.04](../appendix/Install-on-Ubuntu-16.md)
* [CentOS 7.2](../appendix/Install-on-Centos-7.md)
These OS-specific instructions can also be found in executable form
in the `vagrant/` directory.
@@ -15,136 +16,143 @@ and can't offer support.
* [Docker](https://github.com/mediagis/nominatim-docker)
* [Docker on Kubernetes](https://github.com/peter-evans/nominatim-k8s)
* [Kubernetes with Helm](https://github.com/robjuz/helm-charts/blob/master/charts/nominatim/README.md)
* [Ansible](https://github.com/synthesio/infra-ansible-nominatim)
## Prerequisites
### Software
For compiling:
* [cmake](https://cmake.org/)
* [libxml2](http://xmlsoft.org/)
* a recent C++ compiler
Nominatim comes with its own version of osm2pgsql. See the
osm2pgsql README for additional dependencies required for compiling osm2pgsql.
For running tests:
* [behave](http://pythonhosted.org/behave/)
* [Psycopg2](https://initd.org/psycopg)
* [nose](https://nose.readthedocs.io)
* [phpunit](https://phpunit.de)
For running Nominatim:
* [PostgreSQL](https://www.postgresql.org) (12+ will work, 13+ strongly recommended)
* [PostGIS](https://postgis.net) (3.0+ will work, 3.2+ strongly recommended)
* [osm2pgsql](https://osm2pgsql.org) (1.8+)
* [Python 3](https://www.python.org/) (3.7+)
Furthermore the following Python libraries are required:
* [Psycopg3](https://www.psycopg.org)
* [Python Dotenv](https://github.com/theskumar/python-dotenv)
* [psutil](https://github.com/giampaolo/psutil)
* [Jinja2](https://palletsprojects.com/p/jinja/)
* [PyICU](https://pypi.org/project/PyICU/)
* [PyYaml](https://pyyaml.org/) (5.1+)
* [datrie](https://github.com/pytries/datrie)
These will be installed automatically when using pip installation.
* [PostgreSQL](https://www.postgresql.org) (9.3 or later)
* [PostGIS](https://postgis.org) (2.2 or later)
* [PHP](https://php.net) (7.0 or later)
* PHP-pgsql
* PHP-intl (bundled with PHP)
* [PEAR::DB](https://pear.php.net/package/DB)
* a webserver (apache or nginx are recommended)
For running continuous updates:
* [pyosmium](https://osmcode.org/pyosmium/)
For running the Python frontend:
* [SQLAlchemy](https://www.sqlalchemy.org/) (1.4.31+ with greenlet support)
* [asyncpg](https://magicstack.github.io/asyncpg) (0.8+, only when using SQLAlchemy < 2.0)
* one of the following web frameworks:
* [falcon](https://falconframework.org/) (3.0+)
* [starlette](https://www.starlette.io/)
* [uvicorn](https://www.uvicorn.org/)
For dependencies for running tests and building documentation, see
the [Development section](../develop/Development-Environment.md).
* [pyosmium](https://osmcode.org/pyosmium/) (with Python 3)
### Hardware
A minimum of 2GB of RAM is required or installation will fail. For a full
planet import 128GB of RAM or more are strongly recommended. Do not report
out of memory problems if you have less than 64GB RAM.
planet import 32GB of RAM or more are strongly recommended.
For a full planet install you will need at least 1TB of hard disk space.
Take into account that the OSM database is growing fast.
Fast disks are essential. Using NVME disks is recommended.
For a full planet install you will need at least 700GB of hard disk space
(take into account that the OSM database is growing fast). SSD disks
will help considerably to speed up import and queries.
Even on a well configured machine the import of a full planet takes
around 2.5 days. When using traditional SSDs, 4-5 days are more realistic.
On a 6-core machine with 32GB RAM and SSDs the import of a full planet takes
a bit more than 2 days. Without SSDs 7-8 days are more realistic.
## Tuning the PostgreSQL database
## Setup of the server
### PostgreSQL tuning
You might want to tune your PostgreSQL installation so that the later steps
make best use of your hardware. You should tune the following parameters in
your `postgresql.conf` file.
shared_buffers = 2GB
maintenance_work_mem = (10GB)
autovacuum_work_mem = 2GB
work_mem = (50MB)
shared_buffers (2GB)
maintenance_work_mem (10GB)
work_mem (50MB)
effective_cache_size (24GB)
synchronous_commit = off
max_wal_size = 1GB
checkpoint_timeout = 60min
checkpoint_segments = 100 # only for postgresql <= 9.4
checkpoint_timeout = 10min
checkpoint_completion_target = 0.9
random_page_cost = 1.0
wal_level = minimal
max_wal_senders = 0
The numbers in brackets behind some parameters seem to work fine for
128GB RAM machine. Adjust to your setup. A higher number for `max_wal_size`
means that PostgreSQL needs to run checkpoints less often but it does require
the additional space on your disk.
32GB RAM machine. Adjust to your setup.
Autovacuum must not be switched off because it ensures that the
tables are frequently analysed. If your machine has very little memory,
you might consider setting:
For the initial import, you should also set:
autovacuum_max_workers = 1
fsync = off
full_page_writes = off
and even reduce `autovacuum_work_mem` further. This will reduce the amount
of memory that autovacuum takes away from the import process.
Don't forget to reenable them after the initial import or you risk database
corruption. Autovacuum must not be switched off because it ensures that the
tables are frequently analysed.
## Installing the latest release
### Webserver setup
Nominatim is easiest installed directly from Pypi. Make sure you have installed
osm2pgsql, PostgreSQL/PostGIS and libICU together with its header files.
The `website/` directory in the build directory contains the configured
website. Include the directory into your webbrowser to serve php files
from there.
Then you can install Nominatim with:
#### Configure for use with Apache
pip install nominatim-db nominatim-api
Make sure your Apache configuration contains the required permissions for the
directory and create an alias:
## Downloading and building Nominatim from source
<Directory "/srv/nominatim/build/website">
Options FollowSymLinks MultiViews
AddType text/html .php
DirectoryIndex search.php
Require all granted
</Directory>
Alias /nominatim /srv/nominatim/build/website
The following instructions are only relevant, if you want to build and
install Nominatim **from source**.
`/srv/nominatim/build` should be replaced with the location of your
build directory.
### Downloading the source for the latest release
After making changes in the apache config you need to restart apache.
The website should now be available on http://localhost/nominatim.
You can download the [latest release from nominatim.org](https://nominatim.org/downloads/).
The release contains all necessary files. Just unpack it.
#### Configure for use with Nginx
### Downloading the source for the latest development version
Use php-fpm as a deamon for serving PHP cgi. Install php-fpm together with nginx.
If you want to install latest development version from github:
By default php listens on a network socket. If you want it to listen to a
Unix socket instead, change the pool configuration (`pool.d/www.conf`) as
follows:
```
git clone https://github.com/osm-search/Nominatim.git
```
; Comment out the tcp listener and add the unix socket
;listen = 127.0.0.1:9000
listen = /var/run/php5-fpm.sock
The development version does not include the country grid. Download it separately:
; Ensure that the daemon runs as the correct user
listen.owner = www-data
listen.group = www-data
listen.mode = 0666
```
wget -O Nominatim/data/country_osm_grid.sql.gz https://nominatim.org/data/country_grid.sql.gz
```
Tell nginx that php files are special and to fastcgi_pass to the php-fpm
unix socket by adding the location definition to the default configuration.
### Building Nominatim from source
root /srv/nominatim/build/website;
index search.php index.html;
location ~ [^/]\.php(/|$) {
fastcgi_split_path_info ^(.+?\.php)(/.*)$;
if (!-f $document_root$fastcgi_script_name) {
return 404;
}
fastcgi_pass unix:/var/run/php5-fpm.sock;
fastcgi_index search.php;
include fastcgi.conf;
}
Nominatim is easiest to run from its own virtual environment. To create one, run:
sudo apt-get install virtualenv
virtualenv /srv/nominatim-venv
To install Nominatim directly from the source tree into the virtual environment, run:
/srv/nominatim-venv/bin/pip install packaging/nominatim-{db,api}
Restart the nginx and php5-fpm services and the website should now be available
at `http://localhost/`.
Now continue with [importing the database](Import.md).
Now continue with [importing the database](Import-and-Update.md).

View File

@@ -1,72 +0,0 @@
This chapter describes the various operations the Nominatim database administrator
may use to clean and maintain the database. None of these operations is mandatory
but they may help improve the performance and accuracy of results.
## Updating postcodes
Command: `nominatim refresh --postcodes`
Postcode centroids (aka 'calculated postcodes') are generated by looking at all
postcodes of a country, grouping them and calculating the geometric centroid.
There is currently no logic to deal with extreme outliers (typos or other
mistakes in OSM data). There is also no check if a postcodes adheres to a
country's format, e.g. if Swiss postcodes are 4 digits.
When running regular updates, postcodes results can be improved by running
this command on a regular basis. Note that only the postcode table and the
postcode search terms are updated. The postcode that is assigned to each place
is only updated when the place is updated.
The command takes around 70min to run on the planet and needs ca. 40GB of
temporary disk space.
## Updating word counts
Command: `nominatim refresh --word-counts`
Nominatim keeps frequency statistics about all search terms it indexes. These
statistics are currently used to optimise queries to the database. Thus better
statistics mean better performance. Word counts are created once after import
and are usually sufficient even when running regular updates. You might want
to rerun the statistics computation when adding larger amounts of new data,
for example, when adding an additional country via `nominatim add-data`.
## Forcing recomputation of places and areas
Command: `nominatim refresh --data-object [NWR]<id> --data-area [NWR]<id>`
When running replication updates, Nominatim tries to recompute the search
and address information for all places that are affected by a change. But it
needs to restrict the total number of changes to make sure it can keep up
with the minutely updates. Therefore it will refrain from propagating changes
that affect a lot of objects.
The administrator may force an update of places in the database.
`nominatim refresh --data-object` invalidates a single OSM object.
`nominatim refresh --data-area` invalidates an OSM object and all dependent
objects. That are usually the places that inside its area or around the
center of the object. Both commands expect the OSM object as an argument
of the form OSM type + OSM id. The type must be `N` (node), `W` (way) or
`R` (relation).
After invalidating the object, indexing must be run again. If continuous
update are running in the background, the objects will be recomputed together
with the next round of updates. Otherwise you need to run `nominatim index`
to finish the recomputation.
## Removing large deleted objects
Command: `nominatim admin --clean-deleted <PostgreSQL Time Interval>`
Nominatim refuses to delete very large areas because often these deletions are
accidental and are reverted within hours. Instead the deletions are logged in
the `import_polygon_delete` table and left to the administrator to clean up.
To run this command you will need to pass a PostgreSQL time interval. For example to
delete any objects that have been deleted more than a month ago you would run:
`nominatim admin --clean-deleted '1 month'`

View File

@@ -1,296 +1,10 @@
# Database Migrations
Nominatim offers automatic migrations for versions 4.3+. Please follow
the following steps:
This page describes database migrations necessary to update existing databases
to newer versions of Nominatim.
* Stop any updates that are potentially running
* Update the backend: `pip install -U nominatim-db`
* Go to your project directory and run `nominatim admin --migrate`
* Update the frontend: `pip install -U nominatim-api`
* (optionally) Restart updates
Below you find additional migrations and hints about other structural and
breaking changes. **Please read them before running the migration.**
!!! note
If you are migrating from a version <4.3, you need to install 4.3
and migrate to 4.3 first. Then you can migrate to the current
version. It is strongly recommended to do a reimport instead.
## 4.5.0 -> 5.0.0
### PHP frontend removed
The PHP frontend has been completely removed. Please switch to the Python
frontend.
Without the PHP code, the `nominatim refresh --website` command is no longer
needed. It currently omits a warning and does otherwise nothing. It will be
removed in later versions of Nominatim. So make sure you remove it from your
scripts.
### CMake building removed
Nominatim can now only be installed via pip. Please follow the installation
instructions for the current version to change to pip.
### osm2pgsql no longer vendored in
Nominatim no longer ships its own version of osm2pgsql. Please install a
stock version of osm2pgsql from your distribution. See the
[installation instruction for osm2pgsql](https://osm2pgsql.org/doc/install.html)
for details. A minimum version of 1.8 is required. The current stable versions
of Ubuntu and Debian already ship with an appropriate versions. For older
installation, you may have to compile a newer osm2pgsql yourself.
### Legacy tokenizer removed
The `legacy` tokenizer is no longer enabled. This tokenizer has been superseded
by the `ICU` tokenizer a long time ago. In the unlikely case that your database
still uses the `legacy` tokenizer, you must reimport your database.
### osm2pgsql style overhauled
There are some fundamental changes to how customized osm2pgsql styles should
be written. The changes are mostly backwards compatible, i.e. custom styles
should still work with the new implementation. The only exception is a
customization of the `process_tags()` function. This function is no longer
considered public and neither are the helper functions used in it.
They currently still work but will be removed at some point. If you have
been making changes to `process_tags`, please review your style and try
to switch to the new convenience functions.
For more information on the changes, see the
[pull request](https://github.com/osm-search/Nominatim/pull/3615)
and read the new
[customization documentation](https://nominatim.org/release-docs/latest/customize/Import-Styles/).
## 4.4.0 -> 4.5.0
### New structure for Python packages
The nominatim Python package has been split into `nominatim-db` and `nominatim-api`.
Any imports need to be adapted accordingly.
If you are running the Python frontend, change the server module from
`nominatim.server.falcon.server` to `nominatim_api.server.falcon.server`.
If you are using the Nominatim library, all imports need to be changed
from `nominatim.api.<module>` to `nominatim_api.<module>`.
If you have written custom tokenizers or sanitizers, the appropriate modules
are now found in `nominatim_db`.
## 4.2.0 -> 4.3.0
### New indexes for reverse lookup
The reverse lookup algorithm has changed slightly to improve performance.
This change needs a different index in the database. The required index
will be automatically build during migration. Until the new index is available
performance of the /reverse endpoint is significantly reduced. You should
therefore either remove traffic from the machine before attempting a
version update or create the index manually **before** starting the update
using the following SQL:
```sql
CREATE INDEX IF NOT EXISTS idx_placex_geometry_reverse_lookupPlaceNode
ON placex USING gist (ST_Buffer(geometry, reverse_place_diameter(rank_search)))
WHERE rank_address between 4 and 25 AND type != 'postcode'
AND name is not null AND linked_place_id is null AND osm_type = 'N';
```
## 4.0.0 -> 4.1.0
### ICU tokenizer is the new default
Nominatim now installs the [ICU tokenizer](../customize/Tokenizers.md#icu-tokenizer)
by default. This only has an effect on newly installed databases. When
updating older databases, it keeps its installed tokenizer. If you still
run with the legacy tokenizer, make sure to compile Nominatim with the
PostgreSQL module, see [Installation](Installation.md#building-nominatim).
### geocodejson output changed
The `type` field of the geocodejson output has changed. It now contains
the address class of the object instead of the value of the OSM tag. If
your client has used the `type` field, switch them to read `osm_value`
instead.
## 3.7.0 -> 4.0.0
### NOMINATIM_PHRASE_CONFIG removed
Custom blacklist configurations for special phrases now need to be handed
with the `--config` parameter to `nominatim special-phrases`. Alternatively
you can put your custom configuration in the project directory in a file
named `phrase-settings.json`.
Version 3.8 also removes the automatic converter for the php format of
the configuration in older versions. If you are updating from Nominatim < 3.7
and still work with a custom `phrase-settings.php`, you need to manually
convert it into a json format.
### PHP utils removed
The old PHP utils have now been removed completely. You need to switch to
the appropriate functions of the nominatim command line tool. See
[Introducing `nominatim` command line tool](#introducing-nominatim-command-line-tool)
below.
## 3.6.0 -> 3.7.0
### New format and name of configuration file
The configuration for an import is now saved in a `.env` file in the project
directory. This file follows the dotenv format. For more information, see
the [installation chapter](Import.md#configuration-setup-in-env).
To migrate to the new system, create a new project directory, add the `.env`
file and port your custom configuration from `settings/local.php`. Most
settings are named similar and only have received a `NOMINATIM_` prefix.
Use the default settings in `settings/env.defaults` as a reference.
### New location for data files
External data files for Wikipedia importance, postcodes etc. are no longer
expected to reside in the source tree by default. Instead they will be searched
in the project directory. If you have an automated setup script you must
either adapt the download location or explicitly set the location of the
files to the old place in your `.env`.
### Introducing `nominatim` command line tool
The various php utilities have been replaced with a single `nominatim`
command line tool. Make sure to adapt any scripts. There is no direct 1:1
matching between the old utilities and the commands of nominatim CLI. The
following list gives you a list of nominatim sub-commands that contain
functionality of each script:
* ./utils/setup.php: `import`, `freeze`, `refresh`
* ./utils/update.php: `replication`, `add-data`, `index`, `refresh`
* ./utils/specialphrases.php: `special-phrases`
* ./utils/check_import_finished.php: `admin`
* ./utils/warm.php: `admin`
* ./utils/export.php: `export`
Try `nominatim <command> --help` for more information about each subcommand.
`./utils/query.php` no longer exists in its old form. `nominatim search`
provides a replacement but returns different output.
### Switch to normalized house numbers
The housenumber column in the placex table uses now normalized version.
The automatic migration step will convert the column but this may take a
very long time. It is advisable to take the machine offline while doing that.
## 3.5.0 -> 3.6.0
### Change of layout of search_name_* tables
The table need a different index for nearest place lookup. Recreate the
indexes using the following shell script:
```bash
for table in `psql -d nominatim -c "SELECT tablename FROM pg_tables WHERE tablename LIKE 'search_name_%'" -tA | grep -v search_name_blank`;
do
psql -d nominatim -c "DROP INDEX idx_${table}_centroid_place; CREATE INDEX idx_${table}_centroid_place ON ${table} USING gist (centroid) WHERE ((address_rank >= 2) AND (address_rank <= 25)); DROP INDEX idx_${table}_centroid_street; CREATE INDEX idx_${table}_centroid_street ON ${table} USING gist (centroid) WHERE ((address_rank >= 26) AND (address_rank <= 27))";
done
```
### Removal of html output
The debugging UI is no longer directly provided with Nominatim. Instead we
now provide a simple Javascript application. Please refer to
[Setting up the Nominatim UI](Setup-Nominatim-UI.md) for details on how to
set up the UI.
The icons served together with the API responses have been moved to the
nominatim-ui project as well. If you want to keep the `icon` field in the
response, you need to set `CONST_MapIcon_URL` to the URL of the `/mapicon`
directory of nominatim-ui.
### Change order during indexing
When reindexing places during updates, there is now a different order used
which needs a different database index. Create it with the following SQL command:
```sql
CREATE INDEX idx_placex_pendingsector_rank_address
ON placex
USING BTREE (rank_address, geometry_sector)
WHERE indexed_status > 0;
```
You can then drop the old index with:
```sql
DROP INDEX idx_placex_pendingsector;
```
### Unused index
This index has been unused ever since the query using it was changed two years ago. Saves about 12GB on a planet installation.
```sql
DROP INDEX idx_placex_geometry_reverse_lookupPoint;
```
### Switching to dotenv
As part of the work changing the configuration format, the configuration for
the website is now using a separate configuration file. To create the
configuration file, run the following command after updating:
```sh
./utils/setup.php --setup-website
```
### Update SQL code
To update the SQL code to the leatest version run:
```
./utils/setup.php --create-functions --enable-diff-updates --create-partition-functions
```
## 3.4.0 -> 3.5.0
### New Wikipedia/Wikidata importance tables
The `wikipedia_*` tables have a new format that also includes references to
Wikidata. You need to update the computation functions and the tables as
follows:
* download the new Wikipedia tables as described in the import section
* reimport the tables: `./utils/setup.php --import-wikipedia-articles`
* update the functions: `./utils/setup.php --create-functions --enable-diff-updates`
* create a new lookup index:
```sql
CREATE INDEX idx_placex_wikidata
ON placex
USING BTREE ((extratags -> 'wikidata'))
WHERE extratags ? 'wikidata'
AND class = 'place'
AND osm_type = 'N'
AND rank_search < 26;
```
* compute importance: `./utils/update.php --recompute-importance`
The last step takes about 10 hours on the full planet.
Remove one function (it will be recreated in the next step):
```sql
DROP FUNCTION create_country(hstore,character varying);
```
Finally, update all SQL functions:
```sh
./utils/setup.php --create-functions --enable-diff-updates --create-partition-functions
```
SQL statements should be executed from the PostgreSQL commandline. Execute
`psql nominatim` to enter command line mode.
## 3.3.0 -> 3.4.0
@@ -309,12 +23,6 @@ CREATE INDEX idx_location_area_country_geometry ON location_area_country USING G
CREATE INDEX idx_location_area_country_place_id ON location_area_country USING BTREE (place_id);
```
Finally, update all SQL functions:
```sh
./utils/setup.php --create-functions --enable-diff-updates --create-partition-functions
```
## 3.2.0 -> 3.3.0
### New database connection string (DSN) format
@@ -331,7 +39,7 @@ The new format is
### Natural Earth country boundaries no longer needed as fallback
```sql
```
DROP TABLE country_naturalearthdata;
```
@@ -357,37 +65,27 @@ following command:
The reverse algorithm has changed and requires new indexes. Run the following
SQL statements to create the indexes:
```sql
```
CREATE INDEX idx_placex_geometry_reverse_lookupPoint
ON placex
USING gist (geometry)
WHERE (name IS NOT null or housenumber IS NOT null or rank_address BETWEEN 26 AND 27)
AND class NOT IN ('railway','tunnel','bridge','man_made')
AND rank_address >= 26
AND indexed_status = 0
AND linked_place_id IS null;
ON placex USING gist (geometry)
WHERE (name is not null or housenumber is not null or rank_address between 26 and 27)
AND class not in ('railway','tunnel','bridge','man_made')
AND rank_address >= 26 AND indexed_status = 0 AND linked_place_id is null;
CREATE INDEX idx_placex_geometry_reverse_lookupPolygon
ON placex USING gist (geometry)
WHERE St_GeometryType(geometry) in ('ST_Polygon', 'ST_MultiPolygon')
AND rank_address between 4 and 25
AND type != 'postcode'
AND name is not null
AND indexed_status = 0
AND linked_place_id is null;
AND rank_address between 4 and 25 AND type != 'postcode'
AND name is not null AND indexed_status = 0 AND linked_place_id is null;
CREATE INDEX idx_placex_geometry_reverse_placeNode
ON placex USING gist (geometry)
WHERE osm_type = 'N'
AND rank_search between 5 and 25
AND class = 'place'
AND type != 'postcode'
AND name is not null
AND indexed_status = 0
AND linked_place_id is null;
WHERE osm_type = 'N' AND rank_search between 5 and 25
AND class = 'place' AND type != 'postcode'
AND name is not null AND indexed_status = 0 AND linked_place_id is null;
```
You also need to grant the website user access to the `country_osm_grid` table:
```sql
```
GRANT SELECT ON table country_osm_grid to "www-user";
```
@@ -395,7 +93,7 @@ Replace the `www-user` with the user name of your website server if necessary.
You can now drop the unused indexes:
```sql
```
DROP INDEX idx_placex_reverse_geometry;
```
@@ -424,8 +122,8 @@ CREATE INDEX idx_postcode_geometry ON location_postcode USING GIST (geometry);
CREATE UNIQUE INDEX idx_postcode_id ON location_postcode USING BTREE (place_id);
CREATE INDEX idx_postcode_postcode ON location_postcode USING BTREE (postcode);
GRANT SELECT ON location_postcode TO "www-data";
DROP TYPE IF EXISTS nearfeaturecentr CASCADE;
CREATE TYPE nearfeaturecentr AS (
drop type if exists nearfeaturecentr cascade;
create type nearfeaturecentr as (
place_id BIGINT,
keywords int[],
rank_address smallint,

View File

@@ -1,177 +0,0 @@
# Setting up the Nominatim UI
Nominatim is a search API, it does not provide a website interface on its
own. [nominatim-ui](https://github.com/osm-search/nominatim-ui) offers a
small website for testing your setup and inspecting the database content.
This section provides a quick start how to use nominatim-ui with your
installation. For more details, please also have a look at the
[README of nominatim-ui](https://github.com/osm-search/nominatim-ui/blob/master/README.md).
## Installing nominatim-ui
We provide regular releases of nominatim-ui that contain the packaged website.
They do not need any special installation. Just download, configure
and run it. Grab the latest release from
[nominatim-ui's Github release page](https://github.com/osm-search/nominatim-ui/releases)
and unpack it. You can use `nominatim-ui-x.x.x.tar.gz` or `nominatim-ui-x.x.x.zip`.
Next you need to adapt the UI to your installation. Custom settings need to be
put into `dist/theme/config.theme.js`. At a minimum you need to
set `Nominatim_API_Endpoint` to point to your Nominatim installation:
cd nominatim-ui
echo "Nominatim_Config.Nominatim_API_Endpoint='https://myserver.org/nominatim/';" > dist/theme/config.theme.js
For the full set of available settings, have a look at `dist/config.defaults.js`.
Then you can just test it locally by spinning up a webserver in the `dist`
directory. For example, with Python:
cd nominatim-ui/dist
python3 -m http.server 8765
The website is now available at `http://localhost:8765`.
## Forwarding searches to nominatim-ui
Nominatim used to provide the search interface directly by itself when
`format=html` was requested. For all endpoints except for `/reverse` and
`/lookup` this even used to be the default.
The following section describes how to set up Apache or nginx, so that your
users are forwarded to nominatim-ui when they go to URL that formerly presented
the UI.
### Setting up forwarding in Nginx
First of all make nominatim-ui available under `/ui` on your webserver:
``` nginx
server {
# Here is the Nominatim setup as described in the Installation section
location /ui/ {
alias <full path to the nominatim-ui directory>/dist/;
index index.html;
}
}
```
Now we need to find out if a URL should be forwarded to the UI. Add the
following `map` commands *outside* the server section:
``` nginx
# Inspect the format parameter in the query arguments. We are interested
# if it is set to html or something else or if it is missing completely.
map $args $format {
default default;
~(^|&)format=html(&|$) html;
~(^|&)format= other;
}
# Determine from the URI and the format parameter above if forwarding is needed.
map $uri/$format $forward_to_ui {
default 1; # The default is to forward.
~^/ui 0; # If the URI point to the UI already, we are done.
~/other$ 0; # An explicit non-html format parameter. No forwarding.
~/reverse.*/default 0; # Reverse and lookup assume xml format when
~/lookup.*/default 0; # no format parameter is given. No forwarding.
}
```
The `$forward_to_ui` parameter can now be used to conditionally forward the
calls:
```
# When no endpoint is given, default to search.
# Need to add a rewrite so that the rewrite rules below catch it correctly.
rewrite ^/$ /search;
location @php {
# fastcgi stuff..
if ($forward_to_ui) {
rewrite ^(/[^/]*) https://yourserver.com/ui$1.html redirect;
}
}
location ~ [^/]\.php(/|$) {
# fastcgi stuff..
if ($forward_to_ui) {
rewrite (.*).php https://yourserver.com/ui$1.html redirect;
}
}
```
!!! warning
Be aware that the rewrite commands are slightly different for URIs with and
without the .php suffix.
Reload nginx and the UI should be available.
### Setting up forwarding in Apache
First of all make nominatim-ui available in the `ui/` subdirectory where
Nominatim is installed. For example, given you have set up an alias under
`nominatim` like this:
``` apache
Alias /nominatim /home/vagrant/build/website
```
you need to insert the following rules for nominatim-ui before that alias:
```
<Directory "/home/vagrant/nominatim-ui/dist">
DirectoryIndex search.html
Require all granted
</Directory>
Alias /nominatim/ui /home/vagrant/nominatim-ui/dist
```
Replace `/home/vagrant/nominatim-ui` with the directory where you have cloned
nominatim-ui.
!!! important
The alias for nominatim-ui must come before the alias for the Nominatim
website directory.
To set up forwarding, the Apache rewrite module is needed. Enable it with:
``` sh
sudo a2enmod rewrite
```
Then add rewrite rules to the `Directory` directive of the Nominatim website
directory like this:
``` apache
<Directory "/home/vagrant/build/website">
Options FollowSymLinks MultiViews
AddType text/html .php
Require all granted
RewriteEngine On
# This must correspond to the URL where nominatim can be found.
RewriteBase "/nominatim/"
# If no endpoint is given, then use search.
RewriteRule ^(/|$) "search.php"
# If format-html is explicitly requested, forward to the UI.
RewriteCond %{QUERY_STRING} "format=html"
RewriteRule ^([^/]+)(.php)? ui/$1.html [R,END]
# If no format parameter is there then forward anything
# but /reverse and /lookup to the UI.
RewriteCond %{QUERY_STRING} "!format="
RewriteCond %{REQUEST_URI} "!/lookup"
RewriteCond %{REQUEST_URI} "!/reverse"
RewriteRule ^([^/]+)(.php)? ui/$1.html [R,END]
</Directory>
```
Restart Apache and the UI should be available.

View File

@@ -1,202 +0,0 @@
# Updating the Database
There are many different ways to update your Nominatim database.
The following section describes how to keep it up-to-date using
an [online replication service for OpenStreetMap data](https://wiki.openstreetmap.org/wiki/Planet.osm/diffs)
For a list of other methods to add or update data see the output of
`nominatim add-data --help`.
!!! important
If you have configured a flatnode file for the import, then you
need to keep this flatnode file around for updates.
### Installing the newest version of Pyosmium
The replication process uses
[Pyosmium](https://docs.osmcode.org/pyosmium/latest/updating_osm_data.html)
to download update data from the server.
It is recommended to install Pyosmium via pip.
Run (as the same user who will later run the updates):
```sh
pip3 install --user osmium
```
### Setting up the update process
Next the update process needs to be initialised. By default Nominatim is configured
to update using the global minutely diffs.
If you want a different update source you will need to add some settings
to `.env`. For example, to use the daily country extracts
diffs for Ireland from Geofabrik add the following:
# base URL of the replication service
NOMINATIM_REPLICATION_URL="https://download.geofabrik.de/europe/ireland-and-northern-ireland-updates"
# How often upstream publishes diffs (in seconds)
NOMINATIM_REPLICATION_UPDATE_INTERVAL=86400
# How long to sleep if no update found yet (in seconds)
NOMINATIM_REPLICATION_RECHECK_INTERVAL=900
To set up the update process now run the following command:
nominatim replication --init
It outputs the date where updates will start. Recheck that this date is
what you expect.
The `replication --init` command needs to be rerun whenever the replication
service is changed.
### Updating Nominatim
Nominatim supports different modes how to retrieve the update data from the
server. Which one you want to use depends on your exact setup and how often you
want to retrieve updates.
These instructions are for using a single source of updates. If you have
imported multiple country extracts and want to keep them
up-to-date, [Advanced installations section](Advanced-Installations.md)
contains instructions to set up and update multiple country extracts.
#### One-time mode
When the `--once` parameter is given, then Nominatim will download exactly one
batch of updates and then exit. This one-time mode still respects the
`NOMINATIM_REPLICATION_UPDATE_INTERVAL` that you have set. If according to
the update interval no new data has been published yet, it will go to sleep
until the next expected update and only then attempt to download the next batch.
The one-time mode is particularly useful if you want to run updates continuously
but need to schedule other work in between updates. For example, you might
want to regularly recompute postcodes -- a process that
must not be run while updates are in progress. An update script refreshing
postcodes regularly might look like this:
```sh
#!/bin/bash
# Switch to your project directory.
cd /srv/nominatim
while true; do
nominatim replication --once
if [ -f "/srv/nominatim/schedule-maintenance" ]; then
rm /srv/nominatim/schedule-maintenance
nominatim refresh --postcodes
fi
done
```
A cron job then creates the file `/srv/nominatim/schedule-maintenance` once per night.
##### One-time mode with systemd
You can run the one-time mode with a systemd timer & service.
Create a timer description like `/etc/systemd/system/nominatim-updates.timer`:
```
[Unit]
Description=Timer to start updates of Nominatim
[Timer]
OnActiveSec=2
OnUnitActiveSec=1min
Unit=nominatim-updates.service
[Install]
WantedBy=multi-user.target
```
`OnUnitActiveSec` defines how often the individual update command is run.
Then add a service definition for the timer in `/etc/systemd/system/nominatim-updates.service`:
```
[Unit]
Description=Single updates of Nominatim
[Service]
WorkingDirectory=/srv/nominatim-project
ExecStart=/srv/nominatim-venv/bin/nominatim replication --once
StandardOutput=journald
StandardError=inherit
User=nominatim
Group=nominatim
Type=simple
[Install]
WantedBy=multi-user.target
```
Replace the `WorkingDirectory` with your project directory. `ExecStart` points
to the nominatim binary that was installed in your virtualenv earlier.
Finally, you might need to adapt user and group names as required.
Now activate the service and start the updates:
```
sudo systemctl daemon-reload
sudo systemctl enable nominatim-updates.timer
sudo systemctl start nominatim-updates.timer
```
You can stop future data updates while allowing any current, in-progress
update steps to finish, by running `sudo systemctl stop
nominatim-updates.timer` and waiting until `nominatim-updates.service` isn't
running (`sudo systemctl is-active nominatim-updates.service`).
To check the output from the update process, use journalctl: `journalctl -u
nominatim-updates.service`
#### Catch-up mode
With the `--catch-up` parameter, Nominatim will immediately try to download
all changes from the server until the database is up-to-date. The catch-up mode
still respects the parameter `NOMINATIM_REPLICATION_MAX_DIFF`. It downloads and
applies the changes in appropriate batches until all is done.
The catch-up mode is foremost useful to bring the database up to date after the
initial import. Give that the service usually is not in production at this
point, you can temporarily be a bit more generous with the batch size and
number of threads you use for the updates by running catch-up like this:
```
cd /srv/nominatim-project
NOMINATIM_REPLICATION_MAX_DIFF=5000 nominatim replication --catch-up --threads 15
```
The catch-up mode is also useful when you want to apply updates at a lower
frequency than what the source publishes. You can set up a cron job to run
replication catch-up at whatever interval you desire.
!!! hint
When running scheduled updates with catch-up, it is a good idea to choose
a replication source with an update frequency that is an order of magnitude
lower. For example, if you want to update once a day, use an hourly updated
source. This ensures that you don't miss an entire day of updates when
the source is unexpectedly late to publish its update.
If you want to use the source with the same update frequency (e.g. a daily
updated source with daily updates), use the
once mode together with a frequently run systemd script as described above.
It ensures to re-request the newest update until they have been published.
#### Continuous updates
!!! danger
This mode is no longer recommended to use and will removed in future
releases. systemd is much better
suited for running regular updates. Please refer to the setup
instructions for running one-time mode with systemd above.
This is the easiest mode. Simply run the replication command without any
parameters:
nominatim replication
The update application keeps running forever and retrieves and applies
new updates from the server as they are published.

View File

@@ -1,26 +1,19 @@
# Place details
Show all details about a single place saved in the database.
Lookup details about a single place by id. The default output is HTML for debugging search logic and results.
This API endpoint is meant for visual inspection of the data in the database,
mainly together with [Nominatim-UI](https://github.com/osm-search/nominatim-ui/).
The parameters of the endpoint and the output may change occasionally between
versions of Nominatim. Do not rely on the output in scripts or applications.
!!! warning
The details endpoint at https://nominatim.openstreetmap.org
may not used in scripts or bots at all.
See [Nominatim Usage Policy](https://operations.osmfoundation.org/policies/nominatim/).
**The details page (including JSON output) exists for debugging only and must not be downloaded automatically**, see [Nominatim Usage Policy](https://operations.osmfoundation.org/policies/nominatim/).
## Parameters
The details API supports the following two request formats:
``` xml
https://nominatim.openstreetmap.org/details?osmtype=[N|W|R]&osmid=<value>&class=<value>
```
https://nominatim.openstreetmap.org/details?osmtype=[N|W|R]&osmid=<value>&class=<value>
```
`osmtype` and `osmid` are required parameters. The type is one of node (N), way (W)
`osmtype` and `osmid` are required parameter. The type is one of node (N), way (W)
or relation (R). The id must be a number. The `class` parameter is optional and
allows to distinguish between entries, when the corresponding OSM object has more
than one main tag. For example, when a place is tagged with `tourism=hotel` and
@@ -30,97 +23,76 @@ to get exactly the one you want. If there are multiple places in the database
but the `class` parameter is left out, then one of the places will be chosen
at random and displayed.
``` xml
https://nominatim.openstreetmap.org/details?place_id=<value>
```
https://nominatim.openstreetmap.org/details?place_id=<value>
```
Place IDs are assigned sequentially during Nominatim data import. The ID
for a place is different between Nominatim installation (servers) and
changes when data gets reimported. Therefore it cannot be used as
a permanent id and shouldn't be used in bug reports.
!!! danger "Deprecation warning"
The API can also be used with the URL
`https://nominatim.openstreetmap.org/details.php`. This is now deprecated
and will be removed in future versions.
Placeids are assigned sequentially during Nominatim data import. The id for a place is different between Nominatim installation (servers) and changes when data gets reimported. Therefore it can't be used as permanent id and shouldn't be used in bug reports.
## Parameters
This section lists additional optional parameters.
Additional optional parameters are explained below.
### Output format
| Parameter | Value | Default |
|-----------| ----- | ------- |
| json_callback | function name | _unset_ |
* `format=[html|json]`
See [Place Output Formats](Output.md) for details on each format. (Default: html)
* `json_callback=<string>`
Wrap JSON output in a callback function (JSONP) i.e. `<string>(<json>)`.
Only has an effect for JSON output formats.
* `pretty=[0|1]`
For JSON output will add indentation to make it more human-readable. (Default: 0)
When set, then JSON output will be wrapped in a callback function with
the given name. See [JSONP](https://en.wikipedia.org/wiki/JSONP) for more
information.
### Output details
| Parameter | Value | Default |
|-----------| ----- | ------- |
| addressdetails | 0 or 1 | 0 |
* `addressdetails=[0|1]`
When set to 1, include a breakdown of the address into elements.
Include a breakdown of the address into elements. (Default for JSON: 0, for HTML: 1)
| Parameter | Value | Default |
|-----------| ----- | ------- |
| keywords | 0 or 1 | 0 |
* `keywords=[0|1]`
When set to 1, include a list of name keywords and address keywords
in the result.
Include a list of name keywords and address keywords (word ids). (Default: 0)
| Parameter | Value | Default |
|-----------| ----- | ------- |
| linkedplaces | 0 or 1 | 1 |
* `linkedplaces=[0|1]`
Include details of places that are linked with this one. Places get linked
together when they are different forms of the same physical object. Nominatim
links two kinds of objects together: place nodes get linked with the
corresponding administrative boundaries. Waterway relations get linked together with their
members.
Include details of places higher in the address hierarchy. E.g. for a street this is usually the city, state, postal code, country. (Default: 1)
| Parameter | Value | Default |
|-----------| ----- | ------- |
| hierarchy | 0 or 1 | 0 |
* `hierarchy=[0|1]`
Include details of POIs and address that depend on the place. Only POIs
that use this place to determine their address will be returned.
Include details of places lower in the address hierarchy. E.g. for a city this usually a list of streets, suburbs, rivers. (Default for JSON: 0, for HTML: 1)
| Parameter | Value | Default |
|-----------| ----- | ------- |
| group_hierarchy | 0 or 1 | 0 |
* `group_hierarchy=[0|1]`
When set to 1, the output of the address hierarchy will be
grouped by type.
For JSON output will group the places by type. (Default: 0)
| Parameter | Value | Default |
|-----------| ----- | ------- |
| polygon_geojson | 0 or 1 | 0 |
* `polygon_geojson=[0|1]`
Include geometry of result.
Include geometry of result. (Default for JSON: 0, for HTML: 1)
### Language of results
| Parameter | Value | Default |
|-----------| ----- | ------- |
| accept-language | browser language string | content of "Accept-Language" HTTP header |
* `accept-language=<browser language string>`
Preferred language order for showing search results. This may either be
a simple comma-separated list of language codes or have the same format
as the ["Accept-Language" HTTP header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language).
Preferred language order for showing result, overrides the value
specified in the "Accept-Language" HTTP header.
Either use a standard RFC2616 accept-language string or a simple
comma-separated list of language codes.
## Examples
##### HTML
[https://nominatim.openstreetmap.org/details.php?osmtype=W&osmid=38210407](https://nominatim.openstreetmap.org/details.php?osmtype=W&osmid=38210407)
##### JSON
[https://nominatim.openstreetmap.org/details?osmtype=W&osmid=38210407&format=json](https://nominatim.openstreetmap.org/details?osmtype=W&osmid=38210407&format=json)
[https://nominatim.openstreetmap.org/details.php?osmtype=W&osmid=38210407&format=json](https://nominatim.openstreetmap.org/details.php?osmtype=W&osmid=38210407&format=json)
```json

View File

@@ -35,7 +35,7 @@ it contains the county/state/country across the border.
#### 3. I get different counties/states/countries when I change the zoom parameter in the reverse query. How is that possible?
This is basically the same problem as in the previous answer.
The zoom level influences at which [search rank](../customize/Ranking.md#search-rank) Nominatim starts looking
The zoom level influences at which [search rank](https://wiki.openstreetmap.org/wiki/Nominatim/Development_overview#Country_to_street_level) Nominatim starts looking
for the closest object. So the closest house number maybe on one side of the
border while the closest street is on the other. As the address details contain
the address of the closest object found, you might sometimes get one result,
@@ -58,28 +58,4 @@ The [Overpass API](https://wiki.openstreetmap.org/wiki/Overpass_API) is more
suited for these kinds of queries.
That said if you installed your own Nominatim instance you can use the
`nominatim export` PHP script as basis to return such lists.
#### 7. My result has a wrong postcode. Where does it come from?
Most places in OSM don't have a postcode, so Nominatim tries to interpolate
one. It first look at all the places that make up the address of the place.
If one of them has a postcode defined, this is the one to be used. When
none of the address parts has a postcode either, Nominatim interpolates one
from the surrounding objects. If the postcode is for your result is one, then
most of the time there is an OSM object with the wrong postcode nearby.
To find the bad postcode, go to
[https://nominatim.openstreetmap.org](https://nominatim.openstreetmap.org)
and search for your place. When you have found it, click on the 'details' link
under the result to go to the details page. There is a field 'Computed Postcode'
which should display the bad postcode. Click on the 'how?' link. A small
explanation text appears. It contains a link to a query for Overpass Turbo.
Click on that and you get a map with all places in the area that have the bad
postcode. If none is displayed, zoom the map out a bit and then click on 'Run'.
Now go to [OpenStreetMap](https://openstreetmap.org) and fix the error you
have just found. It will take at least a day for Nominatim to catch up with
your data fix. Sometimes longer, depending on how much editing activity is in
the area.
`/utils/export.php` PHP script as basis to return such lists.

View File

@@ -3,7 +3,7 @@
The lookup API allows to query the address and other details of one or
multiple OSM objects like node, way or relation.
## Endpoint
## Parameters
The lookup API has the following format:
@@ -15,140 +15,71 @@ The lookup API has the following format:
prefixed with its type, one of node(N), way(W) or relation(R). Up to 50 ids
can be queried at the same time.
!!! danger "Deprecation warning"
The API can also be used with the URL
`https://nominatim.openstreetmap.org/lookup.php`. This is now deprecated
and will be removed in future versions.
## Parameters
This section lists additional optional parameters.
Additional optional parameters are explained below.
### Output format
| Parameter | Value | Default |
|-----------| ----- | ------- |
| format | one of: `xml`, `json`, `jsonv2`, `geojson`, `geocodejson` | `jsonv2` |
* `format=[xml|json|jsonv2|geojson|geocodejson]`
See [Place Output Formats](Output.md) for details on each format.
See [Place Output Formats](Output.md) for details on each format. (Default: xml)
* `json_callback=<string>`
| Parameter | Value | Default |
|-----------| ----- | ------- |
| json_callback | function name | _unset_ |
When given, then JSON output will be wrapped in a callback function with
the given name. See [JSONP](https://en.wikipedia.org/wiki/JSONP) for more
information.
Wrap JSON output in a callback function (JSONP) i.e. `<string>(<json>)`.
Only has an effect for JSON output formats.
### Output details
| Parameter | Value | Default |
|-----------| ----- | ------- |
| addressdetails | 0 or 1 | 0 |
* `addressdetails=[0|1]`
When set to 1, include a breakdown of the address into elements.
The exact content of the address breakdown depends on the output format.
!!! tip
If you are interested in a stable classification of address categories
(suburb, city, state, etc), have a look at the `geocodejson` format.
All other formats return classifications according to OSM tagging.
There is a much larger set of categories and they are not always consistent,
which makes them very hard to work with.
Include a breakdown of the address into elements. (Default: 0)
| Parameter | Value | Default |
|-----------| ----- | ------- |
| extratags | 0 or 1 | 0 |
* `extratags=[0|1]`
When set to 1, the response include any additional information in the result
that is available in the database, e.g. wikipedia link, opening hours.
Include additional information in the result if available,
e.g. wikipedia link, opening hours. (Default: 0)
| Parameter | Value | Default |
|-----------| ----- | ------- |
| namedetails | 0 or 1 | 0 |
* `namedetails=[0|1]`
When set to 1, include a full list of names for the result. These may include
language variants, older names, references and brand.
Include a list of alternative names in the results. These may include
language variants, references, operator and brand. (Default: 0)
### Language of results
| Parameter | Value | Default |
|-----------| ----- | ------- |
| accept-language | browser language string | content of "Accept-Language" HTTP header |
* `accept-language=<browser language string>`
Preferred language order for showing search results. This may either be
a simple comma-separated list of language codes or have the same format
as the ["Accept-Language" HTTP header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language).
!!! tip
First-time users of Nominatim tend to be confused that they get different
results when using Nominatim in the browser versus in a command-line tool
like wget or curl. The command-line tools
usually don't send any Accept-Language header, prompting Nominatim
to show results in the local language. Browsers on the contrary always
send the currently chosen browser language.
### Polygon output
| Parameter | Value | Default |
|-----------| ----- | ------- |
| polygon_geojson | 0 or 1 | 0 |
| polygon_kml | 0 or 1 | 0 |
| polygon_svg | 0 or 1 | 0 |
| polygon_text | 0 or 1 | 0 |
Add the full geometry of the place to the result output. Output formats
in GeoJSON, KML, SVG or WKT are supported. Only one of these
options can be used at a time.
| Parameter | Value | Default |
|-----------| ----- | ------- |
| polygon_threshold | floating-point number | 0.0 |
When one of the polygon_* outputs is chosen, return a simplified version
of the output geometry. The parameter describes the
tolerance in degrees with which the geometry may differ from the original
geometry. Topology is preserved in the geometry.
Preferred language order for showing search results, overrides the value
specified in the "Accept-Language" HTTP header.
Either use a standard RFC2616 accept-language string or a simple
comma-separated list of language codes.
### Other
| Parameter | Value | Default |
|-----------| ----- | ------- |
| email | valid email address | _unset_ |
* `email=<valid email address>`
If you are making large numbers of request please include an appropriate email
address to identify your requests. See Nominatim's
[Usage Policy](https://operations.osmfoundation.org/policies/nominatim/) for more details.
address to identify your requests. See Nominatim's [Usage Policy](https://operations.osmfoundation.org/policies/nominatim/) for more details.
| Parameter | Value | Default |
|-----------| ----- | ------- |
| debug | 0 or 1 | 0 |
* `debug=[0|1]`
Output assorted developer debug information. Data on internals of Nominatim's
"search loop" logic, and SQL queries. The output is HTML format.
This overrides the specified machine readable format.
"Search Loop" logic, and SQL queries. The output is (rough) HTML format.
This overrides the specified machine readable format. (Default: 0)
## Examples
##### XML
[https://nominatim.openstreetmap.org/lookup?osm_ids=R146656,W104393803,N240109189](https://nominatim.openstreetmap.org/lookup?osm_ids=R146656,W50637691,N240109189)
[https://nominatim.openstreetmap.org/lookup?osm_ids=R146656,W104393803,N240109189](https://nominatim.openstreetmap.org/lookup?osm_ids=R146656,W104393803,N240109189)
```xml
<lookupresults timestamp="Mon, 28 Mar 22 14:38:54 +0000" attribution="Data &#xA9; OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright" querystring="R146656,W50637691,N240109189" more_url="">
<place place_id="282236157" osm_type="relation" osm_id="146656" place_rank="16" address_rank="16" boundingbox="53.3401044,53.5445923,-2.3199185,-2.1468288" lat="53.44246175" lon="-2.2324547359718547" display_name="Manchester, Greater Manchester, North West England, England, United Kingdom" class="boundary" type="administrative" importance="0.35">
<lookupresults timestamp="Mon, 29 Jun 15 18:01:33 +0000" attribution="Data © OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright" querystring="R146656,W104393803,N240109189" polygon="false">
<place place_id="127761056" osm_type="relation" osm_id="146656" place_rank="16" lat="53.4791466" lon="-2.2447445" display_name="Manchester, Greater Manchester, North West England, England, United Kingdom" class="boundary" type="administrative" importance="0.704893333438333">
<city>Manchester</city>
<county>Greater Manchester</county>
<state_district>North West England</state_district>
@@ -156,20 +87,21 @@ This overrides the specified machine readable format.
<country>United Kingdom</country>
<country_code>gb</country_code>
</place>
<place place_id="115462561" osm_type="way" osm_id="50637691" place_rank="30" address_rank="30" boundingbox="52.3994612,52.3996426,13.0479574,13.0481754" lat="52.399550700000006" lon="13.048066846939687" display_name="Brandenburger Tor, Brandenburger Stra&#xDF;e, Historische Innenstadt, Innenstadt, Potsdam, Brandenburg, 14467, Germany" class="tourism" type="attraction" importance="0.29402874005524">
<tourism>Brandenburger Tor</tourism>
<road>Brandenburger Stra&#xDF;e</road>
<suburb>Historische Innenstadt</suburb>
<city>Potsdam</city>
<state>Brandenburg</state>
<postcode>14467</postcode>
<place place_id="77769745" osm_type="way" osm_id="104393803" place_rank="30" lat="52.5162024" lon="13.3777343363579" display_name="Brandenburg Gate, 1, Pariser Platz, Mitte, Berlin, 10117, Germany" class="tourism" type="attraction" importance="0.443472858361592">
<attraction>Brandenburg Gate</attraction>
<house_number>1</house_number>
<pedestrian>Pariser Platz</pedestrian>
<suburb>Mitte</suburb>
<city_district>Mitte</city_district>
<city>Berlin</city>
<state>Berlin</state>
<postcode>10117</postcode>
<country>Germany</country>
<country_code>de</country_code>
</place>
<place place_id="567505" osm_type="node" osm_id="240109189" place_rank="15" address_rank="16" boundingbox="52.3586925,52.6786925,13.2396024,13.5596024" lat="52.5186925" lon="13.3996024" display_name="Berlin, 10178, Germany" class="place" type="city" importance="0.78753902824914">
<place place_id="2570600569" osm_type="node" osm_id="240109189" place_rank="15" lat="52.5170365" lon="13.3888599" display_name="Berlin, Germany" class="place" type="city" importance="0.822149797630868">
<city>Berlin</city>
<state>Berlin</state>
<postcode>10178</postcode>
<country>Germany</country>
<country_code>de</country_code>
</place>
@@ -178,50 +110,38 @@ This overrides the specified machine readable format.
##### JSON with extratags
[https://nominatim.openstreetmap.org/lookup?osm_ids=W50637691&format=json&extratags=1](https://nominatim.openstreetmap.org/lookup?osm_ids=W50637691&format=json&extratags=1)
[https://nominatim.openstreetmap.org/lookup?osm_ids=W50637691&format=json](https://nominatim.openstreetmap.org/lookup?osm_ids=W50637691&format=json)
```json
[
{
"place_id": 115462561,
"licence": "Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright",
"osm_type": "way",
"osm_id": 50637691,
"boundingbox": [
"52.3994612",
"52.3996426",
"13.0479574",
"13.0481754"
],
"lat": "52.399550700000006",
"lon": "13.048066846939687",
"display_name": "Brandenburger Tor, Brandenburger Straße, Historische Innenstadt, Innenstadt, Potsdam, Brandenburg, 14467, Germany",
"class": "tourism",
"type": "attraction",
"importance": 0.2940287400552381,
"address": {
"tourism": "Brandenburger Tor",
"road": "Brandenburger Straße",
"suburb": "Historische Innenstadt",
"city": "Potsdam",
"state": "Brandenburg",
"postcode": "14467",
"country": "Germany",
"country_code": "de"
},
"extratags": {
"image": "http://commons.wikimedia.org/wiki/File:Potsdam_brandenburger_tor.jpg",
"heritage": "4",
"wikidata": "Q695045",
"architect": "Carl von Gontard;Georg Christian Unger",
"wikipedia": "de:Brandenburger Tor (Potsdam)",
"wheelchair": "yes",
"description": "Kleines Brandenburger Tor in Potsdam",
"heritage:website": "http://www.bldam-brandenburg.de/images/stories/PDF/DML%202012/04-p-internet-13.pdf",
"heritage:operator": "bldam",
"architect:wikidata": "Q68768;Q95223",
"year_of_construction": "1771"
}
}
{
"place_id": "84271358",
"licence": "Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright",
"osm_type": "way",
"osm_id": "50637691",
"lat": "52.39955055",
"lon": "13.04806574678",
"display_name": "Brandenburger Tor, Brandenburger Straße, Nördliche Innenstadt, Innenstadt, Potsdam, Brandenburg, 14467, Germany",
"class": "historic",
"type": "city_gate",
"importance": "0.221233780277011",
"address": {
"address29": "Brandenburger Tor",
"pedestrian": "Brandenburger Straße",
"suburb": "Nördliche Innenstadt",
"city": "Potsdam",
"state": "Brandenburg",
"postcode": "14467",
"country": "Germany",
"country_code": "de"
},
"extratags": {
"image": "http://commons.wikimedia.org/wiki/File:Potsdam_brandenburger_tor.jpg",
"wikidata": "Q695045",
"wikipedia": "de:Brandenburger Tor (Potsdam)",
"wheelchair": "yes",
"description": "Kleines Brandenburger Tor in Potsdam"
}
}
]
```

View File

@@ -2,17 +2,19 @@
The [/reverse](Reverse.md), [/search](Search.md) and [/lookup](Lookup.md)
API calls produce very similar output which is explained in this section.
There is one section for each format. The format correspond to what was
selected via the `format` parameter.
There is one section for each format which is selectable via the `format`
parameter.
## JSON
## Formats
### JSON
The JSON format returns an array of places (for search and lookup) or
a single place (for reverse) of the following format:
```
{
"place_id": 100149,
"place_id": "100149",
"licence": "Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright",
"osm_type": "node",
"osm_id": "107775",
@@ -28,7 +30,6 @@ a single place (for reverse) of the following format:
"city": "London",
"state_district": "Greater London",
"state": "England",
"ISO3166-2-lvl4": "GB-ENG",
"postcode": "SW1A 2DU",
"country": "United Kingdom",
"country_code": "gb"
@@ -40,50 +41,48 @@ a single place (for reverse) of the following format:
"wikipedia": "en:London",
"population": "8416535"
}
}
},
```
The possible fields are:
* `place_id` - reference to the Nominatim internal database ID ([see notes](#place_id-is-not-a-persistent-id))
* `osm_type`, `osm_id` - reference to the OSM object ([see notes](#osm-reference))
* `boundingbox` - area of corner coordinates ([see notes](#boundingbox))
* `place_id` - reference to the Nominatim internal database ID (see notes below)
* `osm_type`, `osm_id` - reference to the OSM object
* `boundingbox` - area of corner coordinates
* `lat`, `lon` - latitude and longitude of the centroid of the object
* `display_name` - full comma-separated address
* `class`, `type` - key and value of the main OSM tag
* `importance` - computed importance rank
* `icon` - link to class icon (if available)
* `address` - dictionary of address details (only with `addressdetails=1`,
[see notes](#addressdetails))
* `address` - dictionary of address details (only with `addressdetails=1`)
* `extratags` - dictionary with additional useful tags like website or maxspeed
(only with `extratags=1`)
* `namedetails` - dictionary with full list of available names including ref etc.
* `geojson`, `svg`, `geotext`, `geokml` - full geometry
(only with the appropriate `polygon_*` parameter)
## JSONv2
### JSONv2
This is the same as the JSON format with two changes:
* `class` renamed to `category`
* additional field `place_rank` with the search rank of the object
## GeoJSON
### GeoJSON
This format follows the [RFC7946](https://geojson.org). Every feature includes
a bounding box (`bbox`).
The properties object has the following fields:
The feature list has the following fields:
* `place_id` - reference to the Nominatim internal database ID ([see notes](#place_id-is-not-a-persistent-id))
* `osm_type`, `osm_id` - reference to the OSM object ([see notes](#osm-reference))
* `place_id` - reference to the Nominatim internal database ID (see notes below)
* `osm_type`, `osm_id` - reference to the OSM object
* `category`, `type` - key and value of the main OSM tag
* `display_name` - full comma-separated address
* `place_rank` - class search rank
* `importance` - computed importance rank
* `icon` - link to class icon (if available)
* `address` - dictionary of address details (only with `addressdetails=1`,
[see notes](#addressdetails))
* `address` - dictionary of address details (only with `addressdetails=1`)
* `extratags` - dictionary with additional useful tags like `website` or `maxspeed`
(only with `extratags=1`)
* `namedetails` - dictionary with full list of available names including ref etc.
@@ -91,46 +90,45 @@ The properties object has the following fields:
Use `polygon_geojson` to output the full geometry of the object instead
of the centroid.
## GeocodeJSON
### GeocodeJSON
The GeocodeJSON format follows the
[GeocodeJSON spec 0.1.0](https://github.com/geocoders/geocodejson-spec).
The following feature attributes are implemented:
* `osm_type`, `osm_id` - reference to the OSM object (unofficial extension, [see notes](#osm-reference))
* `type` - the 'address level' of the object ('house', 'street', `district`, `city`,
`county`, `state`, `country`, `locality`)
* `osm_key`- key of the main tag of the OSM object (e.g. boundary, highway, amenity)
* `osm_value` - value of the main tag of the OSM object (e.g. residential, restaurant)
* `osm_type`, `osm_id` - reference to the OSM object (unofficial extension)
* `type` - value of the main tag of the object (e.g. residential, restaurant, ...)
* `label` - full comma-separated address
* `name` - localised name of the place
* `housenumber`, `street`, `locality`, `district`, `postcode`, `city`,
`county`, `state`, `country` -
* `housenumber`, `street`, `locality`, `postcode`, `city`,
`district`, `county`, `state`, `country` -
provided when it can be determined from the address
(see [this issue](https://github.com/openstreetmap/Nominatim/issues/1080) for
current limitations on the correctness of the address) and `addressdetails=1`
was given
* `admin` - list of localised names of administrative boundaries (only with `addressdetails=1`)
Use `polygon_geojson` to output the full geometry of the object instead
of the centroid.
## XML
### XML
The XML response returns one or more place objects in slightly different
formats depending on the API call.
### Reverse
#### Reverse
```
<reversegeocode timestamp="Sat, 11 Aug 18 11:53:21 +0000"
attribution="Data © OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright"
querystring="lat=48.400381&lon=11.745876&zoom=5&format=xml">
<result place_id="179509537" osm_type="relation" osm_id="2145268" ref="BY" place_rank="15" address_rank="15"
<result place_id="179509537" osm_type="relation" osm_id="2145268" ref="BY"
lat="48.9467562" lon="11.4038717"
boundingbox="47.2701114,50.5647142,8.9763497,13.8396373">
Bavaria, Germany
</result>
<addressparts>
<state>Bavaria</state>
<ISO3166-2-lvl4>DE-BY</ISO3166-2-lvl4>
<country>Germany</country>
<country_code>de</country_code>
</addressparts>
@@ -150,11 +148,11 @@ attribution to OSM and the original querystring.
The place information can be found in the `result` element. The attributes of that element contain:
* `place_id` - reference to the Nominatim internal database ID ([see notes](#place_id-is-not-a-persistent-id))
* `osm_type`, `osm_id` - reference to the OSM object ([see notes](#osm-reference))
* `place_id` - reference to the Nominatim internal database ID (see notes below)
* `osm_type`, `osm_id` - reference to the OSM object
* `ref` - content of `ref` tag if it exists
* `lat`, `lon` - latitude and longitude of the centroid of the object
* `boundingbox` - comma-separated list of corner coordinates ([see notes](#boundingbox))
* `boundingbox` - comma-separated list of corner coordinates
The full address of the result can be found in the content of the
`result` element as a comma-separated list.
@@ -162,14 +160,14 @@ The full address of the result can be found in the content of the
Additional information requested with `addressdetails=1`, `extratags=1` and
`namedetails=1` can be found in extra elements.
### Search and Lookup
#### Search and Lookup
```
<searchresults timestamp="Sat, 11 Aug 18 11:55:35 +0000"
attribution="Data © OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright"
querystring="london" polygon="false" exclude_place_ids="100149"
more_url="https://nominatim.openstreetmap.org/search?q=london&addressdetails=1&extratags=1&exclude_place_ids=100149&format=xml&accept-language=en-US%2Cen%3Bq%3D0.7%2Cde%3Bq%3D0.3">
<place place_id="100149" osm_type="node" osm_id="107775" place_rank="15" address_rank="15"
more_url="https://nominatim.openstreetmap.org/search.php?q=london&addressdetails=1&extratags=1&exclude_place_ids=100149&format=xml&accept-language=en-US%2Cen%3Bq%3D0.7%2Cde%3Bq%3D0.3">
<place place_id="100149" osm_type="node" osm_id="107775" place_rank="15"
boundingbox="51.3473219,51.6673219,-0.2876474,0.0323526" lat="51.5073219" lon="-0.1276474"
display_name="London, Greater London, England, SW1A 2DU, United Kingdom"
class="place" type="city" importance="0.9654895765402"
@@ -184,7 +182,6 @@ Additional information requested with `addressdetails=1`, `extratags=1` and
<city>London</city>
<state_district>Greater London</state_district>
<state>England</state>
<ISO3166-2-lvl4>GB-ENG</ISO3166-2-lvl4>
<postcode>SW1A 2DU</postcode>
<country>United Kingdom</country>
<country_code>gb</country_code>
@@ -206,13 +203,12 @@ generic information about the query:
The place information can be found in the `place` elements, of which there may
be more than one. The attributes of that element contain:
* `place_id` - reference to the Nominatim internal database ID ([see notes](#place_id-is-not-a-persistent-id))
* `osm_type`, `osm_id` - reference to the OSM object ([see notes](#osm-reference))
* `place_id` - reference to the Nominatim internal database ID (see notes below)
* `osm_type`, `osm_id` - reference to the OSM object
* `ref` - content of `ref` tag if it exists
* `lat`, `lon` - latitude and longitude of the centroid of the object
* `boundingbox` - comma-separated list of corner coordinates ([see notes](#boundingbox))
* `place_rank` - class [search rank](../customize/Ranking.md#search-rank)
* `address_rank` - place [address rank](../customize/Ranking.md#address-rank)
* `boundingbox` - comma-separated list of corner coordinates
* `place_rank` - class search rank
* `display_name` - full comma-separated address
* `class`, `type` - key and value of the main OSM tag
* `importance` - computed importance rank
@@ -222,81 +218,29 @@ When `addressdetails=1` is requested, the localised address parts appear
as subelements with the type of the address part.
Additional information requested with `extratags=1` and `namedetails=1` can
be found in extra elements as sub-element of `extratags` and `namedetails`
respectively.
be found in extra elements as sub-element of each place.
## Notes on field values
### place_id is not a persistent id
The `place_id` is an internal identifier that is assigned data is imported
into a Nominatim database. The same OSM object will have a different value
on another server. It may even change its ID on the same server when it is
removed and reimported while updating the database with fresh OSM data.
It is thus not useful to treat it as permanent for later use.
The `place_id` is created when a Nominatim database gets installed. A
single place will have a different value on another server or even when
the same data gets re-imported. It's thus not useful to treat it as
permanent for later use.
The combination `osm_type`+`osm_id` is slightly better but remember in
The combination `osm_type`+`osm_id` is slighly better but remember in
OpenStreetMap mappers can delete, split, recreate places (and those
get a new `osm_id`), there is no link between those old and new ids.
Places can also change their meaning without changing their `osm_id`,
e.g. when a restaurant is retagged as supermarket. For a more in-depth
discussion see [Permanent ID](https://wiki.openstreetmap.org/wiki/Permanent_ID).
If you need an ID that is consistent over multiple installations of Nominatim,
then you should use the combination of `osm_type`+`osm_id`+`class`.
### OSM reference
Nominatim may sometimes return special objects that do not correspond directly
to an object in OpenStreetMap. These are:
* **Postcodes**. Nominatim returns an postcode point created from all mapped
postcodes of the same name. The class and type of these object is `place=postcdode`.
No `osm_type` and `osm_id` are included in the result.
* **Housenumber interpolations**. Nominatim returns a single interpolated
housenumber from the interpolation way. The class and type are `place=house`
and `osm_type` and `osm_id` correspond to the interpolation way in OSM.
* **TIGER housenumber.** Nominatim returns a single interpolated housenumber
from the TIGER data. The class and type are `place=house`
and `osm_type` and `osm_id` correspond to the street mentioned in the result.
Please note that the `osm_type` and `osm_id` returned may be changed in the
future. You should not expect to only find `node`, `way` and `relation` for
the type.
Nominatim merges some places (e.g. center node of a city with the boundary
relation) so `osm_type`+`osm_id`+`class_name` would be more unique.
### boundingbox
Comma separated list of min latitude, max latitude, min longitude, max longitude.
The whole planet would be `-90,90,-180,180`.
Can be used to pan and center the map on the result, for example with leafletjs
mapping library
`map.fitBounds([[bbox[0],bbox[2]],[bbox[1],bbox[3]]], {padding: [20, 20], maxzoom: 16});`
Bounds crossing the antimeridian have a min latitude -180 and max latitude 180,
essentially covering the entire planet
(see [issue 184](https://github.com/openstreetmap/Nominatim/issues/184)).
### addressdetails
Address details in the xml and json formats return a list of names together
with a designation label. Per default the following labels may appear:
* continent
* country, country_code
* region, state, state_district, county, ISO3166-2-lvl<admin_level>
* municipality, city, town, village
* city_district, district, borough, suburb, subdivision
* hamlet, croft, isolated_dwelling
* neighbourhood, allotments, quarter
* city_block, residential, farm, farmyard, industrial, commercial, retail
* road
* house_number, house_name
* emergency, historic, military, natural, landuse, place, railway,
man_made, aerialway, boundary, amenity, aeroway, club, craft, leisure,
office, mountain_pass, shop, tourism, bridge, tunnel, waterway
* postcode
They roughly correspond to the classification of the OpenStreetMap data
according to either the `place` tag or the main key of the object.

View File

@@ -1,14 +1,14 @@
This section describes the API V1 of the Nominatim web service. The
service offers the following endpoints:
### Nominatim API
Nominatim indexes named (or numbered) features within the OpenStreetMap (OSM) dataset and a subset of other unnamed features (pubs, hotels, churches, etc).
Its API has the following endpoints for querying the data:
* __[/search](Search.md)__ - search OSM objects by name or type
* __[/reverse](Reverse.md)__ - search OSM object by their location
* __[/lookup](Lookup.md)__ - look up address details for OSM objects by their ID
* __[/status](Status.md)__ - query the status of the server
* __/status__ - query the status of the server
* __/deletable__ - list objects that have been deleted in OSM but are held
back in Nominatim in case the deletion was accidental
* __/polygons__ - list of broken polygons detected by Nominatim
* __[/details](Details.md)__ - show internal details for an object (for debugging only)

View File

@@ -1,132 +1,76 @@
# Reverse Geocoding
Reverse geocoding generates an address from a coordinate given as
latitude and longitude.
Reverse geocoding generates an address from a latitude and longitude or from
an OSM object.
## How it works
The reverse geocoding API does not exactly compute the address for the
coordinate it receives. It works by finding the closest suitable OSM object
and returning its address information. This may occasionally lead to
unexpected results.
First of all, Nominatim only includes OSM objects in
its index that are suitable for searching. Small, unnamed paths for example
are missing from the database and can therefore not be used for reverse
geocoding either.
The other issue to be aware of is that the closest OSM object may not always
have a similar enough address to the coordinate you were requesting. For
example, in dense city areas it may belong to a completely different street.
## Endpoint
## Parameters
The main format of the reverse API is
```
https://nominatim.openstreetmap.org/reverse?lat=<value>&lon=<value>&<params>
https://nominatim.openstreetmap.org/reverse?<query>
```
where `lat` and `lon` are latitude and longitude of a coordinate in WGS84
projection. The API returns exactly one result or an error when the coordinate
is in an area with no OSM data coverage.
There are two ways how the requested location can be specified:
* `lat=<value>` `lon=<value>`
!!! tip
The reverse API allows a lookup of object by coordinate. If you want
to look up an object by ID, use the [Address Lookup API](Lookup.md) instead.
A geographic location to generate an address for. The coordiantes must be
in WGS84 format.
!!! danger "Deprecation warning"
The API can also be used with the URL
`https://nominatim.openstreetmap.org/reverse.php`. This is now deprecated
and will be removed in future versions.
* `osm_type=[N|W|R]` `osm_id=<value>`
A specific OSM node(N), way(W) or relation(R) to return an address for.
## Parameters
This section lists additional parameters to further influence the output.
In both cases exactly one object is returned. The two input parameters cannot
be used at the same time. Both accept the additional optional parameters listed
below.
### Output format
| Parameter | Value | Default |
|-----------| ----- | ------- |
| format | one of: `xml`, `json`, `jsonv2`, `geojson`, `geocodejson` | `xml` |
* `format=[xml|json|jsonv2|geojson|geocodejson]`
See [Place Output Formats](Output.md) for details on each format.
See [Place Output Formats](Output.md) for details on each format. (Default: html)
* `json_callback=<string>`
| Parameter | Value | Default |
|-----------| ----- | ------- |
| json_callback | function name | _unset_ |
When given, then JSON output will be wrapped in a callback function with
the given name. See [JSONP](https://en.wikipedia.org/wiki/JSONP) for more
information.
Wrap JSON output in a callback function ([JSONP](https://en.wikipedia.org/wiki/JSONP)) i.e. `<string>(<json>)`.
Only has an effect for JSON output formats.
### Output details
| Parameter | Value | Default |
|-----------| ----- | ------- |
| addressdetails | 0 or 1 | 1 |
* `addressdetails=[0|1]`
When set to 1, include a breakdown of the address into elements.
The exact content of the address breakdown depends on the output format.
!!! tip
If you are interested in a stable classification of address categories
(suburb, city, state, etc), have a look at the `geocodejson` format.
All other formats return classifications according to OSM tagging.
There is a much larger set of categories and they are not always consistent,
which makes them very hard to work with.
Include a breakdown of the address into elements. (Default: 1)
| Parameter | Value | Default |
|-----------| ----- | ------- |
| extratags | 0 or 1 | 0 |
* `extratags=[0|1]`
When set to 1, the response include any additional information in the result
that is available in the database, e.g. wikipedia link, opening hours.
Include additional information in the result if available,
e.g. wikipedia link, opening hours. (Default: 0)
| Parameter | Value | Default |
|-----------| ----- | ------- |
| namedetails | 0 or 1 | 0 |
* `namedetails=[0|1]`
When set to 1, include a full list of names for the result. These may include
language variants, older names, references and brand.
Include a list of alternative names in the results. These may include
language variants, references, operator and brand. (Default: 0)
### Language of results
| Parameter | Value | Default |
|-----------| ----- | ------- |
| accept-language | browser language string | content of "Accept-Language" HTTP header |
* `accept-language=<browser language string>`
Preferred language order for showing search results. This may either be
a simple comma-separated list of language codes or have the same format
as the ["Accept-Language" HTTP header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language).
Preferred language order for showing search results, overrides the value
specified in the "Accept-Language" HTTP header.
Either use a standard RFC2616 accept-language string or a simple
comma-separated list of language codes.
!!! tip
First-time users of Nominatim tend to be confused that they get different
results when using Nominatim in the browser versus in a command-line tool
like wget or curl. The command-line tools
usually don't send any Accept-Language header, prompting Nominatim
to show results in the local language. Browsers on the contrary always
send the currently chosen browser language.
### Result limitation
* `zoom=[0-18]`
### Result restriction
| Parameter | Value | Default |
|-----------| ----- | ------- |
| zoom | 0-18 | 18 |
Level of detail required for the address. This is a number that
corresponds roughly to the zoom level used in XYZ tile sources in frameworks
like Leaflet.js, Openlayers etc.
Level of detail required for the address. Default: 18. This is a number that corresponds
roughly to the zoom level used in map frameworks like Leaflet.js, Openlayers etc.
In terms of address details the zoom levels are as follows:
zoom | address detail
@@ -135,79 +79,41 @@ In terms of address details the zoom levels are as follows:
5 | state
8 | county
10 | city
12 | town / borough
13 | village / suburb
14 | neighbourhood
15 | any settlement
14 | suburb
16 | major streets
17 | major and minor streets
18 | building
| Parameter | Value | Default |
|-----------| ----- | ------- |
| layer | comma-separated list of: `address`, `poi`, `railway`, `natural`, `manmade` | _unset_ (no restriction) |
The layer filter allows to select places by themes.
The `address` layer contains all places that make up an address:
address points with house numbers, streets, inhabited places (suburbs, villages,
cities, states etc.) and administrative boundaries.
The `poi` layer selects all point of interest. This includes classic points
of interest like restaurants, shops, hotels but also less obvious features
like recycling bins, guideposts or benches.
The `railway` layer includes railway infrastructure like tracks.
Note that in Nominatim's standard configuration, only very few railway
features are imported into the database.
The `natural` layer collects features like rivers, lakes and mountains while
the `manmade` layer functions as a catch-all for features not covered by the
other layers.
### Polygon output
| Parameter | Value | Default |
|-----------| ----- | ------- |
| polygon_geojson | 0 or 1 | 0 |
| polygon_kml | 0 or 1 | 0 |
| polygon_svg | 0 or 1 | 0 |
| polygon_text | 0 or 1 | 0 |
* `polygon_geojson=1`
* `polygon_kml=1`
* `polygon_svg=1`
* `polygon_text=1`
Add the full geometry of the place to the result output. Output formats
in GeoJSON, KML, SVG or WKT are supported. Only one of these
options can be used at a time.
Output geometry of results as a GeoJSON, KML, SVG or WKT. Only one of these
options can be used at a time. (Default: 0)
| Parameter | Value | Default |
|-----------| ----- | ------- |
| polygon_threshold | floating-point number | 0.0 |
* `polygon_threshold=0.0`
When one of the polygon_* outputs is chosen, return a simplified version
of the output geometry. The parameter describes the
Simplify the output geometry before returning. The parameter is the
tolerance in degrees with which the geometry may differ from the original
geometry. Topology is preserved in the geometry.
geometry. Topology is preserved in the result. (Default: 0.0)
### Other
| Parameter | Value | Default |
|-----------| ----- | ------- |
| email | valid email address | _unset_ |
* `email=<valid email address>`
If you are making large numbers of request please include an appropriate email
address to identify your requests. See Nominatim's
[Usage Policy](https://operations.osmfoundation.org/policies/nominatim/) for more details.
address to identify your requests. See Nominatim's [Usage Policy](https://operations.osmfoundation.org/policies/nominatim/) for more details.
| Parameter | Value | Default |
|-----------| ----- | ------- |
| debug | 0 or 1 | 0 |
* `debug=[0|1]`
Output assorted developer debug information. Data on internals of Nominatim's
"search loop" logic, and SQL queries. The output is HTML format.
This overrides the specified machine readable format.
"Search Loop" logic, and SQL queries. The output is (rough) HTML format.
This overrides the specified machine readable format. (Default: 0)
## Examples
@@ -243,7 +149,7 @@ This overrides the specified machine readable format.
"licence":"Data © OpenStreetMap contributors, ODbL 1.0. https:\/\/www.openstreetmap.org\/copyright",
"osm_type":"way",
"osm_id":"280940520",
"lat":"-34.4391708",
"lat":"-34.4391708",
"lon":"-58.7064573",
"place_rank":"26",
"category":"highway",

View File

@@ -1,444 +1,277 @@
# Search queries
The search API allows you to look up a location from a textual description
or address. Nominatim supports structured and free-form search queries.
The search API allows you to look up a location from a textual description.
Nominatim supports structured as well as free-form search queries.
The search query may also contain
[special phrases](https://wiki.openstreetmap.org/wiki/Nominatim/Special_Phrases)
which are translated into specific OpenStreetMap (OSM) tags (e.g. Pub => `amenity=pub`).
This can be used to narrow down the kind of objects to be returned.
Note that this only limits the items to be found, it's not suited to return complete
lists of OSM objects of a specific type. For those use [Overpass API](https://overpass-api.de/).
!!! note
Special phrases are not suitable to query all objects of a certain type in an
area. Nominatim will always just return a collection of the best matches. To
download OSM data by object type, use the [Overpass API](https://overpass-api.de/).
## Parameters
## Endpoint
The search API has the following two formats:
The search API has the following format:
```
https://nominatim.openstreetmap.org/search/<query>?<params>
```
This format only accepts a free-form query string where the
parts of the query are separated by slashes.
```
https://nominatim.openstreetmap.org/search?<params>
```
!!! danger "Deprecation warning"
The API can also be used with the URL
`https://nominatim.openstreetmap.org/search.php`. This is now deprecated
and will be removed in future versions.
In this form, the query may be given through two different sets of parameters:
The query term can be given in two different forms: free-form or structured.
* `q=<query>`
### Free-form query
Free-form query string to search for.
Free-form queries are processed first left-to-right and then right-to-left if that fails. So you may search for
[pilkington avenue, birmingham](//nominatim.openstreetmap.org/search?q=pilkington+avenue,birmingham) as well as for
[birmingham, pilkington avenue](//nominatim.openstreetmap.org/search?q=birmingham,+pilkington+avenue).
Commas are optional, but improve performance by reducing the complexity of the search.
| Parameter | Value |
|-----------| ----- |
| q | Free-form query string to search for |
In this form, the query can be unstructured.
Free-form queries are processed first left-to-right and then right-to-left if that fails. So you may search for
[pilkington avenue, birmingham](https://nominatim.openstreetmap.org/search?q=pilkington+avenue,birmingham) as well as for
[birmingham, pilkington avenue](https://nominatim.openstreetmap.org/search?q=birmingham,+pilkington+avenue).
Commas are optional, but improve performance by reducing the complexity of the search.
* `street=<housenumber> <streetname>`
* `city=<city>`
* `county=<county>`
* `state=<state>`
* `country=<country>`
* `postalcode=<postalcode>`
The free-form may also contain special phrases to describe the type of
place to be returned or a coordinate to search close to a position.
Alternative query string format split into several parameters for structured requests.
Structured requests are faster but are less robust against alternative
OSM tagging schemas. **Do not combine with** `q=<query>` **parameter**.
### Structured query
| Parameter | Value |
|----------- | ----- |
| amenity | name and/or type of POI |
| street | housenumber and streetname |
| city | city |
| county | county |
| state | state |
| country | country |
| postalcode | postal code |
The structured form of the search query allows to lookup up an address
that is already split into its components. Each parameter represents a field
of the address. All parameters are optional. You should only use the ones
that are relevant for the address you want to geocode.
!!! Attention
Cannot be combined with the `q=<query>` parameter. Newer versions of
the API will return an error if you do so. Older versions simply return
unexpected results.
## Parameters
The following parameters can be used to further restrict the search and
change the output. They are usable for both forms of the search query.
All three query forms accept the additional parameters listed below.
### Output format
| Parameter | Value | Default |
|-----------| ----- | ------- |
| format | one of: `xml`, `json`, `jsonv2`, `geojson`, `geocodejson` | `jsonv2` |
* `format=[html|xml|json|jsonv2|geojson|geocodejson]`
See [Place Output Formats](Output.md) for details on each format.
See [Place Output Formats](Output.md) for details on each format. (Default: html)
!!! note
The Nominatim service at
[https://nominatim.openstreetmap.org](https://nominatim.openstreetmap.org)
has a different default behaviour for historical reasons. When the
`format` parameter is omitted, the request will be forwarded to the Web UI.
| Parameter | Value | Default |
|-----------| ----- | ------- |
| json_callback | function name | _unset_ |
When given, then JSON output will be wrapped in a callback function with
the given name. See [JSONP](https://en.wikipedia.org/wiki/JSONP) for more
information.
* `json_callback=<string>`
Wrap JSON output in a callback function ([JSONP](https://en.wikipedia.org/wiki/JSONP)) i.e. `<string>(<json>)`.
Only has an effect for JSON output formats.
| Parameter | Value | Default |
|-----------| ----- | ------- |
| limit | number | 10 |
Limit the maximum number of returned results. Cannot be more than 40.
Nominatim may decide to return less results than given, if additional
results do not sufficiently match the query.
### Output details
| Parameter | Value | Default |
|-----------| ----- | ------- |
| addressdetails | 0 or 1 | 0 |
* `addressdetails=[0|1]`
When set to 1, include a breakdown of the address into elements.
The exact content of the address breakdown depends on the output format.
!!! tip
If you are interested in a stable classification of address categories
(suburb, city, state, etc), have a look at the `geocodejson` format.
All other formats return classifications according to OSM tagging.
There is a much larger set of categories and they are not always consistent,
which makes them very hard to work with.
Include a breakdown of the address into elements. (Default: 0)
| Parameter | Value | Default |
|-----------| ----- | ------- |
| extratags | 0 or 1 | 0 |
* `extratags=[0|1]`
When set to 1, the response include any additional information in the result
that is available in the database, e.g. wikipedia link, opening hours.
Include additional information in the result if available,
e.g. wikipedia link, opening hours. (Default: 0)
| Parameter | Value | Default |
|-----------| ----- | ------- |
| namedetails | 0 or 1 | 0 |
* `namedetails=[0|1]`
When set to 1, include a full list of names for the result. These may include
language variants, older names, references and brand.
Include a list of alternative names in the results. These may include
language variants, references, operator and brand. (Default: 0)
### Language of results
| Parameter | Value | Default |
|-----------| ----- | ------- |
| accept-language | browser language string | content of "Accept-Language" HTTP header |
* `accept-language=<browser language string>`
Preferred language order for showing search results. This may either be
a simple comma-separated list of language codes or have the same format
as the ["Accept-Language" HTTP header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language).
Preferred language order for showing search results, overrides the value
specified in the ["Accept-Language" HTTP header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language).
Either use a standard RFC2616 accept-language string or a simple
comma-separated list of language codes.
!!! tip
First-time users of Nominatim tend to be confused that they get different
results when using Nominatim in the browser versus in a command-line tool
like wget or curl. The command-line tools
usually don't send any Accept-Language header, prompting Nominatim
to show results in the local language. Browsers on the contrary always
send the currently chosen browser language.
### Result limitation
### Result restriction
* `countrycodes=<countrycode>[,<countrycode>][,<countrycode>]...`
There are two ways to influence the results. *Filters* exclude certain
kinds of results completely. *Boost parameters* only change the order of the
results and thus give a preference to some results over others.
Limit search results to one or more countries. `<countrycode>` must be the
ISO 3166-1alpha2 code, e.g. `gb` for the United Kingdom, `de` for Germany.
| Parameter | Value | Default |
|-----------| ----- | ------- |
| countrycodes | comma-separated list of country codes | _unset_ |
Filter that limits the search results to one or more countries.
The country code must be the
[ISO 3166-1alpha2](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2) code
of the country, e.g. `gb` for the United Kingdom, `de` for Germany.
Each place in Nominatim is assigned to one country code based
on OSM country boundaries. In rare cases a place may not be in any country
at all, for example, when it is in international waters. These places are
also excluded when the filter is set.
!!! Note
This parameter should not be confused with the 'country' parameter of
the structured query. The 'country' parameter contains a search term
and will be handled with some fuzziness. The `countrycodes` parameter
is a hard filter and as such should be preferred. Having both parameters
in the same query will work. If the parameters contradict each other,
the search will come up empty.
| Parameter | Value | Default |
|-----------| ----- | ------- |
| layer | comma-separated list of: `address`, `poi`, `railway`, `natural`, `manmade` | _unset_ (no restriction) |
The layer filter allows to select places by themes.
The `address` layer contains all places that make up an address:
address points with house numbers, streets, inhabited places (suburbs, villages,
cities, states tec.) and administrative boundaries.
The `poi` layer selects all point of interest. This includes classic POIs like
restaurants, shops, hotels but also less obvious features like recycling bins,
guideposts or benches.
The `railway` layer includes railway infrastructure like tracks.
Note that in Nominatim's standard configuration, only very few railway
features are imported into the database.
The `natural` layer collects features like rivers, lakes and mountains while
the `manmade` layer functions as a catch-all for features not covered by the
other layers.
| Parameter | Value | Default |
|-----------| ----- | ------- |
| featureType | one of: `country`, `state`, `city`, `settlement` | _unset_ |
The featureType allows to have a more fine-grained selection for places
from the address layer. Results can be restricted to places that make up
the 'state', 'country' or 'city' part of an address. A featureType of
settlement selects any human inhabited feature from 'state' down to
'neighbourhood'.
When featureType is set, then results are automatically restricted
to the address layer (see above).
!!! tip
Instead of using the featureType filters `country`, `state` or `city`,
you can also use a structured query without the finer-grained parameters
amenity or street.
| Parameter | Value | Default |
|-----------| ----- | ------- |
| exclude_place_ids | comma-separated list of place ids |
* `exclude_place_ids=<place_id,[place_id],[place_id]`
If you do not want certain OSM objects to appear in the search
result, give a comma separated list of the `place_id`s you want to skip.
This can be used to retrieve additional search results. For example, if a
previous query only returned a few results, then including those here would
cause the search to return other, less accurate, matches (if possible).
This can be used to broaden search results. For example, if a previous
query only returned a few results, then including those here would cause
the search to return other, less accurate, matches (if possible).
| Parameter | Value | Default |
|-----------| ----- | ------- |
| viewbox | `<x1>,<y1>,<x2>,<y2>` | _unset_ |
Boost parameter which focuses the search on the given area.
Any two corner points of the box are accepted as long as they make a proper
box. `x` is longitude, `y` is latitude.
* `limit=<integer>`
| Parameter | Value | Default |
|-----------| ----- | ------- |
| bounded | 0 or 1 | 0 |
Limit the number of returned results. (Default: 10, Maximum: 50)
When set to 1, then it turns the 'viewbox' parameter (see above) into
a filter parameter, excluding any results outside the viewbox.
When `bounded=1` is given and the viewbox is small enough, then an amenity-only
search is allowed. Give the special keyword for the amenity in square
brackets, e.g. `[pub]` and a selection of objects of this type is returned.
There is no guarantee that the result returns all objects in the area.
* `viewbox=<x1>,<y1>,<x2>,<y2>`
The preferred area to find search results. Any two corner points of the box
are accepted in any order as long as they span a real box. `x` is longitude,
`y` is latitude.
* `bounded=[0|1]`
When a viewbox is given, restrict the result to items contained with that
viewbox (see above). When `viewbox` and `bounded=1` are given, an amenity
only search is allowed. In this case, give the special keyword for the
amenity in square brackets, e.g. `[pub]`. (Default: 0)
### Polygon output
| Parameter | Value | Default |
|-----------| ----- | ------- |
| polygon_geojson | 0 or 1 | 0 |
| polygon_kml | 0 or 1 | 0 |
| polygon_svg | 0 or 1 | 0 |
| polygon_text | 0 or 1 | 0 |
* `polygon_geojson=1`
* `polygon_kml=1`
* `polygon_svg=1`
* `polygon_text=1`
Add the full geometry of the place to the result output. Output formats
in GeoJSON, KML, SVG or WKT are supported. Only one of these
options can be used at a time.
Output geometry of results as a GeoJSON, KML, SVG or WKT. Only one of these
options can be used at a time. (Default: 0)
| Parameter | Value | Default |
|-----------| ----- | ------- |
| polygon_threshold | floating-point number | 0.0 |
* `polygon_threshold=0.0`
When one of the polygon_* outputs is chosen, return a simplified version
of the output geometry. The parameter describes the
Simplify the output geometry before returning. The parameter is the
tolerance in degrees with which the geometry may differ from the original
geometry. Topology is preserved in the geometry.
geometry. Topology is preserved in the result. (Default: 0.0)
### Other
| Parameter | Value | Default |
|-----------| ----- | ------- |
| email | valid email address | _unset_ |
* `email=<valid email address>`
If you are making large numbers of request please include an appropriate email
address to identify your requests. See Nominatim's
[Usage Policy](https://operations.osmfoundation.org/policies/nominatim/) for more details.
address to identify your requests. See Nominatim's [Usage Policy](https://operations.osmfoundation.org/policies/nominatim/) for more details.
| Parameter | Value | Default |
|-----------| ----- | ------- |
| dedupe | 0 or 1 | 1 |
* `dedupe=[0|1]`
Sometimes you have several objects in OSM identifying the same place or
object in reality. The simplest case is a street being split into many
object in reality. The simplest case is a street being split in many
different OSM ways due to different characteristics. Nominatim will
attempt to detect such duplicates and only return one match. Setting
this parameter to 0 disables this deduplication mechanism and
ensures that all results are returned.
attempt to detect such duplicates and only return one match unless
this parameter is set to 0. (Default: 1)
| Parameter | Value | Default |
|-----------| ----- | ------- |
| debug | 0 or 1 | 0 |
* `debug=[0|1]`
Output assorted developer debug information. Data on internals of Nominatim's
"search loop" logic, and SQL queries. The output is HTML format.
This overrides the specified machine readable format.
"Search Loop" logic, and SQL queries. The output is (rough) HTML format.
This overrides the specified machine readable format. (Default: 0)
## Examples
##### XML with KML polygon
##### XML with polygon points
* [https://nominatim.openstreetmap.org/search?q=135+pilkington+avenue,+birmingham&format=xml&polygon_kml=1&addressdetails=1](https://nominatim.openstreetmap.org/search?q=135+pilkington+avenue,+birmingham&format=xml&polygon_kml=1&addressdetails=1)
* [https://nominatim.openstreetmap.org/search?q=135+pilkington+avenue,+birmingham&format=xml&polygon=1&addressdetails=1](https://nominatim.openstreetmap.org/search?q=135+pilkington+avenue,+birmingham&format=xml&polygon=1&addressdetails=1)
* [https://nominatim.openstreetmap.org/search/gb/birmingham/pilkington%20avenue/135?format=xml&polygon=1&addressdetails=1](https://nominatim.openstreetmap.org/search/gb/birmingham/pilkington%20avenue/135?format=xml&polygon=1&addressdetails=1)
* [https://nominatim.openstreetmap.org/search/135%20pilkington%20avenue,%20birmingham?format=xml&polygon=1&addressdetails=1](https://nominatim.openstreetmap.org/search/135%20pilkington%20avenue,%20birmingham?format=xml&polygon=1&addressdetails=1)
```xml
<?xml version="1.0" encoding="UTF-8" ?>
<searchresults timestamp="Tue, 08 Aug 2023 15:45:41 +00:00"
attribution="Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright"
querystring="135 pilkington avenue, birmingham"
more_url="https://nominatim.openstreetmap.org/search?q=135+pilkington+avenue%2C+birmingham&amp;polygon_kml=1&amp;addressdetails=1&amp;limit=20&amp;exclude_place_ids=125279639&amp;format=xml"
exclude_place_ids="125279639">
<place place_id="125279639"
osm_type="way"
osm_id="90394480"
lat="52.5487921"
lon="-1.8164308"
boundingbox="52.5487473,52.5488481,-1.8165130,-1.8163464"
place_rank="30"
address_rank="30"
display_name="135, Pilkington Avenue, Maney, Sutton Coldfield, Wylde Green, Birmingham, West Midlands Combined Authority, England, B72 1LH, United Kingdom"
class="building"
type="residential"
importance="9.999999994736442e-08">
<geokml>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>-1.816513,52.5487566 -1.816434,52.5487473 -1.816429,52.5487629 -1.8163717,52.5487561 -1.8163464,52.5488346 -1.8164599,52.5488481 -1.8164685,52.5488213 -1.8164913,52.548824 -1.816513,52.5487566</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</geokml>
<house_number>135</house_number>
<road>Pilkington Avenue</road>
<hamlet>Maney</hamlet>
<town>Sutton Coldfield</town>
<village>Wylde Green</village>
<city>Birmingham</city>
<ISO3166-2-lvl8>GB-BIR</ISO3166-2-lvl8>
<state_district>West Midlands Combined Authority</state_district>
<state>England</state>
<ISO3166-2-lvl4>GB-ENG</ISO3166-2-lvl4>
<postcode>B72 1LH</postcode>
<country>United Kingdom</country>
<country_code>gb</country_code>
</place>
</searchresults>
<searchresults timestamp="Sat, 07 Nov 09 14:42:10 +0000" querystring="135 pilkington, avenue birmingham" polygon="true">
<place
place_id="1620612" osm_type="node" osm_id="452010817"
boundingbox="52.548641204834,52.5488433837891,-1.81612110137939,-1.81592094898224"
polygonpoints="[['-1.81592098644987','52.5487429714954'],['-1.81592290792183','52.5487234624632'],...]"
lat="52.5487429714954" lon="-1.81602098644987"
display_name="135, Pilkington Avenue, Wylde Green, City of Birmingham, West Midlands (county), B72, United Kingdom"
class="place" type="house">
<house_number>135</house_number>
<road>Pilkington Avenue</road>
<village>Wylde Green</village>
<town>Sutton Coldfield</town>
<city>City of Birmingham</city>
<county>West Midlands (county)</county>
<postcode>B72</postcode>
<country>United Kingdom</country>
<country_code>gb</country_code>
</place>
</searchresults>
```
##### JSON with SVG polygon
[https://nominatim.openstreetmap.org/search?q=Unter%20den%20Linden%201%20Berlin&format=json&addressdetails=1&limit=1&polygon_svg=1](https://nominatim.openstreetmap.org/search?q=Unter%20den%20Linden%201%20Berlin&format=json&addressdetails=1&limit=1&polygon_svg=1)
[https://nominatim.openstreetmap.org/search/Unter%20den%20Linden%201%20Berlin?format=json&addressdetails=1&limit=1&polygon_svg=1](https://nominatim.openstreetmap.org/search/Unter%20den%20Linden%201%20Berlin?format=json&addressdetails=1&limit=1&polygon_svg=1)
```json
[
{
"address": {
"ISO3166-2-lvl4": "DE-BE",
"borough": "Mitte",
"city": "Berlin",
"country": "Deutschland",
"country_code": "de",
"historic": "Kommandantenhaus",
"house_number": "1",
"neighbourhood": "Friedrichswerder",
"postcode": "10117",
"road": "Unter den Linden",
"suburb": "Mitte"
},
"boundingbox": [
"52.5170798",
"52.5173311",
"13.3975116",
"13.3981577"
],
"class": "historic",
"display_name": "Kommandantenhaus, 1, Unter den Linden, Friedrichswerder, Mitte, Berlin, 10117, Deutschland",
"importance": 0.8135042058306902,
"lat": "52.51720765",
"licence": "Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright",
"lon": "13.397834399325466",
"osm_id": 15976890,
"osm_type": "way",
"place_id": 108681845,
"svg": "M 13.3975116 -52.5172905 L 13.397549 -52.5170798 13.397715 -52.5170906 13.3977122 -52.5171064 13.3977392 -52.5171086 13.3977417 -52.5170924 13.3979655 -52.5171069 13.3979623 -52.5171233 13.3979893 -52.5171248 13.3979922 -52.5171093 13.3981577 -52.5171203 13.398121 -52.5173311 13.3978115 -52.5173103 Z",
"type": "house"
}
]
{
"address": {
"city": "Berlin",
"city_district": "Mitte",
"construction": "Unter den Linden",
"continent": "European Union",
"country": "Deutschland",
"country_code": "de",
"house_number": "1",
"neighbourhood": "Scheunenviertel",
"postcode": "10117",
"public_building": "Kommandantenhaus",
"state": "Berlin",
"suburb": "Mitte"
},
"boundingbox": [
"52.5170783996582",
"52.5173187255859",
"13.3975105285645",
"13.3981599807739"
],
"class": "amenity",
"display_name": "Kommandantenhaus, 1, Unter den Linden, Scheunenviertel, Mitte, Berlin, 10117, Deutschland, European Union",
"importance": 0.73606775332943,
"lat": "52.51719785",
"licence": "Data \u00a9 OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright",
"lon": "13.3978352028938",
"osm_id": "15976890",
"osm_type": "way",
"place_id": "30848715",
"svg": "M 13.397511 -52.517283599999999 L 13.397829400000001 -52.517299800000004 13.398131599999999 -52.517315099999998 13.398159400000001 -52.517112099999999 13.3975388 -52.517080700000001 Z",
"type": "public_building"
}
```
##### JSON with address details
[https://nominatim.openstreetmap.org/search?addressdetails=1&q=bakery+in+berlin+wedding&format=jsonv2&limit=1](https://nominatim.openstreetmap.org/search?addressdetails=1&q=bakery+in+berlin+wedding&format=jsonv2&limit=1)
[https://nominatim.openstreetmap.org/?addressdetails=1&q=bakery+in+berlin+wedding&format=json&limit=1](https://nominatim.openstreetmap.org/?addressdetails=1&q=bakery+in+berlin+wedding&format=json&limit=1)
```json
[
{
"address": {
"ISO3166-2-lvl4": "DE-BE",
"borough": "Mitte",
"city": "Berlin",
"country": "Deutschland",
"country_code": "de",
"neighbourhood": "Sprengelkiez",
"postcode": "13347",
"road": "Lindower Straße",
"shop": "Ditsch",
"suburb": "Wedding"
},
"addresstype": "shop",
"boundingbox": [
"52.5427201",
"52.5427654",
"13.3668619",
"13.3669442"
],
"category": "shop",
"display_name": "Ditsch, Lindower Straße, Sprengelkiez, Wedding, Mitte, Berlin, 13347, Deutschland",
"importance": 9.99999999995449e-06,
"lat": "52.54274275",
"licence": "Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright",
"lon": "13.36690305710228",
"name": "Ditsch",
"osm_id": 437595031,
"osm_type": "way",
"place_id": 204751033,
"place_rank": 30,
"type": "bakery"
}
]
{
"address": {
"bakery": "B\u00e4cker Kamps",
"city_district": "Mitte",
"continent": "European Union",
"country": "Deutschland",
"country_code": "de",
"footway": "Bahnsteig U6",
"neighbourhood": "Sprengelkiez",
"postcode": "13353",
"state": "Berlin",
"suburb": "Wedding"
},
"boundingbox": [
"52.5460929870605",
"52.5460968017578",
"13.3591794967651",
"13.3591804504395"
],
"class": "shop",
"display_name": "B\u00e4cker Kamps, Bahnsteig U6, Sprengelkiez, Wedding, Mitte, Berlin, 13353, Deutschland, European Union",
"icon": "https://nominatim.openstreetmap.org/images/mapicons/shopping_bakery.p.20.png",
"importance": 0.201,
"lat": "52.5460941",
"licence": "Data \u00a9 OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright",
"lon": "13.35918",
"osm_id": "317179427",
"osm_type": "node",
"place_id": "1453068",
"type": "bakery"
}
```
##### GeoJSON

View File

@@ -1,71 +0,0 @@
# Status
Report on the state of the service and database. Useful for checking if the
service is up and running. The JSON output also reports
when the database was last updated.
## Endpoint
The status API has the following format:
```
https://nominatim.openstreetmap.org/status
```
!!! danger "Deprecation warning"
The API can also be used with the URL
`https://nominatim.openstreetmap.org/status.php`. This is now deprecated
and will be removed in future versions.
## Parameters
The status endpoint takes a single optional parameter:
| Parameter | Value | Default |
|-----------| ----- | ------- |
| format | one of: `text`, `json` | 'text' |
Selects the output format. See below.
## Output
#### Text format
When everything is okay, a status code 200 is returned and a simple message: `OK`
On error it will return HTTP status code 500 and print a detailed error message, e.g.
`ERROR: Database connection failed`.
#### JSON format
Always returns a HTTP code 200, when the status call could be executed.
On success a JSON dictionary with the following structure is returned:
```json
{
"status": 0,
"message": "OK",
"data_updated": "2020-05-04T14:47:00+00:00",
"software_version": "3.6.0-0",
"database_version": "3.6.0-0"
}
```
The `software_version` field contains the version of Nominatim used to serve
the API. The `database_version` field contains the version of the data format
in the database.
On error will return a shorter JSON dictionary with the error message
and status only, e.g.
```json
{
"status": 700,
"message": "Database connection failed"
}
```

7
docs/bash2md.sh Executable file
View File

@@ -0,0 +1,7 @@
#!/bin/sh
#
# Extract markdown-formatted documentation from a source file
#
# Usage: bash2md.sh <infile> <outfile>
sed '/^#!/d;s:^#\( \|$\)::;s/.*#DOCS://' $1 > $2

View File

@@ -1,149 +0,0 @@
# Customizing Per-Country Data
Whenever an OSM is imported into Nominatim, the object is first assigned
a country. Nominatim can use this information to adapt various aspects of
the address computation to the local customs of the country. This section
explains how country assignment works and the principal per-country
localizations.
## Country assignment
Countries are assigned on the basis of country data from the OpenStreetMap
input data itself. Countries are expected to be tagged according to the
[administrative boundary schema](https://wiki.openstreetmap.org/wiki/Tag:boundary%3Dadministrative):
a OSM relation with `boundary=administrative` and `admin_level=2`. Nominatim
uses the country code to distinguish the countries.
If there is no country data available for a point, then Nominatim uses the
fallback data imported from `data/country_osm_grid.sql.gz`. This was computed
from OSM data as well but is guaranteed to cover all countries.
Some OSM objects may also be located outside any country, for example a buoy
in the middle of the ocean. These object do not get any country assigned and
get a default treatment when it comes to localized handling of data.
## Per-country settings
### Global country settings
The main place to configure settings per country is the file
`settings/country_settings.yaml`. This file has one section per country that
is recognised by Nominatim. Each section is tagged with the country code
(in lower case) and contains the different localization information. Only
countries which are listed in this file are taken into account for computations.
For example, the section for Andorra looks like this:
```
partition: 35
languages: ca
names: !include country-names/ad.yaml
postcode:
pattern: "(ddd)"
output: AD\1
```
The individual settings are described below.
#### `partition`
Nominatim internally splits the data into multiple tables to improve
performance. The partition number tells Nominatim into which table to put
the country. This is purely internal management and has no effect on the
output data.
The default is to have one partition per country.
#### `languages`
A comma-separated list of ISO-639 language codes of default languages in the
country. These are the languages used in name tags without a language suffix.
Note that this is not necessarily the same as the list of official languages
in the country. There may be officially recognised languages in a country
which are only ever used in name tags with the appropriate language suffixes.
Conversely, a non-official language may appear a lot in the name tags, for
example when used as an unofficial Lingua Franca.
List the languages in order of frequency of appearance with the most frequently
used language first. It is not recommended to add languages when there are only
very few occurrences.
If only one language is listed, then Nominatim will 'auto-complete' the
language of names without an explicit language-suffix.
#### `names`
List of names of the country and its translations. These names are used as
a baseline. It is always possible to search countries by the given names, no
matter what other names are in the OSM data. They are also used as a fallback
when a needed translation is not available.
!!! Note
The list of names per country is currently fairly large because Nominatim
supports translations in many languages per default. That is why the
name lists have been separated out into extra files. You can find the
name lists in the file `settings/country-names/<country code>.yaml`.
The names section in the main country settings file only refers to these
files via the special `!include` directive.
#### `postcode`
Describes the format of the postcode that is in use in the country.
When a country has no official postcodes, set this to no. Example:
```
ae:
postcode: no
```
When a country has a postcode, you need to state the postcode pattern and
the default output format. Example:
```
bm:
postcode:
pattern: "(ll)[ -]?(dd)"
output: \1 \2
```
The **pattern** is a regular expression that describes the possible formats
accepted as a postcode. The pattern follows the standard syntax for
[regular expressions in Python](https://docs.python.org/3/library/re.html#regular-expression-syntax)
with two extra shortcuts: `d` is a shortcut for a single digit([0-9])
and `l` for a single ASCII letter ([A-Z]).
Use match groups to indicate groups in the postcode that may optionally be
separated with a space or a hyphen.
For example, the postcode for Bermuda above always consists of two letters
and two digits. They may optionally be separated by a space or hyphen. That
means that Nominatim will consider `AB56`, `AB 56` and `AB-56` spelling variants
for one and the same postcode.
Never add the country code in front of the postcode pattern. Nominatim will
automatically accept variants with a country code prefix for all postcodes.
The **output** field is an optional field that describes what the canonical
spelling of the postcode should be. The format is the
[regular expression expand syntax](https://docs.python.org/3/library/re.html#re.Match.expand) referring back to the bracket groups in the pattern.
Most simple postcodes only have one spelling variant. In that case, the
**output** can be omitted. The postcode will simply be used as is.
In the Bermuda example above, the canonical spelling would be to have a space
between letters and digits.
!!! Warning
When your postcode pattern covers multiple variants of the postcode, then
you must explicitly state the canonical output or Nominatim will not
handle the variations correctly.
### Other country-specific configuration
There are some other configuration files where you can set localized settings
according to the assigned country. These are:
* [Place ranking configuration](Ranking.md)
Please see the linked documentation sections for more information.

View File

@@ -1,574 +0,0 @@
# Configuring the Import of OSM data
In the very first step of a Nominatim import, OSM data is loaded into the
database. Nominatim uses [osm2pgsql](https://osm2pgsql.org) for this task.
It comes with a [flex style](https://osm2pgsql.org/doc/manual.html#the-flex-output)
specifically tailored to filter and convert OSM data into Nominatim's
internal data representation. Nominatim ships with a few preset
configurations for this import, each results in a geocoding database of
different detail. The
[Import section](../admin/Import.md#filtering-imported-data) explains
these default configurations in detail.
If you want to have more control over which OSM data is added to the database,
you can also create your own custom style. Create a new lua style file, put it
into your project directory and then set `NOMINATIM_IMPORT_STYLE` to the name
of the file. Custom style files can be used to modify the existing preset
configurations or to implement your own configuration from scratch.
The remainder of the page describes how the flex style works and how to
customize it.
## The `flex-base` lua module
The core of Nominatim's flex import configuration is the `flex-base` module.
It defines the table layout used by Nominatim and provides standard
implementations for the import callbacks that help with customizing
how OSM tags are used by Nominatim.
Every custom style must include this module to make sure that the correct
tables are created. Thus start your custom style as follows:
``` lua
local flex = require('flex-base')
```
### Using preset configurations
If you want to start with one of the existing presets, then you can import
its settings using the `import_topic()` function:
```
local flex = require('flex-base')
flex.import_topic('streets')
```
The `import_topic` function takes an optional second configuration
parameter. The available options are explained in the
[themepark section](#using-osm2pgsql-themepark).
!!! note
You can also directly import the preset style files, e.g.
`local flex = require('import-street')`. It is not possible to
set extra configuration this way.
### How processing works
When Nominatim processes an OSM object, it looks for four kinds of tags:
The _main tags_ classify what kind of place the OSM object represents. One
OSM object can have more than one main tag. In such case one database entry
is created for each main tag. _Name tags_ represent searchable names of the
place. _Address tags_ are used to compute the address hierarchy of the place.
Address tags are used for searching and for creating a display name of the place.
_Extra tags_ are any tags that are not directly related to search but
contain interesting additional information.
!!! danger
Some tags in the extratags category are used by Nominatim to better
classify the place. You want to make sure these are always present
in custom styles.
Configuring the style means deciding which key and/or key/value is used
in which category.
## Changing the recognized tags
The flex style offers a number of functions to set the classification of
each OSM tag. Most of these functions can also take a preset string instead
of a tag description. These presets describe common configurations that
are also used in the definition of the predefined styles. This section
lists the configuration functions and the accepted presets.
#### Key match lists
Some of the following functions take _key match lists_. These lists can
contain three kinds of strings to match against tag keys:
A string that ends in an asterisk `*` is a prefix match and accordingly matches
against any key that starts with the given string (minus the `*`).
A suffix match can be defined similarly with a string that starts with a `*`.
Any other string is matched exactly against tag keys.
### Main tags
`set/modify_main_tags()` allow to define which tags are used as main tags. It
takes a lua table parameter which defines for keys and key/value
combinations, how they are classified.
The following classifications are recognized:
| classification | meaning |
| :-------------- | :------ |
| always | Unconditionally use this tag as a main tag. |
| named | Consider as main tag, when the object has a primary name (see [names](#name-tags) below) |
| named_with_key | Consider as main tag, when the object has a primary name with a domain prefix. For example, if the main tag is `bridge=yes`, then it will only be added as an extra entry, if there is a tag `bridge:name[:XXX]` for the same object. If this property is set, all names that are not domain-specific are ignored. |
| fallback | Consider as main tag only when no other main tag was found. Fallback always implies `named`, i.e. fallbacks are only tried for objects with primary names. |
| delete | Completely ignore the tag in any further processing |
| extra | Move the tag to extratags and then ignore it for further processing |
| `<function>`| Advanced handling, see [below](#advanced-main-tag-handling) |
Each key in the table parameter defines an OSM tag key. The value may
be directly a classification as described above. Then the tag will
be considered a main tag for any possible value that is not further defined.
To further restrict which values are acceptable, give a table with the
permitted values and their kind of main tag. If the table contains a simple
value without key, then this is used as default for values that are not listed.
`set_main_tags()` will completely replace the current main tag configuration
with the new configuration. `modify_main_tags()` will merge the new
configuration with the existing one. Otherwise, the two functions do exactly
the same.
!!! example
``` lua
local flex = require('import-full')
flex.set_main_tags{
boundary = {administrative = 'named'},
highway = {'always', street_lamp = 'named', no = 'delete'},
landuse = 'fallback'
}
```
In this example an object with a `boundary` tag will only be included
when it has a value of `administrative`. Objects with `highway` tags are
always included with two exceptions: the troll tag `highway=no` is
deleted on the spot. And when the value is `street_lamp` then the object
must have a name, too. Finally, if a `landuse` tag is present then
it will be used independently of the concrete value when neither boundary
nor highway tags were found and the object is named.
##### Presets
| Name | Description |
| :----- | :---------- |
| admin | Basic tag set collecting places and administrative boundaries. This set is needed also to ensure proper address computation and should therefore always be present. You can disable selected place types like `place=locality` after adding this set, if they are not relevant for your use case. |
| all_boundaries | Extends the set of recognized boundaries and places to all available ones. |
| natural | Tags for natural features like rivers and mountain peaks. |
| street/default | Tags for streets. Major streets are always included, minor ones only when they have a name. |
| street/car | Tags for all streets that can be used by a motor vehicle. |
| street/all | Includes all highway features named and unnamed. |
| poi/delete | Adds most POI features with and without name. Some frequent but very domain-specific values are excluded by deleting them. |
| poi/extra | Like 'poi/delete' but excluded values are moved to extratags. |
##### Advanced main tag handling
The groups described above are in fact only a preset for a filtering function
that is used to make the final decision how a pre-selected main tag is entered
into Nominatim's internal table. To further customize handling you may also
supply your own filtering function.
The function takes up to three parameters: a Place object of the object
being processed, the key of the main tag and the value of the main tag.
The function may return one of three values:
* `nil` or `false` causes the entry to be ignored
* the Place object causes the place to be added as is
* `Place.copy(names=..., address=..., extratags=...) causes the
place to be enter into the database but with name/address/extratags
set to the given different values.
The Place object has some read-only values that can be used to determine
the handling:
* **object** is the original OSM object data handed in by osm2pgsql
* **admin_level** is the content of the admin_level tag, parsed into an integer and normalized to a value between 0 and 15
* **has_name** is a boolean indicating if the object has a primary name tag
* **names** is a table with the collected list of name tags
* **address** is a table with the collected list of address tags
* **extratags** is a table with the collected list of additional tags to save
!!! example
``` lua
local flex = require('flex-base')
flex.add_topic('street')
local function no_sidewalks(place, k, v)
if place.object.tags.footway == 'sidewalk' then
return false
end
-- default behaviour is to have all footways
return place
end
flex.modify_main_tags(highway = {'footway' = no_sidewalks}
```
This script adds a custom handler for `highway=footway`. It only includes
them in the database, when the object doesn't have a tag `footway=sidewalk`
indicating that it is just part of a larger street which should already
be indexed. Note that it is not necessary to check the key and value
of the main tag because the function is only used for the specific
main tag.
### Ignored tags
The function `ignore_keys()` sets the `delete` classification for keys.
This function takes a _key match list_ so that it is possible to exclude
groups of keys.
Note that full matches always take precedence over suffix matches, which
in turn take precedence over prefix matches.
!!! example
``` lua
local flex = require('flex-base')
flex.add_topic('admin')
flex.ignore_keys{'old_name', 'old_name:*'}
```
This example uses the `admin` preset with the exception that names
that are no longer are in current use, are ignored.
##### Presets
| Name | Description |
| :----- | :---------- |
| metatags | Tags with meta information about the OSM tag like source, notes and import sources. |
| name | Non-names that actually describe properties or name parts. These names can throw off search and should always be removed. |
| address | Extra `addr:*` tags that are not useful for Nominatim. |
### Tags for `extratags`
The function `add_for_extratags()` sets the `extra` classification for keys.
This function takes a
_key match list_ so that it is possible to move groups of keys to extratags.
Note that full matches always take precedence over suffix matches, which
in turn take precedence over prefix matches.
!!! example
``` lua
local flex = require('flex-base')
flex.add_topic('street')
flex.add_for_extratags{'surface', 'access', 'vehicle', 'maxspeed'}
```
This example uses the `street` preset but adds a couple of tags that
are of interest about the condition of the street.
##### Presets
| Name | Description |
| :----- | :---------- |
| required | Tags that Nominatim will use for various computations when present in extratags. Always include these. |
In addition, all [presets from ignored tags](#presets_1) are accepted.
### General pre-filtering
_(deprecated)_ `set_prefilters()` allows to set the `delete` and `extra`
classification for main tags.
This function removes all previously set main tags with `delete` and `extra`
classification and then adds the newly defined tags.
`set_prefilters()` takes a table with four optional fields:
* __delete_keys__ is a _key match list_ for tags that should be deleted
* __delete_tags__ contains a table of tag keys pointing to a list of tag
values. Tags with matching key/value pairs are deleted.
* __extra_keys__ is a _key match list_ for tags which should be saved into
extratags
* __extra_tags__ contains a table of tag keys pointing to a list of tag
values. Tags with matching key/value pairs are moved to extratags.
!!! danger "Deprecation warning"
Use of this function should be replaced with `modify_main_tags()` to
set the data from `delete_tags` and `extra_tags`, with `ignore_keys()`
for the `delete_keys` parameter and with `add_for_extratags()` for the
`extra_keys` parameter.
### Name tags
`set/modify_name_tags()` allow to define the tags used for naming places. Name tags
can only be selected by their keys. The import script distinguishes
between primary and auxiliary names. A primary name is the given name of
a place. Having a primary name makes a place _named_. This is important
for main tags that are only included when a name is present. Auxiliary names
are identifiers like references. They may be searched for but should not
be included on their own.
The functions take a table with two optional fields `main` and `extra`.
They take _key match lists_ for primary and auxiliary names respectively.
A third field `house` can contain tags for names that appear in place of
house numbers in addresses. This field can only contain complete key names.
'house tags' are special in that they cause the OSM object to be added to
the database independently of the presence of other main tags.
`set_name_tags()` overwrites the current configuration, while
`modify_name_tags()` replaces the fields that are given. (Be aware that
the fields are replaced as a whole. `main = {'foo_name'}` will cause
`foo_name` to become the only recognized primary name. Any previously
defined primary names are forgotten.)
!!! example
``` lua
local flex = require('flex-base')
flex.set_main_tags{highway = {traffic_light = 'named'}}
flex.set_name_tags{main = {'name', 'name:*'},
extra = {'ref'}
}
```
This example creates a search index over traffic lights but will
only include those that have a common name and not those which just
have some reference ID from the city.
##### Presets
| Name | Description |
| :----- | :---------- |
| core | Basic set of recognized names for all places. |
| address | Additional names useful when indexing full addresses. |
| poi | Extended set of recognized names for pois. Use on top of the core set. |
### Address tags
`set/modify_address_tags()` defines the tags that will be used to build
up the address of an object. Address tags can only be chosen by their key.
The functions take a table with arbitrary fields, each defining
a key list or _key match list_. Some fields have a special meaning:
| Field | Type | Description |
| :---------| :-------- | :-----------|
| main | key list | Tags that make a full address object out of the OSM object. This is usually the house number or variants thereof. If a main address tag appears, then the object will always be included, if necessary with a fallback of `place=house`. If the key has a prefix of `addr:` or `is_in:` this will be stripped. |
| extra | key match list | Supplementary tags for addresses, tags like `addr:street`, `addr:city` etc. If the key has a prefix of `addr:` or `is_in:` this will be stripped. |
| interpolation | key list | Tags that identify address interpolation lines. |
| country | key match list | Tags that may contain the country the place is in. The first found value with a two-letter code will be accepted, all other values are discarded. |
| _other_ | key match list | Summary field. If a key matches the key match list, then its value will be added to the address tags with the name of the field as key. If multiple tags match, then an arbitrary one wins. |
`set_address_tags()` overwrites the current configuration, while
`modify_address_tags()` replaces the fields that are given. (Be aware that
the fields are replaced as a whole.)
!!! example
``` lua
local flex = require('import-full')
flex.set_address_tags{
main = {'addr:housenumber'},
extra = {'addr:*'},
postcode = {'postal_code', 'postcode', 'addr:postcode'},
country = {'country_code', 'ISO3166-1'}
}
```
In this example all tags which begin with `addr:` will be saved in
the address tag list. If one of the tags is `addr:housenumber`, the
object will fall back to be entered as a `place=house` in the database
unless there is another interested main tag to be found.
Tags with keys `country_code` and `ISO3166-1` are saved with their
value under `country` in the address tag list. The same thing happens
to postcodes, they will always be saved under the key `postcode` thus
normalizing the multitude of keys that are used in the OSM database.
##### Presets
| Name | Description |
| :----- | :---------- |
| core | Basic set of tags needed to recognize address relationship for any place. Always include this. |
| houses | Additional set of tags needed to recognize proper addresses |
### Handling of unclassified tags
`set_unused_handling()` defines what to do with tags that remain after all tags
have been classified using the functions above. There are two ways in
which the function can be used:
`set_unused_handling(delete_keys = ..., delete_tags = ...)` deletes all
keys that match the descriptions in the parameters and moves all remaining
tags into the extratags list.
`set_unused_handling(extra_keys = ..., extra_tags = ...)` moves all tags
matching the parameters into the extratags list and then deletes the remaining
tags. For the format of the parameters see the description in `set_prefilters()`
above.
When no special handling is set, then unused tags will be discarded with one
exception: place tags are kept in extratags for administrative boundaries.
When using a custom setting, you should also make sure that the place tag
is added for extratags.
!!! example
``` lua
local flex = require('import-full')
flex.set_address_tags{
main = {'addr:housenumber'},
extra = {'addr:*', 'tiger:county'}
}
flex.set_unused_handling{delete_keys = {'tiger:*'}}
```
In this example all remaining tags except those beginning with `tiger:`
are moved to the extratags list. Note that it is not possible to
already delete the tiger tags with `set_prefilters()` because that
would remove tiger:county before the address tags are processed.
## Customizing osm2pgsql callbacks
osm2pgsql expects the flex style to implement three callbacks, one process
function per OSM type. If you want to implement special handling for
certain OSM types, you can override the default implementations provided
by the flex-base module.
### Enabling additional relation types
OSM relations can represent very diverse
[types of real-world objects](https://wiki.openstreetmap.org/wiki/Key:type). To
be able to process them correctly, Nominatim needs to understand how to
create a geometry for each type. By default, the script knows how to
process relations of type `multipolygon`, `boundary` and `waterway`. All
other relation types are ignored.
To add other types relations, set `RELATION_TYPES` for
the type to the kind of geometry that should be created. The following
kinds of geometries can be used:
* __relation_as_multipolygon__ creates a (Multi)Polygon from the ways in
the relation. If the ways do not form a valid area, then the object is
silently discarded.
* __relation_as_multiline__ creates a (Multi)LineString from the ways in
the relations. Ways are combined as much as possible without any regards
to their order in the relation.
!!! Example
``` lua
local flex = require('import-full')
flex.RELATION_TYPES['site'] = flex.relation_as_multipolygon
```
With this line relations of `type=site` will be included in the index
according to main tags found. This only works when the site relation
resolves to a valid area. Nodes in the site relation are not part of the
geometry.
### Adding additional logic to processing functions
The default processing functions are also exported by the flex-base module
as `process_node`, `process_way` and `process_relation`. These can be used
to implement your own processing functions with some additional processing
logic.
!!! Example
``` lua
local flex = require('import-full')
function osm2pgsql.process_relation(object)
if object.tags.boundary ~= 'administrative' or object.tags.admin_level ~= '2' then
flex.process_relation(object)
end
end
```
This example discards all country-level boundaries and uses standard
handling for everything else. This can be useful if you want to use
your own custom country boundaries.
### Customizing the main processing function
!!! danger "Deprecation Warning"
The style used to allow overwriting the internal processing function
`process_tags()`. While this is currently still possible, it is no longer
encouraged and may stop working in future versions. The internal
`Place` class should now be considered read-only.
## Using osm2pgsql-themepark
The Nominatim osm2pgsql style is designed so that it can also be used as
a theme for [osm2pgsql-themepark](https://osm2pgsql.org/themepark/). This
makes it easy to combine Nominatim with other projects like
[openstreetmap-carto](https://github.com/gravitystorm/openstreetmap-carto)
in the same database.
To set up one of the preset styles, simply include a topic with the same name:
```
local themepark = require('themepark')
themepark:add_topic('nominatim/address')
```
Themepark topics offer two configuration options:
* **street_theme** allows to choose one of the sub topics for streets:
* _default_ - include all major streets and named minor paths
* _car_ - include all streets physically usable by cars
* _all_ - include all major streets and minor paths
* **with_extratags**, when set to a truthy value, then tags that are
not specifically used for address or naming are added to the
extratags column
The customization functions described in the
[Changing recognized tags](#changing-the-recognized-tags) section
are available from the theme. To access the theme you need to explicitly initialize it.
!!! Example
``` lua
local themepark = require('themepark')
themepark:add_topic('nominatim/full', {with_extratags = true})
local flex = themepark:init_theme('nominatim')
flex.modify_main_tags{'amenity' = {
'waste_basket' = 'delete'}
}
```
This example uses the full Nominatim configuration but disables
importing waste baskets.
You may also write a new configuration from scratch. Simply omit including
a Nominatim topic and only call the required customization functions.
Customizing the osm2pgsql processing functions as explained
[above](#adding-additional-logic-to-processing-functions) is not possible
when running under themepark. Instead include other topics that make the
necessary modifications or add an additional processor before including
the Nominatim topic.
!!! Example
``` lua
local themepark = require('themepark')
local function discard_country_boundaries(object)
if object.tags.boundary == 'administrative' and object.tags.admin_level == '2' then
return 'stop'
end
end
themepark:add_proc('relation', discard_country_boundaries)
-- Order matters here. The topic needs to be added after the custom callback.
themepark:add_topic('nominatim/full', {with_extratags = true})
```
Discarding country-level boundaries when running under themepark.
## osm2pgsql gazetteer output
Nominatim still allows you to configure the gazetteer output to remain
backwards compatible with older imports. It will be automatically used
when the style file name ends in `.style`. For documentation of the
old import style, please refer to the documentation of older releases
of Nominatim. Do not use the gazetteer output for new imports. There is no
guarantee that new versions of Nominatim are fully compatible with the
gazetteer output.
## Changing the style of existing databases
There is usually no issue changing the style of a database that is already
imported and now kept up-to-date with change files. Just be aware that any
change in the style applies to updates only. If you want to change the data
that is already in the database, then a reimport is necessary.

View File

@@ -1,55 +0,0 @@
## Importance
Search requests can yield multiple results which match equally well with
the original query. In such case Nominatim needs to order the results
according to a different criterion: importance. This is a measure for how
likely it is that a user will search for a given place. This section explains
the sources Nominatim uses for computing importance of a place and how to
customize them.
### How importance is computed
The main value for importance is derived from page ranking values for Wikipedia
pages for a place. For places that do not have their own
Wikipedia page, a formula is used that derives a static importance from the
place's [search rank](../customize/Ranking.md#search-rank).
In a second step, a secondary importance value is added which is meant to
represent how well-known the general area is where the place is located. It
functions as a tie-breaker between places with very similar primary
importance values.
nominatim.org has preprocessed importance tables for the
[primary Wikipedia rankings](https://nominatim.org/data/wikimedia-importance.sql.gz)
and for [secondary importance](https://nominatim.org/data/wikimedia-secondary-importance.sql.gz)
based on Wikipedia importance of the administrative areas.
The source code for creating these files is available in the Github projects
[osm-search/wikipedia-wikidata](https://github.com/osm-search/wikipedia-wikidata)
and
[osm-search/secondary-importance](https://github.com/osm-search/secondary-importance).
### Customizing secondary importance
The secondary importance is implemented as a simple
[Postgis raster](https://postgis.net/docs/raster.html) table, where Nominatim
looks up the value for the coordinates of the centroid of a place. You can
provide your own secondary importance raster in form of an SQL file named
`secondary_importance.sql.gz` in your project directory.
The SQL file needs to drop and (re)create a table `secondary_importance` which
must as a minimum contain a column `rast` of type `raster`. The raster must
be in EPSG:4326 and contain 16bit unsigned ints
(`raster_constraint_pixel_types(rast) = '{16BUI}'). Any other columns in the
table will be ignored. You must furthermore create an index as follows:
```
CREATE INDEX ON secondary_importance USING gist(ST_ConvexHull(gist))
```
The following raster2pgsql command will create a table from a tiff file
that conforms to the requirements:
```
raster2pgsql -I -C -Y -d -t 128x128 input.tiff public.secondary_importance
```

View File

@@ -1,22 +0,0 @@
Nominatim comes with a predefined set of configuration options that should
work for most standard installations. If you have special requirements, there
are many places where the configuration can be adapted. This chapter describes
the following configurable parts:
* [Global Settings](Settings.md) has a detailed description of all parameters that
can be set in your local `.env` configuration
* [Import styles](Import-Styles.md) explains how to write your own import style
in order to control what kind of OSM data will be imported
* [API Result Formatting](Result-Formatting.md) shows how to change the
output of the Nominatim API
* [Place ranking](Ranking.md) describes the configuration around classifing
places in terms of their importance and their role in an address
* [Tokenizers](Tokenizers.md) describes the configuration of the module
responsible for analysing and indexing names
* [Special Phrases](Special-Phrases.md) are common nouns or phrases that
can be used in search to identify a class of places
There are also guides for adding the following external data:
* [US house numbers from the TIGER dataset](Tiger.md)
* [External postcodes](Postcodes.md)

View File

@@ -1,37 +0,0 @@
# External postcode data
Nominatim creates a table of known postcode centroids during import. This table
is used for searches of postcodes and for adding postcodes to places where the
OSM data does not provide one. These postcode centroids are mainly computed
from the OSM data itself. In addition, Nominatim supports reading postcode
information from an external CSV file, to supplement the postcodes that are
missing in OSM.
To enable external postcode support, simply put one CSV file per country into
your project directory and name it `<CC>_postcodes.csv`. `<CC>` must be the
two-letter country code for which to apply the file. The file may also be
gzipped. Then it must be called `<CC>_postcodes.csv.gz`.
The CSV file must use commas as a delimiter and have a header line. Nominatim
expects three columns to be present: `postcode`, `lat` and `lon`. All other
columns are ignored. `lon` and `lat` must describe the x and y coordinates of the
postcode centroids in WGS84.
The postcode files are loaded only when there is data for the given country
in your database. For example, if there is a `us_postcodes.csv` file in your
project directory but you import only an excerpt of Italy, then the US postcodes
will simply be ignored.
As a rule, the external postcode data should be put into the project directory
**before** starting the initial import. Still, you can add, remove and update the
external postcode data at any time. Simply
run:
```
nominatim refresh --postcodes
```
to make the changes visible in your database. Be aware, however, that the changes
only have an immediate effect on searches for postcodes. Postcodes that were
added to places are only updated, when they are reindexed. That usually happens
only during replication updates.

View File

@@ -1,139 +0,0 @@
# Place Ranking in Nominatim
Nominatim uses two metrics to rank a place: search rank and address rank.
This chapter explains what place ranking means and how it can be customized.
## Search rank
The search rank describes the extent and importance of a place. It is used
when ranking search results. Simply put, if there are two results for a
search query which are otherwise equal, then the result with the _lower_
search rank will be appear higher in the result list.
Search ranks are not so important these days because many well-known
places use the Wikipedia importance ranking instead.
The following table gives an overview of the kind of features that Nominatim
expects for each rank:
rank | typical place types | extent
-------|---------------------------------|-------
1-3 | oceans, continents | -
4 | countries | -
5-9 | states, regions, provinces | -
10-12 | counties | -
13-16 | cities, municipalities, islands | 15 km
17-18 | towns, boroughs | 4 km
19 | villages, suburbs | 2 km
20 | hamlets, farms, neighbourhoods | 1 km
21-25 | isolated dwellings, city blocks | 500 m
The extent column describes how far a feature is assumed to reach when it
is mapped only as a point. Larger features like countries and states are usually
available with their exact area in the OpenStreetMap data. That is why no extent
is given.
## Address rank
The address rank describes where a place shows up in an address hierarchy.
Usually only administrative boundaries and place nodes and areas are
eligible to be part of an address. Places that should not appear in the
address must have an address rank of 0.
The following table gives an overview how ranks are mapped to address parts:
rank | address part
-------------|-------------
1-3 | _unused_
4 | country
5-9 | state
10-12 | county
13-16 | city
17-21 | suburb
22-24 | neighbourhood
25 | squares, farms, localities
26-27 | street
28-30 | POI/house number
The country rank 4 usually doesn't show up in the address parts of an object.
The country is determined indirectly from the country code.
Ranks 5-24 can be assigned more or less freely. They make up the major part
of the address.
Rank 25 is also an addressing rank but it is special because while it can be
the parent to a POI with an addr:place of the same name, it cannot be a parent
to streets. Use it for place features that are technically on the same level
as a street (e.g. squares, city blocks) or for places that should not normally
appear in an address unless explicitly tagged so (e.g place=locality which
should be uninhabited and as such not addressable).
The street ranks 26 and 27 are handled slightly differently. Only one object
from these ranks shows up in an address.
For POI level objects like shops, buildings or house numbers always use rank 30.
Ranks 28 is reserved for house number interpolations. 29 is for internal use
only.
## Rank configuration
Search and address ranks are assigned to a place when it is first imported
into the database. There are a few hard-coded rules for the assignment:
* postcodes follow special rules according to their length
* boundaries that are not areas and railway=rail are dropped completely
* the following are always search rank 30 and address rank 0:
* highway nodes
* landuse that is not an area
Other than that, the ranks can be freely assigned via the JSON file according
to their type and the country they are in. The name of the config file to be
used can be changed with the setting `NOMINATIM_ADDRESS_LEVEL_CONFIG`.
The address level configuration must consist of an array of configuration
entries, each containing a tag definition and an optional country array:
```
[ {
"tags" : {
"place" : {
"county" : 12,
"city" : 16,
},
"landuse" : {
"residential" : 22,
"" : 30
}
}
},
{
"countries" : [ "ca", "us" ],
"tags" : {
"boundary" : {
"administrative8" : 18,
"administrative9" : 20
},
"landuse" : {
"residential" : [22, 0]
}
}
}
]
```
The `countries` field contains a list of countries (as ISO 3166-1 alpha 2 code)
for which the definition applies. When the field is omitted, then the
definition is used as a fallback, when nothing more specific for a given
country exists.
`tags` contains the ranks for key/value pairs. The ranks can be either a
single number, in which case they are the search and address rank, or an array
of search and address rank (in that order). The value may be left empty.
Then the rank is used when no more specific value is found for the given
key.
Countries and key/value combination may appear in multiple definitions. Just
make sure that each combination of country/key/value appears only once per
file. Otherwise the import will fail with a UNIQUE INDEX constraint violation
on import.

View File

@@ -1,259 +0,0 @@
# Changing the Appearance of Results in the Server API
The Nominatim Server API offers a number of formatting options that
present search results in [different output formats](../api/Output.md).
These results only contain a subset of all the information that Nominatim
has about the result. This page explains how to adapt the result output
or add additional result formatting.
## Defining custom result formatting
To change the result output, you need to place a file `api/v1/format.py`
into your project directory. This file needs to define a single variable
`dispatch` containing a [FormatDispatcher](#formatdispatcher). This class
serves to collect the functions for formatting the different result types
and offers helper functions to apply the formatters.
There are two ways to define the `dispatch` variable. If you want to reuse
the default output formatting and just make some changes or add an additional
format type, then import the dispatch object from the default API:
``` python
from nominatim_api.v1.format import dispatch as dispatch
```
If you prefer to define a completely new result output, then you can
create an empty dispatcher object:
``` python
from nominatim_api import FormatDispatcher
dispatch = FormatDispatcher()
```
## The formatting function
The dispatcher organises the formatting functions by format and result type.
The format corresponds to the `format` parameter of the API. It can contain
one of the predefined format names or you can invent your own new format.
API calls return data classes or an array of a data class which represent
the result. You need to make sure there are formatters defined for the
following result types:
* StatusResult (single object, returned by `/status`)
* DetailedResult (single object, returned by `/details`)
* SearchResults (list of objects, returned by `/search`)
* ReverseResults (list of objects, returned by `/reverse` and `/lookup`)
* RawDataList (simple object, returned by `/deletable` and `/polygons`)
A formatter function has the following signature:
``` python
def format_func(result: ResultType, options: Mapping[str, Any]) -> str
```
The options dictionary contains additional information about the original
query. See the [reference below](#options-for-different-result-types)
about the possible options.
To set the result formatter for a certain result type and format, you need
to write the format function and decorate it with the
[`format_func`](#nominatim_api.FormatDispatcher.format_func)
decorator.
For example, let us extend the result for the status call in text format
and add the server URL. Such a formatter would look like this:
``` python
from nominatim_api import StatusResult
@dispatch.format_func(StatusResult, 'text')
def _format_status_text(result, _):
header = 'Status for server nominatim.openstreetmap.org'
if result.status:
return f"{header}\n\nERROR: {result.message}"
return f"{header}\n\nOK"
```
If your dispatcher is derived from the default one, then this definition
will overwrite the original formatter function. This way it is possible
to customize the output of selected results.
## Adding new formats
You may also define a completely different output format. This is as simple
as adding formatting functions for all result types using the custom
format name:
``` python
from nominatim_api import StatusResult
@dispatch.format_func(StatusResult, 'chatty')
def _format_status_text(result, _):
if result.status:
return f"The server is currently not running. {result.message}"
return "Good news! The server is running just fine."
```
That's all. Nominatim will automatically pick up the new format name and
will allow the user to use it. There is no need to implement formatter
functions for all the result types, when you invent a new one. The
available formats will be determined for each API endpoint separately.
To find out which formats are available, you can use the `--list-formats`
option of the CLI tool:
```
me@machine:planet-project$ nominatim status --list-formats
2024-08-16 19:54:00: Using project directory: /home/nominatim/planet-project
text
json
chatty
debug
me@machine:planet-project$
```
The `debug` format listed in the last line will always appear. It is a
special format that enables debug output via the command line (the same
as the `debug=1` parameter enables for the server API). To not clash
with this built-in function, you shouldn't name your own format 'debug'.
### Content type of new formats
All responses will be returned with the content type application/json by
default. If your format produces a different content type, you need
to configure the content type with the `set_content_type()` function.
For example, the 'chatty' format above returns just simple text. So the
content type should be set up as:
``` python
from nominatim_api.server.content_types import CONTENT_TEXT
dispatch.set_content_type('chatty', CONTENT_TEXT)
```
The `content_types` module used above provides constants for the most
frequent content types. You set the content type to an arbitrary string,
if the content type you need is not available.
## Formatting error messages
Any exception thrown during processing of a request is given to
a special error formatting function. It takes the requested content type,
the status code and the error message. It should return the error message
in a form appropriate for the given content type.
You can overwrite the default formatting function with the decorator
`error_format_func`:
``` python
import nominatim_api.server.content_types as ct
@dispatch.error_format_func
def _format_error(content_type: str, msg: str, status: int) -> str:
if content_type == ct.CONTENT_XML:
return f"""<?xml version="1.0" encoding="UTF-8" ?>
<message>{msg}</message>
"""
if content_type == ct.CONTENT_JSON:
return f'"{msg}"'
return f"ERROR: {msg}"
```
## Debugging custom formatters
The easiest way to try out your custom formatter is by using the Nominatim
CLI commands. Custom formats can be chosen with the `--format` parameter:
```
me@machine:planet-project$ nominatim status --format chatty
2024-08-16 19:54:00: Using project directory: /home/nominatim/planet-project
Good news! The server is running just fine.
me@machine:planet-project$
```
They will also emit full error messages when there is a problem with the
code you need to debug.
!!! danger
In some cases, when you make an error with your import statement, the
CLI will not give you an error but instead tell you, that the API
commands are no longer available:
me@machine: nominatim status
usage: nominatim [-h] [--version] {import,freeze,replication,special-phrases,add-data,index,refresh,admin} ...
nominatim: error: argument subcommand: invalid choice: 'status'
This happens because the CLI tool is meant to still work when the
nominatim-api package is not installed. Import errors involving
`nominatim_api` are interpreted as "package not installed".
Use the help command to find out which is the offending import that
could not be found:
me@machine: nominatim -h
... [other help text] ...
Nominatim API package not found (was looking for module: nominatim_api.xxx).
## Reference
### FormatDispatcher
::: nominatim_api.FormatDispatcher
options:
heading_level: 6
group_by_category: False
### JsonWriter
::: nominatim_api.utils.json_writer.JsonWriter
options:
heading_level: 6
group_by_category: False
### Options for different result types
This section lists the options that may be handed in with the different result
types in the v1 version of the Nominatim API.
#### StatusResult
_None._
#### DetailedResult
| Option | Description |
|-----------------|-------------|
| locales | [Locale](../library/Result-Handling.md#locale) object for the requested language(s) |
| group_hierarchy | Setting of [group_hierarchy](../api/Details.md#output-details) parameter |
| icon_base_url | (optional) URL pointing to icons as set in [NOMINATIM_MAPICON_URL](Settings.md#nominatim_mapicon_url) |
#### SearchResults
| Option | Description |
|-----------------|-------------|
| query | Original query string |
| more_url | URL for requesting additional results for the same query |
| exclude_place_ids | List of place IDs already returned |
| viewbox | Setting of [viewbox](../api/Search.md#result-restriction) parameter |
| extratags | Setting of [extratags](../api/Search.md#output-details) parameter |
| namedetails | Setting of [namedetails](../api/Search.md#output-details) parameter |
| addressdetails | Setting of [addressdetails](../api/Search.md#output-details) parameter |
#### ReverseResults
| Option | Description |
|-----------------|-------------|
| query | Original query string |
| extratags | Setting of [extratags](../api/Search.md#output-details) parameter |
| namedetails | Setting of [namedetails](../api/Search.md#output-details) parameter |
| addressdetails | Setting of [addressdetails](../api/Search.md#output-details) parameter |
#### RawDataList
_None._

View File

@@ -1,60 +0,0 @@
A Nominatim database can be converted into an SQLite database and used as
a read-only source for geocoding queries. This sections describes how to
create and use an SQLite database.
!!! danger
This feature is in an experimental state at the moment. Use at your own
risk.
## Installing prerequisites
To use a SQLite database, you need to install:
* SQLite (>= 3.30)
* Spatialite (> 5.0.0)
* aiosqlite
On Ubuntu/Debian, you can run:
sudo apt install sqlite3 libsqlite3-mod-spatialite libspatialite7
Install the aiosqlite Python package in your virtual environment:
/srv/nominatim-venv/bin/pip install aiosqlite
## Creating a new SQLite database
Nominatim cannot import directly into SQLite database. Instead you have to
first create a geocoding database in PostgreSQL by running a
[regular Nominatim import](../admin/Import.md).
Once this is done, the database can be converted to SQLite with
nominatim convert -o mydb.sqlite
This will create a database where all geocoding functions are available.
Depending on what functions you need, the database can be made smaller:
* `--without-reverse` omits indexes only needed for reverse geocoding
* `--without-search` omit tables and indexes used for forward search
* `--without-details` leaves out extra information only available in the
details API
## Using an SQLite database
Once you have created the database, you can use it by simply pointing the
database DSN to the SQLite file:
NOMINATIM_DATABASE_DSN=sqlite:dbname=mydb.sqlite
Please note that SQLite support is only available for the Python frontend. To
use the test server with an SQLite database, you therefore need to switch
the frontend engine:
nominatim serve --engine falcon
You need to install falcon or starlette for this, depending on which engine
you choose.
The CLI query commands and the library interface already use the new Python
frontend and therefore work right out of the box.

View File

@@ -1,665 +0,0 @@
This section provides a reference of all configuration parameters that can
be used with Nominatim.
# Configuring Nominatim
Nominatim uses [dotenv](https://github.com/theskumar/python-dotenv) to manage
its configuration settings. There are two means to set configuration
variables: through an `.env` configuration file or through an environment
variable.
The `.env` configuration file needs to be placed into the
[project directory](../admin/Import.md#creating-the-project-directory). It
must contain configuration parameters in `<parameter>=<value>` format.
Please refer to the dotenv documentation for details.
The configuration options may also be set in the form of shell environment
variables. This is particularly useful, when you want to temporarily change
a configuration option. For example, to force the replication serve to
download the next change, you can temporarily disable the update interval:
NOMINATIM_REPLICATION_UPDATE_INTERVAL=0 nominatim replication --once
If a configuration option is defined through .env file and environment
variable, then the latter takes precedence.
## Configuration Parameter Reference
### Import and Database Settings
#### NOMINATIM_DATABASE_DSN
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Database connection string |
| **Format:** | string: `pgsql:<param1>=<value1>;<param2>=<value2>;...` |
| **Default:** | pgsql:dbname=nominatim |
| **After Changes:** | run `nominatim refresh --website` |
Sets the connection parameters for the Nominatim database. At a minimum
the name of the database (`dbname`) is required. You can set any additional
parameter that is understood by libpq. See the [Postgres documentation](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS) for a full list.
!!! note
It is usually recommended not to set the password directly in this
configuration parameter. Use a
[password file](https://www.postgresql.org/docs/current/libpq-pgpass.html)
instead.
#### NOMINATIM_DATABASE_WEBUSER
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Database query user |
| **Format:** | string |
| **Default:** | www-data |
| **After Changes:** | cannot be changed after import |
Defines the name of the database user that will run search queries. Usually
this is the user under which the webserver is executed. The Postgres user
needs to be set up before starting the import.
Nominatim grants minimal rights to this user to all tables that are needed
for running geocoding queries.
#### NOMINATIM_TOKENIZER
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Tokenizer used for normalizing and parsing queries and names |
| **Format:** | string |
| **Default:** | icu |
| **After Changes:** | cannot be changed after import |
Sets the tokenizer type to use for the import. For more information on
available tokenizers and how they are configured, see
[Tokenizers](../customize/Tokenizers.md).
#### NOMINATIM_TOKENIZER_CONFIG
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Configuration file for the tokenizer |
| **Format:** | path |
| **Default:** | _empty_ (default file depends on tokenizer) |
| **After Changes:** | see documentation for each tokenizer |
Points to the file with additional configuration for the tokenizer.
See the [Tokenizer](../customize/Tokenizers.md) descriptions for details
on the file format.
If a relative path is given, then the file is searched first relative to the
project directory and then in the global settings directory.
#### NOMINATIM_LIMIT_REINDEXING
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Avoid invalidating large areas |
| **Format:** | bool |
| **Default:** | yes |
Nominatim computes the address of each place at indexing time. This has the
advantage to make search faster but also means that more objects needs to
be invalidated when the data changes. For example, changing the name of
the state of Florida would require recomputing every single address point
in the state to make the new name searchable in conjunction with addresses.
Setting this option to 'yes' means that Nominatim skips reindexing of contained
objects when the area becomes too large.
#### NOMINATIM_LANGUAGES
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Restrict search languages |
| **Format:** | string: comma-separated list of language codes |
| **Default:** | _empty_ |
Normally Nominatim will include all language variants of name:XX
in the search index. Set this to a comma separated list of language
codes, to restrict import to a subset of languages.
Currently only affects the initial import of country names and special phrases.
#### NOMINATIM_USE_US_TIGER_DATA
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Enable searching for Tiger house number data |
| **Format:** | boolean |
| **Default:** | no |
| **After Changes:** | run `nominatim refresh --functions` |
When this setting is enabled, search and reverse queries also take data
from [Tiger house number data](Tiger.md) into account.
#### NOMINATIM_USE_AUX_LOCATION_DATA
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Enable searching in external house number tables |
| **Format:** | boolean |
| **Default:** | no |
| **After Changes:** | run `nominatim refresh --functions` |
| **Comment:** | Do not use. |
When this setting is enabled, search queries also take data from external
house number tables into account.
*Warning:* This feature is currently unmaintained and should not be used.
#### NOMINATIM_HTTP_PROXY
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Use HTTP proxy when downloading data |
| **Format:** | boolean |
| **Default:** | no |
When this setting is enabled and at least
[NOMINATIM_HTTP_PROXY_HOST](#nominatim_http_proxy_host) and
[NOMINATIM_HTTP_PROXY_PORT](#nominatim_http_proxy_port) are set, the
configured proxy will be used, when downloading external data like
replication diffs.
#### NOMINATIM_HTTP_PROXY_HOST
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Host name of the proxy to use |
| **Format:** | string |
| **Default:** | _empty_ |
When [NOMINATIM_HTTP_PROXY](#nominatim_http_proxy) is enabled, this setting
configures the proxy host name.
#### NOMINATIM_HTTP_PROXY_PORT
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Port number of the proxy to use |
| **Format:** | integer |
| **Default:** | 3128 |
When [NOMINATIM_HTTP_PROXY](#nominatim_http_proxy) is enabled, this setting
configures the port number to use with the proxy.
#### NOMINATIM_HTTP_PROXY_LOGIN
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Username for proxies that require login |
| **Format:** | string |
| **Default:** | _empty_ |
When [NOMINATIM_HTTP_PROXY](#nominatim_http_proxy) is enabled, use this
setting to define the username for proxies that require a login.
#### NOMINATIM_HTTP_PROXY_PASSWORD
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Password for proxies that require login |
| **Format:** | string |
| **Default:** | _empty_ |
When [NOMINATIM_HTTP_PROXY](#nominatim_http_proxy) is enabled, use this
setting to define the password for proxies that require a login.
#### NOMINATIM_OSM2PGSQL_BINARY
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Location of the osm2pgsql binary |
| **Format:** | path |
| **Default:** | _empty_ (use binary shipped with Nominatim) |
| **Comment:** | EXPERT ONLY |
Nominatim uses [osm2pgsql](https://osm2pgsql.org) to load the OSM data
initially into the database. Nominatim comes bundled with a version of
osm2pgsql that is guaranteed to be compatible. Use this setting to use
a different binary instead. You should do this only when you know exactly
what you are doing. If the osm2pgsql version is not compatible, then the
result is undefined.
#### NOMINATIM_WIKIPEDIA_DATA_PATH
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Directory with the wikipedia importance data |
| **Format:** | path |
| **Default:** | _empty_ (project directory) |
Set a custom location for the
[wikipedia ranking file](../admin/Import.md#wikipediawikidata-rankings). When
unset, Nominatim expects the data to be saved in the project directory.
#### NOMINATIM_ADDRESS_LEVEL_CONFIG
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Configuration file for rank assignments |
| **Format:** | path |
| **Default:** | address-levels.json |
The _address level configuration_ defines the rank assignments for places. See
[Place Ranking](Ranking.md) for a detailed explanation what rank assignments
are and what the configuration file must look like.
When a relative path is given, then the file is searched first relative to the
project directory and then in the global settings directory.
#### NOMINATIM_IMPORT_STYLE
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Configuration to use for the initial OSM data import |
| **Format:** | string or path |
| **Default:** | extratags |
The _style configuration_ describes which OSM objects and tags are taken
into consideration for the search database. Nominatim comes with a set
of pre-configured styles, that may be configured here.
You can also write your own custom style and point the setting to the file
with the style. When a relative path is given, then the style file is searched
first relative to the project directory and then in the global settings
directory.
See [Import Styles](Import-Styles.md)
for more information on the available internal styles and the format of the
configuration file.
#### NOMINATIM_FLATNODE_FILE
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Location of osm2pgsql flatnode file |
| **Format:** | path |
| **Default:** | _empty_ (do not use a flatnote file) |
| **After Changes:** | Only change when moving the file physically. |
The `osm2pgsql flatnode file` is file that efficiently stores geographic
location for OSM nodes. For larger imports it can significantly speed up
the import. When this option is unset, then osm2pgsql uses a PsotgreSQL table
to store the locations.
When a relative path is given, then the flatnode file is created/searched
relative to the project directory.
!!! warning
The flatnode file is not only used during the initial import but also
when adding new data with `nominatim add-data` or `nominatim replication`.
Make sure you keep the flatnode file around and this setting unmodified,
if you plan to add more data or run regular updates.
#### NOMINATIM_TABLESPACE_*
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Group of settings for distributing the database over tablespaces |
| **Format:** | string |
| **Default:** | _empty_ (do not use a table space) |
| **After Changes:** | no effect after initial import |
Nominatim allows to distribute the search database over up to 10 different
[PostgreSQL tablespaces](https://www.postgresql.org/docs/current/manage-ag-tablespaces.html).
If you use this option, make sure that the tablespaces exist before starting
the import.
The available tablespace groups are:
NOMINATIM_TABLESPACE_SEARCH_DATA
: Data used by the geocoding frontend.
NOMINATIM_TABLESPACE_SEARCH_INDEX
: Indexes used by the geocoding frontend.
NOMINATIM_TABLESPACE_OSM_DATA
: Raw OSM data cache used for import and updates.
NOMINATIM_TABLESPACE_OSM_INDEX
: Indexes on the raw OSM data cache.
NOMINATIM_TABLESPACE_PLACE_DATA
: Data table with the pre-filtered but still unprocessed OSM data.
Used only during imports and updates.
NOMINATIM_TABLESPACE_PLACE_INDEX
: Indexes on raw data table. Used only during imports and updates.
NOMINATIM_TABLESPACE_ADDRESS_DATA
: Data tables used for computing search terms and addresses of places
during import and updates.
NOMINATIM_TABLESPACE_ADDRESS_INDEX
: Indexes on the data tables for search term and address computation.
Used only for import and updates.
NOMINATIM_TABLESPACE_AUX_DATA
: Auxiliary data tables for non-OSM data, e.g. for Tiger house number data.
NOMINATIM_TABLESPACE_AUX_INDEX
: Indexes on auxiliary data tables.
### Replication Update Settings
#### NOMINATIM_REPLICATION_URL
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Base URL of the replication service |
| **Format:** | url |
| **Default:** | https://planet.openstreetmap.org/replication/minute |
| **After Changes:** | run `nominatim replication --init` |
Replication services deliver updates to OSM data. Use this setting to choose
which replication service to use. See [Updates](../admin/Update.md) for more
information on how to set up regular updates.
#### NOMINATIM_REPLICATION_MAX_DIFF
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Maximum amount of data to download per update cycle (in MB) |
| **Format:** | integer |
| **Default:** | 50 |
| **After Changes:** | restart the replication process |
At each update cycle Nominatim downloads diffs until either no more diffs
are available on the server (i.e. the database is up-to-date) or the limit
given in this setting is exceeded. Nominatim guarantees to downloads at least
one diff, if one is available, no matter how small the setting.
The default for this setting is fairly conservative because Nominatim keeps
all data downloaded in one cycle in RAM. Using large values in a production
server may interfere badly with the search frontend because it evicts data
from RAM that is needed for speedy answers to incoming requests. It is usually
a better idea to keep this setting lower and run multiple update cycles
to catch up with updates.
When catching up in non-production mode, for example after the initial import,
the setting can easily be changed temporarily on the command line:
NOMINATIM_REPLICATION_MAX_DIFF=3000 nominatim replication
#### NOMINATIM_REPLICATION_UPDATE_INTERVAL
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Publication interval of the replication service (in seconds) |
| **Format:** | integer |
| **Default:** | 75 |
| **After Changes:** | restart the replication process |
This setting determines when Nominatim will attempt to download again a new
update. The time is computed from the publication date of the last diff
downloaded. Setting this to a slightly higher value than the actual
publication interval avoids unnecessary rechecks.
#### NOMINATIM_REPLICATION_RECHECK_INTERVAL
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Wait time to recheck for a pending update (in seconds) |
| **Format:** | integer |
| **Default:** | 60 |
| **After Changes:** | restart the replication process |
When replication updates are run in continuous mode (using `nominatim replication`),
this setting determines how long Nominatim waits until it looks for updates
again when updates were not available on the server.
Note that this is different from
[NOMINATIM_REPLICATION_UPDATE_INTERVAL](#nominatim_replication_update_interval).
Nominatim will never attempt to query for new updates for UPDATE_INTERVAL
seconds after the current database date. Only after the update interval has
passed it asks for new data. If then no new data is found, it waits for
RECHECK_INTERVAL seconds before it attempts again.
### API Settings
#### NOMINATIM_CORS_NOACCESSCONTROL
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Send permissive CORS access headers |
| **Format:** | boolean |
| **Default:** | yes |
| **After Changes:** | run `nominatim refresh --website` |
When this setting is enabled, API HTTP responses include the HTTP
[CORS](https://en.wikipedia.org/wiki/CORS) headers
`access-control-allow-origin: *` and `access-control-allow-methods: OPTIONS,GET`.
#### NOMINATIM_MAPICON_URL
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | URL prefix for static icon images |
| **Format:** | url |
| **Default:** | _empty_ |
| **After Changes:** | run `nominatim refresh --website` |
When a mapicon URL is configured, then Nominatim includes an additional `icon`
field in the responses, pointing to an appropriate icon for the place type.
Map icons used to be included in Nominatim itself but now have moved to the
[nominatim-ui](https://github.com/osm-search/nominatim-ui/) project. If you
want the URL to be included in API responses, make the `/mapicon`
directory of the project available under a public URL and point this setting
to the directory.
#### NOMINATIM_DEFAULT_LANGUAGE
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Language of responses when no language is requested |
| **Format:** | language code |
| **Default:** | _empty_ (use the local language of the feature) |
| **After Changes:** | run `nominatim refresh --website` |
Nominatim localizes the place names in responses when the corresponding
translation is available. Users can request a custom language setting through
the HTTP accept-languages header or through the explicit parameter
[accept-languages](../api/Search.md#language-of-results). If neither is
given, it falls back to this setting. If the setting is also empty, then
the local languages (in OSM: the name tag without any language suffix) is
used.
#### NOMINATIM_LOOKUP_MAX_COUNT
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Maximum number of OSM ids accepted by /lookup |
| **Format:** | integer |
| **Default:** | 50 |
| **After Changes:** | run `nominatim refresh --website` |
The /lookup point accepts list of ids to look up address details for. This
setting restricts the number of places a user may look up with a single
request.
#### NOMINATIM_POLYGON_OUTPUT_MAX_TYPES
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Number of different geometry formats that may be returned |
| **Format:** | integer |
| **Default:** | 1 |
| **After Changes:** | run `nominatim refresh --website` |
Nominatim supports returning full geometries of places. The geometries may
be requested in different formats with one of the
[`polygon_*` parameters](../api/Search.md#polygon-output). Use this
setting to restrict the number of geometry types that may be requested
with a single query.
Setting this parameter to 0 disables polygon output completely.
#### NOMINATIM_SEARCH_WITHIN_COUNTRIES
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Disable search for elements that are not in the country grid |
| **Format:** | boolean |
| **Default:** | no |
| **After Changes:** | run `nominatim refresh --website` |
Enable to search elements just within countries.
When enabled, if, despite not finding a point within the static grid of countries, it
finds a geometry of a region, do not return the geometry.
Return "Unable to geocode" instead.
#### NOMINATIM_SERVE_LEGACY_URLS
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Enable serving via URLs with a .php suffix |
| **Format:** | boolean |
| **Default:** | yes |
| **Comment:** | Python frontend only |
When enabled, then endpoints are reachable as `/<name>` as well as `/<name>.php`.
This can be useful when you want to be backwards-compatible with previous
versions of Nominatim.
#### NOMINATIM_API_POOL_SIZE
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Number of parallel database connections per worker |
| **Format:** | number |
| **Default:** | 10 |
| **Comment:** | Python frontend only |
Sets the maximum number of database connections available for a single instance
of Nominatim. When configuring the maximum number of connections that your
PostgreSQL database can handle, you need at least
`NOMINATIM_API_POOL_SIZE` * `<number of configured workers>` connections.
For configuring the number of workers, refer to the section about
[Deploying the Python frontend](../admin/Deployment-Python.md).
#### NOMINATIM_QUERY_TIMEOUT
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Timeout for SQL queries to the database |
| **Format:** | number (seconds) |
| **Default:** | 10 |
| **Comment:** | Python frontend only |
When this timeout is set, then all SQL queries that run longer than the
specified numbers of seconds will be cancelled and the user receives a
timeout exceptions. Users of the API see a 503 HTTP error.
The timeout does ont apply when using the
[low-level DB access](../library/Low-Level-DB-Access.md)
of the library. A timeout can be manually set, if required.
#### NOMINATIM_REQUEST_TIMEOUT
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Timeout for search queries |
| **Format:** | number (seconds) |
| **Default:** | 60 |
| **Comment:** | Python frontend only |
When this timeout is set, a search query will finish sending queries
to the database after the timeout has passed and immediately return the
results gathered so far.
Note that under high load you may observe that users receive different results
than usual without seeing an error. This may cause some confusion.
### Logging Settings
#### NOMINATIM_LOG_DB
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Log requests into the database |
| **Format:** | boolean |
| **Default:** | no |
| **After Changes:** | run `nominatim refresh --website` |
Enable logging requests into a database table with this setting. The logs
can be found in the table `new_query_log`.
When using this logging method, it is advisable to set up a job that
regularly clears out old logging information. Nominatim will not do that
on its own.
Can be used as the same time as NOMINATIM_LOG_FILE.
#### NOMINATIM_LOG_FILE
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Log requests into a file |
| **Format:** | path |
| **Default:** | _empty_ (logging disabled) |
| **After Changes:** | run `nominatim refresh --website` |
Enable logging of requests into a file with this setting by setting the log
file where to log to. A relative file name is assumed to be relative to
the project directory.
The entries in the log file have the following format:
<request time> <execution time in s> <number of results> <type> "<query string>"
Request time is the time when the request was started. The execution time is
given in seconds and includes the entire time the query was queued and executed
in the frontend.
type contains the name of the endpoint used.
Can be used as the same time as NOMINATIM_LOG_DB.
#### NOMINATIM_DEBUG_SQL
| Summary | |
| -------------- | --------------------------------------------------- |
| **Description:** | Enable printing of raw SQL by SQLAlchemy |
| **Format:** | boolean |
| **Default:** | no |
| **Comment:** | **For developers only.** |
This settings enables
[SQL debugging](https://docs.sqlalchemy.org/en/20/core/engines.html#dbengine-logging)
by SQLAlchemy. This can be helpful when debugging some bugs with internal
query handling. It should only be used together with the CLI query functions.
Enabling it for server mode may have unintended consequences. Use the `debug`
parameter instead, which prints information on how the search is executed
including SQL statements.

View File

@@ -1,49 +0,0 @@
# Special phrases
## Importing OSM user-maintained special phrases
As described in the [Import section](../admin/Import.md), it is possible to
import special phrases from the wiki with the following command:
```sh
nominatim special-phrases --import-from-wiki
```
## Importing custom special phrases
Special phrases may also be imported from any custom CSV file. The file needs
to have a header line, use comma as delimiter and define the following
columns:
* **phrase**: the keyword to look for
* **class**: key of the main tag of the place to find
(see [Import styles](Import-Styles.md#how-processing-works)
* **type**: value of the main tag
* **operator**: type of special phrase, may be one of:
* *in*: place is within the place defined by the search term (e.g. "_Hotels in_ Berlin")
* *near*: place is near the place defined by the search term (e.g. "_bus stops near_ Big Ben")
* *named*: special phrase is a classifier (e.g. "_hotel_ California")
* *-*: unspecified, can be any of the above
If the file contains any other columns, then they are silently ignored
To import the CSV file, use the following command:
```sh
nominatim special-phrases --import-from-csv <csv file>
```
Note that the two previous import commands will update the phrases from your database.
This means that if you import some phrases from a CSV file, only the phrases
present in the CSV file will be kept in the database. All other phrases will
be removed.
If you want to only add new phrases and not update the other ones you can add
the argument `--no-replace` to the import command. For example:
```sh
nominatim special-phrases --import-from-csv <csv file> --no-replace
```
This will add the phrases present in the CSV file into the database without
removing the other ones.

View File

@@ -1,28 +0,0 @@
# Installing TIGER housenumber data for the US
Nominatim is able to use the official [TIGER](https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html)
address set to complement the OSM house number data in the US. You can add
TIGER data to your own Nominatim instance by following these steps. The
entire US adds about 10GB to your database.
1. Get preprocessed TIGER data:
cd $PROJECT_DIR
wget https://nominatim.org/data/tiger-nominatim-preprocessed-latest.csv.tar.gz
2. Import the data into your Nominatim database:
nominatim add-data --tiger-data tiger-nominatim-preprocessed-latest.csv.tar.gz
3. Enable use of the Tiger data in your existing `.env` file by adding:
echo NOMINATIM_USE_US_TIGER_DATA=yes >> .env
4. Apply the new settings:
nominatim refresh --functions --website
See the [TIGER-data project](https://github.com/osm-search/TIGER-data) for more
information on how the data got preprocessed.

View File

@@ -1,389 +0,0 @@
# Tokenizers
The tokenizer module in Nominatim is responsible for analysing the names given
to OSM objects and the terms of an incoming query in order to make sure, they
can be matched appropriately.
Nominatim currently offers only one tokenizer module, the ICU tokenizer. This section
describes the tokenizer and how it can be configured.
!!! important
The selection of tokenizer is tied to a database installation. You need to choose
and configure the tokenizer before starting the initial import. Once the import
is done, you cannot switch to another tokenizer anymore. Reconfiguring the
chosen tokenizer is very limited as well. See the comments in each tokenizer
section.
## ICU tokenizer
The ICU tokenizer uses the [ICU library](http://site.icu-project.org/) to
normalize names and queries. It also offers configurable decomposition and
abbreviation handling.
This tokenizer is currently the default.
To enable the tokenizer add the following line to your project configuration:
```
NOMINATIM_TOKENIZER=icu
```
### How it works
On import the tokenizer processes names in the following three stages:
1. During the **Sanitizer step** incoming names are cleaned up and converted to
**full names**. This step can be used to regularize spelling, split multi-name
tags into their parts and tag names with additional attributes. See the
[Sanitizers section](#sanitizers) below for available cleaning routines.
2. The **Normalization** part removes all information from the full names
that are not relevant for search.
3. The **Token analysis** step takes the normalized full names and creates
all transliterated variants under which the name should be searchable.
See the [Token analysis](#token-analysis) section below for more
information.
During query time, the tokeinzer is responsible for processing incoming
queries. This happens in two stages:
1. During **query preprocessing** the incoming text is split into name
chunks and normalised. This usually means applying the same normalisation
as during the import process but may involve other processing like,
for example, word break detection.
2. The **token analysis** step breaks down the query parts into tokens,
looks them up in the database and assigns them possible functions and
probabilities.
Query processing can be further customized while the rest of the analysis
is hard-coded.
### Configuration
The ICU tokenizer is configured using a YAML file which can be configured using
`NOMINATIM_TOKENIZER_CONFIG`. The configuration is read on import and then
saved as part of the internal database status. Later changes to the variable
have no effect.
Here is an example configuration file:
``` yaml
query-preprocessing:
- normalize
normalization:
- ":: lower ()"
- "ß > 'ss'" # German szet is unambiguously equal to double ss
transliteration:
- !include /etc/nominatim/icu-rules/extended-unicode-to-asccii.yaml
- ":: Ascii ()"
sanitizers:
- step: split-name-list
token-analysis:
- analyzer: generic
variants:
- !include icu-rules/variants-ca.yaml
- words:
- road -> rd
- bridge -> bdge,br,brdg,bri,brg
mutations:
- pattern: 'ä'
replacements: ['ä', 'ae']
```
The configuration file contains four sections:
`normalization`, `transliteration`, `sanitizers` and `token-analysis`.
#### Query preprocessing
The section for `query-preprocessing` defines an ordered list of functions
that are applied to the query before the token analysis.
The following is a list of preprocessors that are shipped with Nominatim.
##### normalize
::: nominatim_api.query_preprocessing.normalize
options:
members: False
heading_level: 6
docstring_section_style: spacy
#### Normalization and Transliteration
The normalization and transliteration sections each define a set of
ICU rules that are applied to the names.
The **normalization** rules are applied after sanitation. They should remove
any information that is not relevant for search at all. Usual rules to be
applied here are: lower-casing, removing of special characters, cleanup of
spaces.
The **transliteration** rules are applied at the end of the tokenization
process to transfer the name into an ASCII representation. Transliteration can
be useful to allow for further fuzzy matching, especially between different
scripts.
Each section must contain a list of
[ICU transformation rules](https://unicode-org.github.io/icu/userguide/transforms/general/rules.html).
The rules are applied in the order in which they appear in the file.
You can also include additional rules from external yaml file using the
`!include` tag. The included file must contain a valid YAML list of ICU rules
and may again include other files.
!!! warning
The ICU rule syntax contains special characters that conflict with the
YAML syntax. You should therefore always enclose the ICU rules in
double-quotes.
#### Sanitizers
The sanitizers section defines an ordered list of functions that are applied
to the name and address tags before they are further processed by the tokenizer.
They allows to clean up the tagging and bring it to a standardized form more
suitable for building the search index.
!!! hint
Sanitizers only have an effect on how the search index is built. They
do not change the information about each place that is saved in the
database. In particular, they have no influence on how the results are
displayed. The returned results always show the original information as
stored in the OpenStreetMap database.
Each entry contains information of a sanitizer to be applied. It has a
mandatory parameter `step` which gives the name of the sanitizer. Depending
on the type, it may have additional parameters to configure its operation.
The order of the list matters. The sanitizers are applied exactly in the order
that is configured. Each sanitizer works on the results of the previous one.
The following is a list of sanitizers that are shipped with Nominatim.
##### split-name-list
::: nominatim_db.tokenizer.sanitizers.split_name_list
options:
members: False
heading_level: 6
docstring_section_style: spacy
##### strip-brace-terms
::: nominatim_db.tokenizer.sanitizers.strip_brace_terms
options:
members: False
heading_level: 6
docstring_section_style: spacy
##### tag-analyzer-by-language
::: nominatim_db.tokenizer.sanitizers.tag_analyzer_by_language
options:
members: False
heading_level: 6
docstring_section_style: spacy
##### clean-housenumbers
::: nominatim_db.tokenizer.sanitizers.clean_housenumbers
options:
members: False
heading_level: 6
docstring_section_style: spacy
##### clean-postcodes
::: nominatim_db.tokenizer.sanitizers.clean_postcodes
options:
members: False
heading_level: 6
docstring_section_style: spacy
##### clean-tiger-tags
::: nominatim_db.tokenizer.sanitizers.clean_tiger_tags
options:
members: False
heading_level: 6
docstring_section_style: spacy
#### delete-tags
::: nominatim_db.tokenizer.sanitizers.delete_tags
options:
members: False
heading_level: 6
docstring_section_style: spacy
#### tag-japanese
::: nominatim_db.tokenizer.sanitizers.tag_japanese
options:
members: False
heading_level: 6
docstring_section_style: spacy
#### Token Analysis
Token analyzers take a full name and transform it into one or more normalized
form that are then saved in the search index. In its simplest form, the
analyzer only applies the transliteration rules. More complex analyzers
create additional spelling variants of a name. This is useful to handle
decomposition and abbreviation.
The ICU tokenizer may use different analyzers for different names. To select
the analyzer to be used, the name must be tagged with the `analyzer` attribute
by a sanitizer (see for example the
[tag-analyzer-by-language sanitizer](#tag-analyzer-by-language)).
The token-analysis section contains the list of configured analyzers. Each
analyzer must have an `id` parameter that uniquely identifies the analyzer.
The only exception is the default analyzer that is used when no special
analyzer was selected. There are analysers with special ids:
* '@housenumber'. If an analyzer with that name is present, it is used
for normalization of house numbers.
* '@potcode'. If an analyzer with that name is present, it is used
for normalization of postcodes.
Different analyzer implementations may exist. To select the implementation,
the `analyzer` parameter must be set. The different implementations are
described in the following.
##### Generic token analyzer
The generic analyzer `generic` is able to create variants from a list of given
abbreviation and decomposition replacements and introduce spelling variations.
###### Variants
The optional 'variants' section defines lists of replacements which create alternative
spellings of a name. To create the variants, a name is scanned from left to
right and the longest matching replacement is applied until the end of the
string is reached.
The variants section must contain a list of replacement groups. Each group
defines a set of properties that describes where the replacements are
applicable. In addition, the word section defines the list of replacements
to be made. The basic replacement description is of the form:
```
<source>[,<source>[...]] => <target>[,<target>[...]]
```
The left side contains one or more `source` terms to be replaced. The right side
lists one or more replacements. Each source is replaced with each replacement
term.
!!! tip
The source and target terms are internally normalized using the
normalization rules given in the configuration. This ensures that the
strings match as expected. In fact, it is better to use unnormalized
words in the configuration because then it is possible to change the
rules for normalization later without having to adapt the variant rules.
###### Decomposition
In its standard form, only full words match against the source. There
is a special notation to match the prefix and suffix of a word:
``` yaml
- ~strasse => str # matches "strasse" as full word and in suffix position
- hinter~ => hntr # matches "hinter" as full word and in prefix position
```
There is no facility to match a string in the middle of the word. The suffix
and prefix notation automatically trigger the decomposition mode: two variants
are created for each replacement, one with the replacement attached to the word
and one separate. So in above example, the tokenization of "hauptstrasse" will
create the variants "hauptstr" and "haupt str". Similarly, the name "rote strasse"
triggers the variants "rote str" and "rotestr". By having decomposition work
both ways, it is sufficient to create the variants at index time. The variant
rules are not applied at query time.
To avoid automatic decomposition, use the '|' notation:
``` yaml
- ~strasse |=> str
```
simply changes "hauptstrasse" to "hauptstr" and "rote strasse" to "rote str".
###### Initial and final terms
It is also possible to restrict replacements to the beginning and end of a
name:
``` yaml
- ^south => s # matches only at the beginning of the name
- road$ => rd # matches only at the end of the name
```
So the first example would trigger a replacement for "south 45th street" but
not for "the south beach restaurant".
###### Replacements vs. variants
The replacement syntax `source => target` works as a pure replacement. It changes
the name instead of creating a variant. To create an additional version, you'd
have to write `source => source,target`. As this is a frequent case, there is
a shortcut notation for it:
```
<source>[,<source>[...]] -> <target>[,<target>[...]]
```
The simple arrow causes an additional variant to be added. Note that
decomposition has an effect here on the source as well. So a rule
``` yaml
- "~strasse -> str"
```
means that for a word like `hauptstrasse` four variants are created:
`hauptstrasse`, `haupt strasse`, `hauptstr` and `haupt str`.
###### Mutations
The 'mutation' section in the configuration describes an additional set of
replacements to be applied after the variants have been computed.
Each mutation is described by two parameters: `pattern` and `replacements`.
The pattern must contain a single regular expression to search for in the
variant name. The regular expressions need to follow the syntax for
[Python regular expressions](file:///usr/share/doc/python3-doc/html/library/re.html#regular-expression-syntax).
Capturing groups are not permitted.
`replacements` must contain a list of strings that the pattern
should be replaced with. Each occurrence of the pattern is replaced with
all given replacements. Be mindful of combinatorial explosion of variants.
###### Modes
The generic analyser supports a special mode `variant-only`. When configured
then it consumes the input token and emits only variants (if any exist). Enable
the mode by adding:
```
mode: variant-only
```
to the analyser configuration.
##### Housenumber token analyzer
The analyzer `housenumbers` is purpose-made to analyze house numbers. It
creates variants with optional spaces between numbers and letters. Thus,
house numbers of the form '3 a', '3A', '3-A' etc. are all considered equivalent.
The analyzer cannot be customized.
##### Postcode token analyzer
The analyzer `postcodes` is pupose-made to analyze postcodes. It supports
a 'lookup' variant of the token, which produces variants with optional
spaces. Use together with the clean-postcodes sanitizer.
The analyzer cannot be customized.
### Reconfiguration
Changing the configuration after the import is currently not possible, although
this feature may be added at a later time.

View File

@@ -0,0 +1,4 @@
# Additional Data Sources
This guide explains how data sources other than OpenStreetMap mentioned in
the install instructions got obtained and converted.

View File

@@ -1,167 +0,0 @@
# Database Layout
### Import tables
OSM data is initially imported using [osm2pgsql](https://osm2pgsql.org).
Nominatim uses its own data output style 'gazetteer', which differs from the
output style created for map rendering.
The import process creates the following tables:
![osm2pgsql tables](osm2pgsql-tables.svg)
The `planet_osm_*` tables are the usual backing tables for OSM data. Note
that Nominatim uses them to look up special relations and to find nodes on
ways.
The gazetteer style produces a single table `place` as output with the following
columns:
* `osm_type` - kind of OSM object (**N** - node, **W** - way, **R** - relation)
* `osm_id` - original OSM ID
* `class` - key of principal tag defining the object type
* `type` - value of principal tag defining the object type
* `name` - collection of tags that contain a name or reference
* `admin_level` - numerical value of the tagged administrative level
* `address` - collection of tags defining the address of an object
* `extratags` - collection of additional interesting tags that are not
directly relevant for searching
* `geometry` - geometry of the object (in WGS84)
A single OSM object may appear multiple times in this table when it is tagged
with multiple tags that may constitute a principal tag. Take for example a
motorway bridge. In OSM, this would be a way which is tagged with
`highway=motorway` and `bridge=yes`. This way would appear in the `place` table
once with `class` of `highway` and once with a `class` of `bridge`. Thus the
*unique key* for `place` is (`osm_type`, `osm_id`, `class`).
How raw OSM tags are mapped to the columns in the place table is to a certain
degree configurable. See [Customizing Import Styles](../customize/Import-Styles.md)
for more information.
### Search tables
The following tables carry all information needed to do the search:
![search tables](search-tables.svg)
The **placex** table is the central table that saves all information about the
searchable places in Nominatim. The basic columns are the same as for the
place table and have the same meaning. The placex tables adds the following
additional columns:
* `place_id` - the internal unique ID to identify the place
* `partition` - the id to use with partitioned tables (see below)
* `geometry_sector` - a location hash used for geographically close ordering
* `parent_place_id` - the next higher place in the address hierarchy, only
relevant for POI-type places (with rank 30)
* `linked_place_id` - place ID of the place this object has been merged with.
When this ID is set, then the place is invisible for search.
* `importance` - measure how well known the place is
* `rank_search`, `rank_address` - search and address rank (see [Customizing ranking](../customize/Ranking.md)
* `wikipedia` - the wikipedia page used for computing the importance of the place
* `country_code` - the country the place is located in
* `housenumber` - normalized housenumber, if the place has one
* `postcode` - computed postcode for the place
* `indexed_status` - processing status of the place (0 - ready, 1 - freshly inserted, 2 - needs updating, 100 - needs deletion)
* `indexed_date` - timestamp when the place was processed last
* `centroid` - a point feature for the place
The **location_property_osmline** table is a special table for
[address interpolations](https://wiki.openstreetmap.org/wiki/Addresses#Using_interpolation).
The columns have the same meaning and use as the columns with the same name in
the placex table. Only three columns are special:
* `startnumber` and `endnumber` - beginning and end of the number range
for the interpolation
* `interpolationtype` - a string `odd`, `even` or `all` to indicate
the interval between the numbers
Address interpolations are always ways in OSM, which is why there is no column
`osm_type`.
The **location_postcode** table holds computed centroids of all postcodes that
can be found in the OSM data. The meaning of the columns is again the same
as that of the placex table.
Every place needs an address, a set of surrounding places that describe the
location of the place. The set of address places is made up of OSM places
themselves. The **place_addressline** table cross-references for each place
all the places that make up its address. Two columns define the address
relation:
* `place_id` - reference to the place being addressed
* `address_place_id` - reference to the place serving as an address part
The most of the columns cache information from the placex entry of the address
part. The exceptions are:
* `fromarea` - is true if the address part has an area geometry and can
therefore be considered preceise
* `isaddress` - is true if the address part should show up in the address
output. Sometimes there are multiple places competing for for same address
type (e.g. multiple cities) and this field resolves the tie.
The **search_name** table contains the search index proper. It saves for each
place the terms with which the place can be found. The terms are split into
the name itself and all terms that make up the address. The table mirrors some
of the columns from placex for faster lookup.
Search terms are not saved as strings. Each term is assigned an integer and those
integers are saved in the name and address vectors of the search_name table. The
**word** table serves as the lookup table from string to such a word ID. The
exact content of the word table depends on the [tokenizer](Tokenizers.md) used.
## Address computation tables
Next to the main search tables, there is a set of secondary helper tables used
to compute the address relations between places. These tables are partitioned.
Each country is assigned a partition number in the country_name table (see
below) and the data is then split between a set of tables, one for each
partition. Note that Nominatim still manually manages partitioned tables.
Native support for partitions in PostgreSQL only became usable with version 13.
It will be a little while before Nominatim drops support for older versions.
![address tables](address-tables.svg)
The **search_name_X** tables are used to look up streets that appear in the
`addr:street` tag.
The **location_area_large_X** tables are used to look up larger areas
(administrative boundaries and place nodes) either through their geographic
closeness or through `addr:*` entries.
The **location_road_X** tables are used to find the closest street for a
dependent place.
All three table cache specific information from the placex table for their
selected subset of places:
* `keywords` and `name_vector` contain lists of term ids (from the word table)
that the full name of the place should match against
* `isguess` is true for places that are not described by an area
All other columns reflect their counterpart in the placex table.
## Static data tables
Nominatim also creates a number of static tables at import:
* `nominatim_properties` saves settings that must not be changed after
import
* `address_levels` save the rank information from the
[ranking configuration](../customize/Ranking.md)
* `country_name` contains a fallback of names for all countries, their
default languages and saves the assignment of countries to partitions.
* `country_osm_grid` provides a fallback for country geometries
## Auxiliary data tables
Finally there are some table for auxiliary data:
* `location_property_tiger` - saves housenumber from the Tiger import. Its
layout is similar to that of `location_propoerty_osmline`.
* `place_class_*` tables are helper tables to facilitate lookup of POIs
by their class and type. They exist because it is not possible to create
combined indexes with geometries.

View File

@@ -1,154 +0,0 @@
# Setting up Nominatim for Development
This chapter gives an overview how to set up Nominatim for development
and how to run tests.
!!! Important
This guide assumes you develop under the latest version of Debian/Ubuntu.
You can of course also use your favourite distribution. You just might have
to adapt the commands below slightly, in particular the commands for
installing additional software.
## Installing Nominatim
The first step is to install Nominatim itself. Please follow the installation
instructions in the [Admin section](../admin/Installation.md). You don't need
to set up a webserver for development, the webserver that can be started
via `nominatim serve` is sufficient.
If you want to run Nominatim in a VM via Vagrant, use the default `ubuntu24` setup.
Vagrant's libvirt provider runs out-of-the-box under Ubuntu. You also need to
install an NFS daemon to enable directory sharing between host and guest. The
following packages should get you started:
sudo apt install vagrant vagrant-libvirt libvirt-daemon nfs-kernel-server
## Prerequisites for testing and documentation
The Nominatim test suite consists of behavioural tests (using behave) and
unit tests (using pytest). It has the following additional requirements:
* [behave test framework](https://behave.readthedocs.io) >= 1.2.6
* [flake8](https://flake8.pycqa.org/en/stable/) (CI always runs the latest version from pip)
* [mypy](http://mypy-lang.org/) (plus typing information for external libs)
* [Python Typing Extensions](https://github.com/python/typing_extensions) (for Python < 3.9)
* [pytest](https://pytest.org)
* [pytest-asyncio](https://pytest-asyncio.readthedocs.io)
For testing the Python search frontend, you need to install extra dependencies
depending on your choice of webserver framework:
* [httpx](https://www.python-httpx.org/) (Starlette only)
* [asgi-lifespan](https://github.com/florimondmanca/asgi-lifespan) (Starlette only)
The documentation is built with mkdocs:
* [mkdocs](https://www.mkdocs.org/) >= 1.1.2
* [mkdocstrings](https://mkdocstrings.github.io/) >= 0.25
* [mkdocs-material](https://squidfunk.github.io/mkdocs-material/)
* [mkdocs-gen-files](https://oprypin.github.io/mkdocs-gen-files/)
Please be aware that tests always run against the globally installed
osm2pgsql, so you need to have this set up. If you want to test against
the vendored version of osm2pgsql, you need to set the PATH accordingly.
### Installing prerequisites on Ubuntu/Debian
The Python tools should always be run with the most recent version.
The easiest way, to handle these Python dependencies is to run your
development from within a virtual environment.
```sh
sudo apt install libsqlite3-mod-spatialite osm2pgsql \
postgresql-postgis postgresql-postgis-scripts \
pkg-config libicu-dev virtualenv
```
To set up the virtual environment with all necessary packages run:
```sh
virtualenv ~/nominatim-dev-venv
~/nominatim-dev-venv/bin/pip install\
psutil psycopg[binary] PyICU SQLAlchemy \
python-dotenv jinja2 pyYAML datrie behave \
mkdocs mkdocstrings mkdocs-gen-files pytest pytest-asyncio flake8 \
types-jinja2 types-markupsafe types-psutil types-psycopg2 \
types-pygments types-pyyaml types-requests types-ujson \
types-urllib3 typing-extensions unicorn falcon starlette \
uvicorn mypy osmium aiosqlite
```
Now enter the virtual environment whenever you want to develop:
```sh
. ~/nominatim-dev-venv/bin/activate
```
### Running Nominatim during development
The source code for Nominatim can be found in the `src` directory and can
be run in-place. The source directory features a special script
`nominatim-cli.py` which does the same as the installed 'nominatim' binary
but executes against the code in the source tree. For example:
```
me@machine:~$ cd Nominatim
me@machine:~Nominatim$ ./nominatim-cli.py --version
Nominatim version 4.4.99-1
```
Make sure you have activated the virtual environment holding all
necessary dependencies.
## Executing Tests
All tests are located in the `/test` directory.
To run all tests, run make from the source root:
```sh
make tests
```
There are also make targets for executing only parts of the test suite.
For example to run linting only use:
```sh
make lint
```
The possible testing targets are: mypy, lint, pytest, bdd.
For more information about the structure of the tests and how to change and
extend the test suite, see the [Testing chapter](Testing.md).
## Documentation Pages
The [Nominatim documentation](https://nominatim.org/release-docs/develop/) is
built using the [MkDocs](https://www.mkdocs.org/) static site generation
framework. The master branch is automatically deployed every night on
[https://nominatim.org/release-docs/develop/](https://nominatim.org/release-docs/develop/)
To build the documentation run
```
make doc
```
For local testing, you can start webserver:
```
build> make serve-doc
[server:296] Serving on http://127.0.0.1:8000
[handlers:62] Start watching changes
```
If you develop inside a Vagrant virtual machine, use a port that is forwarded
to your host:
```
build> mkdocs serve --dev-addr 0.0.0.0:8088
[server:296] Serving on http://0.0.0.0:8088
[handlers:62] Start watching changes
```

View File

@@ -0,0 +1,36 @@
# Documentation Pages
The [Nominatim documentation](https://nominatim.org/release-docs/develop/) is built using the [MkDocs](https://www.mkdocs.org/) static site generation framework. The master branch is automatically deployed every night on under [https://nominatim.org/release-docs/develop/]()
To preview local changes:
1. Install MkDocs
```
pip3 install --user mkdocs
```
2. In build directory run
```
make doc
INFO - Cleaning site directory
INFO - Building documentation to directory: /home/vagrant/build/site-html
```
This runs `mkdocs build` plus extra transformion of some files and adds symlinks (see `CMakeLists.txt` for the exact steps).
3. Start webserver for local testing
```
mkdocs serve
[server:296] Serving on http://127.0.0.1:8000
[handlers:62] Start watching changes
```
If you develop inside a Vagrant virtual machine:
* add port forwarding to your Vagrantfile, e.g. `config.vm.network "forwarded_port", guest: 8000, host: 8000`
* use `mkdocs serve --dev-addr 0.0.0.0:8000` because the default localhost
IP does not get forwarded.

View File

@@ -1,263 +0,0 @@
# Writing custom sanitizer and token analysis modules for the ICU tokenizer
The [ICU tokenizer](../customize/Tokenizers.md#icu-tokenizer) provides a
highly customizable method to pre-process and normalize the name information
of the input data before it is added to the search index. It comes with a
selection of sanitizers and token analyzers which you can use to adapt your
installation to your needs. If the provided modules are not enough, you can
also provide your own implementations. This section describes the API
of sanitizers and token analysis.
!!! warning
This API is currently in early alpha status. While this API is meant to
be a public API on which other sanitizers and token analyzers may be
implemented, it is not guaranteed to be stable at the moment.
## Using non-standard modules
Sanitizer names (in the `step` property), token analysis names (in the
`analyzer`) and query preprocessor names (in the `step` property)
may refer to externally supplied modules. There are two ways
to include external modules: through a library or from the project directory.
To include a module from a library, use the absolute import path as name and
make sure the library can be found in your PYTHONPATH.
To use a custom module without creating a library, you can put the module
somewhere in your project directory and then use the relative path to the
file. Include the whole name of the file including the `.py` ending.
## Custom query preprocessors
A query preprocessor must export a single factory function `create` with
the following signature:
``` python
create(self, config: QueryConfig) -> Callable[[list[Phrase]], list[Phrase]]
```
The function receives the custom configuration for the preprocessor and
returns a callable (function or class) with the actual preprocessing
code. When a query comes in, then the callable gets a list of phrases
and needs to return the transformed list of phrases. The list and phrases
may be changed in place or a completely new list may be generated.
The `QueryConfig` is a simple dictionary which contains all configuration
options given in the yaml configuration of the ICU tokenizer. It is up to
the function to interpret the values.
A `nominatim_api.search.Phrase` describes a part of the query that contains one or more independent
search terms. Breaking a query into phrases helps reducing the number of
possible tokens Nominatim has to take into account. However a phrase break
is definitive: a multi-term search word cannot go over a phrase break.
A Phrase object has two fields:
* `ptype` further refines the type of phrase (see list below)
* `text` contains the query text for the phrase
The order of phrases matters to Nominatim when doing further processing.
Thus, while you may split or join phrases, you should not reorder them
unless you really know what you are doing.
Phrase types (`nominatim_api.search.PhraseType`) can further help narrowing
down how the tokens in the phrase are interpreted. The following phrase types
are known:
::: nominatim_api.search.PhraseType
options:
heading_level: 6
## Custom sanitizer modules
A sanitizer module must export a single factory function `create` with the
following signature:
``` python
def create(config: SanitizerConfig) -> Callable[[ProcessInfo], None]
```
The function receives the custom configuration for the sanitizer and must
return a callable (function or class) that transforms the name and address
terms of a place. When a place is processed, then a `ProcessInfo` object
is created from the information that was queried from the database. This
object is sequentially handed to each configured sanitizer, so that each
sanitizer receives the result of processing from the previous sanitizer.
After the last sanitizer is finished, the resulting name and address lists
are forwarded to the token analysis module.
Sanitizer functions are instantiated once and then called for each place
that is imported or updated. They don't need to be thread-safe.
If multi-threading is used, each thread creates their own instance of
the function.
### Sanitizer configuration
::: nominatim_db.tokenizer.sanitizers.config.SanitizerConfig
options:
heading_level: 6
### The main filter function of the sanitizer
The filter function receives a single object of type `ProcessInfo`
which has with three members:
* `place: PlaceInfo`: read-only information about the place being processed.
See PlaceInfo below.
* `names: List[PlaceName]`: The current list of names for the place.
* `address: List[PlaceName]`: The current list of address names for the place.
While the `place` member is provided for information only, the `names` and
`address` lists are meant to be manipulated by the sanitizer. It may add and
remove entries, change information within a single entry (for example by
adding extra attributes) or completely replace the list with a different one.
#### PlaceInfo - information about the place
::: nominatim_db.data.place_info.PlaceInfo
options:
heading_level: 6
#### PlaceName - extended naming information
::: nominatim_db.data.place_name.PlaceName
options:
heading_level: 6
### Example: Filter for US street prefixes
The following sanitizer removes the directional prefixes from street names
in the US:
!!! example
``` python
import re
def _filter_function(obj):
if obj.place.country_code == 'us' \
and obj.place.rank_address >= 26 and obj.place.rank_address <= 27:
for name in obj.names:
name.name = re.sub(r'^(north|south|west|east) ',
'',
name.name,
flags=re.IGNORECASE)
def create(config):
return _filter_function
```
This is the most simple form of a sanitizer module. If defines a single
filter function and implements the required `create()` function by returning
the filter.
The filter function first checks if the object is interesting for the
sanitizer. Namely it checks if the place is in the US (through `country_code`)
and it the place is a street (a `rank_address` of 26 or 27). If the
conditions are met, then it goes through all available names and
removes any leading directional prefix using a simple regular expression.
Save the source code in a file in your project directory, for example as
`us_streets.py`. Then you can use the sanitizer in your `icu_tokenizer.yaml`:
``` yaml
...
sanitizers:
- step: us_streets.py
...
```
!!! warning
This example is just a simplified show case on how to create a sanitizer.
It is not really meant for real-world use: while the sanitizer would
correctly transform `West 5th Street` into `5th Street`. it would also
shorten a simple `North Street` to `Street`.
For more sanitizer examples, have a look at the sanitizers provided by Nominatim.
They can be found in the directory
[`src/nominatim_db/tokenizer/sanitizers`](https://github.com/osm-search/Nominatim/tree/master/src/nominatim_db/tokenizer/sanitizers).
## Custom token analysis module
::: nominatim_db.tokenizer.token_analysis.base.AnalysisModule
options:
heading_level: 6
::: nominatim_db.tokenizer.token_analysis.base.Analyzer
options:
heading_level: 6
### Example: Creating acronym variants for long names
The following example of a token analysis module creates acronyms from
very long names and adds them as a variant:
``` python
class AcronymMaker:
""" This class is the actual analyzer.
"""
def __init__(self, norm, trans):
self.norm = norm
self.trans = trans
def get_canonical_id(self, name):
# In simple cases, the normalized name can be used as a canonical id.
return self.norm.transliterate(name.name).strip()
def compute_variants(self, name):
# The transliterated form of the name always makes up a variant.
variants = [self.trans.transliterate(name)]
# Only create acronyms from very long words.
if len(name) > 20:
# Take the first letter from each word to form the acronym.
acronym = ''.join(w[0] for w in name.split())
# If that leds to an acronym with at least three letters,
# add the resulting acronym as a variant.
if len(acronym) > 2:
# Never forget to transliterate the variants before returning them.
variants.append(self.trans.transliterate(acronym))
return variants
# The following two functions are the module interface.
def configure(rules, normalizer, transliterator):
# There is no configuration to parse and no data to set up.
# Just return an empty configuration.
return None
def create(normalizer, transliterator, config):
# Return a new instance of our token analysis class above.
return AcronymMaker(normalizer, transliterator)
```
Given the name `Trans-Siberian Railway`, the code above would return the full
name `Trans-Siberian Railway` and the acronym `TSR` as variant, so that
searching would work for both.
## Sanitizers vs. Token analysis - what to use for variants?
It is not always clear when to implement variations in the sanitizer and
when to write a token analysis module. Just take the acronym example
above: it would also have been possible to write a sanitizer which adds the
acronym as an additional name to the name list. The result would have been
similar. So which should be used when?
The most important thing to keep in mind is that variants created by the
token analysis are only saved in the word lookup table. They do not need
extra space in the search index. If there are many spelling variations, this
can mean quite a significant amount of space is saved.
When creating additional names with a sanitizer, these names are completely
independent. In particular, they can be fed into different token analysis
modules. This gives a much greater flexibility but at the price that the
additional names increase the size of the search index.

170
docs/develop/Import.md Normal file
View File

@@ -0,0 +1,170 @@
# OSM Data Import
OSM data is initially imported using osm2pgsql. Nominatim uses its own data
output style 'gazetteer', which differs from the output style created for
map rendering.
## Database Layout
The gazetteer style produces a single table `place` with the following rows:
* `osm_type` - kind of OSM object (**N** - node, **W** - way, **R** - relation)
* `osm_id` - original OSM ID
* `class` - key of principal tag defining the object type
* `type` - value of principal tag defining the object type
* `name` - collection of tags that contain a name or reference
* `admin_level` - numerical value of the tagged administrative level
* `address` - collection of tags defining the address of an object
* `extratags` - collection of additional interesting tags that are not
directly relevant for searching
* `geometry` - geometry of the object (in WGS84)
A single OSM object may appear multiple times in this table when it is tagged
with multiple tags that may constitute a principal tag. Take for example a
motorway bridge. In OSM, this would be a way which is tagged with
`highway=motorway` and `bridge=yes`. This way would appear in the `place` table
once with `class` of `highway` and once with a `class` of `bridge`. Thus the
*unique key* for `place` is (`osm_type`, `osm_id`, `class`).
## Configuring the Import
How tags are interpreted and assigned to the different `place` columns can be
configured via the import style configuration file (`CONST_Import_style`). This
is a JSON file which contains a list of rules which are matched against every
tag of every object and then assign the tag its specific role.
### Configuration Rules
A single rule looks like this:
```json
{
"keys" : ["key1", "key2", ...],
"values" : {
"value1" : "prop",
"value2" : "prop1,prop2"
}
}
```
A rule first defines a list of keys to apply the rule to. This is always a list
of strings. The string may have four forms. An empty string matches against
any key. A string that ends in an asterisk `*` is a prefix match and accordingly
matches against any key that starts with the given string (minus the `*`). A
suffix match can be defined similarly with a string that starts with a `*`. Any
other string constitutes an exact match.
The second part of the rules defines a list of values and the properties that
apply to a successful match. Value strings may be either empty, which
means that they match any value, or describe an exact match. Prefix
or suffix matching of values is not possible.
For a rule to match, it has to find a valid combination of keys and values. The
resulting property is that of the matched values.
The rules in a configuration file are processed sequentially and the first
match for each tag wins.
A rule where key and value are the empty string is special. This defines the
fallback when none of the rules match. The fallback is always used as a last
resort when nothing else matches, no matter where the rule appears in the file.
Defining multiple fallback rules is not allowed. What happens in this case,
is undefined.
### Tag Properties
One or more of the following properties may be given for each tag:
* `main`
A principal tag. A new row will be added for the object with key and value
as `class` and `type`.
* `with_name`
When the tag is a principal tag (`main` property set): only really add a new
row, if there is any name tag found (a reference tag is not sufficient, see
below).
* `with_name_key`
When the tag is a principal tag (`main` property set): only really add a new
row, if there is also a name tag that matches the key of the principal tag.
For example, if the main tag is `bridge=yes`, then it will only be added as
an extra row, if there is a tag `bridge:name[:XXX]` for the same object.
If this property is set, all other names that are not domain-specific are
ignored.
* `fallback`
When the tag is a principal tag (`main` property set): only really add a new
row, when no other principal tags for this object have been found. Only one
fallback tag can win for an object.
* `operator`
When the tag is a principal tag (`main` property set): also include the
`operator` tag in the list of names. This is a special construct for an
out-dated tagging practise in OSM. Fuel stations and chain restaurants
in particular used to have the name of the chain tagged as `operator`.
These days the chain can be more commonly found in the `brand` tag but
there is still enough old data around to warrant this special case.
* `name`
Add tag to the list of names.
* `ref`
Add tag to the list of names as a reference. At the moment this only means
that the object is not considered to be named for `with_name`.
* `address`
Add tag to the list of address tags. If the tag starts with `addr:` or
`is_in:`, then this prefix is cut off before adding it to the list.
* `postcode`
Add the value as a postcode to the address tags. If multiple tags are
candidate for postcodes, one wins out and the others are dropped.
* `country`
Add the value as a country code to the address tags. The value must be a
two letter country code, otherwise it is ignored. If there are multiple
tags that match, then one wins out and the others are dropped.
* `house`
If no principle tags can be found for the object, still add the object with
`class`=`place` and `type`=`house`. Use this for address nodes that have no
other function.
* `interpolation`
Add this object as an address interpolation (appears as `class`=`place` and
`type`=`houses` in the database).
* `extra`
Add tag to the list of extra tags.
* `skip`
Skip the tag completely. Useful when a custom default fallback is defined
or to define exceptions to rules.
A rule can define as many of these properties for one match as it likes. For
example, if the property is `"main,extra"` then the tag will open a new row
but also have the tag appear in the list of extra tags.
There are a number of pre-defined styles in the `settings/` directory. It is
advisable to start from one of these styles when defining your own.
### Changing the Style of Existing Databases
There is normally no issue changing the style of a database that is already
imported and now kept up-to-date with change files. Just be aware that any
change in the style applies to updates only. If you want to change the data
that is already in the database, then a reimport is necessary.

View File

@@ -1,152 +0,0 @@
# Indexing Places
In Nominatim, the word __indexing__ refers to the process that takes the raw
OpenStreetMap data from the place table, enriches it with address information
and creates the search indexes. This section explains the basic data flow.
## Initial import
After osm2pgsql has loaded the raw OSM data into the place table,
the data is copied to the final search tables placex and location_property_osmline.
While they are copied, some basic properties are added:
* country_code, geometry_sector and partition
* initial search and address rank
In addition the column `indexed_status` is set to `1` marking the place as one
that needs to be indexed.
All this happens in the triggers `placex_insert` and `osmline_insert`.
## Indexing
The main work horse of the data import is the indexing step, where Nominatim
takes every place from the placex and location_property_osmline tables where
the indexed_status != 0 and computes the search terms and the address parts
of the place.
The indexing happens in three major steps:
1. **Data preparation** - The indexer gets the data for the place to be indexed
from the database.
2. **Search name processing** - The prepared data is given to the
tokenizer which computes the search terms from the names
and potentially other information.
3. **Address processing** - The indexer then hands the prepared data and the
tokenizer information back to the database via an `INSERT` statement which
also sets the indexed_status to `0`. This triggers the update triggers
`placex_update`/`osmline_update` which do the work of computing address
parts and filling all the search tables.
When computing the address terms of a place, Nominatim relies on the processed
search names of all the address parts. That is why places are processed in rank
order, from smallest rank to largest. To ensure correct handling of linked
place nodes, administrative boundaries are processed before all other places.
Apart from these restrictions, each place can be indexed independently
from the others. This allows a large degree of parallelization during the indexing.
It also means that the indexing process can be interrupted at any time and
will simply pick up where it left of when restarted.
### Data preparation
The data preparation step computes and retrieves all data for a place that
might be needed for the next step of processing the search name. That includes
* location information (country code)
* place classification (class, type, ranks)
* names (including names of linked places)
* address information (`addr:*` tags)
Data preparation is implemented in pl/PgSQL mostly in the functions
`placex_indexing_prepare()` and `get_interpolation_address()`.
#### `addr:*` tag inheritance
Nominatim has limited support for inheriting address tags from a building
to POIs inside the building. This only works when the address tags are on the
building outline. Any rank 30 object inside such a building or on its outline
inherits all address tags when it does not have any address tags of its own.
The inheritance is computed in the data preparation step.
### Search name processing
The prepared place information is handed to the tokenizer next. This is a
Python module responsible for processing the names from both name and address
terms and building up the word index from them. The process is explained in
more detail in the [Tokenizer chapter](Tokenizers.md).
### Address processing
Finally, the preprocessed place information and the results of the search name
processing are written back to the database. At this point the update trigger
of the placex/location_property_osmline tables take over and fill all the
dependent tables. This makes up the most work-intensive part of the indexing.
Nominatim distinguishes between dependent and independent places.
**Dependent places** are all places on rank 30: house numbers, POIs etc. These
places don't have a full address of their own. Instead they are attached to
a parent street or place and use the information of the parent for searching
and displaying information. Everything else are **independent places**: streets,
parks, water bodies, suburbs, cities, states etc. They receive a full address
on their own.
The address processing for both types of places is very different.
#### Independent places
To compute the address of an independent place Nominatim searches for all
places that cover the place to compute the address for at least partially.
For places with an area, that area is used to check for coverage. For place
nodes an artificial square area is computed according to the rank of
the place. The lower the rank the lager the area. The `location_area_large_X`
tables are there to facilitate the lookup. All places that can function as
the address of another place are saved in those tables.
`addr:*` and `isin:*` tags are taken into account to compute the address, too.
Nominatim will give preference to places with the same name as in these tags
when looking for places in the vicinity. If there are no matching place names
at all, then the tags are at least added to the search index. That means that
the names will not be shown in the result as the 'address' of the place, but
searching by them still works.
Independent places are always added to the global search index `search_name`.
#### Dependent places
Dependent places skip the full address computation for performance reasons.
Instead they just find a parent place to attach themselves to.
![parenting of dependent places](parenting-flow.svg)
By default a POI
or house number will be attached to the closest street. That can be any major
or minor street indexed by Nominatim. In the default configuration that means
that it can attach itself to a footway but only when it has a name.
When the dependent place has an `addr:street` tag, then Nominatim will first
try to find a street with the same name before falling back to the closest
street.
There are also addresses in OSM, where the housenumber does not belong
to a street at all. These have an `addr:place` tag. For these places, Nominatim
tries to find a place with the given name in the indexed places with an
address rank between 16 and 25. If none is found, then the dependent place
is attached to the closest place in that category and the addr:place name is
added as *unlisted* place, which indicates to Nominatim that it needs to add
it to the address output, no matter what. This special case is necessary to
cover addresses that don't really refer to an existing object.
When an address has both the `addr:street` and `addr:place` tag, then Nominatim
assumes that the `addr:place` tag in fact should be the city part of the address
and give the POI the usual street number address.
Dependent places are only added to the global search index `search_name` when
they have either a name themselves or when they have address tags that are not
covered by the places that make up their address. The latter ensures that
addresses are always searchable by those address tags.

90
docs/develop/Ranking.md Normal file
View File

@@ -0,0 +1,90 @@
# Place Ranking in Nominatim
Nominatim uses two metrics to rank a place: search rank and address rank.
Both can be assigned a value between 0 and 30. They serve slightly
different purposes, which are explained in this chapter.
## Search rank
The search rank describes the extent and importance of a place. It is used
when ranking search result. Simply put, if there are two results for a
search query which are otherwise equal, then the result with the _lower_
search rank will be appear higher in the result list.
Search ranks are not so important these days because many well-known
places use the Wikipedia importance ranking instead.
## Address rank
The address rank describes where a place shows up in an address hierarchy.
Usually only administrative boundaries and place nodes and areas are
eligible to be part of an address. All other objects have an address rank
of 0.
Note that the search rank of a place plays a role in the address computation
as well. When collecting the places that should make up the address parts
then only places are taken into account that have a lower address rank than
the search rank of the base object.
## Rank configuration
Search and address ranks are assigned to a place when it is first imported
into the database. There are a few hard-coded rules for the assignment:
* postcodes follow special rules according to their length
* boundaries that are not areas and railway=rail are dropped completely
* the following are always search rank 30 and address rank 0:
* highway nodes
* landuse that is not an area
Other than that, the ranks can be freely assigned via the JSON file
defined with `CONST_Address_Level_Config` according to their type and
the country they are in.
The address level configuration must consist of an array of configuration
entries, each containing a tag definition and an optional country array:
```
[ {
"tags" : {
"place" : {
"county" : 12,
"city" : 16,
},
"landuse" : {
"residential" : 22,
"" : 30
}
}
},
{
"countries" : [ "ca", "us" ],
"tags" : {
"boundary" : {
"administrative8" : 18,
"administrative9" : 20
},
"landuse" : {
"residential" : [22, 0]
}
}
}
]
```
The `countries` field contains a list of countries (as ISO 3166-1 alpha 2 code)
for which the definition applies. When the field is omitted, then the
definition is used as a fallback, when nothing more specific for a given
country exists.
`tags` contains the ranks for key/value pairs. The ranks can be either a
single number, in which case they are the search and address rank, or an array
of search and address rank (in that order). The value may be left empty.
Then the rank is used when no more specific value is found for the given
key.
Countries and key/value combination may appear in multiple definitions. Just
make sure that each combination of counrty/key/value appears only once per
file. Otherwise the import will fail with a UNIQUE INDEX constraint violation
on import.

View File

@@ -1,120 +0,0 @@
# Nominatim Test Suite
This chapter describes the tests in the `/test` directory, how they are
structured and how to extend them. For a quick introduction on how to run
the tests, see the [Development setup chapter](Development-Environment.md).
## Overall structure
There are two kind of tests in this test suite. There are functional tests
which test the API interface using a BDD test framework and there are unit
tests for the Python code.
This test directory is structured as follows:
```
-+- bdd Functional API tests
| \
| +- steps Step implementations for test descriptions
| +- osm2pgsql Tests for data import via osm2pgsql
| +- db Tests for internal data processing on import and update
| +- api Tests for API endpoints (search, reverse, etc.)
|
+- python Python unit tests
+- testdb Base data for generating API test database
+- testdata Additional test data used by unit tests
```
## Python Unit Tests (`test/python`)
Unit tests for Python code can be found in the `python/` directory. The goal is
to have complete coverage of the Python library in `nominatim`.
To execute the tests run
py.test-3 test/python
or
pytest test/python
The name of the pytest binary depends on your installation.
## BDD Functional Tests (`test/bdd`)
Functional tests are written as BDD instructions. For more information on
the philosophy of BDD testing, see the
[Behave manual](http://pythonhosted.org/behave/philosophy.html).
The following explanation assume that the reader is familiar with the BDD
notations of features, scenarios and steps.
All possible steps can be found in the `steps` directory and should ideally
be documented.
### General Usage
To run the functional tests, do
cd test/bdd
behave
The tests can be configured with a set of environment variables (`behave -D key=val`):
* `TEMPLATE_DB` - name of template database used as a skeleton for
the test databases (db tests)
* `TEST_DB` - name of test database (db tests)
* `API_TEST_DB` - name of the database containing the API test data (api tests)
* `API_TEST_FILE` - OSM file to be imported into the API test database (api tests)
* `API_ENGINE` - webframe to use for running search queries, same values as
`nominatim serve --engine` parameter
* `DB_HOST` - (optional) hostname of database host
* `DB_PORT` - (optional) port of database on host
* `DB_USER` - (optional) username of database login
* `DB_PASS` - (optional) password for database login
* `REMOVE_TEMPLATE` - if true, the template and API database will not be reused
during the next run. Reusing the base templates speeds
up tests considerably but might lead to outdated errors
for some changes in the database layout.
* `KEEP_TEST_DB` - if true, the test database will not be dropped after a test
is finished. Should only be used if one single scenario is
run, otherwise the result is undefined.
Logging can be defined through command line parameters of behave itself. Check
out `behave --help` for details. Also have a look at the 'work-in-progress'
feature of behave which comes in handy when writing new tests.
### API Tests (`test/bdd/api`)
These tests are meant to test the different API endpoints and their parameters.
They require to import several datasets into a test database. This is normally
done automatically during setup of the test. The API test database is then
kept around and reused in subsequent runs of behave. Use `behave -DREMOVE_TEMPLATE`
to force a reimport of the database.
The official test dataset is saved in the file `test/testdb/apidb-test-data.pbf`
and compromises the following data:
* Geofabrik extract of Liechtenstein
* extract of Autauga country, Alabama, US (for tests against Tiger data)
* additional data from `test/testdb/additional_api_test.data.osm`
API tests should only be testing the functionality of the website frontend code.
Most tests should be formulated as BDD DB creation tests (see below) instead.
### DB Creation Tests (`test/bdd/db`)
These tests check the import and update of the Nominatim database. They do not
test the correctness of osm2pgsql. Each test will write some data into the `place`
table (and optionally the `planet_osm_*` tables if required) and then run
Nominatim's processing functions on that.
These tests need to create their own test databases. By default they will be
called `test_template_nominatim` and `test_nominatim`. Names can be changed with
the environment variables `TEMPLATE_DB` and `TEST_DB`. The user running the tests
needs superuser rights for postgres.
### Import Tests (`test/bdd/osm2pgsql`)
These tests check that data is imported correctly into the place table. They
use the same template database as the DB Creation tests, so the same remarks apply.

View File

@@ -1,307 +0,0 @@
# Tokenizers
The tokenizer is the component of Nominatim that is responsible for
analysing names of OSM objects and queries. Nominatim provides different
tokenizers that use different strategies for normalisation. This page describes
how tokenizers are expected to work and the public API that needs to be
implemented when creating a new tokenizer. For information on how to configure
a specific tokenizer for a database see the
[tokenizer chapter in the Customization Guide](../customize/Tokenizers.md).
## Generic Architecture
### About Search Tokens
Search in Nominatim is organised around search tokens. Such a token represents
string that can be part of the search query. Tokens are used so that the search
index does not need to be organised around strings. Instead the database saves
for each place which tokens match this place's name, address, house number etc.
To be able to distinguish between these different types of information stored
with the place, a search token also always has a certain type: name, house number,
postcode etc.
During search an incoming query is transformed into a ordered list of such
search tokens (or rather many lists, see below) and this list is then converted
into a database query to find the right place.
It is the core task of the tokenizer to create, manage and assign the search
tokens. The tokenizer is involved in two distinct operations:
* __at import time__: scanning names of OSM objects, normalizing them and
building up the list of search tokens.
* __at query time__: scanning the query and returning the appropriate search
tokens.
### Importing
The indexer is responsible to enrich an OSM object (or place) with all data
required for geocoding. It is split into two parts: the controller collects
the places that require updating, enriches the place information as required
and hands the place to Postgresql. The collector is part of the Nominatim
library written in Python. Within Postgresql, the `placex_update`
trigger is responsible to fill out all secondary tables with extra geocoding
information. This part is written in PL/pgSQL.
The tokenizer is involved in both parts. When the indexer prepares a place,
it hands it over to the tokenizer to inspect the names and create all the
search tokens applicable for the place. This usually involves updating the
tokenizer's internal token lists and creating a list of all token IDs for
the specific place. This list is later needed in the PL/pgSQL part where the
indexer needs to add the token IDs to the appropriate search tables. To be
able to communicate the list between the Python part and the pl/pgSQL trigger,
the `placex` table contains a special JSONB column `token_info` which is there
for the exclusive use of the tokenizer.
The Python part of the tokenizer returns a structured information about the
tokens of a place to the indexer which converts it to JSON and inserts it into
the `token_info` column. The content of the column is then handed to the PL/pqSQL
callbacks of the tokenizer which extracts the required information. Usually
the tokenizer then removes all information from the `token_info` structure,
so that no information is ever persistently saved in the table. All information
that went in should have been processed after all and put into secondary tables.
This is however not a hard requirement. If the tokenizer needs to store
additional information about a place permanently, it may do so in the
`token_info` column. It just may never execute searches over it and
consequently not create any special indexes on it.
### Querying
At query time, Nominatim builds up multiple _interpretations_ of the search
query. Each of these interpretations is tried against the database in order
of the likelihood with which they match to the search query. The first
interpretation that yields results wins.
The interpretations are encapsulated in the `SearchDescription` class. An
instance of this class is created by applying a sequence of
_search tokens_ to an initially empty SearchDescription. It is the
responsibility of the tokenizer to parse the search query and derive all
possible sequences of search tokens. To that end the tokenizer needs to parse
the search query and look up matching words in its own data structures.
## Tokenizer API
The following section describes the functions that need to be implemented
for a custom tokenizer implementation.
!!! warning
This API is currently in early alpha status. While this API is meant to
be a public API on which other tokenizers may be implemented, the API is
far away from being stable at the moment.
### Directory Structure
Nominatim expects two files containing the Python part of the implementation:
* `src/nominatim_db/tokenizer/<NAME>_tokenizer.py` contains the tokenizer
code used during import and
* `src/nominatim_api/search/<NAME>_tokenizer.py` has the code used during
query time.
`<NAME>` is a unique name for the tokenizer consisting of only lower-case
letters, digits and underscore. A tokenizer also needs to install some SQL
functions. By convention, these should be placed in `lib-sql/tokenizer`.
If the tokenizer has a default configuration file, this should be saved in
`settings/<NAME>_tokenizer.<SUFFIX>`.
### Configuration and Persistence
Tokenizers may define custom settings for their configuration. All settings
must be prefixed with `NOMINATIM_TOKENIZER_`. Settings may be transient or
persistent. Transient settings are loaded from the configuration file when
Nominatim is started and may thus be changed at any time. Persistent settings
are tied to a database installation and must only be read during installation
time. If they are needed for the runtime then they must be saved into the
`nominatim_properties` table and later loaded from there.
### The Python modules
#### `src/nominatim_db/tokenizer/`
The import Python module is expected to export a single factory function:
```python
def create(dsn: str, data_dir: Path) -> AbstractTokenizer
```
The `dsn` parameter contains the DSN of the Nominatim database. The `data_dir`
is a directory in the project directory that the tokenizer may use to save
database-specific data. The function must return the instance of the tokenizer
class as defined below.
#### `src/nominatim_api/search/`
The query-time Python module must also export a factory function:
``` python
def create_query_analyzer(conn: SearchConnection) -> AbstractQueryAnalyzer
```
The `conn` parameter contains the current search connection. See the
[library documentation](../library/Low-Level-DB-Access.md#searchconnection-class)
for details on the class. The function must return the instance of the tokenizer
class as defined below.
### Python Tokenizer Class
All tokenizers must inherit from `nominatim_db.tokenizer.base.AbstractTokenizer`
and implement the abstract functions defined there.
::: nominatim_db.tokenizer.base.AbstractTokenizer
options:
heading_level: 6
### Python Analyzer Class
::: nominatim_db.tokenizer.base.AbstractAnalyzer
options:
heading_level: 6
### Python Query Analyzer Class
::: nominatim_api.search.query_analyzer_factory.AbstractQueryAnalyzer
options:
heading_level: 6
### PL/pgSQL Functions
The tokenizer must provide access functions for the `token_info` column
to the indexer which extracts the necessary information for the global
search tables. If the tokenizer needs additional SQL functions for private
use, then these functions must be prefixed with `token_` in order to ensure
that there are no naming conflicts with the SQL indexer code.
The following functions are expected:
```sql
FUNCTION token_get_name_search_tokens(info JSONB) RETURNS INTEGER[]
```
Return an array of token IDs of search terms that should match
the name(s) for the given place. These tokens are used to look up the place
by name and, where the place functions as part of an address for another place,
by address. Must return NULL when the place has no name.
```sql
FUNCTION token_get_name_match_tokens(info JSONB) RETURNS INTEGER[]
```
Return an array of token IDs of full names of the place that should be used
to match addresses. The list of match tokens is usually more strict than
search tokens as it is used to find a match between two OSM tag values which
are expected to contain matching full names. Partial terms should not be
used for match tokens. Must return NULL when the place has no name.
```sql
FUNCTION token_get_housenumber_search_tokens(info JSONB) RETURNS INTEGER[]
```
Return an array of token IDs of house number tokens that apply to the place.
Note that a place may have multiple house numbers, for example when apartments
each have their own number. Must be NULL when the place has no house numbers.
```sql
FUNCTION token_normalized_housenumber(info JSONB) RETURNS TEXT
```
Return the house number(s) in the normalized form that can be matched against
a house number token text. If a place has multiple house numbers they must
be listed with a semicolon as delimiter. Must be NULL when the place has no
house numbers.
```sql
FUNCTION token_is_street_address(info JSONB) RETURNS BOOLEAN
```
Return true if this is an object that should be parented against a street.
Only relevant for objects with address rank 30.
```sql
FUNCTION token_has_addr_street(info JSONB) RETURNS BOOLEAN
```
Return true if there are street names to match against for finding the
parent of the object.
```sql
FUNCTION token_has_addr_place(info JSONB) RETURNS BOOLEAN
```
Return true if there are place names to match against for finding the
parent of the object.
```sql
FUNCTION token_matches_street(info JSONB, street_tokens INTEGER[]) RETURNS BOOLEAN
```
Check if the given tokens (previously saved from `token_get_name_match_tokens()`)
match against the `addr:street` tag name. Must return either NULL or FALSE
when the place has no `addr:street` tag.
```sql
FUNCTION token_matches_place(info JSONB, place_tokens INTEGER[]) RETURNS BOOLEAN
```
Check if the given tokens (previously saved from `token_get_name_match_tokens()`)
match against the `addr:place` tag name. Must return either NULL or FALSE
when the place has no `addr:place` tag.
```sql
FUNCTION token_addr_place_search_tokens(info JSONB) RETURNS INTEGER[]
```
Return the search token IDs extracted from the `addr:place` tag. These tokens
are used for searches by address when no matching place can be found in the
database. Must be NULL when the place has no `addr:place` tag.
```sql
FUNCTION token_get_address_keys(info JSONB) RETURNS SETOF TEXT
```
Return the set of keys for which address information is provided. This
should correspond to the list of (relevant) `addr:*` tags with the `addr:`
prefix removed or the keys used in the `address` dictionary of the place info.
```sql
FUNCTION token_get_address_search_tokens(info JSONB, key TEXT) RETURNS INTEGER[]
```
Return the array of search tokens for the given address part. `key` can be
expected to be one of those returned with `token_get_address_keys()`. The
search tokens are added to the address search vector of the place, when no
corresponding OSM object could be found for the given address part from which
to copy the name information.
```sql
FUNCTION token_matches_address(info JSONB, key TEXT, tokens INTEGER[])
```
Check if the given tokens match against the address part `key`.
__Warning:__ the tokens that are handed in are the lists previously saved
from `token_get_name_search_tokens()`, _not_ from the match token list. This
is an historical oddity which will be fixed at some point in the future.
Currently, tokenizers are encouraged to make sure that matching works against
both the search token list and the match token list.
```sql
FUNCTION token_get_postcode(info JSONB) RETURNS TEXT
```
Return the postcode for the object, if any exists. The postcode must be in
the form that should also be presented to the end-user.
```sql
FUNCTION token_strip_info(info JSONB) RETURNS JSONB
```
Return the part of the `token_info` field that should be stored in the database
permanently. The indexer calls this function when all processing is done and
replaces the content of the `token_info` column with the returned value before
the trigger stores the information in the database. May return NULL if no
information should be stored permanently.

View File

@@ -1,35 +0,0 @@
@startuml
skinparam monochrome true
skinparam ObjectFontStyle bold
map search_name_X {
place_id => BIGINT
address_rank => SMALLINT
name_vector => INT[]
centroid => GEOMETRY
}
map location_area_large_X {
place_id => BIGINT
keywords => INT[]
partition => SMALLINT
rank_search => SMALLINT
rank_address => SMALLINT
country_code => VARCHR(2)
isguess => BOOLEAN
postcode => TEXT
centroid => POINT
geometry => GEOMETRY
}
map location_road_X {
place_id => BIGINT
partition => SMALLINT
country_code => VARCHR(2)
geometry => GEOMETRY
}
search_name_X -[hidden]> location_area_large_X
location_area_large_X -[hidden]> location_road_X
@enduml

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 11 KiB

View File

@@ -1,34 +0,0 @@
# Additional Data Sources
This guide explains how data sources other than OpenStreetMap mentioned in
the install instructions got obtained and converted.
## Country grid
Nominatim uses pre-generated country borders data. In case one imports only
a subset of a country. And to assign each place a partition. Nominatim
database tables are split into partitions for performance.
More details in [osm-search/country-grid-data](https://github.com/osm-search/country-grid-data).
## US Census TIGER
For the United States you can choose to import additional street-level data.
The data isn't mixed into OSM data but queried as fallback when no OSM
result can be found.
More details in [osm-search/TIGER-data](https://github.com/osm-search/TIGER-data).
## GB postcodes
For Great Britain you can choose to import Royalmail postcode centroids.
More details in [osm-search/gb-postcode-data](https://github.com/osm-search/gb-postcode-data).
## Wikipedia & Wikidata rankings
Nominatim can import "importance" data of place names. This greatly
improves ranking of results.
More details in [osm-search/wikipedia-wikidata](https://github.com/osm-search/wikipedia-wikidata).

View File

@@ -1,44 +0,0 @@
@startuml
skinparam monochrome true
skinparam ObjectFontStyle bold
map planet_osm_nodes #eee {
id => BIGINT
lat => INT
lon => INT
}
map planet_osm_ways #eee {
id => BIGINT
nodes => BIGINT[]
tags => TEXT[]
}
map planet_osm_rels #eee {
id => BIGINT
parts => BIGINT[]
members => TEXT[]
tags => TEXT[]
way_off => SMALLINT
rel_off => SMALLINT
}
map place {
osm_type => CHAR(1)
osm_id => BIGINT
class => TEXT
type => TEXT
name => HSTORE
address => HSTORE
extratags => HSTORE
admin_level => SMALLINT
geometry => GEOMETRY
}
planet_osm_nodes -[hidden]> planet_osm_ways
planet_osm_ways -[hidden]> planet_osm_rels
planet_osm_ways -[hidden]-> place
planet_osm_nodes::id <- planet_osm_ways::nodes
@enduml

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 13 KiB

View File

@@ -9,16 +9,16 @@ the address computation and the search frontend.
The __data import__ stage reads the raw OSM data and extracts all information
that is useful for geocoding. This part is done by osm2pgsql, the same tool
that can also be used to import a rendering database. It uses the special
gazetteer output plugin in `osm2pgsql/src/output-gazetter.[ch]pp`. The result of
gazetteer output plugin in `osm2pgsql/output-gazetter.[ch]pp`. The result of
the import can be found in the database table `place`.
The __address computation__ or __indexing__ stage takes the data from `place`
and adds additional information needed for geocoding. It ranks the places by
importance, links objects that belong together and computes addresses and
the search index. Most of this work is done in PL/pgSQL via database triggers
and can be found in the files in the `sql/functions/` directory.
and can be found in the file `sql/functions.sql`.
The __search frontend__ implements the actual API. It takes search
and reverse geocoding queries from the user, looks up the data and
returns the results in the requested format. This part is located in the
`nominatim-api` package. The source code can be found in `src/nominatim_api`.
returns the results in the requested format. This part is written in PHP
and can be found in the `lib/` and `website/` directories.

View File

@@ -1,31 +0,0 @@
@startuml
skinparam monochrome true
start
if (has 'addr:street'?) then (yes)
if (street with that name\n nearby?) then (yes)
:**Use closest street**
**with same name**;
kill
else (no)
:** Use closest**\n**street**;
kill
endif
elseif (has 'addr:place'?) then (yes)
if (place with that name\n nearby?) then (yes)
:**Use closest place**
**with same name**;
kill
else (no)
:add addr:place to address;
:**Use closest place**\n**rank 16 to 25**;
kill
endif
else (otherwise)
:**Use closest**\n**street**;
kill
endif
@enduml

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 9.8 KiB

View File

@@ -1,99 +0,0 @@
@startuml
skinparam monochrome true
skinparam ObjectFontStyle bold
left to right direction
map placex {
place_id => BIGINT
osm_type => CHAR(1)
osm_id => BIGINT
class => TEXT
type => TEXT
name => HSTORE
address => HSTORE
extratags => HSTORE
admin_level => SMALLINT
partition => SMALLINT
geometry_sector => INT
parent_place_id => BIGINT
linked_place_id => BIGINT
importance => DOUBLE
rank_search => SMALLINT
rank_address => SMALLINT
wikipedia => TEXT
country_code => VARCHAR(2)
housenumber => TEXT
postcode => TEXT
indexed_status => SMALLINT
indexed_date => TIMESTAMP
centroid => GEOMETRY
geometry => GEOMETRY
}
map search_name {
place_id => BIGINT
importance => DOUBLE
search_rank => SMALLINT
address_rank => SMALLINT
name_vector => INT[]
nameaddress_vector => INT[]
country_code => VARCHAR(2)
centroid => GEOMETRY
}
map word {
word_id => INT
word_token => TEXT
... =>
}
map location_property_osmline {
place_id => BIGINT
osm_id => BIGINT
startnumber => INT
endnumber => INT
interpolationtype => TEXT
address => HSTORE
partition => SMALLINT
geometry_sector => INT
parent_place_id => BIGINT
country_code => VARCHAR(2)
postcode => text
indexed_status => SMALLINT
indexed_date => TIMESTAMP
linegeo => GEOMETRY
}
map place_addressline {
place_id => BIGINT
address_place_id => BIGINT
distance => DOUBLE
cached_rank_address => SMALLINT
fromarea => BOOLEAN
isaddress => BOOLEAN
}
map location_postcode {
place_id => BIGINT
postcode => TEXT
parent_place_id => BIGINT
rank_search => SMALLINT
rank_address => SMALLINT
indexed_status => SMALLINT
indexed_date => TIMESTAMP
geometry => GEOMETRY
}
placex::place_id <-- search_name::place_id
placex::place_id <-- place_addressline::place_id
placex::place_id <-- place_addressline::address_place_id
search_name::name_vector --> word::word_id
search_name::nameaddress_vector --> word::word_id
place_addressline -[hidden]> location_property_osmline
search_name -[hidden]> place_addressline
location_property_osmline -[hidden]-> location_postcode
@enduml

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 35 KiB

Some files were not shown because too many files have changed in this diff Show More