mirror of
https://github.com/osm-search/Nominatim.git
synced 2026-02-16 15:47:58 +00:00
Gsoc2019 contributions for adding Wikidata to Nominatim (#1475)
Complete rewrite of wikipedia processing scripts, addition of processing wikidata, new data source, new documentation by @tchaddad during Google Summer of Code 2019 project.
This commit is contained in:
58
data-sources/wikipedia-wikidata/README.md
Normal file
58
data-sources/wikipedia-wikidata/README.md
Normal file
@@ -0,0 +1,58 @@
|
||||
## Add Wikipedia and Wikidata to Nominatim
|
||||
|
||||
OSM contributors frequently tag items with links to Wikipedia and Wikidata. Nominatim can use the page ranking of Wikipedia pages to help indicate the relative importance of osm features. This is done by calculating an importance score between 0 and 1 based on the number of inlinks to an article for a location. If two places have the same name and one is more important than the other, the wikipedia score often points to the correct place.
|
||||
|
||||
These scripts extract and prepare both Wikipedia page rank and Wikidata links for use in Nominatim.
|
||||
|
||||
#### Create a new postgres DB for Processing
|
||||
|
||||
Due to the size of initial and intermediate tables, processing can be done in an external database:
|
||||
```
|
||||
CREATE DATABASE wikiprocessingdb;
|
||||
```
|
||||
---
|
||||
Wikipedia
|
||||
---
|
||||
|
||||
Processing these data requires a large amount of disk space (~1TB) and considerable time (>24 hours).
|
||||
|
||||
#### Import & Process Wikipedia tables
|
||||
|
||||
This step downloads and converts [Wikipedia](https://dumps.wikimedia.org/) page data SQL dumps to postgreSQL files which can be imported and processed with pagelink information from Wikipedia language sites to calculate importance scores.
|
||||
|
||||
- The script will processes data from whatever set of Wikipedia languages are specified in the initial languages array
|
||||
|
||||
- Note that processing the top 40 Wikipedia languages can take over a day, and will add nearly 1TB to the processing database. The final output tables will be approximately 11GB and 2GB in size
|
||||
|
||||
To download, convert, and import the data, then process summary statistics and compute importance scores, run:
|
||||
```
|
||||
./wikipedia_import.sh
|
||||
```
|
||||
---
|
||||
Wikidata
|
||||
---
|
||||
|
||||
This script downloads and processes Wikidata to enrich the previously created Wekipedia tables for use in Nominatim.
|
||||
|
||||
#### Import & Process Wikidata
|
||||
|
||||
This step downloads and converts [Wikidata](https://dumps.wikimedia.org/wikidatawiki/) page data SQL dumps to postgreSQL files which can be processed and imported into Nominatim database. Also utilizes Wikidata Query Service API to discover and include place types.
|
||||
|
||||
- Script presumes that the user has already processed Wikipedia tables as specified above
|
||||
|
||||
- Script requires wikidata_place_types.txt and wikidata_place_type_levles.csv
|
||||
|
||||
- script requires the [jq json parser](https://stedolan.github.io/jq/)
|
||||
|
||||
- Script processes data from whatever set of Wikipedia languages are specified in the initial languages array
|
||||
|
||||
- Script queries Wikidata Query Service API and imports all instances of place types listed in wikidata_place_types.txt
|
||||
|
||||
- Script updates wikipedia_articles table with extracted wikidata
|
||||
|
||||
By including Wikidata in the wikipedia_articles table, new connections can be made on the fly from the Nominatim placex table to wikipedia_article importance scores.
|
||||
|
||||
To download, convert, and import the data, then process required items, run:
|
||||
```
|
||||
./wikidata_import.sh
|
||||
```
|
||||
95
data-sources/wikipedia-wikidata/import_wikidata.sh
Executable file
95
data-sources/wikipedia-wikidata/import_wikidata.sh
Executable file
@@ -0,0 +1,95 @@
|
||||
#!/bin/bash
|
||||
|
||||
psqlcmd() {
|
||||
psql wikiprocessingdb
|
||||
}
|
||||
|
||||
mysql2pgsqlcmd() {
|
||||
./mysql2pgsql.perl /dev/stdin /dev/stdout
|
||||
}
|
||||
|
||||
|
||||
# list the languages to process (refer to List of Wikipedias here: https://en.wikipedia.org/wiki/List_of_Wikipedias)
|
||||
|
||||
language=( "ar" "bg" "ca" "cs" "da" "de" "en" "es" "eo" "eu" "fa" "fr" "ko" "hi" "hr" "id" "it" "he" "lt" "hu" "ms" "nl" "ja" "no" "pl" "pt" "kk" "ro" "ru" "sk" "sl" "sr" "fi" "sv" "tr" "uk" "vi" "vo" "war" "zh" )
|
||||
|
||||
|
||||
# get a few wikidata dump tables
|
||||
|
||||
wget https://dumps.wikimedia.org/wikidatawiki/latest/wikidatawiki-latest-geo_tags.sql.gz
|
||||
wget https://dumps.wikimedia.org/wikidatawiki/latest/wikidatawiki-latest-page.sql.gz
|
||||
wget https://dumps.wikimedia.org/wikidatawiki/latest/wikidatawiki-latest-wb_items_per_site.sql.gz
|
||||
|
||||
|
||||
# import wikidata tables
|
||||
|
||||
gzip -dc wikidatawiki-latest-geo_tags.sql.gz | mysql2pgsqlcmd | psqlcmd
|
||||
gzip -dc wikidatawiki-latest-page.sql.gz | mysql2pgsqlcmd | psqlcmd
|
||||
gzip -dc wikidatawiki-latest-wb_items_per_site.sql.gz | mysql2pgsqlcmd | psqlcmd
|
||||
|
||||
|
||||
# get wikidata places from wikidata query API
|
||||
|
||||
while read F ; do
|
||||
wget "https://query.wikidata.org/bigdata/namespace/wdq/sparql?format=json&query=SELECT ?item WHERE{?item wdt:P31*/wdt:P279*wd:$F;}" -O $F.json
|
||||
jq -r '.results | .[] | .[] | [.item.value] | @csv' $F.json >> $F.txt
|
||||
awk -v qid=$F '{print $0 ","qid}' $F.txt | sed -e 's!"http://www.wikidata.org/entity/!!' | sed 's/"//g' >> $F.csv
|
||||
cat $F.csv >> wikidata_place_dump.csv
|
||||
rm $F.json $F.txt $F.csv
|
||||
done < wikidata_place_types.txt
|
||||
|
||||
|
||||
# import wikidata places
|
||||
|
||||
echo "CREATE TABLE wikidata_place_dump (item text, instance_of text);" | psqlcmd
|
||||
echo "COPY wikidata_place_dump (item, instance_of) FROM '/srv/nominatim/Nominatim/data-sources/wikipedia-wikidata/wikidata_place_dump.csv' DELIMITER ',' CSV;" | psqlcmd
|
||||
|
||||
echo "CREATE TABLE wikidata_place_type_levels (place_type text, level integer);" | psqlcmd
|
||||
echo "COPY wikidata_place_type_levels (place_type, level) FROM '/srv/nominatim/Nominatim/data-sources/wikipedia-wikidata/wikidata_place_type_levels.csv' DELIMITER ',' CSV HEADER;" | psqlcmd
|
||||
|
||||
|
||||
# create derived tables
|
||||
|
||||
echo "CREATE TABLE geo_earth_primary AS SELECT gt_page_id, gt_lat, gt_lon FROM geo_tags WHERE gt_globe = 'earth' AND gt_primary = 1 AND NOT( gt_lat < -90 OR gt_lat > 90 OR gt_lon < -180 OR gt_lon > 180 OR gt_lat=0 OR gt_lon=0) ;" | psqlcmd
|
||||
echo "CREATE TABLE geo_earth_wikidata AS SELECT DISTINCT geo_earth_primary.gt_page_id, geo_earth_primary.gt_lat, geo_earth_primary.gt_lon, page.page_title, page.page_namespace FROM geo_earth_primary LEFT OUTER JOIN page ON (geo_earth_primary.gt_page_id = page.page_id) ORDER BY geo_earth_primary.gt_page_id;" | psqlcmd
|
||||
|
||||
echo "ALTER TABLE wikidata_place_dump ADD COLUMN ont_level integer, ADD COLUMN lat numeric(11,8), ADD COLUMN lon numeric(11,8);" | psqlcmd
|
||||
echo "UPDATE wikidata_place_dump SET ont_level = wikidata_place_type_levels.level FROM wikidata_place_type_levels WHERE wikidata_place_dump.instance_of = wikidata_place_type_levels.place_type;" | psqlcmd
|
||||
|
||||
echo "CREATE TABLE wikidata_places AS SELECT DISTINCT ON (item) item, instance_of, MAX(ont_level) AS ont_level, lat, lon FROM wikidata_place_dump GROUP BY item, instance_of, ont_level, lat, lon ORDER BY item;" | psqlcmd
|
||||
echo "UPDATE wikidata_places SET lat = geo_earth_wikidata.gt_lat, lon = geo_earth_wikidata.gt_lon FROM geo_earth_wikidata WHERE wikidata_places.item = geo_earth_wikidata.page_title" | psqlcmd
|
||||
|
||||
|
||||
# process language pages
|
||||
|
||||
echo "CREATE TABLE wikidata_pages (item text, instance_of text, lat numeric(11,8), lon numeric(11,8), ips_site_page text, language text );" | psqlcmd
|
||||
|
||||
for i in "${language[@]}"
|
||||
do
|
||||
echo "CREATE TABLE wikidata_${i}_pages as select wikidata_places.item, wikidata_places.instance_of, wikidata_places.lat, wikidata_places.lon, wb_items_per_site.ips_site_page FROM wikidata_places LEFT JOIN wb_items_per_site ON (CAST (( LTRIM(wikidata_places.item, 'Q')) AS INTEGER) = wb_items_per_site.ips_item_id) WHERE ips_site_id = '${i}wiki' AND LEFT(wikidata_places.item,1) = 'Q' order by wikidata_places.item;" | psqlcmd
|
||||
echo "ALTER TABLE wikidata_${i}_pages ADD COLUMN language text;" | psqlcmd
|
||||
echo "UPDATE wikidata_${i}_pages SET language = '${i}';" | psqlcmd
|
||||
echo "INSERT INTO wikidata_pages SELECT item, instance_of, lat, lon, ips_site_page, language FROM wikidata_${i}_pages;" | psqlcmd
|
||||
done
|
||||
|
||||
echo "ALTER TABLE wikidata_pages ADD COLUMN wp_page_title text;" | psqlcmd
|
||||
echo "UPDATE wikidata_pages SET wp_page_title = REPLACE(ips_site_page, ' ', '_');" | psqlcmd
|
||||
echo "ALTER TABLE wikidata_pages DROP COLUMN ips_site_page;" | psqlcmd
|
||||
|
||||
|
||||
# add wikidata to wikipedia_article table
|
||||
|
||||
echo "UPDATE wikipedia_article SET lat = wikidata_pages.lat, lon = wikidata_pages.lon, wd_page_title = wikidata_pages.item, instance_of = wikidata_pages.instance_of FROM wikidata_pages WHERE wikipedia_article.language = wikidata_pages.language AND wikipedia_article.title = wikidata_pages.wp_page_title;" | psqlcmd
|
||||
echo "CREATE TABLE wikipedia_article_slim AS SELECT * FROM wikipedia_article WHERE wikidata_id IS NOT NULL;" | psqlcmd
|
||||
echo "ALTER TABLE wikipedia_article RENAME TO wikipedia_article_full;" | psqlcmd
|
||||
echo "ALTER TABLE wikipedia_article_slim RENAME TO wikipedia_article;" | psqlcmd
|
||||
|
||||
|
||||
# clean up intermediate tables
|
||||
|
||||
echo "DROP TABLE wikidata_place_dump;" | psqlcmd
|
||||
echo "DROP TABLE geo_earth_primary;" | psqlcmd
|
||||
for i in "${language[@]}"
|
||||
do
|
||||
echo "DROP TABLE wikidata_${i}_pages;" | psqlcmd
|
||||
done
|
||||
77
data-sources/wikipedia-wikidata/import_wikipedia.sh
Executable file
77
data-sources/wikipedia-wikidata/import_wikipedia.sh
Executable file
@@ -0,0 +1,77 @@
|
||||
#!/bin/bash
|
||||
|
||||
psqlcmd() {
|
||||
psql wikiprocessingdb
|
||||
}
|
||||
|
||||
mysql2pgsqlcmd() {
|
||||
./mysql2pgsql.perl /dev/stdin /dev/stdout
|
||||
}
|
||||
|
||||
|
||||
# list the languages to process (refer to List of Wikipedias here: https://en.wikipedia.org/wiki/List_of_Wikipedias)
|
||||
|
||||
language=( "ar" "bg" "ca" "cs" "da" "de" "en" "es" "eo" "eu" "fa" "fr" "ko" "hi" "hr" "id" "it" "he" "lt" "hu" "ms" "nl" "ja" "no" "pl" "pt" "kk" "ro" "ru" "sk" "sl" "sr" "fi" "sv" "tr" "uk" "vi" "vo" "war" "zh" )
|
||||
|
||||
|
||||
# create wikipedia calculation tables
|
||||
|
||||
echo "CREATE TABLE linkcounts (language text, title text, count integer, sumcount integer, lat double precision, lon double precision);" | psqlcmd
|
||||
echo "CREATE TABLE wikipedia_article (language text NOT NULL, title text NOT NULL, langcount integer, othercount integer, totalcount integer, lat double precision, lon double precision, importance double precision, title_en text, osm_type character(1), osm_id bigint );" | psqlcmd
|
||||
echo "CREATE TABLE wikipedia_redirect (language text, from_title text, to_title text );" | psqlcmd
|
||||
|
||||
|
||||
# download individual wikipedia language tables
|
||||
|
||||
for i in "${language[@]}"
|
||||
do
|
||||
wget https://dumps.wikimedia.org/${i}wiki/latest/${i}wiki-latest-page.sql.gz
|
||||
wget https://dumps.wikimedia.org/${i}wiki/latest/${i}wiki-latest-pagelinks.sql.gz
|
||||
wget https://dumps.wikimedia.org/${i}wiki/latest/${i}wiki-latest-langlinks.sql.gz
|
||||
wget https://dumps.wikimedia.org/${i}wiki/latest/${i}wiki-latest-redirect.sql.gz
|
||||
done
|
||||
|
||||
|
||||
# import individual wikipedia language tables
|
||||
|
||||
for i in "${language[@]}"
|
||||
do
|
||||
gzip -dc ${i}wiki-latest-pagelinks.sql.gz | sed "s/\`pagelinks\`/\`${i}pagelinks\`/g" | mysql2pgsqlcmd | psqlcmd
|
||||
gzip -dc ${i}wiki-latest-page.sql.gz | sed "s/\`page\`/\`${i}page\`/g" | mysql2pgsqlcmd | psqlcmd
|
||||
gzip -dc ${i}wiki-latest-langlinks.sql.gz | sed "s/\`langlinks\`/\`${i}langlinks\`/g" | mysql2pgsqlcmd | psqlcmd
|
||||
gzip -dc ${i}wiki-latest-redirect.sql.gz | sed "s/\`redirect\`/\`${i}redirect\`/g" | mysql2pgsqlcmd | psqlcmd
|
||||
done
|
||||
|
||||
|
||||
# process language tables and associated pagelink counts
|
||||
|
||||
for i in "${language[@]}"
|
||||
do
|
||||
echo "create table ${i}pagelinkcount as select pl_title as title,count(*) as count from ${i}pagelinks where pl_namespace = 0 group by pl_title;" | psqlcmd
|
||||
echo "insert into linkcounts select '${i}',pl_title,count(*) from ${i}pagelinks where pl_namespace = 0 group by pl_title;" | psqlcmd
|
||||
echo "insert into wikipedia_redirect select '${i}',page_title,rd_title from ${i}redirect join ${i}page on (rd_from = page_id) where page_namespace = 0 and rd_namespace = 0;" | psqlcmd
|
||||
echo "alter table ${i}pagelinkcount add column othercount integer;" | psqlcmd
|
||||
echo "update ${i}pagelinkcount set othercount = 0;" | psqlcmd
|
||||
for j in "${language[@]}"
|
||||
do
|
||||
echo "update ${i}pagelinkcount set othercount = ${i}pagelinkcount.othercount + x.count from (select page_title as title,count from ${i}langlinks join ${i}page on (ll_from = page_id) join ${j}pagelinkcount on (ll_lang = '${j}' and ll_title = title)) as x where x.title = ${i}pagelinkcount.title;" | psqlcmd
|
||||
done
|
||||
echo "insert into wikipedia_article select '${i}', title, count, othercount, count+othercount from ${i}pagelinkcount;" | psqlcmd
|
||||
done
|
||||
|
||||
|
||||
# calculate importance score for each wikipedia page
|
||||
|
||||
echo "update wikipedia_article set importance = log(totalcount)/log((select max(totalcount) from wikipedia_article))" | psqlcmd
|
||||
|
||||
|
||||
# clean up intermediate tables to conserve space
|
||||
|
||||
for i in "${language[@]}"
|
||||
do
|
||||
echo "DROP TABLE ${i}pagelinks;" | psqlcmd
|
||||
echo "DROP TABLE ${i}page;" | psqlcmd
|
||||
echo "DROP TABLE ${i}langlinks;" | psqlcmd
|
||||
echo "DROP TABLE ${i}redirect;" | psqlcmd
|
||||
echo "DROP TABLE ${i}pagelinkcount;" | psqlcmd
|
||||
done
|
||||
951
data-sources/wikipedia-wikidata/mysql2pgsql.perl
Executable file
951
data-sources/wikipedia-wikidata/mysql2pgsql.perl
Executable file
@@ -0,0 +1,951 @@
|
||||
#!/usr/bin/perl -w
|
||||
# mysql2pgsql
|
||||
# MySQL to PostgreSQL dump file converter
|
||||
#
|
||||
# For usage: perl mysql2pgsql.perl --help
|
||||
#
|
||||
# ddl statments are changed but none or only minimal real data
|
||||
# formatting are done.
|
||||
# data consistency is up to the DBA.
|
||||
#
|
||||
# (c) 2004-2007 Jose M Duarte and Joseph Speigle ... gborg
|
||||
#
|
||||
# (c) 2000-2004 Maxim Rudensky <fonin@omnistaronline.com>
|
||||
# (c) 2000 Valentine Danilchuk <valdan@ziet.zhitomir.ua>
|
||||
# All rights reserved.
|
||||
#
|
||||
# Redistribution and use in source and binary forms, with or without
|
||||
# modification, are permitted provided that the following conditions
|
||||
# are met:
|
||||
# 1. Redistributions of source code must retain the above copyright
|
||||
# notice, this list of conditions and the following disclaimer.
|
||||
# 2. Redistributions in binary form must reproduce the above copyright
|
||||
# notice, this list of conditions and the following disclaimer in the
|
||||
# documentation and/or other materials provided with the distribution.
|
||||
# 3. All advertising materials mentioning features or use of this software
|
||||
# must display the following acknowledgement:
|
||||
# This product includes software developed by the Max Rudensky
|
||||
# and its contributors.
|
||||
# 4. Neither the name of the author nor the names of its contributors
|
||||
# may be used to endorse or promote products derived from this software
|
||||
# without specific prior written permission.
|
||||
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
||||
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
||||
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
||||
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
||||
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
||||
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
||||
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
# SUCH DAMAGE.
|
||||
|
||||
use Getopt::Long;
|
||||
|
||||
use POSIX;
|
||||
|
||||
use strict;
|
||||
use warnings;
|
||||
|
||||
|
||||
# main sections
|
||||
# -------------
|
||||
# 1 variable declarations
|
||||
# 2 subroutines
|
||||
# 3 get commandline options and specify help statement
|
||||
# 4 loop through file and process
|
||||
# 5. print_plpgsql function prototype
|
||||
|
||||
#################################################################
|
||||
# 1. variable declarations
|
||||
#################################################################
|
||||
# command line options
|
||||
my( $ENC_IN, $ENC_OUT, $PRESERVE_CASE, $HELP, $DEBUG, $SCHEMA, $LOWERCASE, $CHAR2VARCHAR, $NODROP, $SEP_FILE, $opt_debug, $opt_help, $opt_schema, $opt_preserve_case, $opt_char2varchar, $opt_nodrop, $opt_sepfile, $opt_enc_in, $opt_enc_out );
|
||||
# variables for constructing pre-create-table entities
|
||||
my $pre_create_sql=''; # comments, 'enum' constraints preceding create table statement
|
||||
my $auto_increment_seq= ''; # so we can easily substitute it if we need a default value
|
||||
my $create_sql=''; # all the datatypes in the create table section
|
||||
my $post_create_sql=''; # create indexes, foreign keys, table comments
|
||||
my $function_create_sql = ''; # for the set (function,trigger) and CURRENT_TIMESTAMP ( function,trigger )
|
||||
# constraints
|
||||
my ($type, $column_valuesStr, @column_values, $value );
|
||||
my %constraints=(); # holds values constraints used to emulate mysql datatypes (e.g. year, set)
|
||||
# datatype conversion variables
|
||||
my ( $index,$seq);
|
||||
my ( $column_name, $col, $quoted_column);
|
||||
my ( @year_holder, $year, $constraint_table_name);
|
||||
my $table=""; # table_name for create sql statements
|
||||
my $table_no_quotes=""; # table_name for create sql statements
|
||||
my $sl = '^\s+\w+\s+'; # matches the column name
|
||||
my $tables_first_timestamp_column= 1; # decision to print warnings about default_timestamp not being in postgres
|
||||
my $mysql_numeric_datatypes = "TINYINT|SMALLINT|MEDIUMINT|INT|INTEGER|BIGINT|REAL|DOUBLE|FLOAT|DECIMAL|NUMERIC";
|
||||
my $mysql_datetime_datatypes = "|DATE|TIME|TIMESTAMP|DATETIME|YEAR";
|
||||
my $mysql_text_datatypes = "CHAR|VARCHAR|BINARY|VARBINARY|TINYBLOB|BLOB|MEDIUMBLOB|LONGBLOB|TINYTEXT|TEXT|MEDIUMTEXT|LONGTEXT|ENUM|SET";
|
||||
my $mysql_datatypesStr = $mysql_numeric_datatypes . "|". $mysql_datetime_datatypes . "|". $mysql_text_datatypes ;
|
||||
# handling INSERT INTO statements
|
||||
my $rowRe = qr{
|
||||
\( # opening parens
|
||||
( # (start capture)
|
||||
(?: # (start group)
|
||||
' # string start
|
||||
[^'\\]* # up to string-end or backslash (escape)
|
||||
(?: # (start group)
|
||||
\\. # gobble escaped character
|
||||
[^'\\]* # up to string-end of backslash
|
||||
)* # (end group, repeat zero or more)
|
||||
' # string end
|
||||
| # (OR)
|
||||
.*? # everything else (not strings)
|
||||
)* # (end group, repeat zero or more)
|
||||
) # (end capture)
|
||||
\) # closing parent
|
||||
}x;
|
||||
|
||||
my ($insert_table, $valueString);
|
||||
#
|
||||
########################################################
|
||||
# 2. subroutines
|
||||
#
|
||||
# get_identifier
|
||||
# print_post_create_sql()
|
||||
# quote_and_lc()
|
||||
# make_plpgsql($table,$column_name) -- at end of file
|
||||
########################################################
|
||||
|
||||
# returns an identifier with the given suffix doing controlled
|
||||
# truncation if necessary
|
||||
sub get_identifier($$$) {
|
||||
my ($table, $col, $suffix) = @_;
|
||||
my $name = '';
|
||||
$table=~s/\"//g; # make sure that $table doesn't have quotes so we don't end up with redundant quoting
|
||||
# in the case of multiple columns
|
||||
my @cols = split(/,/,$col);
|
||||
$col =~ s/,//g;
|
||||
# in case all columns together too long we have to truncate them
|
||||
if (length($col) > 55) {
|
||||
my $totaltocut = length($col)-55;
|
||||
my $tocut = ceil($totaltocut / @cols);
|
||||
@cols = map {substr($_,0,abs(length($_)-$tocut))} @cols;
|
||||
$col="";
|
||||
foreach (@cols){
|
||||
$col.=$_;
|
||||
}
|
||||
}
|
||||
|
||||
my $max_table_length = 63 - length("_${col}_$suffix");
|
||||
|
||||
if (length($table) > $max_table_length) {
|
||||
$table = substr($table, length($table) - $max_table_length, $max_table_length);
|
||||
}
|
||||
return quote_and_lc("${table}_${col}_${suffix}");
|
||||
}
|
||||
|
||||
|
||||
#
|
||||
#
|
||||
# called when we encounter next CREATE TABLE statement
|
||||
# also called at EOF to print out for last table
|
||||
# prints comments, indexes, foreign key constraints (the latter 2 possibly to a separate file)
|
||||
sub print_post_create_sql() {
|
||||
my ( @create_idx_comments_constraints_commandsArr, $stmts, $table_field_combination);
|
||||
my %stmts;
|
||||
# loop to check for duplicates in $post_create_sql
|
||||
# Needed because of duplicate key declarations ( PRIMARY KEY and KEY), auto_increment columns
|
||||
|
||||
@create_idx_comments_constraints_commandsArr = split(';\n?', $post_create_sql);
|
||||
if ($SEP_FILE) {
|
||||
open(SEP_FILE, ">>:encoding($ENC_OUT)", $SEP_FILE) or die "Unable to open $SEP_FILE for output: $!\n";
|
||||
}
|
||||
|
||||
foreach (@create_idx_comments_constraints_commandsArr) {
|
||||
if (m/CREATE INDEX "*(\S+)"*\s/i) { # CREATE INDEX korean_english_wordsize_idx ON korean_english USING btree (wordsize);
|
||||
$table_field_combination = $1;
|
||||
# if this particular table_field_combination was already used do not print the statement:
|
||||
if ($SEP_FILE) {
|
||||
print SEP_FILE "$_;\n" if !defined($stmts{$table_field_combination});
|
||||
} else {
|
||||
print OUT "$_;\n" if !defined($stmts{$table_field_combination});
|
||||
}
|
||||
$stmts{$table_field_combination} = 1;
|
||||
}
|
||||
elsif (m/COMMENT/i) { # COMMENT ON object IS 'text'; but comment may be part of table name so use 'elsif'
|
||||
print OUT "$_;\n"
|
||||
} else { # foreign key constraint or comments (those preceded by -- )
|
||||
if ($SEP_FILE) {
|
||||
print SEP_FILE "$_;\n";
|
||||
} else {
|
||||
print OUT "$_;\n"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if ($SEP_FILE) {
|
||||
close SEP_FILE;
|
||||
}
|
||||
$post_create_sql='';
|
||||
# empty %constraints for next " create table" statement
|
||||
}
|
||||
|
||||
# quotes a string or a multicolumn string (comma separated)
|
||||
# and optionally lowercase (if LOWERCASE is set)
|
||||
# lowercase .... if user wants default postgres behavior
|
||||
# quotes .... to preserve keywords and to preserve case when case-sensitive tables are to be used
|
||||
sub quote_and_lc($)
|
||||
{
|
||||
my $col = shift;
|
||||
if ($LOWERCASE) {
|
||||
$col = lc($col);
|
||||
}
|
||||
if ($col =~ m/,/) {
|
||||
my @cols = split(/,\s?/, $col);
|
||||
@cols = map {"\"$_\""} @cols;
|
||||
return join(', ', @cols);
|
||||
} else {
|
||||
return "\"$col\"";
|
||||
}
|
||||
}
|
||||
|
||||
########################################################
|
||||
# 3. get commandline options and maybe print help
|
||||
########################################################
|
||||
|
||||
GetOptions("help", "debug"=> \$opt_debug, "schema=s" => \$SCHEMA, "preserve_case" => \$opt_preserve_case, "char2varchar" => \$opt_char2varchar, "nodrop" => \$opt_nodrop, "sepfile=s" => \$opt_sepfile, "enc_in=s" => \$opt_enc_in, "enc_out=s" => \$opt_enc_out );
|
||||
|
||||
$HELP = $opt_help || 0;
|
||||
$DEBUG = $opt_debug || 0;
|
||||
$PRESERVE_CASE = $opt_preserve_case || 0;
|
||||
if ($PRESERVE_CASE == 1) { $LOWERCASE = 0; }
|
||||
else { $LOWERCASE = 1; }
|
||||
$CHAR2VARCHAR = $opt_char2varchar || 0;
|
||||
$NODROP = $opt_nodrop || 0;
|
||||
$SEP_FILE = $opt_sepfile || 0;
|
||||
$ENC_IN = $opt_enc_in || 'utf8';
|
||||
$ENC_OUT = $opt_enc_out || 'utf8';
|
||||
|
||||
if (($HELP) || ! defined($ARGV[0]) || ! defined($ARGV[1])) {
|
||||
print "\n\nUsage: perl $0 {--help --debug --preserve_case --char2varchar --nodrop --schema --sepfile --enc_in --enc_out } mysql.sql pg.sql\n";
|
||||
print "\t* OPTIONS WITHOUT ARGS\n";
|
||||
print "\t--help: prints this message \n";
|
||||
print "\t--debug: output the commented-out mysql line above the postgres line in pg.sql \n";
|
||||
print "\t--preserve_case: prevents automatic case-lowering of column and table names\n";
|
||||
print "\t\tIf you want to preserve case, you must set this flag. For example,\n";
|
||||
print "\t\tIf your client application quotes table and column-names and they have cases in them, set this flag\n";
|
||||
print "\t--char2varchar: converts all char fields to varchar\n";
|
||||
print "\t--nodrop: strips out DROP TABLE statements\n";
|
||||
print "\t\totherise harmless warnings are printed by psql when the dropped table does not exist\n";
|
||||
print "\n\t* OPTIONS WITH ARGS\n";
|
||||
print "\t--schema: outputs a line into the postgres sql file setting search_path \n";
|
||||
print "\t--sepfile: output foreign key constraints and indexes to a separate file so that it can be\n";
|
||||
print "\t\timported after large data set is inserted from another dump file\n";
|
||||
print "\t--enc_in: encoding of mysql in file (default utf8) \n";
|
||||
print "\t--enc_out: encoding of postgres out file (default utf8) \n";
|
||||
print "\n\t* REQUIRED ARGUMENTS\n";
|
||||
if (defined ($ARGV[0])) {
|
||||
print "\tmysql.sql ($ARGV[0])\n";
|
||||
} else {
|
||||
print "\tmysql.sql (undefined)\n";
|
||||
}
|
||||
if (defined ($ARGV[1])) {
|
||||
print "\tpg.sql ($ARGV[1])\n";
|
||||
} else {
|
||||
print "\tpg.sql (undefined)\n";
|
||||
}
|
||||
print "\n";
|
||||
exit 1;
|
||||
}
|
||||
########################################################
|
||||
# 4. process through mysql_dump.sql file
|
||||
# in a big loop
|
||||
########################################################
|
||||
|
||||
# open in and out files
|
||||
open(IN,"<:encoding($ENC_IN)", $ARGV[0]) || die "can't open mysql dump file $ARGV[0]";
|
||||
open(OUT,">:encoding($ENC_OUT)", $ARGV[1]) || die "can't open pg dump file $ARGV[1]";
|
||||
|
||||
# output header
|
||||
print OUT "--\n";
|
||||
print OUT "-- Generated from mysql2pgsql.perl\n";
|
||||
print OUT "-- http://gborg.postgresql.org/project/mysql2psql/\n";
|
||||
print OUT "-- (c) 2001 - 2007 Jose M. Duarte, Joseph Speigle\n";
|
||||
print OUT "--\n";
|
||||
print OUT "\n";
|
||||
print OUT "-- warnings are printed for drop tables if they do not exist\n";
|
||||
print OUT "-- please see http://archives.postgresql.org/pgsql-novice/2004-10/msg00158.php\n\n";
|
||||
print OUT "-- ##############################################################\n";
|
||||
|
||||
if ($SCHEMA ) {
|
||||
print OUT "set search_path='" . $SCHEMA . "'\\g\n" ;
|
||||
}
|
||||
|
||||
# loop through mysql file on a per-line basis
|
||||
while(<IN>) {
|
||||
|
||||
############## flow #########################
|
||||
# (the lines are directed to different string variables at different times)
|
||||
#
|
||||
# handle drop table , unlock, connect statements
|
||||
# if ( start of create table) {
|
||||
# print out post_create table (indexes, foreign key constraints, comments from previous table)
|
||||
# add drop table statement if !$NODROP to pre_create_sql
|
||||
# next;
|
||||
# }
|
||||
# else if ( inside create table) {
|
||||
# add comments in this portion to create_sql
|
||||
# if ( end of create table) {
|
||||
# delete mysql-unique CREATE TABLE commands
|
||||
# print pre_create_sql
|
||||
# print the constraint tables for set and year datatypes
|
||||
# print create_sql
|
||||
# print function_create_sql (this is for the enum columns only)
|
||||
# next;
|
||||
# }
|
||||
# do substitutions
|
||||
# -- NUMERIC DATATYPES
|
||||
# -- CHARACTER DATATYPES
|
||||
# -- DATE AND TIME DATATYPES
|
||||
# -- KEY AND UNIQUE CREATIONS
|
||||
# and append them to create_sql
|
||||
# } else {
|
||||
# print inserts on-the-spot (this script only changes default timestamp of 0000-00-00)
|
||||
# }
|
||||
# LOOP until EOF
|
||||
#
|
||||
########################################################
|
||||
|
||||
|
||||
if (!/^\s*insert into/i) { # not inside create table so don't worry about data corruption
|
||||
s/`//g; # '`pgsql uses no backticks to denote table name (CREATE TABLE `sd`) or around field
|
||||
# and table names like mysql
|
||||
# doh! we hope all dashes and special chars are caught by the regular expressions :)
|
||||
}
|
||||
if (/^\s*USE\s*([^;]*);/) {
|
||||
print OUT "\\c ". $1;
|
||||
next;
|
||||
}
|
||||
if (/^(UN)?LOCK TABLES/i || /drop\s+table/i ) {
|
||||
|
||||
# skip
|
||||
# DROP TABLE is added when we see the CREATE TABLE
|
||||
next;
|
||||
}
|
||||
if (/(create\s+table\s+)([-_\w]+)\s/i) { # example: CREATE TABLE `english_english`
|
||||
print_post_create_sql(); # for last table
|
||||
$tables_first_timestamp_column= 1; # decision to print warnings about default_timestamp not being in postgres
|
||||
$create_sql = '';
|
||||
$table_no_quotes = $2 ;
|
||||
$table=quote_and_lc($2);
|
||||
if ( !$NODROP ) { # always print drop table if user doesn't explicitly say not to
|
||||
# to drop a table that is referenced by a view or a foreign-key constraint of another table,
|
||||
# CASCADE must be specified. (CASCADE will remove a dependent view entirely, but in the
|
||||
# in the foreign-key case it will only remove the foreign-key constraint, not the other table entirely.)
|
||||
# (source: 8.1.3 docs, section "drop table")
|
||||
warn "table $table will be dropped CASCADE\n";
|
||||
$pre_create_sql .= "DROP TABLE $table CASCADE;\n"; # custom dumps may be missing the 'dump' commands
|
||||
}
|
||||
|
||||
s/(create\s+table\s+)([-_\w]+)\s/$1 $table /i;
|
||||
if ($DEBUG) {
|
||||
$create_sql .= '-- ' . $_;
|
||||
}
|
||||
$create_sql .= $_;
|
||||
next;
|
||||
}
|
||||
if ($create_sql ne "") { # we are inside create table statement so lets process datatypes
|
||||
# print out comments or empty lines in context
|
||||
if ($DEBUG) {
|
||||
$create_sql .= '-- ' . $_;
|
||||
}
|
||||
if (/^#/ || /^$/ || /^\s*--/) {
|
||||
s/^#/--/; # Two hyphens (--) is the SQL-92 standard indicator for comments
|
||||
$create_sql.=$_;
|
||||
next;
|
||||
}
|
||||
|
||||
if (/\).*;/i) { # end of create table squence
|
||||
|
||||
s/INSERT METHOD[=\s+][^;\s]+//i;
|
||||
s/PASSWORD=[^;\s]+//i;
|
||||
s/ROW_FORMAT=(?:DEFAULT|DYNAMIC|FIXED|COMPRESSED|REDUNDANT|COMPACT)+//i;
|
||||
s/KEY_BLOCK_SIZE=8//i;
|
||||
s/DELAY KEY WRITE=[^;\s]+//i;
|
||||
s/INDEX DIRECTORY[=\s+][^;\s]+//i;
|
||||
s/DATA DIRECTORY=[^;\s]+//i;
|
||||
s/CONNECTION=[^;\s]+//i;
|
||||
s/CHECKSUM=[^;\s]+//i;
|
||||
s/Type=[^;\s]+//i; # ISAM , # older versions
|
||||
s/COLLATE=[^;\s]+//i; # table's collate
|
||||
s/COLLATE\s+[^;\s]+//i; # table's collate
|
||||
# possible AUTO_INCREMENT starting index, it is used in mysql 5.0.26, not sure since which version
|
||||
if (/AUTO_INCREMENT=(\d+)/i) {
|
||||
# should take < ---- ) ENGINE=MyISAM AUTO_INCREMENT=16 DEFAULT CHARSET=latin1;
|
||||
# and should ouput ---> CREATE SEQUENCE "rhm_host_info_id_seq" START WITH 16;
|
||||
my $start_value = $1;
|
||||
print $auto_increment_seq . "--\n";
|
||||
# print $pre_create_sql . "--\n";
|
||||
$pre_create_sql =~ s/(CREATE SEQUENCE $auto_increment_seq )/$1 START WITH $start_value /;
|
||||
}
|
||||
s/AUTO_INCREMENT=\d+//i;
|
||||
s/PACK_KEYS=\d//i; # mysql 5.0.22
|
||||
s/DEFAULT CHARSET=[^;\s]+//i; # my mysql version is 4.1.11
|
||||
s/ENGINE\s*=\s*[^;\s]+//i; # my mysql version is 4.1.11
|
||||
s/ROW_FORMAT=[^;\s]+//i; # my mysql version is 5.0.22
|
||||
s/KEY_BLOCK_SIZE=8//i;
|
||||
s/MIN_ROWS=[^;\s]+//i;
|
||||
s/MAX_ROWS=[^;\s]+//i;
|
||||
s/AVG_ROW_LENGTH=[^;\s]+//i;
|
||||
if (/COMMENT='([^']*)'/) { # ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='must be country zones';
|
||||
$post_create_sql.="COMMENT ON TABLE $table IS '$1'\;"; # COMMENT ON table_name IS 'text';
|
||||
s/COMMENT='[^']*'//i;
|
||||
}
|
||||
$create_sql =~ s/,$//g; # strip last , inside create table
|
||||
# make sure we end in a comma, as KEY statments are turned
|
||||
# into post_create_sql indices
|
||||
# they often are the last line so leaving a 'hanging comma'
|
||||
my @array = split("\n", $create_sql);
|
||||
for (my $a = $#array; $a >= 0; $a--) { #loop backwards
|
||||
if ($a == $#array && $array[$a] =~ m/,\s*$/) { # for last line
|
||||
$array[$a] =~ s/,\s*$//;
|
||||
next;
|
||||
}
|
||||
if ($array[$a] !~ m/create table/i) { # i.e. if there was more than one column in table
|
||||
if ($a != $#array && $array[$a] !~ m/,\s*$/ ) { # for second to last
|
||||
$array[$a] =~ s/$/,/;
|
||||
last;
|
||||
}
|
||||
elsif ($a != $#array && $array[$a] =~ m/,\s*$/ ) { # for second to last
|
||||
last;
|
||||
}
|
||||
}
|
||||
}
|
||||
$create_sql = join("\n", @array) . "\n";
|
||||
$create_sql .= $_;
|
||||
|
||||
# put comments out first
|
||||
print OUT $pre_create_sql;
|
||||
|
||||
# create separate table to reference and to hold mysql's possible set data-type
|
||||
# values. do that table's creation before create table
|
||||
# definition
|
||||
foreach $column_name (keys %constraints) {
|
||||
$type=$constraints{$column_name}{'type'};
|
||||
$column_valuesStr = $constraints{$column_name}{'values'};
|
||||
$constraint_table_name = get_identifier(${table},${column_name} ,"constraint_table");
|
||||
if ($type eq 'set') {
|
||||
print OUT qq~DROP TABLE $constraint_table_name CASCADE\\g\n~ ;
|
||||
print OUT qq~create table $constraint_table_name ( set_values varchar UNIQUE)\\g\n~ ;
|
||||
$function_create_sql .= make_plpgsql($table,$column_name);
|
||||
} elsif ($type eq 'year') {
|
||||
print OUT qq~DROP TABLE $constraint_table_name CASCADE\\g\n~ ;
|
||||
print OUT qq~create table $constraint_table_name ( year_values varchar UNIQUE)\\g\n~ ;
|
||||
}
|
||||
@column_values = split /,/, $column_valuesStr;
|
||||
foreach $value (@column_values) {
|
||||
print OUT qq~insert into $constraint_table_name values ( $value )\\g\n~; # ad ' for ints and varchars
|
||||
}
|
||||
}
|
||||
|
||||
$create_sql =~ s/double double/double precision/g;
|
||||
|
||||
# print create table and reset create table vars
|
||||
# when moving from each "create table" to "insert" part of dump
|
||||
print OUT $create_sql;
|
||||
print OUT $function_create_sql;
|
||||
$pre_create_sql="";
|
||||
$auto_increment_seq="";
|
||||
$create_sql="";
|
||||
$function_create_sql='';
|
||||
%constraints=();
|
||||
# the post_create_sql for this table is output at the beginning of the next table def
|
||||
# in case we want to make indexes after doing inserting
|
||||
next;
|
||||
}
|
||||
if (/^\s*(\w+)\s+.*COMMENT\s*'([^']*)'/) { #`zone_country_id` int(11) COMMENT 'column comment here',
|
||||
$quoted_column=quote_and_lc($1);
|
||||
$post_create_sql.="COMMENT ON COLUMN $table"."."." $quoted_column IS '$2'\;"; # COMMENT ON table_name.column_name IS 'text';
|
||||
s/COMMENT\s*'[^']*'//i;
|
||||
}
|
||||
|
||||
|
||||
# NUMERIC DATATYPES
|
||||
#
|
||||
# auto_increment -> sequences
|
||||
# UNSIGNED conversions
|
||||
# TINYINT
|
||||
# SMALLINT
|
||||
# MEDIUMINT
|
||||
# INT, INTEGER
|
||||
# BIGINT
|
||||
#
|
||||
# DOUBLE [PRECISION], REAL
|
||||
# DECIMAL(M,D), NUMERIC(M,D)
|
||||
# FLOAT(p)
|
||||
# FLOAT
|
||||
|
||||
s/(\w*int)\(\d+\)/$1/g; # hack of the (n) stuff for e.g. mediumint(2) int(3)
|
||||
|
||||
if (/^(\s*)(\w+)\s*.*numeric.*auto_increment/i) { # int,auto_increment -> serial
|
||||
$seq = get_identifier($table, $2, 'seq');
|
||||
$quoted_column=quote_and_lc($2);
|
||||
# Smash datatype to int8 and autogenerate the sequence.
|
||||
s/^(\s*)(\w+)\s*.*NUMERIC(.*)auto_increment([^,]*)/$1 $quoted_column serial8 $4/ig;
|
||||
$create_sql.=$_;
|
||||
next;
|
||||
}
|
||||
if (/^\s*(\w+)\s+.*int.*auto_increment/i) { # example: data_id mediumint(8) unsigned NOT NULL auto_increment,
|
||||
$seq = get_identifier($table, $1, 'seq');
|
||||
$quoted_column=quote_and_lc($1);
|
||||
s/(\s*)(\w+)\s+.*int.*auto_increment([^,]*)/$1 $quoted_column serial8 $3/ig;
|
||||
$create_sql.=$_;
|
||||
next;
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
# convert UNSIGNED to CHECK constraints
|
||||
if (m/^(\s*)(\w+)\s+((float|double precision|double|real|decimal|numeric))(.*)unsigned/i) {
|
||||
$quoted_column = quote_and_lc($2);
|
||||
s/^(\s*)(\w+)\s+((float|double precision|double|real|decimal|numeric))(.*)unsigned/$1 $quoted_column $3 $4 CHECK ($quoted_column >= 0)/i;
|
||||
}
|
||||
# example: `wordsize` tinyint(3) unsigned default NULL,
|
||||
if (m/^(\s+)(\w+)\s+(\w+)\s+unsigned/i) {
|
||||
$quoted_column=quote_and_lc($2);
|
||||
s/^(\s+)(\w+)\s+(\w+)\s+unsigned/$1 $quoted_column $3 CHECK ($quoted_column >= 0)/i;
|
||||
}
|
||||
if (m/^(\s*)(\w+)\s+(bigint.*)unsigned/) {
|
||||
$quoted_column=quote_and_lc($2);
|
||||
# see http://archives.postgresql.org/pgsql-general/2005-07/msg01178.php
|
||||
# and see http://www.postgresql.org/docs/8.2/interactive/datatype-numeric.html
|
||||
# see http://dev.mysql.com/doc/refman/5.1/en/numeric-types.html max size == 20 digits
|
||||
s/^(\s*)(\w+)\s+bigint(.*)unsigned/$1 $quoted_column NUMERIC (20,0) CHECK ($quoted_column >= 0)/i;
|
||||
|
||||
}
|
||||
|
||||
# int type conversion
|
||||
# TINYINT (signed) -128 to 127 (unsigned) 0 255
|
||||
# SMALLINT A small integer. The signed range is -32768 to 32767. The unsigned range is 0 to 65535.
|
||||
# MEDIUMINT A medium-sized integer. The signed range is -8388608 to 8388607. The unsigned range is 0 to 16777215.
|
||||
# INT A normal-size integer. The signed range is -2147483648 to 2147483647. The unsigned range is 0 to 4294967295.
|
||||
# BIGINT The signed range is -9223372036854775808 to 9223372036854775807. The unsigned range is 0 to 18446744073709551615
|
||||
# for postgres see http://www.postgresql.org/docs/8.2/static/datatype-numeric.html#DATATYPE-INT
|
||||
s/^(\s+"*\w+"*\s+)tinyint/$1 smallint/i;
|
||||
s/^(\s+"*\w+"*\s+)mediumint/$1 integer/i;
|
||||
|
||||
# the floating point types
|
||||
# double -> double precision
|
||||
# double(n,m) -> double precision
|
||||
# float - no need for conversion
|
||||
# float(n) - no need for conversion
|
||||
# float(n,m) -> double precision
|
||||
|
||||
s/(^\s*\w+\s+)double(\(\d+,\d+\))?/$1float/i;
|
||||
s/float(\(\d+,\d+\))/float/i;
|
||||
|
||||
#
|
||||
# CHARACTER TYPES
|
||||
#
|
||||
# set
|
||||
# enum
|
||||
# binary(M), VARBINARy(M), tinyblob, tinytext,
|
||||
# bit
|
||||
# char(M), varchar(M)
|
||||
# blob -> text
|
||||
# mediumblob
|
||||
# longblob, longtext
|
||||
# text -> text
|
||||
# mediumtext
|
||||
# longtext
|
||||
# mysql docs: A BLOB is a binary large object that can hold a variable amount of data.
|
||||
|
||||
# set
|
||||
# For example, a column specified as SET('one', 'two') NOT NULL can have any of these values:
|
||||
# ''
|
||||
# 'one'
|
||||
# 'two'
|
||||
# 'one,two'
|
||||
if (/(\w*)\s+set\(((?:['"]\w+['"]\s*,*)+(?:['"]\w+['"])*)\)(.*)$/i) { # example: `au_auth` set('r','w','d') NOT NULL default '',
|
||||
$column_name = $1;
|
||||
$constraints{$column_name}{'values'} = $2; # 'abc','def', ...
|
||||
$constraints{$column_name}{'type'} = "set"; # 'abc','def', ...
|
||||
$_ = qq~ $column_name varchar , ~;
|
||||
$column_name = quote_and_lc($1);
|
||||
$create_sql.=$_;
|
||||
next;
|
||||
|
||||
}
|
||||
if (/(\S*)\s+enum\(((?:['"][^'"]+['"]\s*,)+['"][^'"]+['"])\)(.*)$/i) { # enum handling
|
||||
# example: `test` enum('?','+','-') NOT NULL default '?'
|
||||
# $2 is the values of the enum 'abc','def', ...
|
||||
$quoted_column=quote_and_lc($1);
|
||||
# "test" NOT NULL default '?' CONSTRAINT test_test_constraint CHECK ("test" IN ('?','+','-'))
|
||||
$_ = qq~ $quoted_column varchar CHECK ($quoted_column IN ( $2 ))$3\n~; # just assume varchar?
|
||||
$create_sql.=$_;
|
||||
next;
|
||||
}
|
||||
# Take care of "binary" option for char and varchar
|
||||
# (pre-4.1.2, it indicated a byte array; from 4.1.2, indicates
|
||||
# a binary collation)
|
||||
s/(?:var)?char(?:\(\d+\))? (?:byte|binary)/text/i;
|
||||
if (m/(?:var)?binary\s*\(\d+\)/i) { # c varBINARY(3) in Mysql
|
||||
warn "WARNING in table '$table' '$_': binary type is converted to bytea (unsized) for Postgres\n";
|
||||
}
|
||||
s/(?:var)?binary(?:\(\d+\))?/text/i; # c varBINARY(3) in Mysql
|
||||
s/bit(?:\(\d+\))?/bytea/i; # bit datatype -> bytea
|
||||
|
||||
# large datatypes
|
||||
s/\w*blob/bytea/gi;
|
||||
s/tinytext/text/gi;
|
||||
s/mediumtext/text/gi;
|
||||
s/longtext/text/gi;
|
||||
|
||||
# char -> varchar -- if specified as a command line option
|
||||
# PostgreSQL would otherwise pad with spaces as opposed
|
||||
# to MySQL! Your user interface may depend on this!
|
||||
if ($CHAR2VARCHAR) {
|
||||
s/(^\s+\S+\s+)char/${1}varchar/gi;
|
||||
}
|
||||
|
||||
# nuke column's collate and character set
|
||||
s/(\S+)\s+character\s+set\s+\w+/$1/gi;
|
||||
s/(\S+)\s+collate\s+\w+/$1/gi;
|
||||
|
||||
#
|
||||
# DATE AND TIME TYPES
|
||||
#
|
||||
# date time
|
||||
# year
|
||||
# datetime
|
||||
# timestamp
|
||||
|
||||
# date time
|
||||
# these are the same types in postgres, just do the replacement of 0000-00-00 date
|
||||
|
||||
if (m/default '(\d+)-(\d+)-(\d+)([^']*)'/i) { # we grab the year, month and day
|
||||
# NOTE: times of 00:00:00 are possible and are okay
|
||||
my $time = '';
|
||||
my $year=$1;
|
||||
my $month= $2;
|
||||
my $day = $3;
|
||||
if ($4) {
|
||||
$time = $4;
|
||||
}
|
||||
if ($year eq "0000") { $year = '1970'; }
|
||||
if ($month eq "00") { $month = '01'; }
|
||||
if ($day eq "00") { $day = '01'; }
|
||||
s/default '[^']+'/default '$year-$month-$day$time'/i; # finally we replace with $datetime
|
||||
}
|
||||
|
||||
# convert mysql's year datatype to a constraint
|
||||
if (/(\w*)\s+year\(4\)(.*)$/i) { # can be integer OR string 1901-2155
|
||||
$constraint_table_name = get_identifier($table,$1 ,"constraint_table");
|
||||
$column_name=quote_and_lc($1);
|
||||
@year_holder = ();
|
||||
$year='';
|
||||
for (1901 .. 2155) {
|
||||
$year = "'$_'";
|
||||
unless ($year =~ /2155/) { $year .= ','; }
|
||||
push( @year_holder, $year);
|
||||
}
|
||||
$constraints{$column_name}{'values'} = join('','',@year_holder); # '1901','1902', ...
|
||||
$constraints{$column_name}{'type'} = "year";
|
||||
$_ = qq~ $column_name varchar CONSTRAINT ${table}_${column_name}_constraint REFERENCES $constraint_table_name ("year_values") $2\n~;
|
||||
$create_sql.=$_;
|
||||
next;
|
||||
} elsif (/(\w*)\s+year\(2\)(.*)$/i) { # same for a 2-integer string
|
||||
$constraint_table_name = get_identifier($table,$1 ,"constraint_table");
|
||||
$column_name=quote_and_lc($1);
|
||||
@year_holder = ();
|
||||
$year='';
|
||||
for (1970 .. 2069) {
|
||||
$year = "'$_'";
|
||||
if ($year =~ /2069/) { next; }
|
||||
push( @year_holder, $year);
|
||||
}
|
||||
push( @year_holder, '0000');
|
||||
$constraints{$column_name}{'values'} = join(',',@year_holder); # '1971','1972', ...
|
||||
$constraints{$column_name}{'type'} = "year"; # 'abc','def', ...
|
||||
$_ = qq~ $1 varchar CONSTRAINT ${table}_${column_name}_constraint REFERENCES $constraint_table_name ("year_values") $2\n~;
|
||||
$create_sql.=$_;
|
||||
next;
|
||||
}
|
||||
|
||||
# datetime
|
||||
# Default on a dump from MySQL 5.0.22 is in the same form as datetime so let it flow down
|
||||
# to the timestamp section and deal with it there
|
||||
s/(${sl})datetime /$1timestamp without time zone /i;
|
||||
|
||||
# change not null datetime field to null valid ones
|
||||
# (to support remapping of "zero time" to null
|
||||
# s/($sl)datetime not null/$1timestamp without time zone/i;
|
||||
|
||||
|
||||
# timestamps
|
||||
#
|
||||
# nuke datetime representation (not supported in PostgreSQL)
|
||||
# change default time of 0000-00-00 to 1970-01-01
|
||||
|
||||
# we may possibly need to create a trigger to provide
|
||||
# equal functionality with ON UPDATE CURRENT TIMESTAMP
|
||||
|
||||
|
||||
if (m/${sl}timestamp/i) {
|
||||
if ( m/ON UPDATE CURRENT_TIMESTAMP/i ) { # the ... default CURRENT_TIMESTAMP only applies for blank inserts, not updates
|
||||
s/ON UPDATE CURRENT_TIMESTAMP//i ;
|
||||
m/^\s*(\w+)\s+timestamp/i ;
|
||||
# automatic trigger creation
|
||||
$table_no_quotes =~ s/"//g;
|
||||
$function_create_sql .= " CREATE OR REPLACE FUNCTION update_". $table_no_quotes . "() RETURNS trigger AS '
|
||||
BEGIN
|
||||
NEW.$1 := CURRENT_TIMESTAMP;
|
||||
RETURN NEW;
|
||||
END;
|
||||
' LANGUAGE 'plpgsql';
|
||||
|
||||
-- before INSERT is handled by 'default CURRENT_TIMESTAMP'
|
||||
CREATE TRIGGER add_current_date_to_".$table_no_quotes." BEFORE UPDATE ON ". $table . " FOR EACH ROW EXECUTE PROCEDURE
|
||||
update_".$table_no_quotes."();\n";
|
||||
|
||||
}
|
||||
if ($tables_first_timestamp_column && m/DEFAULT NULL/i) {
|
||||
# DEFAULT NULL is the same as DEFAULT CURRENT_TIMESTAMP for the first TIMESTAMP column. (MYSQL manual)
|
||||
s/($sl)(timestamp\s+)default null/$1 $2 DEFAULT CURRENT_TIMESTAMP/i;
|
||||
}
|
||||
$tables_first_timestamp_column= 0;
|
||||
if (m/${sl}timestamp\s*\(\d+\)/i) { # fix for timestamps with width spec not handled (ID: 1628)
|
||||
warn "WARNING for in table '$table' '$_': your default timestamp width is being ignored for table $table \n";
|
||||
s/($sl)timestamp(?:\(\d+\))/$1datetime/i;
|
||||
}
|
||||
} # end timestamp section
|
||||
|
||||
# KEY AND UNIQUE CREATIONS
|
||||
#
|
||||
# unique
|
||||
if ( /^\s+unique\s+\(([^(]+)\)/i ) { # example UNIQUE `name` (`name`), same as UNIQUE KEY
|
||||
# POSTGRESQL: treat same as mysql unique
|
||||
$quoted_column = quote_and_lc($1);
|
||||
s/\s+unique\s+\(([^(]+)\)/ unique ($quoted_column) /i;
|
||||
$create_sql.=$_;
|
||||
next;
|
||||
} elsif ( /^\s+unique\s+key\s*(\w+)\s*\(([^(]+)\)/i ) { # example UNIQUE KEY `name` (`name`)
|
||||
# MYSQL: unique key: allows null=YES, allows duplicates=NO (*)
|
||||
# ... new ... UNIQUE KEY `unique_fullname` (`fullname`) in my mysql v. Ver 14.12 Distrib 5.1.7-beta
|
||||
# POSTGRESQL: treat same as mysql unique
|
||||
# just quote columns
|
||||
$quoted_column = quote_and_lc($2);
|
||||
s/\s+unique\s+key\s*(\w+)\s*\(([^(]+)\)/ unique ($quoted_column) /i;
|
||||
$create_sql.=$_;
|
||||
# the index corresponding to the 'key' is automatically created
|
||||
next;
|
||||
}
|
||||
# keys
|
||||
if ( /^\s+fulltext key\s+/i) { # example: FULLTEXT KEY `commenttext` (`commenttext`)
|
||||
# that is key as a word in the first check for a match
|
||||
# the tsvector datatype is made for these types of things
|
||||
# example mysql file:
|
||||
# what is tsvector datatype?
|
||||
# http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html
|
||||
warn "dba must do fulltext key transformation for $table\n";
|
||||
next;
|
||||
}
|
||||
if ( /^(\s+)constraint (\S+) foreign key \((\S+)\) references (\S+) \((\S+)\)(.*)/i ) {
|
||||
$quoted_column =quote_and_lc($3);
|
||||
$col=quote_and_lc($5);
|
||||
$post_create_sql .= "ALTER TABLE $table ADD FOREIGN KEY ($quoted_column) REFERENCES " . quote_and_lc($4) . " ($col);\n";
|
||||
next;
|
||||
}
|
||||
if ( /^\s*primary key\s*\(([^)]+)\)([,\s]+)/i ) { # example PRIMARY KEY (`name`)
|
||||
# MYSQL: primary key: allows null=NO , allows duplicates=NO
|
||||
# POSTGRESQL: When an index is declared unique, multiple table rows with equal indexed values will not be
|
||||
# allowed. Null values are not considered equal.
|
||||
# POSTGRESQL quote's source: 8.1.3 docs section 11.5 "unique indexes"
|
||||
# so, in postgres, we need to add a NOT NULL to the UNIQUE constraint
|
||||
# and, primary key (mysql) == primary key (postgres) so that we *really* don't need change anything
|
||||
$quoted_column = quote_and_lc($1);
|
||||
s/(\s*)primary key\s+\(([^)]+)\)([,\s]+)/$1 primary key ($quoted_column)$3/i;
|
||||
# indexes are automatically created for unique columns
|
||||
$create_sql.=$_;
|
||||
next;
|
||||
} elsif (m/^\s+key\s[-_\s\w]+\((.+)\)/i ) { # example: KEY `idx_mod_english_def_word` (`word`),
|
||||
# regular key: allows null=YES, allows duplicates=YES
|
||||
# MYSQL: KEY is normally a synonym for INDEX. http://dev.mysql.com/doc/refman/5.1/en/create-table.html
|
||||
#
|
||||
# * MySQL: ALTER TABLE {$table} ADD KEY $column ($column)
|
||||
# * PostgreSQL: CREATE INDEX {$table}_$column_idx ON {$table}($column) // Please note the _idx "extension"
|
||||
# PRIMARY KEY (`postid`),
|
||||
# KEY `ownerid` (`ownerid`)
|
||||
# create an index for everything which has a key listed for it.
|
||||
my $col = $1;
|
||||
# TODO we don't have a translation for the substring syntax in text columns in MySQL (e.g. "KEY my_idx (mytextcol(20))")
|
||||
# for now just getting rid of the brackets and numbers (the substring specifier):
|
||||
$col=~s/\(\d+\)//g;
|
||||
$quoted_column = quote_and_lc($col);
|
||||
if ($col =~ m/,/) {
|
||||
$col = s/,/_/;
|
||||
}
|
||||
$index = get_identifier($table, $col, 'idx');
|
||||
$post_create_sql.="CREATE INDEX $index ON $table USING btree ($quoted_column)\;";
|
||||
# just create index do not add to create table statement
|
||||
next;
|
||||
}
|
||||
|
||||
# handle 'key' declared at end of column
|
||||
if (/\w+.*primary key/i) { # mysql: key is normally just a synonym for index
|
||||
# just leave as is ( postgres has primary key type)
|
||||
|
||||
|
||||
} elsif (/(\w+\s+(?:$mysql_datatypesStr)\s+.*)key/i) { # mysql: key is normally just a synonym for index
|
||||
# I can't find a reference for 'key' in a postgres command without using the word 'primary key'
|
||||
s/$1key/$1/i ;
|
||||
$index = get_identifier($table, $1, 'idx');
|
||||
$quoted_column =quote_and_lc($1);
|
||||
$post_create_sql.="CREATE INDEX $index ON $table USING btree ($quoted_column) \;";
|
||||
$create_sql.=$_;
|
||||
}
|
||||
|
||||
|
||||
|
||||
# do we really need this anymore?
|
||||
# remap colums with names of existing system attribute
|
||||
if (/"oid"/i) {
|
||||
s/"oid"/"_oid"/g;
|
||||
print STDERR "WARNING: table $table uses column \"oid\" which is renamed to \"_oid\"\nYou should fix application manually! Press return to continue.";
|
||||
my $wait=<STDIN>;
|
||||
}
|
||||
|
||||
s/oid/_oid/i if (/key/i && /oid/i); # fix oid in key
|
||||
|
||||
# FINAL QUOTING OF ALL COLUMNS
|
||||
# quote column names which were not already quoted
|
||||
# perhaps they were not quoted because they were not explicitly handled
|
||||
if (!/^\s*"(\w+)"(\s+)/i) {
|
||||
/^(\s*)(\w+)(\s+)(.*)$/i ;
|
||||
$quoted_column= quote_and_lc($2);
|
||||
s/^(\s*)(\w+)(\s+)(.*)$/$1 $quoted_column $3 $4 /;
|
||||
}
|
||||
$create_sql.=$_;
|
||||
# END of if ($create_sql ne "") i.e. were inside create table statement so processed datatypes
|
||||
}
|
||||
# add "not in create table" comments or empty lines to pre_create_sql
|
||||
elsif (/^#/ || /^$/ || /^\s*--/) {
|
||||
s/^#/--/; # Two hyphens (--) is the SQL-92 standard indicator for comments
|
||||
$pre_create_sql .= $_ ; # printed above create table statement
|
||||
next;
|
||||
}
|
||||
elsif (/^\s*insert into/i) { # not inside create table and doing insert
|
||||
# fix mysql's zero/null value for timestamps
|
||||
s/'0000-00-00/'1970-01-01/gi;
|
||||
# commented out to fix bug "Field contents interpreted as a timestamp", what was the point of this line anyway?
|
||||
#s/([12]\d\d\d)([01]\d)([0-3]\d)([0-2]\d)([0-6]\d)([0-6]\d)/'$1-$2-$3 $4:$5:$6'/;
|
||||
|
||||
#---- fix data in inserted data: (from MS world)
|
||||
s!\x96!-!g; # --
|
||||
s!\x93!"!g; # ``
|
||||
s!\x94!"!g; # ''
|
||||
s!\x85!... !g; # \ldots
|
||||
s!\x92!`!g;
|
||||
|
||||
print OUT $pre_create_sql; # print comments preceding the insert section
|
||||
$pre_create_sql="";
|
||||
$auto_increment_seq = "";
|
||||
|
||||
s/'((?:[^'\\]++|\\.)*+)'(?=[),])/E'$1'/g;
|
||||
# for the E'' see http://www.postgresql.org/docs/8.2/interactive/release-8-1.html
|
||||
s!\\\\!\\\\\\\\!g; # replace \\ with ]\\\\
|
||||
|
||||
# split 'extended' INSERT INTO statements to something PostgreSQL can understand
|
||||
( $insert_table, $valueString) = $_ =~ m/^INSERT\s+INTO\s+['`"]*(.*?)['`"]*\s+VALUES\s*(.*)/i;
|
||||
$insert_table = quote_and_lc($insert_table);
|
||||
|
||||
s/^INSERT INTO.*?\);//i; # hose the statement which is to be replaced whether a run-on or not
|
||||
# guarantee table names are quoted
|
||||
print OUT qq(INSERT INTO $insert_table VALUES $valueString \n);
|
||||
|
||||
} else {
|
||||
print OUT $_ ; # example: /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
|
||||
}
|
||||
# keep looping and get next line of IN file
|
||||
|
||||
} # END while(<IN>)
|
||||
|
||||
print_post_create_sql(); # in case there is extra from the last table
|
||||
|
||||
#################################################################
|
||||
# 5. print_plgsql function prototype
|
||||
# emulate the set datatype with the following plpgsql function
|
||||
# looks ugly so putting at end of file
|
||||
#################################################################
|
||||
#
|
||||
sub make_plpgsql {
|
||||
my ($table,$column_name) = ($_[0],$_[1]);
|
||||
$table=~s/\"//g; # make sure that $table doesn't have quotes so we don't end up with redundant quoting
|
||||
my $constraint_table = get_identifier($table,$column_name ,"constraint_table");
|
||||
return "
|
||||
-- this function is called by the insert/update trigger
|
||||
-- it checks if the INSERT/UPDATE for the 'set' column
|
||||
-- contains members which comprise a valid mysql set
|
||||
-- this TRIGGER function therefore acts like a constraint
|
||||
-- provided limited functionality for mysql's set datatype
|
||||
-- just verifies and matches for string representations of the set at this point
|
||||
-- though the set datatype uses bit comparisons, the only supported arguments to our
|
||||
-- set datatype are VARCHAR arguments
|
||||
-- to add a member to the set add it to the ".$table."_".$column_name." table
|
||||
CREATE OR REPLACE FUNCTION check_".$table."_".$column_name."_set( ) RETURNS TRIGGER AS \$\$\n
|
||||
DECLARE
|
||||
----
|
||||
arg_str VARCHAR ;
|
||||
argx VARCHAR := '';
|
||||
nobreak INT := 1;
|
||||
rec_count INT := 0;
|
||||
psn INT := 0;
|
||||
str_in VARCHAR := NEW.$column_name;
|
||||
----
|
||||
BEGIN
|
||||
----
|
||||
IF str_in IS NULL THEN RETURN NEW ; END IF;
|
||||
arg_str := REGEXP_REPLACE(str_in, '\\',\\'', ','); -- str_in is CONSTANT
|
||||
arg_str := REGEXP_REPLACE(arg_str, '^\\'', '');
|
||||
arg_str := REGEXP_REPLACE(arg_str, '\\'\$', '');
|
||||
-- RAISE NOTICE 'arg_str %',arg_str;
|
||||
psn := POSITION(',' in arg_str);
|
||||
IF psn > 0 THEN
|
||||
psn := psn - 1; -- minus-1 from comma position
|
||||
-- RAISE NOTICE 'psn %',psn;
|
||||
argx := SUBSTRING(arg_str FROM 1 FOR psn); -- get one set member
|
||||
psn := psn + 2; -- go to first starting letter
|
||||
arg_str := SUBSTRING(arg_str FROM psn); -- hack it off
|
||||
ELSE
|
||||
psn := 0; -- minus-1 from comma position
|
||||
argx := arg_str;
|
||||
END IF;
|
||||
-- RAISE NOTICE 'argx %',argx;
|
||||
-- RAISE NOTICE 'new arg_str: %',arg_str;
|
||||
WHILE nobreak LOOP
|
||||
EXECUTE 'SELECT count(*) FROM $constraint_table WHERE set_values = ' || quote_literal(argx) INTO rec_count;
|
||||
IF rec_count = 0 THEN RAISE EXCEPTION 'one of the set values was not found';
|
||||
END IF;
|
||||
IF psn > 0 THEN
|
||||
psn := psn - 1; -- minus-1 from comma position
|
||||
-- RAISE NOTICE 'psn %',psn;
|
||||
argx := SUBSTRING(arg_str FROM 1 FOR psn); -- get one set member
|
||||
psn := psn + 2; -- go to first starting letter
|
||||
arg_str := SUBSTRING(arg_str FROM psn); -- hack it off
|
||||
psn := POSITION(',' in arg_str);
|
||||
ELSE nobreak = 0;
|
||||
END IF;
|
||||
-- RAISE NOTICE 'next argx % and next arg_str %', argx, arg_str;
|
||||
END LOOP;
|
||||
RETURN NEW;
|
||||
----
|
||||
END;
|
||||
\$\$ LANGUAGE 'plpgsql' VOLATILE;
|
||||
|
||||
drop trigger set_test ON $table;
|
||||
-- make a trigger for each set field
|
||||
-- make trigger and hard-code in column names
|
||||
-- see http://archives.postgresql.org/pgsql-interfaces/2005-02/msg00020.php
|
||||
CREATE TRIGGER set_test
|
||||
BEFORE INSERT OR UPDATE ON $table FOR EACH ROW
|
||||
EXECUTE PROCEDURE check_".$table."_".$column_name."_set();\n";
|
||||
} # end sub make_plpgsql();
|
||||
|
||||
199
data-sources/wikipedia-wikidata/wikidata_place_type_levels.csv
Normal file
199
data-sources/wikipedia-wikidata/wikidata_place_type_levels.csv
Normal file
@@ -0,0 +1,199 @@
|
||||
place_type,level
|
||||
Q9842,4
|
||||
Q9430,3
|
||||
Q928830,4
|
||||
Q9259,1
|
||||
Q91028,5
|
||||
Q8514,2
|
||||
Q8502,2
|
||||
Q83405,3
|
||||
Q82794,2
|
||||
Q820477,1
|
||||
Q811979,1
|
||||
Q8072,2
|
||||
Q79007,2
|
||||
Q786014,3
|
||||
Q75848,2
|
||||
Q75520,2
|
||||
Q728937,4
|
||||
Q7275,2
|
||||
Q719456,3
|
||||
Q7075,3
|
||||
Q697295,4
|
||||
Q6852233,2
|
||||
Q682943,3
|
||||
Q665487,5
|
||||
Q655686,3
|
||||
Q643589,5
|
||||
Q641226,2
|
||||
Q631305,2
|
||||
Q6256,2
|
||||
Q6023295,2
|
||||
Q5773747,5
|
||||
Q56061,1
|
||||
Q55659167,4
|
||||
Q55488,4
|
||||
Q55465477,3
|
||||
Q54050,2
|
||||
Q532,3
|
||||
Q53060,2
|
||||
Q52177058,4
|
||||
Q515716,5
|
||||
Q5153984,4
|
||||
Q515,3
|
||||
Q5144960,5
|
||||
Q5119,4
|
||||
Q5119,4
|
||||
Q5107,2
|
||||
Q5084,4
|
||||
Q5031071,4
|
||||
Q5003624,2
|
||||
Q4989906,1
|
||||
Q4976993,3
|
||||
Q486972,1
|
||||
Q486972,2
|
||||
Q483110,3
|
||||
Q4830453,4
|
||||
Q47521,3
|
||||
Q473972,1
|
||||
Q46831,2
|
||||
Q46614560,5
|
||||
Q44782,3
|
||||
Q44613,4
|
||||
Q44539,4
|
||||
Q44494,2
|
||||
Q44377,2
|
||||
Q4421,2
|
||||
Q43501,2
|
||||
Q4286337,3
|
||||
Q42523,3
|
||||
Q41176,2
|
||||
Q40357,3
|
||||
Q4022,4
|
||||
Q40080,2
|
||||
Q39816,2
|
||||
Q39715,3
|
||||
Q39614,1
|
||||
Q3957,3
|
||||
Q3947,4
|
||||
Q3914,3
|
||||
Q38723,2
|
||||
Q38720,3
|
||||
Q3623867,5
|
||||
Q35666,2
|
||||
Q355304,3
|
||||
Q35509,2
|
||||
Q35112127,3
|
||||
Q34985575,4
|
||||
Q34876,5
|
||||
Q34763,2
|
||||
Q34627,4
|
||||
Q3455524,3
|
||||
Q34442,4
|
||||
Q33837,2
|
||||
Q33506,3
|
||||
Q32815,4
|
||||
Q3257686,2
|
||||
Q3240715,2
|
||||
Q3191695,5
|
||||
Q3153117,2
|
||||
Q30198,2
|
||||
Q30139652,3
|
||||
Q294422,3
|
||||
Q2870166,3
|
||||
Q27686,3
|
||||
Q274153,3
|
||||
Q271669,1
|
||||
Q2659904,2
|
||||
Q24529780,2
|
||||
Q24354,3
|
||||
Q2354973,4
|
||||
Q23442,2
|
||||
Q23413,3
|
||||
Q23397,3
|
||||
Q2327515,4
|
||||
Q2311958,5
|
||||
Q22927291,6
|
||||
Q22698,1
|
||||
Q2175765,4
|
||||
Q205495,4
|
||||
Q204832,3
|
||||
Q2042028,2
|
||||
Q202216,6
|
||||
Q1970725,3
|
||||
Q194203,5
|
||||
Q194195,2
|
||||
Q190429,2
|
||||
Q185187,3
|
||||
Q185113,2
|
||||
Q183366,2
|
||||
Q1799794,1
|
||||
Q1788454,4
|
||||
Q1785071,3
|
||||
Q1777138,3
|
||||
Q177634,2
|
||||
Q177380,2
|
||||
Q174814,4
|
||||
Q174782,2
|
||||
Q17350442,2
|
||||
Q17343829,3
|
||||
Q17334923,0
|
||||
Q17018380,3
|
||||
Q16970,4
|
||||
Q16917,3
|
||||
Q16831714,4
|
||||
Q165,3
|
||||
Q160742,4
|
||||
Q159719,3
|
||||
Q159334,4
|
||||
Q15640612,5
|
||||
Q15324,2
|
||||
Q15284,5
|
||||
Q15243209,6
|
||||
Q152081,1
|
||||
Q15195406,4
|
||||
Q1500350,5
|
||||
Q149621,5
|
||||
Q14757767,4
|
||||
Q14350,3
|
||||
Q1410668,3
|
||||
Q1394476,3
|
||||
Q1377575,2
|
||||
Q1353183,3
|
||||
Q134447,4
|
||||
Q133215,3
|
||||
Q133056,2
|
||||
Q13221722,3
|
||||
Q13220204,2
|
||||
Q1311958,4
|
||||
Q1303167,3
|
||||
Q130003,3
|
||||
Q12518,2
|
||||
Q12516,3
|
||||
Q1248784,3
|
||||
Q123705,3
|
||||
Q12323,3
|
||||
Q12284,4
|
||||
Q12280,4
|
||||
Q121359,2
|
||||
Q1210950,2
|
||||
Q11755880,3
|
||||
Q11707,3
|
||||
Q11315,3
|
||||
Q11303,3
|
||||
Q1115575,4
|
||||
Q1107656,1
|
||||
Q10864048,1
|
||||
Q1076486,2
|
||||
Q105731,3
|
||||
Q105190,3
|
||||
Q1048525,3
|
||||
Q102496,5
|
||||
Q28872924,1
|
||||
Q15617994,1
|
||||
Q159313,2
|
||||
Q24398318,3
|
||||
Q327333,2
|
||||
Q43229,1
|
||||
Q860861,1
|
||||
Q4989906,1
|
||||
|
195
data-sources/wikipedia-wikidata/wikidata_place_types.txt
Normal file
195
data-sources/wikipedia-wikidata/wikidata_place_types.txt
Normal file
@@ -0,0 +1,195 @@
|
||||
Q9842
|
||||
Q9430
|
||||
Q928830
|
||||
Q9259
|
||||
Q91028
|
||||
Q8514
|
||||
Q8502
|
||||
Q83405
|
||||
Q82794
|
||||
Q820477
|
||||
Q811979
|
||||
Q8072
|
||||
Q79007
|
||||
Q786014
|
||||
Q75848
|
||||
Q75520
|
||||
Q728937
|
||||
Q7275
|
||||
Q719456
|
||||
Q7075
|
||||
Q697295
|
||||
Q6852233
|
||||
Q682943
|
||||
Q665487
|
||||
Q655686
|
||||
Q643589
|
||||
Q641226
|
||||
Q631305
|
||||
Q6256
|
||||
Q6023295
|
||||
Q5773747
|
||||
Q56061
|
||||
Q55659167
|
||||
Q55488
|
||||
Q55465477
|
||||
Q54050
|
||||
Q532
|
||||
Q53060
|
||||
Q52177058
|
||||
Q515716
|
||||
Q5153984
|
||||
Q515
|
||||
Q5144960
|
||||
Q5119
|
||||
Q5107
|
||||
Q5084
|
||||
Q5031071
|
||||
Q5003624
|
||||
Q4989906
|
||||
Q4976993
|
||||
Q486972
|
||||
Q483110
|
||||
Q4830453
|
||||
Q47521
|
||||
Q473972
|
||||
Q46831
|
||||
Q46614560
|
||||
Q44782
|
||||
Q44613
|
||||
Q44539
|
||||
Q44494
|
||||
Q44377
|
||||
Q4421
|
||||
Q43501
|
||||
Q4286337
|
||||
Q42523
|
||||
Q41176
|
||||
Q40357
|
||||
Q4022
|
||||
Q40080
|
||||
Q39816
|
||||
Q39715
|
||||
Q39614
|
||||
Q3957
|
||||
Q3947
|
||||
Q3914
|
||||
Q38723
|
||||
Q38720
|
||||
Q3623867
|
||||
Q35666
|
||||
Q355304
|
||||
Q35509
|
||||
Q35112127
|
||||
Q34985575
|
||||
Q34876
|
||||
Q34763
|
||||
Q34627
|
||||
Q3455524
|
||||
Q34442
|
||||
Q33837
|
||||
Q33506
|
||||
Q32815
|
||||
Q3257686
|
||||
Q3240715
|
||||
Q3191695
|
||||
Q3153117
|
||||
Q30198
|
||||
Q30139652
|
||||
Q294422
|
||||
Q2870166
|
||||
Q27686
|
||||
Q274153
|
||||
Q271669
|
||||
Q2659904
|
||||
Q24529780
|
||||
Q24354
|
||||
Q2354973
|
||||
Q23442
|
||||
Q23413
|
||||
Q23397
|
||||
Q2327515
|
||||
Q2311958
|
||||
Q22927291
|
||||
Q22698
|
||||
Q2175765
|
||||
Q205495
|
||||
Q204832
|
||||
Q2042028
|
||||
Q202216
|
||||
Q1970725
|
||||
Q194203
|
||||
Q194195
|
||||
Q190429
|
||||
Q185187
|
||||
Q185113
|
||||
Q183366
|
||||
Q1799794
|
||||
Q1788454
|
||||
Q1785071
|
||||
Q1777138
|
||||
Q177634
|
||||
Q177380
|
||||
Q174814
|
||||
Q174782
|
||||
Q17350442
|
||||
Q17343829
|
||||
Q17334923
|
||||
Q17018380
|
||||
Q16970
|
||||
Q16917
|
||||
Q16831714
|
||||
Q165
|
||||
Q160742
|
||||
Q159719
|
||||
Q159334
|
||||
Q15640612
|
||||
Q15324
|
||||
Q15284
|
||||
Q15243209
|
||||
Q152081
|
||||
Q15195406
|
||||
Q1500350
|
||||
Q149621
|
||||
Q14757767
|
||||
Q14350
|
||||
Q1410668
|
||||
Q1394476
|
||||
Q1377575
|
||||
Q1353183
|
||||
Q134447
|
||||
Q133215
|
||||
Q133056
|
||||
Q13221722
|
||||
Q13220204
|
||||
Q1311958
|
||||
Q1303167
|
||||
Q130003
|
||||
Q12518
|
||||
Q12516
|
||||
Q1248784
|
||||
Q123705
|
||||
Q12323
|
||||
Q12284
|
||||
Q12280
|
||||
Q121359
|
||||
Q1210950
|
||||
Q11755880
|
||||
Q11707
|
||||
Q11315
|
||||
Q11303
|
||||
Q1115575
|
||||
Q1107656
|
||||
Q10864048
|
||||
Q1076486
|
||||
Q105731
|
||||
Q105190
|
||||
Q1048525
|
||||
Q102496
|
||||
Q28872924
|
||||
Q15617994
|
||||
Q159313
|
||||
Q24398318
|
||||
Q327333
|
||||
Q43229
|
||||
Q860861
|
||||
200
data-sources/wikipedia-wikidata/wikidata_places.md
Normal file
200
data-sources/wikipedia-wikidata/wikidata_places.md
Normal file
@@ -0,0 +1,200 @@
|
||||
|
||||
## Wikidata place types and related OSM Tags
|
||||
|
||||
Wikidata does not have any official ontologies, however the [DBpedia project](https://wiki.dbpedia.org/) has created an [ontology](https://wiki.dbpedia.org/services-resources/ontology) that covered [place types](http://mappings.dbpedia.org/server/ontology/classes/#Place). The table below used the DBpedia place ontology as a starting point, and is provided as a cross-reference to the relevant OSM tags.
|
||||
|
||||
The Wikidata place types listed in the table below can be used in conjunction with the [Wikidata Query Service](https://query.wikidata.org/) to retrieve instances of those place types from the Wikidata knowledgebase.
|
||||
|
||||
```
|
||||
SELECT ?item ?lat ?lon
|
||||
WHERE {
|
||||
?item wdt:P31*/wdt:P279*wd:Q9430; wdt:P625 ?pt.
|
||||
?item p:P625?loc.
|
||||
?loc psv:P625?cnode.
|
||||
?cnode wikibase:geoLatitude?lat.
|
||||
?cnode wikibase:geoLongitude?lon.
|
||||
}
|
||||
```
|
||||
|
||||
An example json return for all instances of the Wikidata item "Q9430" (Ocean) can be seen at [json](https://query.wikidata.org/bigdata/namespace/wdq/sparql?format=json&query=SELECT?item?lat?lon%20WHERE{?item%20wdt:P31*/wdt:P279*wd:Q9430;wdt:P625?pt.?item%20p:P625?loc.?loc%20psv:P625?cnode.?cnode%20wikibase:geoLatitude?lat.?cnode%20wikibase:geoLongitude?lon.})
|
||||
|
||||
**NOTE** the OSM tags listed are those listed in the wikidata entries, and not all the possible matches for tags within OSM.
|
||||
|
||||
|
||||
title | concept | OSM Tag |
|
||||
-----------|---------------------------------------|------------------|
|
||||
[Q17334923](https://www.wikidata.org/entity/Q17334923) | Location | |
|
||||
[Q811979](https://www.wikidata.org/entity/Q811979) | Architectural Structure | |
|
||||
[Q194195](https://www.wikidata.org/entity/Q194195) | Amusement park |
|
||||
[Q204832](https://www.wikidata.org/entity/Q204832) | Roller coaster | [attraction=roller_coaster](https://wiki.openstreetmap.org/wiki/Tag:attraction=roller_coaster) |
|
||||
[Q2870166](https://www.wikidata.org/entity/Q2870166) | Water ride | |
|
||||
[Q641226](https://www.wikidata.org/entity/Q641226) | Arena | [amenity=events_centre](https://wiki.openstreetmap.org/wiki/Tag:amenity=events_centre) |
|
||||
[Q41176](https://www.wikidata.org/entity/Q41176) | Building | [building=yes](https://wiki.openstreetmap.org/wiki/Key:building) |
|
||||
[Q1303167](https://www.wikidata.org/entity/Q1303167) | Barn | [building=barn](https://wiki.openstreetmap.org/wiki/Tag:building=barn) |
|
||||
[Q655686](https://www.wikidata.org/entity/Q655686) | Commercial building | [building=commercial](https://wiki.openstreetmap.org/wiki/Tag:building=commercial) |
|
||||
[Q4830453](https://www.wikidata.org/entity/Q4830453) | Business | |
|
||||
[Q7075](https://www.wikidata.org/entity/Q7075) | Library | [amenity=library](https://wiki.openstreetmap.org/wiki/Tag:amenity=library) |
|
||||
[Q133215](https://www.wikidata.org/entity/Q133215) | Casino | [amenity=casino](https://wiki.openstreetmap.org/wiki/Tag:amenity=casino) |
|
||||
[Q23413](https://www.wikidata.org/entity/Q23413) | Castle | [historic=castle](https://wiki.openstreetmap.org/wiki/Tag:historic=castle) |
|
||||
[Q83405](https://www.wikidata.org/entity/Q83405) | Factory | |
|
||||
[Q53060](https://www.wikidata.org/entity/Q53060) | Gate | [barrier=gate](https://wiki.openstreetmap.org/wiki/Tag:barrier=gate) |cnode%20wikibase:geoLatitude?lat.?cnode%20wikibase:geoLongitude?lon.})
|
||||
[Q11755880](https://www.wikidata.org/entity/Q11755880) | Residential Building | [building=residential](https://wiki.openstreetmap.org/wiki/Tag:building=residential) |
|
||||
[Q3947](https://www.wikidata.org/entity/Q3947) | House | [building=house](https://wiki.openstreetmap.org/wiki/Tag:building=house) |
|
||||
[Q35112127](https://www.wikidata.org/entity/Q35112127) | Historic Building | |
|
||||
[Q5773747](https://www.wikidata.org/entity/Q5773747) | Historic house | |
|
||||
[Q38723](https://www.wikidata.org/entity/Q38723) | Higher Education Institution |
|
||||
[Q3914](https://www.wikidata.org/entity/Q3914) | School | [amenity=school](https://wiki.openstreetmap.org/wiki/Tag:amenity=school) |
|
||||
[Q9842](https://www.wikidata.org/entity/Q9842) | Primary school | |
|
||||
[Q159334](https://www.wikidata.org/entity/Q159334) | Secondary school | |
|
||||
[Q16917](https://www.wikidata.org/entity/Q16917) | Hospital | [amenity=hospital](https://wiki.openstreetmap.org/wiki/Tag:amenity=hospital), [healthcare=hospital](https://wiki.openstreetmap.org/wiki/Tag:healthcare=hospital), [building=hospital](https://wiki.openstreetmap.org/wiki/Tag:building=hospital) |
|
||||
[Q27686](https://www.wikidata.org/entity/Q27686) | Hotel | [tourism=hotel](https://wiki.openstreetmap.org/wiki/Tag:tourism=hotel), [building=hotel](https://wiki.openstreetmap.org/wiki/Tag:building=hotel) |
|
||||
[Q33506](https://www.wikidata.org/entity/Q33506) | Museum | [tourism=museum](https://wiki.openstreetmap.org/wiki/Tag:tourism=museum) |
|
||||
[Q40357](https://www.wikidata.org/entity/Q40357) | Prison | [amenity=prison](https://wiki.openstreetmap.org/wiki/Tag:amenity=prison) |
|
||||
[Q24398318](https://www.wikidata.org/entity/Q24398318) | Religious Building | |
|
||||
[Q160742](https://www.wikidata.org/entity/Q160742) | Abbey | |
|
||||
[Q16970](https://www.wikidata.org/entity/Q16970) | Church (building) | [building=church](https://wiki.openstreetmap.org/wiki/Tag:building=church) |
|
||||
[Q44613](https://www.wikidata.org/entity/Q44613) | Monastery | [amenity=monastery](https://wiki.openstreetmap.org/wiki/Tag:amenity=monastery) |
|
||||
[Q32815](https://www.wikidata.org/entity/Q32815) | Mosque | [building=mosque](https://wiki.openstreetmap.org/wiki/Tag:building=mosque) |
|
||||
[Q697295](https://www.wikidata.org/entity/Q697295) | Shrine | [building=shrine](https://wiki.openstreetmap.org/wiki/Tag:building=shrine) |
|
||||
[Q34627](https://www.wikidata.org/entity/Q34627) | Synagogue | [building=synagogue](https://wiki.openstreetmap.org/wiki/Tag:building=synagogue) |
|
||||
[Q44539](https://www.wikidata.org/entity/Q44539) | Temple | [building=temple](https://wiki.openstreetmap.org/wiki/Tag:building=temple) |
|
||||
[Q11707](https://www.wikidata.org/entity/Q11707) | Restaurant | [amenity=restaurant](https://wiki.openstreetmap.org/wiki/Tag:amenity=restaurant) |
|
||||
[Q11315](https://www.wikidata.org/entity/Q11315) | Shopping mall | [shop=mall](https://wiki.openstreetmap.org/wiki/Tag:shop=mall), [shop=shopping_centre](https://wiki.openstreetmap.org/wiki/Tag:shop=shopping_centre) |
|
||||
[Q11303](https://www.wikidata.org/entity/Q11303) | Skyscraper | |
|
||||
[Q17350442](https://www.wikidata.org/entity/Q17350442) | Venue | |
|
||||
[Q41253](https://www.wikidata.org/entity/Q41253) | Movie Theater | [amenity=cinema](https://wiki.openstreetmap.org/wiki/Tag:amenity=cinema) |
|
||||
[Q483110](https://www.wikidata.org/entity/Q483110) | Stadium | [leisure=stadium](https://wiki.openstreetmap.org/wiki/Tag:leisure=stadium), [building=stadium](https://wiki.openstreetmap.org/wiki/Tag:building=stadium) |
|
||||
[Q24354](https://www.wikidata.org/entity/Q24354) | Theater (structure) | [amenity=theatre](https://wiki.openstreetmap.org/wiki/Tag:amenity=theatre) |
|
||||
[Q121359](https://www.wikidata.org/entity/Q121359) | Infrastructure | |
|
||||
[Q1248784](https://www.wikidata.org/entity/Q1248784) | Airport | |
|
||||
[Q12323](https://www.wikidata.org/entity/Q12323) | Dam | [waterway=dam](https://wiki.openstreetmap.org/wiki/Tag:waterway=dam) |
|
||||
[Q1353183](https://www.wikidata.org/entity/Q1353183) | Launch pad | |
|
||||
[Q105190](https://www.wikidata.org/entity/Q105190) | Levee | [man_made=dyke](https://wiki.openstreetmap.org/wiki/Tag:man_made=dyke) |
|
||||
[Q105731](https://www.wikidata.org/entity/Q105731) | Lock (water navigation) | [lock=yes](https://wiki.openstreetmap.org/wiki/Key:lock) |
|
||||
[Q44782](https://www.wikidata.org/entity/Q44782) | Port | |
|
||||
[Q159719](https://www.wikidata.org/entity/Q159719) | Power station | [power=plant](https://wiki.openstreetmap.org/wiki/Tag:power=plant) |
|
||||
[Q174814](https://www.wikidata.org/entity/Q174814) | Electrical substation | |
|
||||
[Q134447](https://www.wikidata.org/entity/Q134447) | Nuclear power plant | [plant:source=nuclear](https://wiki.openstreetmap.org/wiki/Tag:plant:source=nuclear) |
|
||||
[Q786014](https://www.wikidata.org/entity/Q786014) | Rest area | [highway=rest_area](https://wiki.openstreetmap.org/wiki/Tag:highway=rest_area), [highway=services](https://wiki.openstreetmap.org/wiki/Tag:highway=services) |
|
||||
[Q12280](https://www.wikidata.org/entity/Q12280) | Bridge | [bridge=* ](https://wiki.openstreetmap.org/wiki/Key:bridge), [man_made=bridge](https://wiki.openstreetmap.org/wiki/Tag:man_made=bridge) |
|
||||
[Q728937](https://www.wikidata.org/entity/Q728937) | Railroad Line | [railway=rail](https://wiki.openstreetmap.org/wiki/Tag:railway=rail) |
|
||||
[Q1311958](https://www.wikidata.org/entity/Q1311958) | Railway Tunnel | |
|
||||
[Q34442](https://www.wikidata.org/entity/Q34442) | Road | [highway=* ](https://wiki.openstreetmap.org/wiki/Key:highway), [route=road](https://wiki.openstreetmap.org/wiki/Tag:route=road) |
|
||||
[Q1788454](https://www.wikidata.org/entity/Q1788454) | Road junction | |
|
||||
[Q44377](https://www.wikidata.org/entity/Q44377) | Tunnel | [tunnel=* ](https://wiki.openstreetmap.org/wiki/Key:tunnel) |
|
||||
[Q5031071](https://www.wikidata.org/entity/Q5031071) | Canal tunnel | |
|
||||
[Q719456](https://www.wikidata.org/entity/Q719456) | Station | [public_transport=station](https://wiki.openstreetmap.org/wiki/Tag:public_transport=station) |
|
||||
[Q205495](https://www.wikidata.org/entity/Q205495) | Filling station | [amenity=fuel](https://wiki.openstreetmap.org/wiki/Tag:amenity=fuel) |
|
||||
[Q928830](https://www.wikidata.org/entity/Q928830) | Metro station | [station=subway](https://wiki.openstreetmap.org/wiki/Tag:station=subway) |
|
||||
[Q55488](https://www.wikidata.org/entity/Q55488) | Train station | [railway=station](https://wiki.openstreetmap.org/wiki/Tag:railway=station) |
|
||||
[Q2175765](https://www.wikidata.org/entity/Q2175765) | Tram stop | [railway=tram_stop](https://wiki.openstreetmap.org/wiki/Tag:railway=tram_stop), [public_transport=stop_position](https://wiki.openstreetmap.org/wiki/Tag:public_transport=stop_position) |
|
||||
[Q6852233](https://www.wikidata.org/entity/Q6852233) | Military building | |
|
||||
[Q44494](https://www.wikidata.org/entity/Q44494) | Mill (grinding) | |
|
||||
[Q185187](https://www.wikidata.org/entity/Q185187) | Watermill | [man_made=watermill](https://wiki.openstreetmap.org/wiki/Tag:man_made=watermill) |
|
||||
[Q38720](https://www.wikidata.org/entity/Q38720) | Windmill | [man_made=windmill](https://wiki.openstreetmap.org/wiki/Tag:man_made=windmill) |
|
||||
[Q4989906](https://www.wikidata.org/entity/Q4989906) | Monument | [historic=monument](https://wiki.openstreetmap.org/wiki/Tag:historic=monument) |
|
||||
[Q5003624](https://www.wikidata.org/entity/Q5003624) | Memorial | [historic=memorial](https://wiki.openstreetmap.org/wiki/Tag:historic=memorial) |
|
||||
[Q271669](https://www.wikidata.org/entity/Q271669) | Landform | |
|
||||
[Q190429](https://www.wikidata.org/entity/Q190429) | Depression (geology) | |
|
||||
[Q17018380](https://www.wikidata.org/entity/Q17018380) | Bight (geography) | |
|
||||
[Q54050](https://www.wikidata.org/entity/Q54050) | Hill | |
|
||||
[Q1210950](https://www.wikidata.org/entity/Q1210950) | Channel (geography) | |
|
||||
[Q23442](https://www.wikidata.org/entity/Q23442) | Island | [place=island](https://wiki.openstreetmap.org/wiki/Tag:place=island) |
|
||||
[Q42523](https://www.wikidata.org/entity/Q42523) | Atoll | |
|
||||
[Q34763](https://www.wikidata.org/entity/Q34763) | Peninsula | |
|
||||
[Q355304](https://www.wikidata.org/entity/Q355304) | Watercourse | |
|
||||
[Q30198](https://www.wikidata.org/entity/Q30198) | Marsh | [wetland=marsh](https://wiki.openstreetmap.org/wiki/Tag:wetland=marsh) |
|
||||
[Q75520](https://www.wikidata.org/entity/Q75520) | Plateau | |
|
||||
[Q2042028](https://www.wikidata.org/entity/Q2042028) | Ravine | |
|
||||
[Q631305](https://www.wikidata.org/entity/Q631305) | Rock formation | |
|
||||
[Q12516](https://www.wikidata.org/entity/Q12516) | Pyramid | |
|
||||
[Q1076486](https://www.wikidata.org/entity/Q1076486) | Sports venue | |
|
||||
[Q682943](https://www.wikidata.org/entity/Q682943) | Cricket field | [sport=cricket](https://wiki.openstreetmap.org/wiki/Tag:sport=cricket) |
|
||||
[Q1048525](https://www.wikidata.org/entity/Q1048525) | Golf course | [leisure=golf_course](https://wiki.openstreetmap.org/wiki/Tag:leisure=golf_course) |
|
||||
[Q1777138](https://www.wikidata.org/entity/Q1777138) | Race track | [highway=raceway](https://wiki.openstreetmap.org/wiki/Tag:highway=raceway) |
|
||||
[Q130003](https://www.wikidata.org/entity/Q130003) | Ski resort | |
|
||||
[Q174782](https://www.wikidata.org/entity/Q174782) | Town square | [place=square](https://wiki.openstreetmap.org/wiki/Tag:place=square) |
|
||||
[Q12518](https://www.wikidata.org/entity/Q12518) | Tower | [building=tower](https://wiki.openstreetmap.org/wiki/Tag:building=tower), [man_made=tower](https://wiki.openstreetmap.org/wiki/Tag:man_made=tower) |
|
||||
[Q39715](https://www.wikidata.org/entity/Q39715) | Lighthouse | [man_made=lighthouse](https://wiki.openstreetmap.org/wiki/Tag:man_made=lighthouse) |
|
||||
[Q274153](https://www.wikidata.org/entity/Q274153) | Water tower | [building=water_tower](https://wiki.openstreetmap.org/wiki/Tag:building=water_tower), [man_made=water_tower](https://wiki.openstreetmap.org/wiki/Tag:man_made=water_tower) |
|
||||
[Q43501](https://www.wikidata.org/entity/Q43501) | Zoo | [tourism=zoo](https://wiki.openstreetmap.org/wiki/Tag:tourism=zoo) |
|
||||
[Q39614](https://www.wikidata.org/entity/Q39614) | Cemetery | [amenity=grave_yard](https://wiki.openstreetmap.org/wiki/Tag:amenity=grave_yard), [landuse=cemetery](https://wiki.openstreetmap.org/wiki/Tag:landuse=cemetery) |
|
||||
[Q152081](https://www.wikidata.org/entity/Q152081) | Concentration camp | |
|
||||
[Q1107656](https://www.wikidata.org/entity/Q1107656) | Garden | [leisure=garden](https://wiki.openstreetmap.org/wiki/Tag:leisure=garden) |
|
||||
[Q820477](https://www.wikidata.org/entity/Q820477) | Mine | |
|
||||
[Q33837](https://www.wikidata.org/entity/Q33837) | Archipelago | [place=archipelago](https://wiki.openstreetmap.org/wiki/Tag:place=archipelago) |
|
||||
[Q40080](https://www.wikidata.org/entity/Q40080) | Beach | [natural=beach](https://wiki.openstreetmap.org/wiki/Tag:natural=beach) |
|
||||
[Q15324](https://www.wikidata.org/entity/Q15324) | Body of water | [natural=water](https://wiki.openstreetmap.org/wiki/Tag:natural=water) |
|
||||
[Q23397](https://www.wikidata.org/entity/Q23397) | Lake | [water=lake](https://wiki.openstreetmap.org/wiki/Tag:water=lake) |
|
||||
[Q9430](https://www.wikidata.org/entity/Q9430) | Ocean | |
|
||||
[Q165](https://www.wikidata.org/entity/Q165) | Sea | |
|
||||
[Q47521](https://www.wikidata.org/entity/Q47521) | Stream | |
|
||||
[Q12284](https://www.wikidata.org/entity/Q12284) | Canal | [waterway=canal](https://wiki.openstreetmap.org/wiki/Tag:waterway=canal) |
|
||||
[Q4022](https://www.wikidata.org/entity/Q4022) | River | [waterway=river](https://wiki.openstreetmap.org/wiki/Tag:waterway=river), [type=waterway](https://wiki.openstreetmap.org/wiki/Relation:waterway) |
|
||||
[Q185113](https://www.wikidata.org/entity/Q185113) | Cape | [natural=cape](https://wiki.openstreetmap.org/wiki/Tag:natural=cape) |
|
||||
[Q35509](https://www.wikidata.org/entity/Q35509) | Cave | [natural=cave_entrance](https://wiki.openstreetmap.org/wiki/Tag:natural=cave_entrance) |
|
||||
[Q8514](https://www.wikidata.org/entity/Q8514) | Desert | |
|
||||
[Q4421](https://www.wikidata.org/entity/Q4421) | Forest | [natural=wood](https://wiki.openstreetmap.org/wiki/Tag:natural=wood) |
|
||||
[Q35666](https://www.wikidata.org/entity/Q35666) | Glacier | [natural=glacier](https://wiki.openstreetmap.org/wiki/Tag:natural=glacier) |
|
||||
[Q177380](https://www.wikidata.org/entity/Q177380) | Hot spring | |
|
||||
[Q8502](https://www.wikidata.org/entity/Q8502) | Mountain | [natural=peak](https://wiki.openstreetmap.org/wiki/Tag:natural=peak) |
|
||||
[Q133056](https://www.wikidata.org/entity/Q133056) | Mountain pass | |
|
||||
[Q46831](https://www.wikidata.org/entity/Q46831) | Mountain range | |
|
||||
[Q39816](https://www.wikidata.org/entity/Q39816) | Valley | [natural=valley](https://wiki.openstreetmap.org/wiki/Tag:natural=valley) |
|
||||
[Q8072](https://www.wikidata.org/entity/Q8072) | Volcano | [natural=volcano](https://wiki.openstreetmap.org/wiki/Tag:natural=volcano) |
|
||||
[Q43229](https://www.wikidata.org/entity/Q43229) | Organization | |
|
||||
[Q327333](https://www.wikidata.org/entity/Q327333) | Government agency | [office=government](https://wiki.openstreetmap.org/wiki/Tag:office=government)|
|
||||
[Q22698](https://www.wikidata.org/entity/Q22698) | Park | [leisure=park](https://wiki.openstreetmap.org/wiki/Tag:leisure=park) |
|
||||
[Q159313](https://www.wikidata.org/entity/Q159313) | Urban agglomeration | |
|
||||
[Q177634](https://www.wikidata.org/entity/Q177634) | Community | |
|
||||
[Q5107](https://www.wikidata.org/entity/Q5107) | Continent | [place=continent](https://wiki.openstreetmap.org/wiki/Tag:place=continent) |
|
||||
[Q6256](https://www.wikidata.org/entity/Q6256) | Country | [place=country](https://wiki.openstreetmap.org/wiki/Tag:place=country) |
|
||||
[Q75848](https://www.wikidata.org/entity/Q75848) | Gated community | |
|
||||
[Q3153117](https://www.wikidata.org/entity/Q3153117) | Intercommunality | |
|
||||
[Q82794](https://www.wikidata.org/entity/Q82794) | Region | |
|
||||
[Q56061](https://www.wikidata.org/entity/Q56061) | Administrative division | [boundary=administrative](https://wiki.openstreetmap.org/wiki/Tag:boundary=administrative) |
|
||||
[Q665487](https://www.wikidata.org/entity/Q665487) | Diocese | |
|
||||
[Q4976993](https://www.wikidata.org/entity/Q4976993) | Parish | [boundary=civil_parish](https://wiki.openstreetmap.org/wiki/Tag:boundary=civil_parish) |
|
||||
[Q194203](https://www.wikidata.org/entity/Q194203) | Arrondissements of France | |
|
||||
[Q91028](https://www.wikidata.org/entity/Q91028) | Arrondissements of Belgium | |
|
||||
[Q3623867](https://www.wikidata.org/entity/Q3623867) | Arrondissements of Benin | |
|
||||
[Q2311958](https://www.wikidata.org/entity/Q2311958) | Canton (country subdivision) | [political_division=canton](https://wiki.openstreetmap.org/wiki/FR:Cantons_in_France) |
|
||||
[Q643589](https://www.wikidata.org/entity/Q643589) | Department | |
|
||||
[Q202216](https://www.wikidata.org/entity/Q202216) | Overseas department and region | |
|
||||
[Q149621](https://www.wikidata.org/entity/Q149621) | District | [place=district](https://wiki.openstreetmap.org/wiki/Tag:place=district) |
|
||||
[Q15243209](https://www.wikidata.org/wiki/Q15243209) | Historic district | |
|
||||
[Q5144960](https://www.wikidata.org/entity/Q5144960) | Microregion | |
|
||||
[Q15284](https://www.wikidata.org/entity/Q15284) | Municipality | |
|
||||
[Q515716](https://www.wikidata.org/entity/Q515716) | Prefecture | |
|
||||
[Q34876](https://www.wikidata.org/entity/Q34876) | Province | |
|
||||
[Q3191695](https://www.wikidata.org/entity/Q3191695) | Regency (Indonesia) | |
|
||||
[Q1970725](https://www.wikidata.org/entity/Q1970725) | Natural region | |
|
||||
[Q486972](https://www.wikidata.org/entity/Q486972) | Human settlement | |
|
||||
[Q515](https://www.wikidata.org/entity/Q515) | City | [place=city](https://wiki.openstreetmap.org/wiki/Tag:place=city) |
|
||||
[Q5119](https://www.wikidata.org/entity/Q5119) | Capital city | [capital=yes](https://wiki.openstreetmap.org/wiki/Key:capital) |
|
||||
[Q4286337](https://www.wikidata.org/entity/Q4286337) | City district | |
|
||||
[Q1394476](https://www.wikidata.org/entity/Q1394476) | Civil township | |
|
||||
[Q1115575](https://www.wikidata.org/entity/Q1115575) | Civil parish | [designation=civil_parish](https://wiki.openstreetmap.org/wiki/Tag:designation=civil_parish) |
|
||||
[Q5153984](https://www.wikidata.org/entity/Q5153984) | Commune-level subdivisions | |
|
||||
[Q123705](https://www.wikidata.org/entity/Q123705) | Neighbourhood | [place=neighbourhood](https://wiki.openstreetmap.org/wiki/Tag:place=neighbourhood) |
|
||||
[Q1500350](https://www.wikidata.org/entity/Q1500350) | Townships of China | |
|
||||
[Q17343829](https://www.wikidata.org/entity/Q17343829) | Unincorporated Community | |
|
||||
[Q3957](https://www.wikidata.org/entity/Q3957) | Town | [place=town](https://wiki.openstreetmap.org/wiki/Tag:place=town) |
|
||||
[Q532](https://www.wikidata.org/entity/Q532) | Village | [place=village](https://wiki.openstreetmap.org/wiki/Tag:place=village) |
|
||||
[Q5084](https://www.wikidata.org/entity/Q5084) | Hamlet | [place=hamlet](https://wiki.openstreetmap.org/wiki/Tag:place=hamlet) |
|
||||
[Q7275](https://www.wikidata.org/entity/Q7275) | State | |
|
||||
[Q79007](https://www.wikidata.org/entity/Q79007) | Street | |
|
||||
[Q473972](https://www.wikidata.org/entity/Q473972) | Protected area | [boundary=protected_area](https://wiki.openstreetmap.org/wiki/Tag:boundary=protected_area) |
|
||||
[Q1377575](https://www.wikidata.org/entity/Q1377575) | Wildlife refuge | |
|
||||
[Q1410668](https://www.wikidata.org/entity/Q1410668) | National Wildlife Refuge | [protection_title=National Wildlife Refuge](ownership=national), [ownership=national](https://wiki.openstreetmap.org/wiki/Tag:ownership=national)|
|
||||
[Q9259](https://www.wikidata.org/entity/Q9259) | World Heritage Site | |
|
||||
|
||||
---
|
||||
|
||||
### Future Work
|
||||
|
||||
The Wikidata improvements to Nominatim can be further enhanced by:
|
||||
|
||||
- continuing to add new Wikidata links to OSM objects
|
||||
- increasing the number of place types accounted for in the wikipedia_articles table
|
||||
- working to use place types in the wikipedia_article matching process
|
||||
Reference in New Issue
Block a user