World-POI: Global Point-of-Interest Data Enriched from Foursquare and OpenStreetMap as Tabular and Graph Data
2510.21342v1
cs.DB, cs.AI, cs.CG, cs.CY, cs.SI
2025-10-28
Авторы:
Hossein Amiri, Mohammad Hashemi, Andreas Züfle
Abstract
Recently, Foursquare released a global dataset with more than 100 million
points of interest (POIs), each representing a real-world business on its
platform. However, many entries lack complete metadata such as addresses or
categories, and some correspond to non-existent or fictional locations. In
contrast, OpenStreetMap (OSM) offers a rich, user-contributed POI dataset with
detailed and frequently updated metadata, though it does not formally verify
whether a POI represents an actual business. In this data paper, we present a
methodology that integrates the strengths of both datasets: Foursquare as a
comprehensive baseline of commercial POIs and OSM as a source of enriched
metadata. The combined dataset totals approximately 1 TB. While this full
version is not publicly released, we provide filtered releases with adjustable
thresholds that reduce storage needs and make the data practical to download
and use across domains. We also provide step-by-step instructions to reproduce
the full 631 GB build. Record linkage is achieved by computing name similarity
scores and spatial distances between Foursquare and OSM POIs. These measures
identify and retain high-confidence matches that correspond to real businesses
in Foursquare, have representations in OSM, and show strong name similarity.
Finally, we use this filtered dataset to construct a graph-based representation
of POIs enriched with attributes from both sources, enabling advanced spatial
analyses and a range of downstream applications.