Irchel Geoparser
The Irchel Geoparser is a Python library for identifying place names in unstructured text and linking them to geographic locations. It provides a modular platform for geoparsing that supports custom processing strategies, persistent project-based workflows, and configuration-driven gazetteer integration.
Overview
Geoparsing extracts place names from text and links them to geographic locations. The Irchel Geoparser approaches this task through a two-stage pipeline that separates toponym recognition (identifying place names) from toponym resolution (linking them to specific locations). This separation is a deliberate design choice that enables flexible experimentation with different processing strategies and systematic comparison of their performance. Identified toponyms are linked to gazetteer databases that provide rich geographic metadata including coordinates, administrative hierarchies, feature types, and population information.
Key Features
Project-Based Workflows: Documents and processing results are stored in a persistent database, enabling long-term research and comparative analysis
Modular Architecture: Pluggable recognizer and resolver modules can be mixed, matched, and extended by implementing well-defined interfaces
Trainable Modules: Recognizers and resolvers can be fine-tuned on annotated data to improve performance for specific domains or languages
Custom Gazetteers: Arbitrary geographic databases can be integrated through YAML configuration files that describe data sources and transformations
Getting Started
To begin using the Irchel Geoparser, follow the Installation guide to set up the library and install a gazetteer. Then proceed to the Quickstart guide for a simple example of parsing text and accessing results. For more advanced usage, explore the user guides that cover Projects, Modules, Training, and Gazetteers.
Demo
Discover what is possible with the Irchel Geoparser. Our Demo page showcases an interactive visualization of place names mentioned in Jules Verne’s “Around the World in Eighty Days”. The demo includes a complete Jupyter notebook and Docker setup so you can reproduce the analysis yourself.
Contributing
The Irchel Geoparser is an open-source project, and contributions are welcome. If you encounter issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.
Acknowledgments
The Irchel Geoparser originated as part of my Master’s thesis and was further developed with support from the Department of Geography at the University of Zurich and the Public Data Lab of the Digitalization Initiative of the Zurich Higher Education Institutions. I thank Prof. Dr. Ross Purves for the opportunity to continue this work as part of a research project.
License
The Irchel Geoparser is released under the MIT License. It also uses several third-party libraries, each with its own license. For a complete list of these licenses, see the full license details in the repository.