Quickstart
This guide provides a quick introduction to using the Irchel Geoparser for basic geoparsing tasks. After following the Installation guide, you can start parsing text with just a few lines of code.
Basic Usage
The simplest way to use the library is through the Geoparser class, which provides a stateless interface for quick geoparsing tasks. The default settings are optimized for English texts, prioritizing speed over accuracy. See the Customizing the Geoparser section below for other options.
Here’s a minimal working example:
from geoparser import Geoparser
# Initialize the geoparser with default settings
geoparser = Geoparser()
# Parse a text
text = "The Eiffel Tower in Paris attracts millions of visitors each year."
documents = geoparser.parse(text)
# Access the results
for doc in documents:
print(f"Document: {doc.text}\n")
for toponym in doc.toponyms:
print(f" Toponym: {toponym.text}")
if toponym.location:
location = toponym.location
print(f" Name: {location.data.get('name')}")
print(f" Country: {location.data.get('country_name')}")
print(f" Coordinates: ({location.data.get('latitude')}, {location.data.get('longitude')})")
else:
print(" Location: Could not be resolved")
print()
This code identifies place names in the text and links them to geographic locations in the GeoNames gazetteer. The output might look like:
Document: The Eiffel Tower in Paris attracts millions of visitors each year.
Toponym: Eiffel Tower
Name: Eiffel Tower
Country: France
Coordinates: (48.85837, 2.29448)
Toponym: Paris
Name: Paris
Country: France
Coordinates: (48.85341, 2.3488)
Processing Multiple Documents
The parse() method accepts both a single text string and a list of texts. Processing multiple documents together enables efficient batch processing:
from geoparser import Geoparser
geoparser = Geoparser()
texts = [
"London is the capital of the United Kingdom.",
"Tokyo is Japan's largest city.",
"The Statue of Liberty stands in New York Harbor."
]
documents = geoparser.parse(texts)
for i, doc in enumerate(documents, 1):
print(f"Document {i}:")
for toponym in doc.toponyms:
if toponym.location:
print(f" - {toponym.text} → {toponym.location.data.get('name')}")
print()
Understanding the Results
The parse() method returns a list of Document objects, each representing one of the input texts. Each document has a toponyms property that provides access to the identified place names (references) within that document.
Each toponym (Reference object) has several important properties:
text: The actual text of the place name as it appears in the documentstart: The starting character position in the document textend: The ending character position in the document textlocation: The resolved geographic entity (aFeatureobject), orNoneif the toponym is unresolved
When a toponym is successfully resolved, its location property contains a Feature object with geographic information. The feature has two main properties:
data: A dictionary containing attributes from the gazetteer. For GeoNames, common attributes includename,country_name,latitude,longitude,population,feature_name(the type of place), and various administrative divisions.geometry: A Shapely geometry object representing the feature’s spatial extent (typically a Point for most gazetteers, but can be polygons or other geometry types).
Working with Unresolved Toponyms
Not all identified place names can be successfully linked to geographic locations. Always check if the location is None before accessing its attributes:
from geoparser import Geoparser
geoparser = Geoparser()
documents = geoparser.parse("They traveled from Atlantis to Wonderland.")
for doc in documents:
for toponym in doc.toponyms:
print(f"Toponym: {toponym.text}")
if toponym.location:
print(f" Resolved to: {toponym.location.data.get('name')}")
else:
print(" Could not be resolved (fictional location)")
Customizing the Geoparser
The default Geoparser() uses a spaCy model for recognition and a SentenceTransformer model for resolution. You can customize these components by providing your own module instances:
from geoparser import Geoparser
from geoparser.modules import SpacyRecognizer, SentenceTransformerResolver
# Use a more accurate spaCy model
recognizer = SpacyRecognizer(model_name="en_core_web_trf")
# Use a different gazetteer
resolver = SentenceTransformerResolver(gazetteer_name="swissnames3d")
geoparser = Geoparser(recognizer=recognizer, resolver=resolver)
documents = geoparser.parse("Zurich is the largest city in Switzerland.")
For more details on working with different modules, see the Modules guide.
Persisting Results
By default, the parse() method creates a temporary project internally and deletes it after returning the results. If you want to keep the results for later analysis, use the save=True parameter:
from geoparser import Geoparser
geoparser = Geoparser()
documents = geoparser.parse("Berlin is the capital of Germany.", save=True)
# Results saved under project name: a1b2c3d4
When save=True, the method prints the project name that was created. You can later access these results using the Project class, as described in the Projects guide.
Next Steps
This quickstart covered the basics of using the Irchel Geoparser for simple tasks. To learn more about advanced features, explore these guides:
Projects - Persistent workspaces for research and analysis
Modules - Using and creating custom recognizers and resolvers
Training - Fine-tuning models on your own data
Gazetteers - Working with different geographic databases
For complete API documentation, see the Geoparser reference.