Quickstart

This guide provides a quick introduction to using the Irchel Geoparser for basic geoparsing tasks. After following the Installation guide, you can start parsing text with just a few lines of code.

Basic Usage

The simplest way to use the library is through the Geoparser class, which provides a stateless interface for quick geoparsing tasks. The default settings are optimized for English texts, prioritizing speed over accuracy. See the Customizing the Geoparser section below for other options.

Here’s a minimal working example:

from geoparser import Geoparser

# Initialize the geoparser with default settings
geoparser = Geoparser()

# Parse a text
text = "The Eiffel Tower in Paris attracts millions of visitors each year."
documents = geoparser.parse(text)

# Access the results
for doc in documents:
    print(f"Document: {doc.text}\n")
    for toponym in doc.toponyms:
        print(f"  Toponym: {toponym.text}")
        if toponym.location:
            location = toponym.location
            print(f"    Name: {location.data.get('name')}")
            print(f"    Country: {location.data.get('country_name')}")
            print(f"    Coordinates: ({location.data.get('latitude')}, {location.data.get('longitude')})")
        else:
            print("    Location: Could not be resolved")
        print()

This code identifies place names in the text and links them to geographic locations in the GeoNames gazetteer. The output might look like:

Document: The Eiffel Tower in Paris attracts millions of visitors each year.

  Toponym: Eiffel Tower
    Name: Eiffel Tower
    Country: France
    Coordinates: (48.85837, 2.29448)

  Toponym: Paris
    Name: Paris
    Country: France
    Coordinates: (48.85341, 2.3488)

Processing Multiple Documents

The parse() method accepts both a single text string and a list of texts. Processing multiple documents together enables efficient batch processing:

from geoparser import Geoparser

geoparser = Geoparser()

texts = [
    "London is the capital of the United Kingdom.",
    "Tokyo is Japan's largest city.",
    "The Statue of Liberty stands in New York Harbor."
]

documents = geoparser.parse(texts)

for i, doc in enumerate(documents, 1):
    print(f"Document {i}:")
    for toponym in doc.toponyms:
        if toponym.location:
            print(f"  - {toponym.text} → {toponym.location.data.get('name')}")
    print()

Understanding the Results

The parse() method returns a list of Document objects, each representing one of the input texts. Each document has a toponyms property that provides access to the identified place names (references) within that document.

Each toponym (Reference object) has several important properties:

text: The actual text of the place name as it appears in the document
start: The starting character position in the document text
end: The ending character position in the document text
location: The resolved geographic entity (a Feature object), or None if the toponym is unresolved

When a toponym is successfully resolved, its location property contains a Feature object with geographic information. The feature has two main properties:

data: A dictionary containing attributes from the gazetteer. For GeoNames, common attributes include name, country_name, latitude, longitude, population, feature_name (the type of place), and various administrative divisions.
geometry: A Shapely geometry object representing the feature’s spatial extent (typically a Point for most gazetteers, but can be polygons or other geometry types).

Working with Unresolved Toponyms

Not all identified place names can be successfully linked to geographic locations. Always check if the location is None before accessing its attributes:

from geoparser import Geoparser

geoparser = Geoparser()
documents = geoparser.parse("They traveled from Atlantis to Wonderland.")

for doc in documents:
    for toponym in doc.toponyms:
        print(f"Toponym: {toponym.text}")
        if toponym.location:
            print(f"  Resolved to: {toponym.location.data.get('name')}")
        else:
            print("  Could not be resolved (fictional location)")

Customizing the Geoparser

The default Geoparser() uses a spaCy model for recognition and a SentenceTransformer model for resolution. You can customize these components by providing your own module instances:

from geoparser import Geoparser
from geoparser.modules import SpacyRecognizer, SentenceTransformerResolver

# Use a more accurate spaCy model
recognizer = SpacyRecognizer(model_name="en_core_web_trf")

# Use a different gazetteer
resolver = SentenceTransformerResolver(gazetteer_name="swissnames3d")

geoparser = Geoparser(recognizer=recognizer, resolver=resolver)

documents = geoparser.parse("Zurich is the largest city in Switzerland.")

For more details on working with different modules, see the Modules guide.

Persisting Results

By default, the parse() method creates a temporary project internally and deletes it after returning the results. If you want to keep the results for later analysis, use the save=True parameter:

from geoparser import Geoparser

geoparser = Geoparser()
documents = geoparser.parse("Berlin is the capital of Germany.", save=True)
# Results saved under project name: a1b2c3d4

When save=True, the method prints the project name that was created. You can later access these results using the Project class, as described in the Projects guide.

Next Steps

This quickstart covered the basics of using the Irchel Geoparser for simple tasks. To learn more about advanced features, explore these guides:

Projects - Persistent workspaces for research and analysis
Modules - Using and creating custom recognizers and resolvers
Training - Fine-tuning models on your own data
Gazetteers - Working with different geographic databases

For complete API documentation, see the Geoparser reference.