Projects
This guide explains how to use projects for organizing documents and processing results, comparing different processing strategies, and managing annotations for training.
Overview
While the simple Geoparser.parse() interface (which is essentially a convenience wrapper around the Project class) is convenient for quick tasks, projects provide a more powerful approach for research and production workflows. The project-based architecture replaces the in-memory session-oriented approach with persistent storage, enabling long-term research workflows and systematic comparison of different processing configurations.
Projects maintain comprehensive state information about what has been processed, by which modules, and using which configuration. This enables sophisticated queries about processing outcomes and supports comparative analysis of different approaches applied to the same data. All results are stored with associated tags that identify which recognizer and resolver combination produced them, allowing you to experiment with different processing strategies while keeping all results organized.
Creating and Loading Projects
Projects are identified by name. When you create a Project instance, it either creates a new project in the database or loads an existing one with that name:
from geoparser import Project
# Create a new project or load an existing one
project = Project("research_corpus")
Once created, a project persists in the database until explicitly deleted. You can close your Python session, come back later, and reload the same project by using the same name.
Adding Documents
After creating a project, you can add documents to it using the create_documents() method. This method accepts either a single text string or a list of text strings:
from geoparser import Project
project = Project("news_analysis")
# Add a single document
project.create_documents("The summit took place in Geneva.")
# Add multiple documents
texts = [
"London hosted the Olympic Games in 2012.",
"The conference was held in Barcelona.",
"Researchers gathered in Vienna to discuss the findings."
]
project.create_documents(texts)
Documents added to a project are stored in the database with unique identifiers. You can add more documents to the same project at any time, and they will accumulate in the project’s collection.
Running Processing Modules
Once you have documents in a project, you can run recognition and resolution modules to identify and resolve place names. The project manages the execution and stores the results:
from geoparser import Project
from geoparser.modules import SpacyRecognizer, SentenceTransformerResolver
project = Project("news_analysis")
# Create module instances
recognizer = SpacyRecognizer()
resolver = SentenceTransformerResolver()
# Run the recognizer to identify place names
project.run_recognizer(recognizer)
# Run the resolver to link place names to locations
project.run_resolver(resolver)
The run_recognizer() method processes all documents in the project that haven’t been processed by this specific recognizer yet. Similarly, run_resolver() processes all references that haven’t been resolved by this specific resolver. This means you can safely call these methods multiple times—only unprocessed items will be handled.
Retrieving Results
After running modules on your project, you can retrieve the processed documents using the get_documents() method:
from geoparser import Project
project = Project("news_analysis")
# Get all documents with their results
documents = project.get_documents()
# Access the results
for doc in documents:
print(f"Document: {doc.text}")
for toponym in doc.toponyms:
print(f" - {toponym.text}", end="")
if toponym.location:
print(f" → {toponym.location.data.get('name')}")
else:
print(" (unresolved)")
print()
The toponyms property on each document returns only the references that were identified by the recognizer associated with the current tag. Similarly, each reference’s location property returns the feature resolved by the resolver associated with that tag. By default, when you don’t specify a tag, the system uses the "latest" tag, which references the most recently run recognizer and resolver that were executed with this "latest" tag. If you run modules with a different tag, the "latest" tag remains unchanged and continues to reference the modules that were last run with "latest". This context-based filtering, controlled through tags, is explained in the next section.
Comparative Workflows
Projects are particularly useful for comparing different processing configurations. You can systematically evaluate how different recognizers, resolvers, or parameters affect the geoparsing results:
from geoparser import Project
from geoparser.modules import SpacyRecognizer, SentenceTransformerResolver
project = Project("parameter_study")
project.create_documents([
"The delegation traveled from Brussels to Amsterdam.",
"Trade routes connected Venice, Constantinople, and Alexandria."
])
# Test different similarity thresholds
recognizer = SpacyRecognizer()
for threshold in [0.5, 0.6, 0.7, 0.8]:
resolver = SentenceTransformerResolver(min_similarity=threshold)
tag = f"threshold_{threshold}"
project.run_recognizer(recognizer, tag=tag)
project.run_resolver(resolver, tag=tag)
docs = project.get_documents(tag=tag)
resolved_count = sum(
1 for doc in docs
for toponym in doc.toponyms
if toponym.location is not None
)
print(f"Threshold {threshold}: {resolved_count} resolved toponyms")
This approach enables reproducible experiments where you can precisely control and document which processing configuration produced which results.
Working with Annotations
Projects support managing manually annotated data, which is essential for training custom models and evaluating system performance. You can create annotations programmatically using the create_references() and create_referents() methods:
from geoparser import Project
project = Project("manual_annotations")
texts = ["Paris is the capital of France."]
project.create_documents(texts)
# Create references (identified place names)
references = [[(0, 5), (23, 29)]] # "Paris" and "France"
project.create_references(texts, references, tag="manual")
# Create referents (resolved locations)
referents = [[("geonames", "2988507"), ("geonames", "3017382")]]
project.create_referents(texts, references, referents, tag="manual")
These methods store the annotations in the database using internal recognizer and resolver modules. The annotations can then be used for training or evaluation purposes.
Alternatively, you can load annotations from JSON files exported from annotation tools:
from geoparser import Project
project = Project("annotated_corpus")
# Load annotations from a JSON file
# Set create_documents=True if documents aren't already in the project
project.load_annotations(
path="annotations.json",
tag="annotator",
create_documents=True
)
# Access the manually annotated toponyms
documents = project.get_documents(tag="annotator")
for doc in documents:
print(f"Document: {doc.text}")
print(f" Annotated toponyms: {len(doc.toponyms)}")
The JSON file should follow this structure:
{
"gazetteer": "geonames",
"documents": [
{
"text": "Paris is the capital of France.",
"toponyms": [
{
"start": 0,
"end": 5,
"text": "Paris",
"loc_id": "2988507"
},
{
"start": 23,
"end": 29,
"text": "France",
"loc_id": "3017382"
}
]
}
]
}
The loc_id field should contain the identifier from the specified gazetteer, or empty string/null for toponyms that were not linked to locations.
Deleting Projects
When you’re done with a project and want to free up database space, you can delete it using the delete() method:
from geoparser import Project
project = Project("temporary_analysis")
# ... work with the project ...
# Delete the project and all associated data
project.delete()
This removes the project and all its documents, references, and referents from the database. The deletion is permanent and cannot be undone, so use this method carefully.
Next Steps
Now that you understand project-based workflows, you can explore:
Modules - Learn about the different recognizers and resolvers available
Training - Use project annotations to train custom models
Gazetteers - Understand how geographic databases work
For complete API documentation of the Project class, see the Project reference.