Project

class Project(name: str)

Bases: object

Handles project management and pipeline execution.

This class manages project lifecycle, document management, and provides methods to run processing pipelines on project documents.

create_documents(texts: str | List[str]) → None

Create documents in the project.

Parameters:: texts – Either a single document text or a list of document texts

create_references(texts: List[str], references: List[List[tuple]], tag: str) → None

Create references (toponym spans) using ManualRecognizer.

Parameters:

texts – List of document texts
references – List of reference tuples (start, end) for each document
tag – Tag to identify this recognition set

create_referents(texts: List[str], references: List[List[tuple]], referents: List[List[tuple]], tag: str) → None

Create referents (location assignments) using ManualResolver.

Parameters:

texts – List of document texts
references – List of reference tuples (start, end) for each document
referents – List of referent tuples (gazetteer_name, identifier) for each document
tag – Tag to identify this resolution set

delete() → None

Delete this project and all its associated data from the database.

This will remove the project, all its documents, references, referents, recognitions, and resolutions due to cascade delete relationships.

get_documents(tag: str = 'latest') → List[Document]

Retrieve all documents in the project with context set for the specified tag.

Parameters:: tag – Tag identifier to determine which recognizer/resolver context to use (default: “latest”)
Returns:: List of Document objects with context set for filtering.

load_annotations(path: str, tag: str, create_documents: bool = False) → None

Load annotations from an annotator JSON file and register them in the project.

This method imports annotations from the legacy annotator format and registers them using ManualRecognizer for toponym spans and ManualResolver for location assignments. The annotations are stored with the provided tag to distinguish different annotation sources.

Parameters:

path – Path to the JSON file exported from the annotator
tag – Tag to identify this annotation set This allows tracking multiple annotation sources separately
create_documents – Whether to create new documents from the texts in the JSON (default: False). Set to True if the documents don’t exist yet, False to add annotations to existing documents.

run_recognizer(recognizer: Recognizer, tag: str = 'latest') → None

Run a recognizer module on all documents in this project.

This is a convenience method that simplifies the workflow for advanced users by handling service initialization and document retrieval internally.

Parameters:

recognizer – The recognizer module to run on all project documents
tag – Tag to associate with this recognizer run (default: “latest”)

run_resolver(resolver: Resolver, tag: str = 'latest') → None

Run a resolver module on all documents in this project.

This is a convenience method that simplifies the workflow for advanced users by handling service initialization and document retrieval internally.

Parameters:

resolver – The resolver module to run on all project documents
tag – Tag to associate with this resolver run (default: “latest”)

train_recognizer(recognizer: Recognizer, tag: str, **kwargs) → None

Train a recognizer module using documents with reference annotations from this project.

This method retrieves documents that have been processed by a specific recognizer, prepares the training data, and calls the recognizer’s fit method if available.

Parameters:

recognizer – The recognizer module to train
tag – Tag identifying which annotations to use for training
**kwargs – Additional training parameters (e.g., output_path, epochs, batch_size)

Raises:

ValueError – If the recognizer does not implement a fit method

train_resolver(resolver: Resolver, tag: str, **kwargs) → None

Train a resolver module using documents with referent annotations from this project.

This method retrieves documents that have been processed by specific recognizer and resolver, prepares the training data, and calls the resolver’s fit method if available.

Parameters:

resolver – The resolver module to train
tag – Tag identifying which annotations to use for training
**kwargs – Additional training parameters (e.g., output_path, epochs, batch_size)

Raises:

ValueError – If the resolver does not implement a fit method