Project

class Project(name: str)

Bases: object

Handles project management and pipeline execution.

This class manages project lifecycle, document management, and provides methods to run processing pipelines on project documents.

create_documents(texts: str | List[str]) None

Create documents in the project.

Parameters:

texts – Either a single document text or a list of document texts

create_references(texts: List[str], references: List[List[tuple]], tag: str) None

Create references (toponym spans) using ManualRecognizer.

Parameters:
  • texts – List of document texts

  • references – List of reference tuples (start, end) for each document

  • tag – Tag to identify this recognition set

create_referents(texts: List[str], references: List[List[tuple]], referents: List[List[tuple]], tag: str) None

Create referents (location assignments) using ManualResolver.

Parameters:
  • texts – List of document texts

  • references – List of reference tuples (start, end) for each document

  • referents – List of referent tuples (gazetteer_name, identifier) for each document

  • tag – Tag to identify this resolution set

delete() None

Delete this project and all its associated data from the database.

This will remove the project, all its documents, references, referents, recognitions, and resolutions due to cascade delete relationships.

get_documents(tag: str = 'latest') List[Document]

Retrieve all documents in the project with context set for the specified tag.

Parameters:

tag – Tag identifier to determine which recognizer/resolver context to use (default: “latest”)

Returns:

List of Document objects with context set for filtering.

load_annotations(path: str, tag: str, create_documents: bool = False) None

Load annotations from an annotator JSON file and register them in the project.

This method imports annotations from the legacy annotator format and registers them using ManualRecognizer for toponym spans and ManualResolver for location assignments. The annotations are stored with the provided tag to distinguish different annotation sources.

Parameters:
  • path – Path to the JSON file exported from the annotator

  • tag – Tag to identify this annotation set This allows tracking multiple annotation sources separately

  • create_documents – Whether to create new documents from the texts in the JSON (default: False). Set to True if the documents don’t exist yet, False to add annotations to existing documents.

run_recognizer(recognizer: Recognizer, tag: str = 'latest') None

Run a recognizer module on all documents in this project.

This is a convenience method that simplifies the workflow for advanced users by handling service initialization and document retrieval internally.

Parameters:
  • recognizer – The recognizer module to run on all project documents

  • tag – Tag to associate with this recognizer run (default: “latest”)

run_resolver(resolver: Resolver, tag: str = 'latest') None

Run a resolver module on all documents in this project.

This is a convenience method that simplifies the workflow for advanced users by handling service initialization and document retrieval internally.

Parameters:
  • resolver – The resolver module to run on all project documents

  • tag – Tag to associate with this resolver run (default: “latest”)

train_recognizer(recognizer: Recognizer, tag: str, **kwargs) None

Train a recognizer module using documents with reference annotations from this project.

This method retrieves documents that have been processed by a specific recognizer, prepares the training data, and calls the recognizer’s fit method if available.

Parameters:
  • recognizer – The recognizer module to train

  • tag – Tag identifying which annotations to use for training

  • **kwargs – Additional training parameters (e.g., output_path, epochs, batch_size)

Raises:

ValueError – If the recognizer does not implement a fit method

train_resolver(resolver: Resolver, tag: str, **kwargs) None

Train a resolver module using documents with referent annotations from this project.

This method retrieves documents that have been processed by specific recognizer and resolver, prepares the training data, and calls the resolver’s fit method if available.

Parameters:
  • resolver – The resolver module to train

  • tag – Tag identifying which annotations to use for training

  • **kwargs – Additional training parameters (e.g., output_path, epochs, batch_size)

Raises:

ValueError – If the resolver does not implement a fit method