Datasets

class indico.queries.datasets.CreateDataset(name, files, wait=True, dataset_type='TEXT', from_local_images=False, image_filename_col='filename', batch_size=20, ocr_engine=None, omnipage_ocr_options=None, read_api_ocr_options=None)

Create a dataset and upload the associated files.

Parameters
  • name (str) – Name of the dataset

  • files (List[str]) – List of pathnames to the dataset files

Options:

dataset_type (str): Type of dataset to create [TEXT, DOCUMENT, IMAGE] wait (bool, default=True): Wait for the dataset to upload and finish

Returns

Dataset object

Raises

IndicoError

class indico.queries.datasets.GetDataset(id)

Retrieve a dataset description object

Parameters

id (int) – id of the dataset to query

Returns

Dataset object

Raises:

class indico.queries.datasets.GetDatasetStatus(id)

Get the status of a dataset

Parameters

id (int) – id of the dataset to query

Returns

COMPLETE or FAILED

Return type

status (str)

Raises:

class indico.queries.datasets.GetDatasetFileStatus(id)

Get the status of dataset file upload

Parameters

id (int) – id of the dataset to query

Returns

DOWNLOADED or FAILED

Return type

status (str)

Raises:

class indico.queries.datasets.ListDatasets(*, limit=100)

List all of your datasets

Options:

limit (int, default=100): Max number of datasets to retrieve

Returns

List[Dataset]

Raises:

class indico.queries.datasets.DeleteDataset(id)

Delete a dataset

Parameters

id (int) – ID of the dataset

Returns

The success of the operation

Return type

success (bool)

Raises:

class indico.queries.datasets.AddDatasetFiles(dataset_id, files, autoprocess=False, wait=True, batch_size=20)

Add files to a dataset.

Parameters
  • dataset_id (int) – ID of the dataset

  • files (List[str]) – List of pathnames to the dataset files

Options:

autoprocess (bool, default=False): Automatically process new dataset files wait (bool, default=True): Block while polling for status of files batch_size (int, default=20): Batch size for uploading files

Returns

Dataset

Raises:

class indico.queries.datasets.RemoveDatasetFile(dataset_id, file_id)

Remove a file from a dataset by ID. To retrieve a list of files in a dataset, see GetDatasetFileStatus.

Parameters
  • dataset_id (int) – Dataset ID

  • file_id (int) – Datafile ID (returned by GetDatasetFileStatus)

Returns

Dataset object

Raises

IndicoError