paidiverpy.utils.data#

Helper functions to download and load datasets.

Classes#

PaidiverpyData

A class to download and load datasets.

Module Contents#

class paidiverpy.utils.data.PaidiverpyData[source]#

A class to download and load datasets.

load(dataset_name: str) dict[str, str][source]#

Download, unzip, and load the specified dataset.

Parameters:

dataset_name (str) – The name of the dataset (for example, ‘sample_image’).

Returns:

A dictionary containing the input path, metadata path, metadata type, and image type.

Return type:

dict

copy_files_docker(extract_dir: pathlib.Path, dataset_name: str) None[source]#

Copy files from the extract directory to the appropriate location in the Docker container.

Parameters:
  • extract_dir (Path) – The directory where the dataset has been extracted.

  • dataset_name (str) – The name of the dataset.

load_persistent_paths() dict[str, str][source]#

Load the persistent paths from the cache directory.

Returns:

The persistent paths.

Return type:

dict[str, str]

save_persistent_paths(paths: dict[str, str]) None[source]#

Save the persistent paths to the cache directory.

Parameters:

paths (dict) – The paths to save.

download_file(url: str, dataset_name: str, cache_dir: pathlib.Path = CACHE_DIR) pathlib.Path[source]#

Download dataset file from the given URL.

Download the file from the given URL and cache it locally to avoid redundant downloads. A progress bar is displayed for the download process.

Parameters:
  • url (str) – The URL to download the file from.

  • cache_dir (Path) – The directory to store the downloaded file.

  • dataset_name (str) – The name of the dataset.

Returns:

The path to the downloaded file.

Return type:

Path

unzip_file(zip_path: pathlib.Path, dataset_name: str, extract_dir: pathlib.Path = CACHE_DIR) None[source]#

Unzip the file to the specified directory.

Parameters:
  • zip_path (Path) – The path to the zip file.

  • extract_dir (Path) – The directory to extract the contents to.

  • dataset_name (str) – The name of the dataset.

calculate_information(dataset_name: str, extract_dir: pathlib.Path, dataset_information: dict[str, Any]) dict[str, str][source]#

Calculate the information for the dataset.

Parameters:
  • dataset_name (str) – Dataset name

  • extract_dir (Path) – Path to the extracted directory

  • dataset_information (dict) – Information about the dataset

Returns:

Information about the dataset

Return type:

dict