Custom Algorithm#
In paidiverpy, you have the flexibility to add your own algorithm to the suite of available algorithms. This guide walks you through the steps to create, implement, and configure a custom algorithm.
Creating a Custom Algorithm#
To create a custom algorithm, start by creating a new file that contains a class inheriting from the CustomLayer class. This base class is located in the paidiverpy.custom_layer.custom_layer module, shown below:
"""Custom layer module.
This module contains the CustomLayer class for processing the images in the
custom layer.
"""
import logging
from typing import Any
from dask.distributed import Client
from paidiverpy import Paidiverpy
from paidiverpy.config.config_params import ConfigParams
from paidiverpy.config.configuration import Configuration
from paidiverpy.images_layer import ImagesLayer
from paidiverpy.metadata_parser import MetadataParser
from paidiverpy.models.custom_params import CustomParams
from paidiverpy.utils.install_packages import check_and_install_dependencies
class CustomLayer(Paidiverpy):
"""CustomLayer class.
Process the images in the custom layer.
Args:
parameters (dict): The parameters for the step.
config_params (dict | ConfigParams, optional): The configuration parameters.
It can contain the following keys / attributes:
- input_path (str): The path to the input files.
- output_path (str): The path to the output files.
- metadata_path (str): The path to the metadata file.
- metadata_type (str): The type of the metadata file.
- track_changes (bool): Whether to track changes.
- n_jobs (int): The number of n_jobs.
config_file_path (str): The path to the configuration file.
config (Configuration): The configuration object.
metadata (MetadataParser): The metadata object.
images (ImagesLayer): The images object.
paidiverpy (Paidiverpy): The paidiverpy object.
step_name (str): The name of the step.
client (Client): The Dask client.
config_index (int): The index of the configuration.
logger (logging.Logger): The logger object.
raise_error (bool): Whether to raise an error.
verbose (int): verbose level (0 = none, 1 = errors/warnings, 2 = info).
"""
def __init__(
self,
parameters: dict[str, Any] | None = None,
config_params: dict[str, Any] | ConfigParams | None = None,
config_file_path: str | None = None,
config: Configuration | None = None,
metadata: MetadataParser | None = None,
images: ImagesLayer | None = None,
paidiverpy: Paidiverpy | None = None,
step_name: str | None = None,
client: Client | None = None,
config_index: int | None = None,
logger: logging.Logger | None = None,
raise_error: bool = False,
verbose: int = 2,
):
super().__init__(
config_params=config_params,
config_file_path=config_file_path,
metadata=metadata,
config=config,
images=images,
paidiverpy=paidiverpy,
client=client,
logger=logger,
raise_error=raise_error,
verbose=verbose,
)
self.step_name = step_name
self.config_index = self.config.add_step(config_index, parameters, step_class=CustomLayer)
self.step_metadata = self._calculate_steps_metadata(self.config.steps[self.config_index])
self.raise_error = self._calculate_raise_error()
def run(self) -> None:
"""Custom Layer run method.
Run the custom layer steps on the images based on the configuration
file or parameters.
Args:
add_new_step (bool, optional): Whether to add a new step to the images object.
Defaults to True.
"""
check_and_install_dependencies(self.step_metadata.get("dependencies"), self.step_metadata.get("dependencies_path"))
test = self.step_metadata.get("test", False)
algorithm_name = self.step_metadata.get("name", "")
try:
method = getattr(self, algorithm_name)
except AttributeError as e:
msg = f"Method {algorithm_name} not found in CustomLayer."
self.logger.error(msg)
if self.raise_error:
raise AttributeError(msg) from e
return
params = self.step_metadata.get("params") or {}
params = CustomParams(**params) if isinstance(params, dict) else params
processing_type = self.step_metadata.get("processing_type")
if processing_type == "dataset":
images = self.images.get_step(step=len(self.images.images) - 1)
images = self.process_dataset(images, method, params)
else:
images = self.process_images(method, params)
if not test:
self.step_name = f"step_{self.config_index}" if not self.step_name else self.step_name
self.images.add_step(
step=self.step_name,
images=images,
step_metadata=self.step_metadata,
track_changes=self.track_changes,
)
Your custom algorithm class should extend CustomLayer and implement a new method to it with a chosen name (e.g., multiply_data, process, etc.). This method will contain the logic of your algorithm. You can see below an example of a custom class that multiplies each image by a parameter:
from paidiverpy.custom_layer.custom_layer import CustomLayer
class MyCustomClass(CustomLayer):
@staticmethod
def multiply_data(image_data, params, **kwargs):
return image_data * params.some_param
Your method can be a static method or an instance method, depending on he processing type:
Image-level processing (default):
It needs to be a @staticmethod.
It processes one image at a time.
It receives the following parameters: - image_data: a single NumPy or Dask array with shape (height, width, channels) - params: an object with algorithm-specific parameters - kwargs: a dictionary that contains the metadata of the whole dataset and the filename of the current image being processed (kwargs[“metadata”] and kwargs[“filename”])
Returns: a NumPy or Dask array with the processed image.
Dataset-level processing:
It can be a regular instance method (without the @staticmethod decorator).
It processes the entire dataset at once.
It receives the following parameters: - images: an xarray.Dataset containing all images in the dataset. The images are represented as a variable named images within the dataset, with dimensions (filename, y, x, band). - params: an object with algorithm-specific parameters
Returns: an updated xarray.Dataset with the processed images and dimensions (filename, y, x, band).
You can access the metadata by using self.get_metadata().
In this case, if you are working with images with different sizes, you will need to handle the padding and cropping of images manually. Please refer to the How we handle the images section for more details.
Important
Each image_data (or image_data[i]) is a 3D array, even for grayscale images (singleton channel dimension).
Multi-channel images follow RGB or RGBA conventions.
Note
If your algorithm requires external libraries, import them directly within the file containing your custom class.
Configuration File#
After creating your custom algorithm, specify it in the configuration file as a pipeline step. Below is an example configuration:
general:
# General configurations here
steps:
# Steps before the custom algorithm
- custom:
name: "my_custom_algorithm" # Name of the algorithm and the method in the class
file_path: "/path/to/file.py" # Path to the module implementing the custom algorithm
class_name: "MyCustomClass" # Name of the custom algorithm class
dependencies: "marimba,scikit-learn==0.24.2"
dependencies_path: "/path/to/requirements.txt" # Optional path to a requirements file
params: # Algorithm parameters
some_param: 10
another_param: 0.5
# Steps following the custom algorithm
In this example:
The custom algorithm, named my_custom_algorithm, is defined in the file /path/to/file.py and implemented in the class MyCustomClass.
The algorithm accepts parameters such as some_param (set to 10) and another_param (set to 0.5).
External dependencies are declared in two ways:
As a comma-separated string in the dependencies field (e.g., marimba, scikit-learn==0.24.2).
Via a requirements.txt file specified with the dependencies_path field.
There is no preferred method for declaring dependencies—either or both can be used. If both are provided, both sets will be installed before executing the algorithm.
Since the processing_type parameter is not explicitly set, the algorithm defaults to image-level processing, meaning each image will be processed individually.
Important
You only need to specify external packages as dependencies; packages already available in your environment or included with paidiverpy do not need to be listed and will be ignored.
Dataset-level Processing#
If your custom algorithm processes the entire dataset at once, you can set the processing_type parameter to dataset in the configuration file. This will ensure that the process method receives a list of images instead of a single image. Here’s how to configure it:
general:
# General configurations here
steps:
# Steps before the custom algorithm
- custom:
name: "my_custom_algorithm"
file_path: "/path/to/file.py"
class_name: "MyCustomClass"
dependencies: "marimba,scikit-learn==0.24.2"
dependencies_path: "/path/to/requirements.txt"
processing_type: "dataset" # Set to 'dataset' for dataset-level processing
params:
some_param: 10
another_param: 0.5
# Steps following the custom algorithm
Real Example#
For a more concrete example, consider the following code snippet (available in examples/custom_algorithms files of the paidiverpy package):
"""This is an example of a custom algorithm that scales the image data using MinMaxScaler from sklearn.preprocessing."""
import ast
from typing import Any
import numpy as np
from sklearn import preprocessing
from paidiverpy.custom_layer import CustomLayer
from paidiverpy.models.custom_params import CustomParams
from paidiverpy.utils.data import NUM_DIMENSIONS
from paidiverpy.utils.data import NUM_DIMENSIONS_GREY
class MyCustomClass(CustomLayer):
"""This class scales the image data using MinMaxScaler from sklearn.preprocessing."""
@staticmethod
def min_max_data(image_data: np.ndarray[Any, Any], params: CustomParams | None = None, **_kwargs: dict[str, Any]) -> np.ndarray[Any, Any]:
"""Convert the image to the specified number of bits.
Args:
image_data (xr.DataArray): The image data.
params (CustomParams, optional): The custom parameters.
**_kwargs: Additional keyword arguments.
Returns:
np.ndarray: The updated image.
"""
if len(image_data.shape) == NUM_DIMENSIONS and image_data.shape[-1] == 1:
image_data = np.squeeze(image_data, axis=-1)
feature_range = ast.literal_eval(params.feature_range)
min_max_scaler = preprocessing.MinMaxScaler(feature_range=feature_range)
image_data = min_max_scaler.fit_transform(image_data)
if len(image_data.shape) == NUM_DIMENSIONS_GREY:
image_data = np.expand_dims(image_data, axis=-1)
return image_data
In this example, the custom algorithm accepts an image and a feature_range parameter. Using sklearn’s MinMaxScaler, it normalizes the image data within the specified range, then returns the processed data.
The corresponding configuration file might look like this:
general:
# General configurations here
steps:
# Steps before the custom algorithm
- custom:
name: "min_max_data"
file_path: "/path/to/file.py"
class_name: "MyCustomClass"
dependencies:
- "scikit-learn"
params:
feature_range: (0, 1)
# Steps following the custom algorithm
In this setup:
The custom algorithm min_max_data resides in /path/to/file.py, with the class name MyCustomClass.
The algorithm has one parameter, feature_range, set to (0, 1).
The dependency scikit-learn is installed before the algorithm runs.
To execute, run your application with the configuration file above, and the custom algorithm will be applied accordingly.
Example configuration files for custom algorithms can be found in the example/config_files directory of the repository. You can also run an example notebook with a custom algorithm by exploring the Gallery section.
Run in Docker#
To pass the custom algorithm to the Docker container, you need to mount the custom algorithm file to the container. The following steps show how to run the container with a custom algorithm:
docker run --rm \
-v <INPUT_PATH>:/app/input/ \
-v <OUTPUT_PATH>:/app/output/ \
-v <FULL_PATH_OF_CONFIGURATION_FILE_WITHOUT_FILENAME>:/app/config_files \
-v <METADATA_PATH_WITHOUT_FILENAME>:/app/metadata/ \
-v <FULL_PATH_OF_CUSTOM_ALGORITHM_FILE_AND_REQUIREMENTS_FILE>:/app/custom_algorithms \
paidiverpy \
paidiverpy -c /app/examples/config_files/<CONFIGURATION_FILE_FILENAME>
In this command:
<INPUT_PATH>: The input path defined in your configuration file, where the input images are located.
<OUTPUT_PATH>: The output path defined in your configuration file.
<FULL_PATH_OF_CONFIGURATION_FILE_WITHOUT_FILENAME>: The local directory of your configuration file.
<CONFIGURATION_FILE_FILENAME>: The name of the configuration file.
<FULL_PATH_OF_CUSTOM_ALGORITHM_FILE_AND_REQUIREMENTS_FILE>: The local directory of your custom algorithm file and requirements file (if any).
The output images will be saved to the specified output_path.