Special Operations

All Available Operations

The API provides a suite of specialized operations designed to optimize specific tasks.
The currently available operations include:

  • analyze-image: Use AI models to upscale an image.
  • compare-image: Compare the differences between two images, using various algorithms such as AE, MAE, NCC, PSNR, and RMSE.
  • compare-video: Compare the differences between two videos.
  • compress: Compress image files while maintaining a visually similar quality to the human eye.
  • convert-pdfa: Convert a PDF to a PDF/A compliant format, with the option to select a specific PDF/A profile.
  • extract-archive: Extract the contents of a compressed archive.
  • extract-streams: Extract individual streams (such as video, audio, and subtitles) from a multimedia file.
  • merge-streams: Merge multiple streams (such as video, audio, and subtitles) into a single multimedia file, with the option to specify specific layouts such as 5.1 audio.
  • thumbnail: Create thumbnails of document files.
  • validate-pdfa: Verify PDF/A compliance with a specific PDF/A profile for a given file.

Operation analyze-image

The analyze-image operation enables you to extract valuable information from an image, utilizing both traditional and artificial intelligence (AI) techniques.

You can customize the extraction process for specific data by enabling the corresponding option.

Available Options

Option Name Type Possible Values Description
nsfw boolean true, false This option activates the detection of Not Safe For Work (NSFW) content in images. It provides an indication of whether explicit or inappropriate material is present.
age boolean true, false Enabling this option triggers the identification of the age of individuals depicted in the images. It provides insights into the approximate age range of people captured in the visual content.
max_outputs_per_input integer 1-2048 Specifies the maximum number of outputs to be generated per input file during the conversion process. If the conversion yields more than one output per input, only the specified number of outputs will be returned.
allow_multiple_outputs boolean true, false If the comparison produces more than one output file, by default all of them are compressed in just one file. Set this option to true if you want a download link for each file.
{
"conversion": [{
    "category": "operation",
    "target": "analyze-image",
    "options": {
      "nsfw": true,
      "age": true
    }
  }]
}

Operation compare-image

The compare-image operation allows for the comparison of differences between two provided images.

Available Options

Option Name Type Possible Values Description
method string AE, MAE, NCC, PSNR, RMSE The algorithm to use for image comparison.
diff_color string red, green, blue, yellow, orange, violet, black, white The color to use to show the pixels that are different in the diff image.
threshold integer 0-100 Sets the threshold in percentage above which color is to be considered different between the two images. A zero means that the pixels must be identical to be considered equal.
allow_multiple_outputs boolean true, false If the comparison produces more than one output file, by default all of them are compressed in just one file. Set this option to true if you want a download link for each file.

Option Details

Value Description
AE The Absolute Error (AE) algorithm measures the absolute differences between the pixel values of the two images being compared.
MAE The Mean Absolute Error (MAE) algorithm calculates the average absolute differences between the pixel values of the two images being compared.
NCC The Normalized Cross-Correlation (NCC) algorithm measures the similarity between the two images by comparing the correlation of the pixel values.
PSNR The Peak Signal-to-Noise Ratio (PSNR) algorithm calculates the ratio between the maximum possible pixel value of the image and the difference between the two images being compared.
RMSE The Root Mean Squared Error (RMSE) algorithm calculates the square root of the average of the squared differences between the pixel values of the two images being compared.
{
  "conversion": [{
    "category": "operation",
    "target": "compare-image",
    "options": {
      "method": "mae",
      "diff_color": "orange",
      "threshold": "25",
      "allow_multiple_outputs": true
    }
  }]
}

Output Details

The output of this image comparison operation may include two files:

  • A binary image file containing the difference between the two input images, highlighted in the color specified by the diff_color option.
  • Please see the JSON object with the according structure in the example.

The JSON output provided in this example is a representation of the results of an image comparison operation and can be used to determine the degree of similarity or difference between the two input images.
It allows to have a per channel comparison and have a relative idea of the difference between the images.
It includes information about the comparison method used and the resulting color values and, since the method used in the call was mae, their Mean Absolute Error values.

The top-level key "method" is a string representing the method used for the comparison. In this example, it is set to "Mean Absolute Error".

The next key is an identifier, "fe194466-3df5-4baf-8138-511d874e583e", that it is associated with the comparison result containing the color channels: red, green, blue, and all.
Each color channel is associated with a string that contains two parts: the first is the Mean Absolute Error value, and the second (the value in parenthesis) represents the relative error as a ratio (range between 0 and 1) and this can be useful to understand how much different the two images are.

For example, the value "1911.56 (0.0291685)" for the "red" color channel means that the Mean Absolute Error for the red channel is 1911.56 and the relative error is 0.0291685.

{
    "method": "Mean Absolute Error",
    "fe194466-3df5-4baf-8138-511d874e583e": {
        "red": "1911.56 (0.0291685)",
        "green": "1813.52 (0.0276726)",
        "blue": "2007.71 (0.0306357)",
        "all": "1910.93 (0.029159)"
    }
}

Operation compare-video

The compare-video operation allows for the comparison of differences between two provided videos.
It utilizes the Video Multimethod Assessment Fusion (VMAF) video quality metric to evaluate the similarity in quality between the two videos.

VMAF is a machine learning-based video quality assessment algorithm that combines the strengths of several existing video quality metrics into a single, unified score.
It was designed to closely predict the Mean Opinion Score (MOS) of human viewers and has been shown to be highly correlated with subjective quality assessments.
By calculating the VMAF score for two videos, you can get an idea of how similar in quality the videos are to each other.

{
"conversion": [{
        "category": "operation",
        "target": "compare-video"
    }],
}

Output Details

Field Name Description
integer_adm2 The Average Difference Modulus (ADM) metric, which measures the average difference between the two videos in terms of pixel intensity.
integer_adm_scale0
integer_adm_scale1
integer_adm_scale2
integer_adm_scale3
The ADM metric calculated at a lower resolution.
integer_motion2 The Motion2 metric, which measures the degree of motion in the video.
integer_motion The Motion metric, which measures the degree of motion in the video.
integer_vif_scale0
integer_vif_scale1
integer_vif_scale2
integer_vif_scale3
The Visual Information Fidelity (VIF) metric calculated at a lower resolution.
vmaf The VMAF score is calculated on a per-frame basis and the vmaf field in the metrics field contains the VMAF score for each individual frame.
pooled_metrics Contains summary statistics for each metric over the entire video. The statistics include the minimum, maximum, mean and harmonic mean values for each metric.
The vmaf field in the pooled_metrics field contains summary statistics for the VMAF scores over the entire video.

The output of this operation includes information about the quality of each frame in the video, as well as summary statistics for the entire video.
The metrics field contains a set of quality metrics for each frame, as in the code example.

{
    "version": "2.3.1",
    "fps": 24.03,
    "frames": [
    {
        "frameNum": 0,
        "metrics": {
            "integer_adm2": 0.842100,
            "integer_adm_scale0": 0.779905,
            "integer_adm_scale1": 0.749054,
            "integer_adm_scale2": 0.846122,
            "integer_adm_scale3": 0.901889,
            "integer_motion2": 0.000000,
            "integer_motion": 0.000000,
            "integer_vif_scale0": 0.198811,
            "integer_vif_scale1": 0.522634,
            "integer_vif_scale2": 0.660773,
            "integer_vif_scale3": 0.768340,
            "vmaf": 42.687407
        }
    },
    <...more frames...>
    ],
    "pooled_metrics": {
        "integer_adm2": {
        "min": 0.794626,
        "max": 0.863825,
        "mean": 0.830444,
        "harmonic_mean": 0.830332
    },
    "integer_adm_scale0": {
        "min": 0.698013,
        "max": 0.818805,
        "mean": 0.761654,
        "harmonic_mean": 0.761299
    },
    <...more metrics...>
    "vmaf": {
        "min": 40.664485,
        "max": 61.936634,
        "mean": 51.017497,
        "harmonic_mean": 50.703806
    }
}

Operation compress

The compress operation allows for the reduction of an image file's size, potentially with no noticeable loss in quality.
This is achieved through the use of various image compression algorithms, which are designed to remove redundant or unnecessary information from the image file while preserving the visual integrity of the image.

There are several types of image compression algorithms that can be used, each with their own trade-offs in terms of file size reduction and image quality.
The specific compression algorithm used is automatically selected by setting the quality option between the values: low, medium, high, best.

Available Options

Option Name Type Possible Values Description
compression_level string low, medium, high, best Tries to compress your file at different levels with no perceivable loss of quality
compression_target string gif, jpg, png Sets the target format if it's different from the input's file format
allow_multiple_outputs boolean true, false If the comparison produces more than one output file, by default all of them are compressed in just one file. Set this option to true if you want a download link for each file.
{
    "conversion": [{
        "category": "operation",
        "target": "compress",
        "options": {
            "compression_level": "high",
            "compression_target": "jpg"
        }
    }],
}

Operation convert-pdfa

The convert-pdfa operation allows for the conversion of a PDF file to a PDF/A archive, which is a specialized format designed for long-term preservation of electronic documents.
PDF/A ensures that documents will be rendered correctly and consistently over time, even if the software and hardware used to view the documents change.

Available Options

Option Name Type Possible Values Description
validation_profile string pdfa1a, pdfa1b, pdfa2a, pdfa2b, pdfa2u, pdfa3a, pdfa3b, pdfa3u The converted file is PDF/A compliant to the specified profile
strict_mode boolean true, false If true, it will not modify the appearance of the input file in order to make it PDF/A compliant. This means that any elements of the file that do not meet the requirements of the selected PDF/A profile will not be altered, potentially resulting in a non-compliant PDF/A file and a failing conversion.

If false, the convert-pdfa operation will make any necessary modifications to the input file in order to make it PDF/A compliant.
allow_multiple_outputs boolean true, false If the comparison produces more than one output file, by default all of them are compressed in just one file. Set this option to true if you want a download link for each file.

Option Details

Profile Description
pdfa1a This profile requires that all fonts be embedded and that the document's visual appearance be fully self-contained
pdfa1b This profile is similar to pdfa1a, but allows for certain color spaces to be not embedded
pdfa2a This profile is similar to pdfa1a, but allows for the use of transparent objects and requires that the document be tagged with logical structure information
pdfa2b This profile is similar to pdfa2a, but allows for certain color spaces to be not embedded
pdfa2u This profile is similar to pdfa2a, but allows for Unicode encoding of text strings
pdfa3a This profile is similar to pdfa2a, but allows for the use of JPEG 2000 image compression
pdfa3b This profile is similar to pdfa3a, but allows for certain color spaces to be not embedded
pdfa3u This profile is similar to pdfa3a, but allows for Unicode encoding of text strings
{
  "conversion": [{
    "category": "operation",
    "target": "convert-pdfa",
    "options": {
      "validation_profile": "pdfa1b",
      "strict_mode": true
    }
  }]
}

Operation extract-archive

The extract-archive operation allows for accessing the contents of an archived file and extracting them for further processing or use.
It can handle a range of different archive formats, such as a ZIP, 7ZIP, TAR.GZ, TAR.BZ2, or RAR file, making it a versatile tool for working with archived content.

Available Options

Option Name Type Possible Values Description
summary boolean true, false includes information about the file paths and sizes of the extracted files, and is structured in a way that makes it easy to parse and create a tree-like structure for display, such as on a website.
allow_multiple_outputs boolean true, false If the comparison produces more than one output file, by default all of them are compressed in just one file. Set this option to true if you want a download link for each file.
{
  "conversion": [{
    "category": "operation",
    "target": "extract-archive",
    "options": {
      "summary": true
    }
  }]
}

Operation extract-streams

The extract-stream operation allows for the extraction of individual streams from a multimedia file, such as an MP4 file, for further processing or use.

It can handle a range of different file formats, making it a versatile tool for working with multimedia content.
By default, this operation extracts all streams contained within the input file, including audio, video, and subtitle streams.

Available Options

Option Name Type Possible Values Description
stream string all, audio, video, subtitle This can be useful for isolating and processing individual streams within a multimedia file.
allow_multiple_outputs boolean true, false If the comparison produces more than one output file, by default all of them are compressed in just one file. Set this option to true if you want a download link for each file.

Operation merge-streams

The merge-stream operation allows for the merging of multiple multimedia streams, such as audio, video, and subtitle files, into a single multimedia file.
It offers a range of options for customizing the resulting merged file, making it a versatile tool for working with multimedia content.

Available Options

Option Name Type Possible Values Description
video_format string avi, flv, mkv, mov, mp4, mpg, ogv, webm, wmv Allows the specification of the format of the merged file.
keep_original_audio boolean true, false If set to true, this option will keep the audio streams from the original video.
input_filters array of objects see details later in this page This option allows for the application of filters to the merged result, such as modifying the volume of individual streams.
metadata array of objects see details later in this page This option allows the specification of metadata for each stream, such as the language of an audio stream or the subtitle language of a subtitle stream.
audio_channels string layout, back_center, back_left, back_right, front_center, front_left, front_left_of_center, front_right, front_right_of_center, low_frequency, side_left, side_right, top_center, top_front_center, top_front_left, top_front_right, top_back_center, top_back_left, top_back_right, wide_left, wide_right This option allows the specification of the audio layout of the final merged file.
See details later in this page.
allow_multiple_outputs boolean true, false If the comparison produces more than one output file, by default all of them are compressed in just one file. Set this option to true if you want a download link for each file.

Option Details

  • input_filters: it requires an array of objects where each object contains the following fields:

    Property Name Type Possible Values Description
    input_id string pattern: ^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$ Specifies the input_id of the file for which the desired filter will be set.
    filter_name string volume The filter to apply the selected filter to the input_id
    filter_value string The value(s) to use with the selected filter as a comma separated list E.g. for the volume filter allows you to control the loudness of the stream.
    A value of 0 mutes the stream, 0.50 reduces it to half, 1 leaves it as is in the original file, and 1.50 increases it by 50%.
  • metadata: it requires an array of objects where each object contains the following fields:

    Property Name Type Possible Values Description
    input_id string pattern: ^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$ Specifies the input_id of the file for which the desired properties will be set.
    language string pattern: ^[a-z]{3}$ The language (using ISO 639-2/T standard) of the stream contained within the input file, as specified by the input_id field
  • audio_channels: This option allows the specification of the audio layout of the final merged file.

    The option layout allows you to predefine the layout of sound sources. Possible values are described in the next table.

    Option Value Description
    mono A single channel, with all audio being played through a single speaker or set of speakers.
    stereo 2 channels, typically played through a pair of left and right speakers.
    2.1 2 channels for left and right audio, plus a separate channel for low-frequency effects (LFE) played through a subwoofer.
    3.0 3 channels, with left, right, and center audio played through separate speakers.
    3.0-back This audio layout is similar to 3.0, but includes an additional back channel for rear audio.
    3.1 3 channels for left, right, and center audio, plus a separate channel for LFE played through a subwoofer.
    4.0 4 channels, with left, right, left-surround, and right-surround audio played through separate speakers.
    4.1 4 channels for left, right, left-surround, and right-surround audio, plus a separate channel for LFE played through a subwoofer.
    quad This audio layout is similar to 4.0, but with the addition of a separate channel for rear audio.
    quad-side This audio layout is similar to quad, but with the addition of side channels for left-side and right-side audio.
    5.0 5 channels, with left, right, center, left-surround, and right-surround audio played through separate speakers.
    5.0-side This audio layout is similar to 5.0, but with the addition of side channels for left-side and right-side audio.
    5.1 5 channels for left, right, center, left-surround, and right-surround audio, plus a separate channel for LFE played through a subwoofer.
    5.1-side This audio layout is similar to 5.1, but with the addition of side channels for left-side and right-side audio.
    6.0 6 channels, with left, right, center, left-surround, right-surround, and back audio played through separate speakers.
    6.0-front This audio layout is similar to 6.0, but with the addition of front channels for left-front and right-front audio.
    hexagonal 6 channels, with left, right, left-surround, right-surround, left-back, and right-back audio played through separate speakers.
    6.1 6 channels for left, right, center, left-surround, right-surround, and back audio, plus a separate channel for LFE played through a subwoofer.
    6.1-back This audio layout is similar to 6.1, but with the back channel replaced by left-back and right-back channels.
    6.1-front This audio layout is similar to 6.1, but with the back channel replaced by left-front and right-front channels.
    7.0 7 channels, with left, right, center, left-surround, right-surround, left-back, and right-back audio played through separate speakers.
    7.0-front This audio layout is similar to 7.0, but with the addition of front channels for left-front and right-front audio.
    7.1 7 channels for left, right, center, left-surround, right-surround, left-back, and right-back audio, plus a separate channel for LFE played through a subwoofer.
    7.1-wide This audio layout is similar to 7.1, but with the addition of wide channels for left-wide and right-wide audio.
    7.1-wide-side This audio layout is similar to 7.1-wide, but with the addition of side channels for left-side and right-side audio.
    octagonal 8 channels, with left, right, left-surround, right-surround, left-back, right-back, left-side, and right-side audio played through separate speakers
    hexadecagonal 16 channels, with left, right, center, left-surround, right-surround, left-back, right-back, left-side, right-side, left-front, right-front, left-wide, right-wide, top-front-center, top-back-center, and bottom-center audio played through separate speakers.

Furthermore, the option audio_channels allows to specify the single channels of the above layouts.

The value is the input_id of the file to use as a source for the specified channel.

The available values of the audio_channels option are: back_center, back_left, back_right, front_center, front_left, front_left_of_center, front_right, front_right_of_center, low_frequency, side_left, side_right, top_center, top_front_center, top_front_left, top_front_right, top_back_center, top_back_left, top_back_right, wide_left, wide_right

{
    "conversion": [{
        "category": "operation",
        "target": "merge-streams",
        "options": {
            "audio_channels": {
                "layout": "3.0",
                "front_left": "{input_file_id_1}",
                "front_right": "{input_file_id_2}",
                "front_center": "{input_file_id_3}"
            }
        }
    }],
}

Operation thumbnail

The thumbnail operation allows you to create small versions of images or document pages, called thumbnails.
This can be useful if you want to display a large number of images or document pages in a small space.

Available Options

Option Name Type Possible Values Description
thumbnail_target string png, jpg creates a thumbnail in the Portable Network Graphics (PNG) format or in the Joint Photographic Experts Group (JPEG) format.
width integer Range: 10-500 Specifies the width of the thumbnail in pixels. You can use either option alone or together with height to specify the thumbnail's aspect ratio.
height integer Range: 10-500 Specifies the height of the thumbnail in pixels. You can use either option alone or together with width to specify the thumbnail's aspect ratio.
dpi integer Range: 10-300 Specifies the horizontal and vertical density of the thumbnail image. This is important because it determines the size of the image when it is printed or displayed on a high-resolution screen.
pages string Pattern: ^(((\d+|first)-?(\d*|last))|(first)|(last))(, ?(((\d+|first)-?(\d*|last))|(first)|(last)))*$ Specifies the page numbers of a document that you want to extract and convert to image thumbnails.
This is useful if you only want to create thumbnails of certain pages of a document.
You can specify a single page number or a range of pages separated by a hyphen. For example, 1 or 1-3 would create thumbnails of the first and second pages of a document. Aliases like "first" and "last" are also accepted.
Note that this option is only applicable to document inputs.
allow_multiple_outputs boolean true, false If the comparison produces more than one output file, by default all of them are compressed in just one file. Set this option to true if you want a download link for each file.

Operation validate-pdfa

The validate-pdfa operation allows for the validation of a PDF file's compliance with the PDF/A standard.
This operation checks the input file against the specified PDF/A profile for ensuring that a PDF file meets the requirements for long-term preservation and will be consistently rendered over time, even if the software and hardware used to view the file change.

Available Options

Option Name Type Possible Values Description
validation_profile string pdfa1a, pdfa1b, pdfa2a, pdfa2b, pdfa2u, pdfa3a, pdfa3b, pdfa3u The input file is checked for PDF/A compliance to this profile
allow_multiple_outputs boolean true, false If the comparison produces more than one output file, by default all of them are compressed in just one file. Set this option to true if you want a download link for each file.

Option Details

Profile Description
pdfa1a This profile requires that all fonts be embedded and that the document's visual appearance be fully self-contained
pdfa1b This profile is similar to pdfa1a, but allows for certain color spaces to be not embedded
pdfa2a This profile is similar to pdfa1a, but allows for the use of transparent objects and requires that the document be tagged with logical structure information
pdfa2b This profile is similar to pdfa2a, but allows for certain color spaces to be not embedded
pdfa2u This profile is similar to pdfa2a, but allows for Unicode encoding of text strings
pdfa3a This profile is similar to pdfa2a, but allows for the use of JPEG 2000 image compression
pdfa3b This profile is similar to pdfa3a, but allows for certain color spaces to be not embedded
pdfa3u This profile is similar to pdfa3a, but allows for Unicode encoding of text strings
{
  "conversion": [{
    "category": "operation",
    "target": "validate-pdfa",
    "options": {
      "validation_profile": "pdfa1b"
    }
  }]
}

Output Details

The output of this operation includes details on the sections of the PDF that do not conform to the specified profile as in the example code.

{
    "profile": {
        "name": "PDF/A-2B validation profile",
        "description": "Validation rules against ISO 19005-2:2011, Level B"
    },
    "errors": [
        {
            "error": null,
            "rule": "The Catalog dictionary of a conforming file shall contain the Metadata key whose value is a metadata stream as defined in ISO 32000-1:2008, 14.3.2.",
            "location": "root/document[0]"
        },
        {
            "error": null,
            "rule": "DeviceRGB shall only be used if a device independent DefaultRGB colour space has been set when the DeviceRGB colour space is used.",
            "location": "root/document[0]/pages[0](1 0 obj PDPage)/contentStream[0](2 0 obj PDContentStream)/operators[6]/colorSpace[0]"
        }
    ]
}