CERTH Video Analysis API Specification
Table of Contents
API description for the analysis of videos
Example: Video analysis request
API description for the analysis of images
Example: image collection – zip file request
Example: image collection – individual URLs request
Checking the Status of an Analysis
Example: Checking the status of a video analysis
Example: Retrieving the results of a video analysis
Example: Retrieving the results of an image collection analysis
API description for the analysis of text
Overview
In case of issues or questions, please do not hesitate to contact us via:
| CERTH | Ioannis Kontostathis Damianos Galanopoulos Vasileios Mezaris | <ioankont@iti.gr> <dgalanop@iti.gr> <bmezaris@iti.gr> | | :—- | :—- | :—- |
The CERTH Video Analysis API is responsible for the extraction of metadata from videos, images and text. It generates structured and meaningful descriptions (also known as semantic metadata) that help systems interpret and analyze multimedia content. The API communicates with the weblyzard’s API through secure HTTPS requests. This connection enables the exchange of data and supports advanced content understanding. To access our API, please contact us to receive the required user key parameter, which is essential for authentication.
Video Analysis Process
The analysis of videos is conducted at three temporal levels: scene, shot, and the whole video. Each level focuses on a different granularity of content segmentation and understanding.
Temporal Segmentation
- Shots: Continuous segments recorded by a single camera without interruption. They are the smallest temporal units in video segmentation.
- Scenes: Larger segments that consist of one or more shots. They are both semantically and temporally coherent, often aligning with meaningful parts of the video’s narrative.
Analysis by Level
Video-Level Analysis
- Visual event detection – Identifies major activities throughout the entire video.
- Sentiment analysis – Analyzes the emotional content expressed in the video.
- Video summarization – Generates a condensed version of the video by selecting the most important segments.
- Scene segmentation – Automatically divides the video into semantically meaningful scenes.
Scene-Level Analysis
- Shot segmentation – Detects and separates individual shots within a scene.
- Keyframe extraction – Selects representative frames that summarize the main content of each scene.
Shot-Level Analysis
- Keyframe extraction – Identifies important frames that best represent the content of a shot.
- Visual concept detection – Recognizes visual elements such as objects, scenes, or activities.
- Sentiment analysis – Evaluates emotional content at a finer (shot) level.
- Cross-modal signature extraction – Extracts features that link visual content with other modalities such as text, for multimodal analysis and retrieval.
360-Degree Video Analysis {#360-degree-video-analysis}
For panoramic 360-degree videos, the API:
- Detects the most important regions across the entire view.
- Converts the spherical video into a traditional 2D format, including the salient events.
- Applies standard video analysis techniques, as described above.
Image Analysis Process
The CERTH Video Analysis API also supports the analysis of individual images. The following methods (also described above) are available:
- Visual concept detection
- Sentiment analysis
- Cross-modal signature extraction
Text Analysis Process
The CERTH Video Analysis API supports analysis of short text inputs, such as textual queries or captions. Generates feature representations of the text that can be compared or aligned with visual data, enabling multimodal content understanding and retrieval.
API description for the analysis of videos
To request the analysis of a video, submit a POST call to the following endpoint: https://transmixr-idt.iti.gr/video-annotation.
The URL of the video and the user key must be included as parameters in the body of the request in JSON format. You can use tools like Postman to easily create and send the request. The service supports videos from various online platforms and social media such as YouTube, Facebook, Twitter, Vimeo etc. as well as custom URLs. The reply to the request is a JSON document that includes a short message and, if the request was valid, an item id that uniquely identifies the video and is later used to retrieve the status and the results.
Path | https://transmixr-idt.iti.gr/video-annotation | |
---|---|---|
Method | POST | |
Parameters: | <video_url> the URL of the video to be processed <user_key> a unique 32-digits access key that allows access to the service | |
Returns: | JSON file | |
Status codes | 200, 403, 404, 409 | |
Attributes: | ||
Message | A short response message. It can be one of the following: | |
Status Code | Message | |
200 | The REST call has been received. Please check the status of the analysis via the appropriate REST call Explanation: The request was valid and video processing has started | |
403 | Not valid user key Explanation: The user_key used in the parameters is not valid | |
403 | Limit exceeded. Try again later Explanation: The maximum total video duration sent for processing the last 10 hours was exceeded | |
404 | The video URL is broken Explanation: The content of the video_url parameter is not valid | |
404 | XX : Error. No such mode Explanation: The service allows specific analysis mode, such as “video-annotation”. This error message informs that the mode requested does not exist. | |
404 | Bad request Explanation: The request was not correctly formed | |
409 | Video is currently being processed Explanation: The request sent is currently being processed | |
409 | Video already in queue Explanation: The video has already been sent and is waiting in the queue | |
<item_id> | A unique identifier for the video sent. It is used later to retrieve the status and the results |
Example: Video analysis request
In this example, we submit a request to analyze the following YouTube video:
https://www.youtube.com/watch?v=RzGS74FsGG0
Upon successful submission, the API responds with a confirmation message indicating that the request has been received. The response also includes an <item_id>, which serves as a unique identifier for the analysis request.
You will use this <item_id> in subsequent requests to:
- Check the status of the analysis
- Retrieve the final results once processing is complete
Request | POST call to: https://transmixr-idt.iti.gr/video-annotation Body: {“video_url”:”https://www.youtube.com/watch?v=RzGS74FsGG0”, “user_key”: “Z0g*****************************”} | |
---|---|---|
Reply | Status code: 200 { “message”: “The REST call has been received. Please check the status of the analysis via the appropriate REST call”, “video_id”: “0f025ca87779280a9c9806018d2fd4fd”, “item_id”: “0f025ca87779280a9c9806018d2fd4fd” } |
API description for the analysis of images
To submit an image or a collection of images for analysis, send a POST request to the following endpoint:
https://transmixr-idt.iti.gr/image-annotation.
The image / image collection can be either a zip file containing one or more images or individual image URLs that can be downloaded. In the case of the zip file, the URL of the zip file should be provided in the “zip_url” parameter. In the case of individual image URLs, a list of the URLs should be provided in the “image_urls” parameter. In both cases the image collection can consist of between 1 and 1000 images.
Path | https://transmixr-idt.iti.gr/image-annotation | |
---|---|---|
Method | POST | |
Parameters: | <zip_url> the URL of the image collection (zip file) to be processed (only one of the zip_url and image_urls parameters should be used) <images_url>_ a list of URLs of the images to be processed (only one of the zip_url and image_urls parameters should be used) <user_key) a unique 32-digits access key that allows access to the service | |
Returns: | JSON file | |
Status codes | 200, 403, 404, 409 | |
Attributes: | ||
Message | A short response message. It can be one of the following: | |
Status Code | Message | |
200 | The REST call has been received. Please check the status of the analysis via the appropriate REST call Explanation: The request was valid and image processing has started | |
403 | Not valid user key Explanation: The user_key used in the parameters is not valid | |
404 | XX : Error. No such mode Explanation: The service allows specific analysis mode, such as “video-annotation”. This error message informs that the mode requested does not exist. | |
404 | Bad request Explanation: The request was not correctly formed | |
409 | Image collection is currently being processed Explanation: The request sent is currently being processed | |
409 | Image collection already in queue Explanation: The image collection has already been sent and is waiting in the queue | |
<item_id> | A unique identifier for the image collection sent. It is used later to retrieve the status and the results |
Table X: Overview of the image annotation API endpoint, including request method, parameters, and response status codes with explanations.
Example: image collection – zip file request {#example:-image-collection-–-zip-file-request}
In this example we request the analysis of an image collection in a zip file.
Request | POST https://transmixr-idt.iti.gr/image-annotation Body: {“zip_url”: ”https://example.com/collection.zip”, “user_key”: “Z0g*****************************”} | |
---|---|---|
Reply | Status code: 200 { “message”: “The REST call has been received. Please check the status of the analysis via the appropriate REST call”, “item_id”: “716797cdedd2d4f577f3503d09a8” } |
Table X: Example of a POST request to submit an image collection (ZIP file) for analysis, and the corresponding successful response.
Example: image collection – individual URLs request {#example:-image-collection-–-individual-urls-request}
In this example we request the analysis of an image collection in individual URLs. The collection consists of 3 images.
Request | POST https://transmixr-idt.iti.gr/image-annotation Body: {“image_urls”: [”https://example.com/image100.jpg”, “https://example.com/image101.jpg”, ”https://example.com/image102.jpg”], “user_key”: “Z0g*****************************”} | |
---|---|---|
Reply | Status code: 200 { “message”: “The REST call has been received. Please check the status of the analysis via the appropriate REST call”, “item_id”: “716797cdedd2d4f577f3503d09a8” } |
Table X: Example of a POST request to submit individual image URLs for analysis, and the corresponding successful response.
Checking the Status of an Analysis
The second step in the workflow is to check the status of the submitted video analysis. This should be done periodically, as the analysis may take some time to complete. Some of the status messages are temporary and some are final. You should continue polling the status endpoint until a final status is returned. The table below lists all possible status messages and clearly distinguishes between temporary and final statuses. To issue a request for the status we submit a GET call to https://transmixr-idt.iti.gr/status/<item_id>, where <item_id > is the item_id previously retrieved.
Path | https://transmixr-idt.iti.gr/status/<item_id> | |
---|---|---|
Method | GET | |
Returns: | JSON file | |
Status codes | 200, 404 | |
Attributes: | ||
Message | The status of the analysis. It can be one of the following: | |
Status Code | Message | |
200 | VIDEO_WAITING_IN_QUEUE Explanation: The video is waiting to be processed in the queue Type: Temporary | |
200 | ITEM_WAITING_IN_QUEUE Explanation: The image collection is waiting to be processed in the queue Type: Temporary | |
200 | VIDEO_DOWNLOAD_STARTED Explanation: Downloading of the video has been initiated Type: Temporary | |
200 | IMAGE_COLLECTION_DOWNLOAD_STARTED Explanation: Downloading of the image collection has been initiated Type: Temporary | |
200 | VIDEO_DOWNLOAD_FAILED Explanation: Video downloading has failed Type: Final | |
200 | VIDEO_DOWNLOAD_TIMEOUT Explanation: The video was taking too long to download so downloading was cancelled Type: Final | |
200 | MAX_VIDEO_DURATION_EXCEEDED Explanation: The is an 1 hour limit to the duration of the video that can be submitted Type: Final | |
200 | VIDEO_ANALYSIS_STARTED Explanation: The analysis of the video has started Type: Temporary | |
200 | IMAGE_COLLECTION_ANALYSIS_STARTED Explanation: The image collection analysis has started Type: Temporary | |
200 | VIDEO_ ANALYSIS_COMPLETED Explanation: The analysis of the video has completed successfully Type: Final | |
200 | IMAGE_COLLECTION_ANALYSIS_COMPLETED Explanation: The image collection analysis has completed successfully Type: Final | |
200 | VIDEO_ANALYSIS_FAILED Explanation: The analysis of the video has failed Type: Final | |
200 | IMAGE_COLLECTION_ANALYSIS_FAILED Explanation: The image collection analysis has failed Type: Final | |
400 | Wrong file name or status file does no longer exist Explanation: No status of the requested item_id exists | |
<item_id> | A unique identifier for the video sent. It is used later to retrieve the status and the results |
Table X: Detailed summary of the analysis status check API, showing the request method, required parameters, and possible response status codes along with their explanations.
Example: Checking the status of a video analysis
In this example we request the status of the previously submitted video. The reply informs us that the analysis has started.
Request | GET https://transmixr-idt.iti.gr:/status/716797cdedd2d4f577f3503d09a8 | |
---|---|---|
Reply | Status code: 200 { “status”: “VIDEO_ANALYSIS_STARTED” } |
Table X: Example of a GET request to check the status of a video analysis, along with the corresponding successful response.
Retrieving the results
The final step of the workflow is the retrieval of the results. Once the results have been created, they can be retrieved for 48 hours. After this point they are no longer available on our server. To get the result you should issue a GET call to https://transmixr-idt.iti.gr/result/<item_id>, where <item_id> is the identifier of your video/image collection.
Please note that the returned JSON includes only the URLs for the produced video summarization, video thumbnails, and the shot keyframes and must be downloaded independently if needed.
Path | https://transmixr-idt.iti.gr/result/<item_id> | |
---|---|---|
Method | GET | |
Returns | JSON FILE | |
Status code | 200, 404 |
Table X: Overview of the analysis results retrieval endpoint, including request path, method, return type, and possible status codes
Reply | Status code: 200 {“expires_at”: <local_expiration_date>, “framerate”: <video_frame_rate>, “generated_at”: <local_creation_date>, “generated_by”: “http://transmixr-idt.iti.gr”, “scenes”: [ { “scene_id”: <scene_1_id>, “begintime”: <begintime>, “endtime”: <endtime>, “keyframes”: [ { “time”: <keyframe_1_time>, “url”: <keyframe_1_url> }, |
---|---|
{ “time”: <keyframe_2_time>, “url”: <keyframe_2_url> }, … { “time”: <keyframe_5_time>, “url”: <keyframe_5_url> }], “shots”: [ { “shot_id”: <shot_1_id>, “begintime”: <shot_1_begintime>, “endtime”: <shot_1_endtime>, “concepts”: { <concept_1_id>: <concept_1_score>, <concept_2_id>: <concept_2_score>, … <concept_30_id>: <concept_30_score>, }, “keyframes”: [ { “time”: <keyframe_1_time> “url”: <keyframe_1_url> }, { “time”: <keyframe_2_time> “url”: <keyframe_2_url> }, { “time”: <keyframe_3_time> “url”: <keyframe_3_url> }], “sentiment”: [ <sentiment_id>, <sentiment_score> ], “signature”: [ <signature_1_score>, <signature_2_score>, … <signature_2048_score>] }, ] }, ], “events”: { <event_1_id>: <event_1_score>, <event_2_id>: <event_2_score>, … <event_10_id>: <event_10_score>, }, “sentiment”: [ <sentiment_id>, <sentiment_score> ], “summary”: <video_summary_url> “thumbnails”: [ <thumb_url_1>, <thumb_url_2>, <thumb_url_3>, <thumb_url_4>, <thumb_url_5> ], “version”: “v1.1” } |
Table X: Structure of the JSON response returned by the analysis results endpoint.
Example: Retrieving the results of a video analysis
In this example, the results of a video analysis are retrieved. The returned JSON document contains multiple layers of information:
- Top-level fields include general metadata such as the video’s frame rate (framerate), generation timestamp (generated_at), and analysis summary URL (summary).
- The scenes array contains the list of detected scenes. For each scene, the following attributes are provided:
- scene_id: A unique identifier for the scene.
- begintime and endtime: The start and end time of the scene, in seconds.
- keyframes: A list of keyframes with their timestamps (time) and URLs (url).
- shots: A list of shots contained within the scene.
- For each shot, the JSON includes:
- shot_id: A unique identifier for the shot.
- begintime and endtime: Start and end time of the shot.
- keyframes: A list of up to three keyframes, each with a time and url.
- concepts: The top 30 visual concepts detected in the shot, each associated with a confidence score (the higher the score, the more relevant the concept).
- sentiment: A pair indicating sentiment — a label (positive or negative) and a numerical score between 0 and 1 (higher values indicate more positive sentiment).
- signature: A list of floating-point values representing a visual descriptor of the shot, based on the middle keyframe.
- At the video level:
- sentiment: Overall sentiment for the video, in the same binary + numerical form.
- events: A set of predicted visual events, each associated with a confidence score.
- thumbnails: A list of URLs pointing to representative thumbnails from the video.
Request | GET https://transmixr-idt.iti.gr/result/716797cdedd2d4f577f3503d09a8 |
---|---|
Reply | Status code: 200 { “expires_at”: “2020-07-15 10:51:13.304437”, “framerate”: 25.000, “generated_at”: “2020-07-01 10:51:13.304426”, “generated_by”: “https://transmixr-idt.iti.gr”, “scenes”: [ { “scene_id”: “Sc1”, “begintime”: 0.040, “endtime”: 53.800, “keyframes”: [ { “time”: 9.000, “url”: “https://transmixr-idt.iti.gr/keyframe/716797cdedd2d4f577f3503/shot4_1” }, … ], “shots”: [ { “shot_id”: “Sh1”, “begintime”: 0.040, “endtime”: 1.400, “concepts”: { “Walking”: 0.004, “Smartphone”: 0.005, … }, “keyframes”: [ { “time”: 0.360, “url”: “https://transmixr-idt.iti.gr/keyframe/716797cdedd2d4f577f3503/shot1_1” }, … ], “sentiment”: { “positive”, “0.866345” }, “signature”: [ 0.05675, 0.00458, … ] }, … ], }, … ], “sentiment”: { “negative”, “0.1745” } “events”: { “Cleaning windows”: -8.943937301635742, “Discus throw”: -6.931890487670898, … }, “summary”: “https://transmixr-idt.iti.gr:443/summary/716797cdedd2d4f577f3503”, “thumbnails”: [ “ https://transmixr-idt.iti.gr: 443/thumbnail/716797cdedd2d4f577f3503/1”, “ https://transmixr-idt.iti.gr: 443/thumbnail/716797cdedd2d4f577f3503/2”, … “ https://transmixr-idt.iti.gr: 443/thumbnail/716797cdedd2d4f577f3503/5”, ], }, “version”: “v1.1” } |
Status code: 200 { “expires_at”: “2020-07-15 10:51:13.304437”, “framerate”: 25.000, “generated_at”: “2020-07-01 10:51:13.304426”, “generated_by”: “https://transmixr-idt.iti.gr”, “scenes”: [ { “scene_id”: “Sc1”, “begintime”: 0.040, “endtime”: 53.800, “keyframes”: [ { “time”: 9.000, “url”: “https://transmixr-idt.iti.gr/keyframe/716797cdedd2d4f577f3503/shot4_1” }, … ], “shots”: [ { “shot_id”: “Sh1”, “begintime”: 0.040, “endtime”: 1.400, “concepts”: { “Walking”: 0.004, “Smartphone”: 0.005, … }, “keyframes”: [ { “time”: 0.360, “url”: “https://transmixr-idt.iti.gr/keyframe/716797cdedd2d4f577f3503/shot1_1” }, … ], “sentiment”: { “positive”, “0.866345” }, “signature”: [ 0.05675, 0.00458, … ] }, … ], }, … ], “sentiment”: { “negative”, “0.1745” } “events”: { “Cleaning windows”: -8.943937301635742, “Discus throw”: -6.931890487670898, … }, “summary”: “https://transmixr-idt.iti.gr:443/summary/716797cdedd2d4f577f3503”, “thumbnails”: [ “ https://transmixr-idt.iti.gr: 443/thumbnail/716797cdedd2d4f577f3503/1”, “ https://transmixr-idt.iti.gr: 443/thumbnail/716797cdedd2d4f577f3503/2”, … “ https://transmixr-idt.iti.gr: 443/thumbnail/716797cdedd2d4f577f3503/5”, ], }, “version”: “v1.1” } |
Table X: Example of a JSON response returned by a GET request for video analysis results.
Example: Retrieving the results of an image collection analysis
This example demonstrates how to retrieve the results of an image collection analysis. The returned JSON document includes a list of analyzed images, each identified by its file name (e.g., “img1003.jpg”). For each image, the following information is provided:
- concepts – A list of the top 30 visual concepts detected.
- sentiment – The sentiment associated with the image.
- signature – The cross-modal signature.
- dimensions – The width and height of the image.
- analysis – A boolean indicating whether the image was successfully analyzed.
If an image could not be analyzed—for example, due to a download failure—the analysis field will be set to false, and the concepts and sentiment fields will not be included.
The format and content of the results are the same as those returned for video analysis.
Request | GET https://transmixr-idt.iti.gr/result/716797cdedd2d4f577f3503d09a8 |
---|---|
Reply | Status code: 200 { “img1003.jpg”: { “analysis”: “True”, “concepts”: { “yt8m_top30”: { “Animal”: 0.010635197162628174, “Boat”: 0.9999825954437256, … } }, “sentiment”: { “negative”, “0,078937” }, “dimensions”: “150x200”, “signature”: [ 0.10345, 0.02353, … ] }, … } |
Table X: Example of a JSON response returned by a GET request for image collection analysis results.
API description for the analysis of text
Text processing consists of signature extraction only. In contrast to the image/video analysis, which is asynchronous, the text signature extraction is synchronous. This means that the processing result (the signature vector) is instantly returned to the processing request.
Path | https://transmixr-idt.iti.gr/text-sign-extr | ||
---|---|---|---|
Method | POST | ||
Parameters: user_key Text | a unique 32-digits access key that allows access to the service The text that will be processed | ||
Returns | JSON FILE | ||
status code | data | ||
200 | {“signature”: <signature vector>} Explanation: The returned signature vector, which is a list of 2048 floating point numbers | ||
403 | {“message”: “Not valid user_key.”} Explanation: The user_key used in the parameters is not valid | ||
404 | {“message”: “Bad request”} Explanation: The request was not correctly formed |
Table X: Overview of the signature extraction API endpoint, including request details, parameters, and possible response status codes with explanations.
Request | POST https://transmixr-idt.iti.gr/text-sign-extr Body: {“text”: “A group of people crossing a forest”, “user_key”: “Z0g*****************************”} |
---|---|
Reply | Status code: 200 { “signature”: [0.00345, 0.04586, 0.01345, …] } |
Table X: Example of a POST request for text-based signature extraction and the corresponding successful response.