Odin
Odin is an open source diagnosis framework for generic machine learning classification tasks and for computer vision object detection and instance segmentation tasks that lets developers add meta-annotations to their data sets, compute performance metrics split by meta-annotation values, and visualize diagnosis reports.
Odin is agnostic to the training platform and input formats and can be extended with application- and domain-specific meta-annotations and metrics with almost no coding.

Supported Tasks
Odin provides different diagnosis methods for machine learning models which address one of the following tasks:
- Classification
- requires an algorithm to categorize the input data into categories
- Object Detection
- requires an algorithm to determine which objects are present in an image and to localize their position using bounding boxes
- Instance Segmentation
- requires an algorithm to determine which objects are present in an image and to localize their position using a pixel-level segmentation mask
Supported Diagnosis Methods
The following tables summarize the evaluation metrics and diagnosis methods supported in Odin.
Evaluation Metrics
| Metrics | Binary Classification | Single-label Classification | Multi-label Classification | Object Detection | Instance Segmentation | |
|---|---|---|---|---|---|---|
| Base Metrics | Accuracy | yes | yes | yes | n/a | n/a |
| Error Rate | yes | yes | yes | n/a | n/a | |
| Precision | yes | yes | yes | yes | yes | |
| Recall | yes | yes | yes | yes | yes | |
| F1 Score | yes | yes | yes | yes | yes | |
| Average Precision | yes | yes | yes | yes | yes | |
| Precision-Recall AUC | yes | yes | yes | yes | yes | |
| ROC AUC | yes | yes | yes | n/a | n/a | |
| F1 AUC | yes | yes | yes | yes | yes | |
| Custom Metric | yes | yes | yes | yes | yes | |
| Curves | Precision-Recall | yes | yes | yes | yes | yes |
| F1 | yes | yes | yes | yes | yes | |
| ROC | yes | yes | yes | n/a | n/a | |
Dataset Exploration
| Analysis | Binary Classification | Single-label Classification | Multi-label Classification | Object Detection | Instance Segmentation | |
|---|---|---|---|---|---|---|
| Distribution of Classes | Total | yes | yes | yes | yes | yes |
| Per-property | yes | yes | yes | yes | yes | |
| Distribution of Properties | Total | yes | yes | yes | yes | yes |
| Per-category | yes | yes | yes | yes | yes | |
| Co-occurrence Matrix | Total | n/a | n/a | yes | yes | yes |
Model Analyses
| Analysis | Models Comparison | Binary Classification | Single-label Classification | Multi-label Classification | Object Detection | Instance Segmentation | |
|---|---|---|---|---|---|---|---|
| Performance Analysis | Per-property | yes | yes | yes | yes | yes | yes |
| Sensitivity and Impact Analysis | Per-property | yes | yes | yes | yes | yes | yes |
| Distribution of TP | Total | yes | no | yes | yes | yes | yes |
| Per-category | yes | yes | yes | yes | yes | yes | |
| Distribution of FP | Total | yes | no | yes | yes | yes | yes |
| Per-category | yes | yes | yes | yes | n/a | n/a | |
| Distribution of FN | Total | yes | no | yes | yes | yes | yes |
| Per-category | yes | yes | yes | yes | yes | yes | |
| Distribution of TN | Total | yes | no | yes | yes | n/a | n/a |
| Per-category | yes | yes | yes | yes | n/a | n/a | |
| Confusion Matrix | Total | n/a | yes | yes | n/a | n/a | n/a |
| Per-category | n/a | no | yes | yes | n/a | n/a | |
| Per-property | n/a | yes | yes | yes | n/a | n/a | |
| FP Categorization and Impact* | Per-category | yes | no | yes | yes | yes | yes |
| FP Trend | Per-category | no | no | yes | yes | yes | yes |
| FN Categorization | Per-category | yes | no | yes | yes | yes | yes |
| Curve Analysis | Total | yes | yes | yes | yes | yes | yes |
| Per-category | yes | no | yes | yes | yes | yes | |
| Reliability Analysis | Total | n/a | yes | yes | yes | yes | yes |
| Per-category | n/a | no | yes | yes | yes | yes | |
| Top-1 Top-5 Analysis | Total | no | n/a | yes | n/a | n/a | n/a |
| Per-property | no | n/a | yes | n/a | n/a | n/a | |
| IoU Analysis | Per-category | no | n/a | n/a | n/a | yes | yes |
| Performance Summary | Total | yes | yes | yes | yes | yes | yes |
| Per-category | yes | no | yes | yes | yes | yes | |
| Per-property | yes | yes | yes | yes | yes | yes | |
*From the previous version, we have modified the counting of the background errors for localization problems. For more information, click here
Supported CAMs Methods
Evaluation Metrics
| Metrics | Binary Classification | Single-label Classification | Multi-label Classification | Object Detection | Instance Segmentation | |
|---|---|---|---|---|---|---|
| CAMs Metrics | Global IoU | yes | yes | yes | n/a | n/a |
| Component IoU | yes | yes | yes | n/a | n/a | |
| Irrelevant Attention | yes | yes | yes | n/a | n/a | |
| Bbox Coverage | yes | yes | yes | n/a | n/a | |
CAMs Analyses
| Analysis | Models Comparison | Binary Classification | Single-label Classification | Multi-label Classification | Object Detection | Instance Segmentation | |
|---|---|---|---|---|---|---|---|
| CAMs Analysis | Total | yes | yes | yes | yes | n/a | n/a |
| Per-category | no | yes | yes | yes | n/a | n/a | |
Contributors
Piero Fraternali - piero.fraternali@polimi.it
Rocio Nahime Torres - rocionahime.torres@polimi.it
Federico Milani - federico.milani@polimi.it
Niccolò Zangrando - niccolo.zangrando@polimi.it