Data Models for Dataset Drift Controls in Machine Learning With Optical Images
Luis Oala, Marco Aversa, Gabriel Nobis, Kurt Willis, Yoan Neuenschwander, Michèle Buck, Christian Matek, Jerome Extermann, Enrico Pomarico, Wojciech Samek, and others
Transactions on Machine Learning Research (TMLR), 2023
DiffInfinite: Large Mask-Image Synthesis via Parallel Random Patch Diffusion in Histopathology
Marco Aversa, Gabriel Nobis, Miriam Hägele, Kai Standvoss, Mihaela Chirica, Roderick Murray-Smith, Ahmed Alaa, Lukas Ruff, Daniela Ivanova, Wojciech Samek, and others
arXiv preprint arXiv:2306.13384, 2023
Localized Data Work as a Precondition for Data-Centric ML: A Case Study of Full Lifecycle Crop Disease Identification in Ghana
Darlington Akogo, Issah Samori, Cyril Akafia, Harriet Fiagbor, Andrews Kangah, Donald Kwame Asiedu, Kwabena Fuachie, and Luis Oala
arXiv preprint arXiv:2307.01767, 2023
2022
2022
Dataset Similarity to Assess Semi-supervised Learning Under Distribution Mismatch Between the Labelled and Unlabelled Datasets
Saul Calderon Ramirez, Luis Oala, Jordina Torrentes-Barrena, Shengxiang Yang, David Elizondo, Armaghan Moemeni, Simon Colreavy-Donnelly, Wojciech Samek, Miguel Molina-Cabello, and Ezequiel Lopez-Rubio
IEEE Transactions on Artificial Intelligence, 2022
Deutsche Normungsroadmap Künstliche Intelligenz
Rasmus Adler, Andreas Bunte, Simon Burton, Jürgen Großmann, Alexander Jaschke, Philip Kleen, Jeanette Miriam Lorenz, Jackie Ma, Karla Markert, Henri Meeß, and others
2022
Machine Learning for Health (ML4H) 2022
Antonio Parziale, Monica Agrawal, Shengpu Tang, Kristen Severson, Luis Oala, Adarsh Subbaswamy, Sayantan Kumar, Elora Schoerverth, Stefan Hegselmann, Helen Zhou, and others
In Machine Learning for Health, 2022
Machine Learning for Health symposium 2022–Extended Abstract track
Antonio Parziale, Monica Agrawal, Shalmali Joshi, Irene Y Chen, Shengpu Tang, Luis Oala, and Adarsh Subbaswamy
arXiv preprint arXiv:2211.15564, 2022
Piloting A Survey-Based Assessment of Transparency and Trustworthiness with Three Medical AI Tools
Jana Fehr, Giovanna Jaramillo-Gutierrez, Luis Oala, Matthias I Gröschel, Manuel Bierwirth, Pradeep Balachandran, Alixandro Werneck-Leite, and Christoph Lippert
In Healthcare, 2022
Proceedings of the 2nd Machine Learning for Health symposium
Antonio Parziale, Agrawal Monica, Joshi Shalmali, Irene Y Chen, Tang Shengpu, Oala Luis, Subbaswamy Adarsh, and others
PROCEEDINGS OF MACHINE LEARNING RESEARCH, 2022
2021
2021
ICLR
Post-Hoc Domain Adaptation via Guided Data Homogenization
Kurt Willis, and Luis Oala
In ICLR 2021 Workshop on Robust and Reliable Machine Learning in the Real World Workshop (RobustML), 2021
Addressing shifts in data distributions is an important prerequisite for the deployment of deep learning models to real-world settings. A general approach to this problem involves the adjustment of models to a new domain through transfer learning. However, in many cases, this is not applicable in a post-hoc manner to deployed models and further parameter adjustments jeopardize safety certifications that were established beforehand. In such a context, we propose to deal with changes in the data distribution via guided data homogenization which shifts the burden of adaptation from the model to the data. This approach makes use of information about the training data contained implicitly in the deep learning model to learn a domain transfer function. This allows for a targeted deployment of models to unknown scenarios without changing the model itself. We demonstrate the potential of data homogenization through experiments on the CIFAR-10 and MNIST data sets.
ICLR
More Than Meets The Eye: Semi-supervised Learning Under Non-IID Data
Saul Calderon-Ramirez, and Luis Oala
In ICLR 2021 Workshop on Robust and Reliable Machine Learning in the Real World Workshop (RobustML), 2021
In absence of explicit knowledge about data models it is often assumed that a semantic match between labelled and unlabelled data is desirable in the context of semi-supervised deep learning (SSDL). In this work, we demonstrate the limits of semantic data set matching. We present and make available a comprehensive simulation sandbox, called non-IID-SSDL, for stress testing SSDL algorithms under various non-IID configurations. We show that semantic data set matching can degrade the performance for state of the art SSDL. In addition, we demonstrate that simple dissimilarity measures in the feature space of a generic classifier offer a promising and more reliable matching criterion.
BVM
Interval Neural Networks as Instability Detectors for Image Reconstructions
Jan Macdonald, Maximilian März, Luis Oala, and Wojciech Samek
This work investigates the detection of instabilities that may occur when utilizing deep learning models for image reconstruction tasks. Although neural networks often empirically outperform traditional reconstruction methods, their usage for sensitive medical applications remains controversial. Indeed, in a recent series of works, it has been demonstrated that deep learning approaches are susceptible to various types of instabilities, caused for instance by adversarial noise or out-ofdistribution features. It is argued that this phenomenon can be observed regardless of the underlying architecture and that there is no easy remedy. Based on this insight, the present work demonstrates, how uncertainty quantification methods can be employed as instability detectors. In particular, it is shown that the recently proposed Interval Neural Networks are highly effective in revealing instabilities of reconstructions. Such an ability is crucial to ensure a safe use of deep learning-based methods for medical image reconstruction.
IJCARS
Detecting failure modes in image reconstructions with interval neural network uncertainty
Luis Oala, Cosmas Heiß, Jan Macdonald, Maximilian März, Gitta Kutyniok, and Wojciech Samek
International Journal of Computer Assisted Radiology and Surgery, 2021
Purpose The quantitative detection of failure modes is important for making deep neural networks reliable and usable at scale. We consider three examples for common failure modes in image reconstruction and demonstrate the potential of uncertainty quantification as a fine-grained alarm system. Methods We propose a deterministic, modular and lightweight approach called Interval Neural Network (INN) that produces fast and easy to interpret uncertainty scores for deep neural networks. Importantly, INNs can be constructed post hoc for already trained prediction networks. We compare it against state-of-the-art baseline methods (MCDROP, PROBOUT). Results We demonstrate on controlled, synthetic inverse problems the capacity of INNs to capture uncertainty due to noise as well as directional error information. On a real-world inverse problem with human CT scans, we can show that INNs produce uncertainty scores which improve the detection of all considered failure modes compared to the baseline methods. Conclusion Interval Neural Networks offer a promising tool to expose weaknesses of deep image reconstruction models and ultimately make them more reliable. The fact that they can be applied post hoc to equip already trained deep neural network models with uncertainty scores makes them particularly interesting for deployment.
Improving uncertainty estimation with semi-supervised deep learning for COVID-19 detection using chest X-ray images
Saul Calderon-Ramirez, Shengxiang Yang, Armaghan Moemeni, Simon Colreavy-Donnelly, David A Elizondo, Luis Oala, Jorge Rodrı́guez-Capitán, Manuel Jiménez-Navarro, Ezequiel López-Rubio, and Miguel A Molina-Cabello
Ieee Access, 2021
Post-hoc domain adaptation via guided data homogenization
Kurt Willis, and Luis Oala
arXiv preprint arXiv:2104.03624, 2021
Detecting failure modes in image reconstructions with interval neural network uncertainty
Luis Oala, Cosmas Heiß, Jan Macdonald, Maximilian März, Gitta Kutyniok, and Wojciech Samek
International Journal of Computer Assisted Radiology and Surgery, 2021
Machine learning for health: algorithm auditing & quality control
Luis Oala, Andrew G Murchison, Pradeep Balachandran, Shruti Choudhary, Jana Fehr, Alixandro Werneck Leite, Peter G Goldschmidt, Christian Johner, Elora DM Schörverth, Rose Nakasi, and others
Journal of medical systems, 2021
Machine learning for health (ml4h) 2021
Subhrajit Roy, Stephen Pfohl, Girmaw Abebe Tadesse, Luis Oala, Fabian Falck, Yuyin Zhou, Liyue Shen, Ghada Zamzmi, Purity Mugambi, Ayah Zirikly, and others
In Machine Learning for Health, 2021
A collection of the accepted abstracts for the Machine Learning for Health (ML4H) symposium 2021
Fabian Falck, Yuyin Zhou, Emma Rocheteau, Liyue Shen, Luis Oala, Girmaw Abebe, Subhrajit Roy, Stephen Pfohl, Emily Alsentzer, and Matthew McDermott
arXiv e-prints, 2021
2020
2020
NeurIPS
ML4H Auditing: From Paper to Practice
Luis Oala, Jana Fehr, Luca Gilli, Pradeep Balachandran, Alixandro Werneck Leite, Saul Calderon-Ramirez, Danny Xie Li, Gabriel Nobis, Erick Alejandro Munoz Alvarado, Giovanna Jaramillo-Gutierrez, Christian Matek, Arun Shroff, Ferath Kherif, Bruno Sanguinetti, and Thomas Wiegand
In Proceedings of the Machine Learning for Health NeurIPS Workshop, 11 dec 2020
Healthcare systems are currently adapting to digital technologies, producing large quantities of novel data. Based on these data, machine-learning algorithms have been developed to support practitioners in labor-intensive workflows such as diagnosis, prognosis, triage or treatment of disease. However, their translation into medical practice is often hampered by a lack of careful evaluation in different settings. Efforts have started worldwide to establish guidelines for evaluating machine learning for health (ML4H) tools, highlighting the necessity to evaluate models for bias, interpretability, robustness, and possible failure modes. However, testing and adopting these guidelines in practice remains an open challenge. In this work, we target the paper-to-practice gap by applying an ML4H audit framework proposed by the ITU/WHO Focus Group on Artificial Intelligence for Health (FG-AI4H) to three use cases: diagnostic prediction of diabetic retinopathy, diagnostic prediction of Alzheimer’s disease, and cytomorphologic classification for leukemia diagnostics. The assessment comprises dimensions such as bias, interpretability, and robustness. Our results highlight the importance of fine-grained and caseadapted quality assessment, provide support for incorporating proposed quality assessment considerations of ML4H during the entire development life cycle, and suggest improvements for future ML4H reference evaluation frameworks.
ICML
Detecting Failure Modes in Image Reconstructions with Interval Neural Network Uncertainty
Luis Oala, Cosmas Heiß, Jan Macdonald, Maximilian März, Wojciech Samek, and Gitta Kutyniok
In ICML 2020 Workshop on Uncertainty & Robustness in Deep Learning, 11 dec 2020
The quantitative detection of failure modes is important for making deep neural networks reliable and usable at scale. We consider three examples for failure modes in image reconstruction problems and demonstrate the potential of uncertainty quantification as a fine-grained alarm system. We propose a deterministic, modular and lightweight approach, called Interval Neural Networks, that produces fast and easy to interpret uncertainty scores which improve the detection of failure modes across four out of five image reconstruction experiments.
Ml4h auditing: From paper to practice
Luis Oala, Jana Fehr, Luca Gilli, Pradeep Balachandran, Alixandro Werneck Leite, Saul Calderon-Ramirez, Danny Xie Li, Gabriel Nobis, Erick Alejandro Muñoz Alvarado, Giovanna Jaramillo-Gutierrez, and others
In Machine learning for health, 11 dec 2020
Preprints
2024
2023
2022
2021
2020
2020
arXiv
Interval Neural Networks: Uncertainty Scores
Luis Oala, Cosmas Heiß, Jan Macdonald, Maximilian März, Wojciech Samek, and Gitta Kutyniok
We propose a fast, non-Bayesian method for producing uncertainty scores in the output of pre-trained deep neural networks (DNNs) using a data-driven interval propagating network. This interval neural network (INN) has interval valued parameters and propagates its input using interval arithmetic. The INN produces sensible lower and upper bounds encompassing the ground truth. We provide theoretical justification for the validity of these bounds. Furthermore, its asymmetric uncertainty scores offer additional, directional information beyond what Gaussian-based, symmetric variance estimation can provide. We find that noise in the data is adequately captured by the intervals produced with our method. In numerical experiments on an image reconstruction task, we demonstrate the practical utility of INNs as a proxy for the prediction error in comparison to two state-of-the-art uncertainty quantification methods. In summary, INNs produce fast, theoretically justified uncertainty scores for DNNs that are easy to interpret, come with added information and pose as improved error proxies - features that may prove useful in advancing the usability of DNNs especially in sensitive applications such as health care.
arXiv
MixMOOD: A systematic approach to class distribution mismatch in semi-supervised learning using deep dataset dissimilarity measures
Saul Calderon-Ramirez, Luis Oala, Jordina Torrents-Barrena, Shengxiang Yang, Armaghan Moemeni, Wojciech Samek, and Miguel A. Molina-Cabello
In this work, we propose MixMOOD - a systematic approach to mitigate effect of class distribution mismatch in semi-supervised deep learning (SSDL) with MixMatch. This work is divided into two components: (i) an extensive out of distribution (OOD) ablation test bed for SSDL and (ii) a quantitative unlabelled dataset selection heuristic referred to as MixMOOD. In the first part, we analyze the sensitivity of MixMatch accuracy under 90 different distribution mismatch scenarios across three multi-class classification tasks. These are designed to systematically understand how OOD unlabelled data affects MixMatch performance. In the second part, we propose an efficient and effective method, called deep dataset dissimilarity measures (DeDiMs), to compare labelled and unlabelled datasets. The proposed DeDiMs are quick to evaluate and model agnostic. They use the feature space of a generic Wide-ResNet and can be applied prior to learning. Our test results reveal that supposed semantic similarity between labelled and unlabelled data is not a good heuristic for unlabelled data selection. In contrast, strong correlation between MixMatch accuracy and the proposed DeDiMs allow us to quantitatively rank different unlabelled datasets ante hoc according to expected MixMatch accuracy. This is what we call MixMOOD. Furthermore, we argue that the MixMOOD approach can aid to standardize the evaluation of different semi-supervised learning techniques under real world scenarios involving out of distribution data.
Standardization
2024
2023
2022
2021
2021
ITU/WHO
Good practices for health applications of machine learning: Considerations for manufacturers and regulators
Christian Johner, Pradeep Balachandran, Luis Oala, Aaron .Y. Lee, Alixandro Werneck Leite, Andrew Murchison, Anle Lin, Christoph Molnar, Juliet Rumball-Smith, Pat Baird, Peter. G. Goldschmidt, Pierre Quartarolo, Shan Xu, Sven Piechottka, and Zack Hornberger
In Proceedings of the ITU/WHO Focus Group on Artificial Intelligence for Health (FG-AI4H) - Meeting K, Itu/who 2021
This document contains the latest draft of the FG-AI4H deliverable DEL02.2 "Good practices for health applications of machine learning: Considerations for manufacturers and regulators". This deliverable defines a set of guidelines intended to serve the AI solution developers/manufacturers on how to do conduct a comprehensive requirements analysis and to streamline the conformity assessment procedures to ensure regulatory compliance for the AI based Medical Devices (AI/ML-MD).
ITU/WHO
FG-AI4H Open Code Initiative - Evaluation and Reporting Package
Elora Schörverth, Steffen Vogler, Pradeep Balachandran, Alixandro Werneck Leite, Danny Xie Li, Kamran Ali, Garcia, Dominik Schneider, Joachim Krois, Marc Lecoultre, Shobha Iyer, Shruti Choudhary, and Luis Oala
In Proceedings of the ITU/WHO Focus Group on Artificial Intelligence for Health (FG-AI4H) - Meeting K, Jan 2021
This document, Data and artificial intelligence assessment methods (DAISAM) reference, is the reference collection of WG-DAISAM for assessment methods of data and artificial intelligence quality evaluation. This document also constitutes subsection 7.3 of the FG-AI4H deliverable 7.
ITU/WHO
Data and artificial intelligence assessment methods (DAISAM) Audit Reporting Template
Boris Verks, and Luis Oala
In Proceedings of the ITU/WHO Focus Group on Artificial Intelligence for Health (FG-AI4H) - Meeting J, 2020
Standardized templates to report results for the assessment processes developed by WG-DAISAM. In this version the, template comprises three elements: Data Specification Sheet, ML Model Specification Sheet and ML Model Summary Findings.