Publications | Luis Oala

Conferences and journals

2024

2023

Data Models for Dataset Drift Controls in Machine Learning With Optical Images

Luis Oala, Marco Aversa, Gabriel Nobis, Kurt Willis, Yoan Neuenschwander, Michèle Buck, Christian Matek, Jerome Extermann, Enrico Pomarico, Wojciech Samek, and others

Transactions on Machine Learning Research (TMLR), 2023
DiffInfinite: Large Mask-Image Synthesis via Parallel Random Patch Diffusion in Histopathology

Marco Aversa, Gabriel Nobis, Miriam Hägele, Kai Standvoss, Mihaela Chirica, Roderick Murray-Smith, Ahmed Alaa, Lukas Ruff, Daniela Ivanova, Wojciech Samek, and others

arXiv preprint arXiv:2306.13384, 2023
Localized Data Work as a Precondition for Data-Centric ML: A Case Study of Full Lifecycle Crop Disease Identification in Ghana

Darlington Akogo, Issah Samori, Cyril Akafia, Harriet Fiagbor, Andrews Kangah, Donald Kwame Asiedu, Kwabena Fuachie, and Luis Oala

arXiv preprint arXiv:2307.01767, 2023

2022

Dataset Similarity to Assess Semi-supervised Learning Under Distribution Mismatch Between the Labelled and Unlabelled Datasets

Saul Calderon Ramirez, Luis Oala, Jordina Torrentes-Barrena, Shengxiang Yang, David Elizondo, Armaghan Moemeni, Simon Colreavy-Donnelly, Wojciech Samek, Miguel Molina-Cabello, and Ezequiel Lopez-Rubio

IEEE Transactions on Artificial Intelligence, 2022
Deutsche Normungsroadmap Künstliche Intelligenz

Rasmus Adler, Andreas Bunte, Simon Burton, Jürgen Großmann, Alexander Jaschke, Philip Kleen, Jeanette Miriam Lorenz, Jackie Ma, Karla Markert, Henri Meeß, and others

2022
Machine Learning for Health (ML4H) 2022

Antonio Parziale, Monica Agrawal, Shengpu Tang, Kristen Severson, Luis Oala, Adarsh Subbaswamy, Sayantan Kumar, Elora Schoerverth, Stefan Hegselmann, Helen Zhou, and others

In Machine Learning for Health, 2022
Machine Learning for Health symposium 2022–Extended Abstract track

Antonio Parziale, Monica Agrawal, Shalmali Joshi, Irene Y Chen, Shengpu Tang, Luis Oala, and Adarsh Subbaswamy

arXiv preprint arXiv:2211.15564, 2022
Piloting A Survey-Based Assessment of Transparency and Trustworthiness with Three Medical AI Tools

Jana Fehr, Giovanna Jaramillo-Gutierrez, Luis Oala, Matthias I Gröschel, Manuel Bierwirth, Pradeep Balachandran, Alixandro Werneck-Leite, and Christoph Lippert

In Healthcare, 2022
Proceedings of the 2nd Machine Learning for Health symposium

Antonio Parziale, Agrawal Monica, Joshi Shalmali, Irene Y Chen, Tang Shengpu, Oala Luis, Subbaswamy Adarsh, and others

PROCEEDINGS OF MACHINE LEARNING RESEARCH, 2022

2021

ICLR

Post-Hoc Domain Adaptation via Guided Data Homogenization

Kurt Willis, and Luis Oala

In ICLR 2021 Workshop on Robust and Reliable Machine Learning in the Real World Workshop (RobustML), 2021

Abs HTML

Addressing shifts in data distributions is an important prerequisite for the deployment of deep learning models to real-world settings. A general approach to this problem involves the adjustment of models to a new domain through transfer learning. However, in many cases, this is not applicable in a post-hoc manner to deployed models and further parameter adjustments jeopardize safety certifications that were established beforehand. In such a context, we propose to deal with changes in the data distribution via guided data homogenization which shifts the burden of adaptation from the model to the data. This approach makes use of information about the training data contained implicitly in the deep learning model to learn a domain transfer function. This allows for a targeted deployment of models to unknown scenarios without changing the model itself. We demonstrate the potential of data homogenization through experiments on the CIFAR-10 and MNIST data sets.
ICLR

More Than Meets The Eye: Semi-supervised Learning Under Non-IID Data

Saul Calderon-Ramirez, and Luis Oala

In ICLR 2021 Workshop on Robust and Reliable Machine Learning in the Real World Workshop (RobustML), 2021

Abs HTML

In absence of explicit knowledge about data models it is often assumed that a semantic match between labelled and unlabelled data is desirable in the context of semi-supervised deep learning (SSDL). In this work, we demonstrate the limits of semantic data set matching. We present and make available a comprehensive simulation sandbox, called non-IID-SSDL, for stress testing SSDL algorithms under various non-IID configurations. We show that semantic data set matching can degrade the performance for state of the art SSDL. In addition, we demonstrate that simple dissimilarity measures in the feature space of a generic classifier offer a promising and more reliable matching criterion.
BVM

Interval Neural Networks as Instability Detectors for Image Reconstructions

Jan Macdonald, Maximilian März, Luis Oala, and Wojciech Samek

In Bildverarbeitung für die Medizin 2021, 2021

Abs HTML PDF

This work investigates the detection of instabilities that may occur when utilizing deep learning models for image reconstruction tasks. Although neural networks often empirically outperform traditional reconstruction methods, their usage for sensitive medical applications remains controversial. Indeed, in a recent series of works, it has been demonstrated that deep learning approaches are susceptible to various types of instabilities, caused for instance by adversarial noise or out-ofdistribution features. It is argued that this phenomenon can be observed regardless of the underlying architecture and that there is no easy remedy. Based on this insight, the present work demonstrates, how uncertainty quantification methods can be employed as instability detectors. In particular, it is shown that the recently proposed Interval Neural Networks are highly effective in revealing instabilities of reconstructions. Such an ability is crucial to ensure a safe use of deep learning-based methods for medical image reconstruction.
IJCARS

Detecting failure modes in image reconstructions with interval neural network uncertainty

Luis Oala, Cosmas Heiß, Jan Macdonald, Maximilian März, Gitta Kutyniok, and Wojciech Samek

International Journal of Computer Assisted Radiology and Surgery, 2021

Abs HTML PDF Code

Purpose The quantitative detection of failure modes is important for making deep neural networks reliable and usable at scale. We consider three examples for common failure modes in image reconstruction and demonstrate the potential of uncertainty quantification as a fine-grained alarm system. Methods We propose a deterministic, modular and lightweight approach called Interval Neural Network (INN) that produces fast and easy to interpret uncertainty scores for deep neural networks. Importantly, INNs can be constructed post hoc for already trained prediction networks. We compare it against state-of-the-art baseline methods (MCDROP, PROBOUT). Results We demonstrate on controlled, synthetic inverse problems the capacity of INNs to capture uncertainty due to noise as well as directional error information. On a real-world inverse problem with human CT scans, we can show that INNs produce uncertainty scores which improve the detection of all considered failure modes compared to the baseline methods. Conclusion Interval Neural Networks offer a promising tool to expose weaknesses of deep image reconstruction models and ultimately make them more reliable. The fact that they can be applied post hoc to equip already trained deep neural network models with uncertainty scores makes them particularly interesting for deployment.
Improving uncertainty estimation with semi-supervised deep learning for COVID-19 detection using chest X-ray images

Saul Calderon-Ramirez, Shengxiang Yang, Armaghan Moemeni, Simon Colreavy-Donnelly, David A Elizondo, Luis Oala, Jorge Rodrı́guez-Capitán, Manuel Jiménez-Navarro, Ezequiel López-Rubio, and Miguel A Molina-Cabello

Ieee Access, 2021
Post-hoc domain adaptation via guided data homogenization

Kurt Willis, and Luis Oala

arXiv preprint arXiv:2104.03624, 2021
Detecting failure modes in image reconstructions with interval neural network uncertainty

Luis Oala, Cosmas Heiß, Jan Macdonald, Maximilian März, Gitta Kutyniok, and Wojciech Samek

International Journal of Computer Assisted Radiology and Surgery, 2021
Machine learning for health: algorithm auditing & quality control

Luis Oala, Andrew G Murchison, Pradeep Balachandran, Shruti Choudhary, Jana Fehr, Alixandro Werneck Leite, Peter G Goldschmidt, Christian Johner, Elora DM Schörverth, Rose Nakasi, and others

Journal of medical systems, 2021
Machine learning for health (ml4h) 2021

Subhrajit Roy, Stephen Pfohl, Girmaw Abebe Tadesse, Luis Oala, Fabian Falck, Yuyin Zhou, Liyue Shen, Ghada Zamzmi, Purity Mugambi, Ayah Zirikly, and others

In Machine Learning for Health, 2021
A collection of the accepted abstracts for the Machine Learning for Health (ML4H) symposium 2021

Fabian Falck, Yuyin Zhou, Emma Rocheteau, Liyue Shen, Luis Oala, Girmaw Abebe, Subhrajit Roy, Stephen Pfohl, Emily Alsentzer, and Matthew McDermott

arXiv e-prints, 2021

2020

NeurIPS
ML4H Auditing: From Paper to Practice

Luis Oala, Jana Fehr, Luca Gilli, Pradeep Balachandran, Alixandro Werneck Leite, Saul Calderon-Ramirez, Danny Xie Li, Gabriel Nobis, Erick Alejandro Munoz Alvarado, Giovanna Jaramillo-Gutierrez, Christian Matek, Arun Shroff, Ferath Kherif, Bruno Sanguinetti, and Thomas Wiegand

In Proceedings of the Machine Learning for Health NeurIPS Workshop, 11 dec 2020

Abs Bib HTML PDF Slides

Healthcare systems are currently adapting to digital technologies, producing large quantities of novel data. Based on these data, machine-learning algorithms have been developed to support practitioners in labor-intensive workflows such as diagnosis, prognosis, triage or treatment of disease. However, their translation into medical practice is often hampered by a lack of careful evaluation in different settings. Efforts have started worldwide to establish guidelines for evaluating machine learning for health (ML4H) tools, highlighting the necessity to evaluate models for bias, interpretability, robustness, and possible failure modes. However, testing and adopting these guidelines in practice remains an open challenge. In this work, we target the paper-to-practice gap by applying an ML4H audit framework proposed by the ITU/WHO Focus Group on Artificial Intelligence for Health (FG-AI4H) to three use cases: diagnostic prediction of diabetic retinopathy, diagnostic prediction of Alzheimer’s disease, and cytomorphologic classification for leukemia diagnostics. The assessment comprises dimensions such as bias, interpretability, and robustness. Our results highlight the importance of fine-grained and caseadapted quality assessment, provide support for incorporating proposed quality assessment considerations of ML4H during the entire development life cycle, and suggest improvements for future ML4H reference evaluation frameworks.
@inproceedings{pmlr-v136-oala20a, award = {Spotlight<br>Top 10%}, title = {ML4H Auditing: From Paper to Practice}, author = {Oala, Luis and Fehr, Jana and Gilli, Luca and Balachandran, Pradeep and Leite, Alixandro Werneck and Calderon-Ramirez, Saul and Li, Danny Xie and Nobis, Gabriel and Alvarado, Erick Alejandro Munoz and Jaramillo-Gutierrez, Giovanna and Matek, Christian and Shroff, Arun and Kherif, Ferath and Sanguinetti, Bruno and Wiegand, Thomas}, booktitle = {Proceedings of the Machine Learning for Health NeurIPS Workshop}, pages = {280--317}, year = {2020}, editor = {Alsentzer, Emily and McDermott, Matthew B. A. and Falck, Fabian and Sarkar, Suproteem K. and Roy, Subhrajit and Hyland, Stephanie L.}, volume = {136}, series = {Proceedings of Machine Learning Research}, month = {11 Dec}, publisher = {PMLR}, url = {http://proceedings.mlr.press/v136/oala20a.html}, }
ICML

Detecting Failure Modes in Image Reconstructions with Interval Neural Network Uncertainty

Luis Oala, Cosmas Heiß, Jan Macdonald, Maximilian März, Wojciech Samek, and Gitta Kutyniok

In ICML 2020 Workshop on Uncertainty & Robustness in Deep Learning, 11 dec 2020

Abs HTML PDF Slides

The quantitative detection of failure modes is important for making deep neural networks reliable and usable at scale. We consider three examples for failure modes in image reconstruction problems and demonstrate the potential of uncertainty quantification as a fine-grained alarm system. We propose a deterministic, modular and lightweight approach, called Interval Neural Networks, that produces fast and easy to interpret uncertainty scores which improve the detection of failure modes across four out of five image reconstruction experiments.
Ml4h auditing: From paper to practice

Luis Oala, Jana Fehr, Luca Gilli, Pradeep Balachandran, Alixandro Werneck Leite, Saul Calderon-Ramirez, Danny Xie Li, Gabriel Nobis, Erick Alejandro Muñoz Alvarado, Giovanna Jaramillo-Gutierrez, and others

In Machine learning for health, 11 dec 2020

Preprints

2024

2023

2022

2021

2020

arXiv

Interval Neural Networks: Uncertainty Scores

Luis Oala, Cosmas Heiß, Jan Macdonald, Maximilian März, Wojciech Samek, and Gitta Kutyniok

2020

Abs HTML PDF

We propose a fast, non-Bayesian method for producing uncertainty scores in the output of pre-trained deep neural networks (DNNs) using a data-driven interval propagating network. This interval neural network (INN) has interval valued parameters and propagates its input using interval arithmetic. The INN produces sensible lower and upper bounds encompassing the ground truth. We provide theoretical justification for the validity of these bounds. Furthermore, its asymmetric uncertainty scores offer additional, directional information beyond what Gaussian-based, symmetric variance estimation can provide. We find that noise in the data is adequately captured by the intervals produced with our method. In numerical experiments on an image reconstruction task, we demonstrate the practical utility of INNs as a proxy for the prediction error in comparison to two state-of-the-art uncertainty quantification methods. In summary, INNs produce fast, theoretically justified uncertainty scores for DNNs that are easy to interpret, come with added information and pose as improved error proxies - features that may prove useful in advancing the usability of DNNs especially in sensitive applications such as health care.
arXiv

MixMOOD: A systematic approach to class distribution mismatch in semi-supervised learning using deep dataset dissimilarity measures

Saul Calderon-Ramirez, Luis Oala, Jordina Torrents-Barrena, Shengxiang Yang, Armaghan Moemeni, Wojciech Samek, and Miguel A. Molina-Cabello

2020

Abs HTML PDF

In this work, we propose MixMOOD - a systematic approach to mitigate effect of class distribution mismatch in semi-supervised deep learning (SSDL) with MixMatch. This work is divided into two components: (i) an extensive out of distribution (OOD) ablation test bed for SSDL and (ii) a quantitative unlabelled dataset selection heuristic referred to as MixMOOD. In the first part, we analyze the sensitivity of MixMatch accuracy under 90 different distribution mismatch scenarios across three multi-class classification tasks. These are designed to systematically understand how OOD unlabelled data affects MixMatch performance. In the second part, we propose an efficient and effective method, called deep dataset dissimilarity measures (DeDiMs), to compare labelled and unlabelled datasets. The proposed DeDiMs are quick to evaluate and model agnostic. They use the feature space of a generic Wide-ResNet and can be applied prior to learning. Our test results reveal that supposed semantic similarity between labelled and unlabelled data is not a good heuristic for unlabelled data selection. In contrast, strong correlation between MixMatch accuracy and the proposed DeDiMs allow us to quantitatively rank different unlabelled datasets ante hoc according to expected MixMatch accuracy. This is what we call MixMOOD. Furthermore, we argue that the MixMOOD approach can aid to standardize the evaluation of different semi-supervised learning techniques under real world scenarios involving out of distribution data.

Standardization

2024

2023

2022

2021

ITU/WHO

Good practices for health applications of machine learning: Considerations for manufacturers and regulators

Christian Johner, Pradeep Balachandran, Luis Oala, Aaron .Y. Lee, Alixandro Werneck Leite, Andrew Murchison, Anle Lin, Christoph Molnar, Juliet Rumball-Smith, Pat Baird, Peter. G. Goldschmidt, Pierre Quartarolo, Shan Xu, Sven Piechottka, and Zack Hornberger

In Proceedings of the ITU/WHO Focus Group on Artificial Intelligence for Health (FG-AI4H) - Meeting K, Itu/who 2021

Abs HTML PDF

This document contains the latest draft of the FG-AI4H deliverable DEL02.2 "Good practices for health applications of machine learning: Considerations for manufacturers and regulators". This deliverable defines a set of guidelines intended to serve the AI solution developers/manufacturers on how to do conduct a comprehensive requirements analysis and to streamline the conformity assessment procedures to ensure regulatory compliance for the AI based Medical Devices (AI/ML-MD).
ITU/WHO

FG-AI4H Open Code Initiative - Evaluation and Reporting Package

Elora Schörverth, Steffen Vogler, Pradeep Balachandran, Alixandro Werneck Leite, Danny Xie Li, Kamran Ali, Garcia, Dominik Schneider, Joachim Krois, Marc Lecoultre, Shobha Iyer, Shruti Choudhary, and Luis Oala

In Proceedings of the ITU/WHO Focus Group on Artificial Intelligence for Health (FG-AI4H) - Meeting K, Jan 2021

Abs HTML PDF Code

Presentation of the software components developed by WG-DAISAM for automating the FG-AI4H evaluation process.

2020

ITU/WHO

Data and artificial intelligence assessment methods (DAISAM) reference

Luis Oala, Pradeep Balachandran, Federico Cabitza, Saul Calderon Ramirez, Alexandre Chiavegatto Filho, Fabian Eitel, Jérôme Extermann, Jana Fehr, Stephane Ghozzi, Luca Gilli, Giovanna Jaramillo-Gutierrez, Quist-Aphetsi Kester, Shalini Kurapati, Stefan Konigorski, Joachim Krois, and 10 more authors

In Proceedings of the ITU/WHO Focus Group on Artificial Intelligence for Health (FG-AI4H) - Meeting I, May 2020

Abs HTML PDF

This document, Data and artificial intelligence assessment methods (DAISAM) reference, is the reference collection of WG-DAISAM for assessment methods of data and artificial intelligence quality evaluation. This document also constitutes subsection 7.3 of the FG-AI4H deliverable 7.
ITU/WHO

Data and artificial intelligence assessment methods (DAISAM) Audit Reporting Template

Boris Verks, and Luis Oala

In Proceedings of the ITU/WHO Focus Group on Artificial Intelligence for Health (FG-AI4H) - Meeting J, 2020

Abs HTML PDF

Standardized templates to report results for the assessment processes developed by WG-DAISAM. In this version the, template comprises three elements: Data Specification Sheet, ML Model Specification Sheet and ML Model Summary Findings.