Data Science for Health

Fair and Explainable Machine Learning

ANNa :

Analogical proportions are statements of the form “A is to B as C is to D”. They are the basis of analogical inference that has been used in machine learning tasks such as classification, decision making, and automatic translation with competitive results. Analogical extrapolation can solve hard reasoning tasks, such as IQ test, and support data augmentation when learning models with few labeled samples. What makes analogical inference special is its unique ability to process simultaneously similarities and dissimilarities. This characteristic links the two main axes of AI: knowledge representation and reasoning (KRR) and machine learning (ML). Moreover, analogical reasoning contributes to the transparency of AI as it is close to human reasoning and enables explanations based on examples and counter-examples. The objective of the ANNa project is to provide an online platform to detect, solve, and reason with analogies, with noteworthy applications in NLP, medical sciences, and industry. It is a project of LORIA ORPAILLEUR team.

Visit our website !

Contact {miguel.couceiro, esteban.marquer} at

FaIrness through eXplanations and feature dropOut :

Algorithmic decisions are increasingly present in several aspects of our lives, e.g., loan grant decision, terrorism detection, prediction of criminal recidivism, and similar social and economical applications. Many of these algorithmic decisions are taken without human supervision and through decision making processes that are not transparent. This raises concerns regarding the potential bias of these processes towards certain groups of society. Such unfair outcomes not only affect human rights, but they also undermine public trust in Machine Learning (ML).
FixOut addresses fairness issues of ML models based on decision outcomes, and shows how the simple idea of “feature dropout” followed by an “ensemble approach” can improve model fairness. Originally, it was conceived to tackle process fairness of ML Models based on decision outcomes. For that FIXOut uses an explanation method to assess a model’s reliance on salient or sensitive features, that is integrated in a human-centered workflow that outputs a classifier M’ that does not compromise M’s performance while improving it in process fairness as well as in other fairness metrics. It is a project of LORIA ORPAILLEUR team.

Visit our website !

Contact {miguel.couceiro, guilherme.alves-da-silva} at

Benchmarking of clustering tools for mixed data

Recently accepted in Nature/Scientific Reports, this study examines the performance of various clustering strategies for mixed data, i.e. data with both continuous and categorical variables. It is a joint project between the LORIA CAPSID, the CHRU Nancy CIC-P, and the LORIA ORPAILLEUR teams.

« We conducted a benchmark analysis of “ready-to-use” tools in R comparing 4 model-based (Kamila algorithm, Latent Class Analysis, Latent Class Model [LCM] and Clustering by Mixture Modeling) and 5 distance/dissimilarity-based (Gower distance or Unsupervised Extra Trees dissimilarity followed by hierarchical clustering or Partitioning Around Medoids, K-prototypes) clustering methods. Clustering performances were assessed by Adjusted Rand Index (ARI) on 1000 generated virtual populations consisting of mixed variables using 7 scenarios with varying population sizes, number of clusters, number of continuous and categorical variables, proportions of relevant (non-noisy) variables and degree of variable relevance (low, mild, high). »

Simulated datasets and links to access to the tested tools are available here.

Further contributions and feedbacks on existing « ready-to-use » tools for mixed data are welcome.

Contact Nicolas Girerd at CIC-P or Marie-Dominique Devignes at the LORIA.

Ressources for pharmacogenomics analyses


PGxLOD is a semantic web resource (Linked Open Data) intended to host pharmacogenomic knowledge extracted from various sources (PharmGKB, litterature and Electronic Health Records).

PGxLOD uses the PGxO ontology. A full description of the motivation, implementation and instantiation of PGxO and PGxLOD is available in [1].

PGxLOD and PGxO were developed during the ANR PraktikPharma project (ANR-15-CE23-0028).

Go to PGxLOD main page.

Rare diseases

Integration of rare diseases, genes and phenotypes from Orphanet: Orphamine

Orphamine is a tool for visualizing data from Orphanet : 8496 diseases, 1360 clinical signs, 3129 genes. It integrates cross-references with OMIM, ICD-10, HGNC, UniProtKB and GeneAtlas. It is a project of LORIA ORPAILLEUR team.

Go to the main Orphamine web site.

Knowledge graphs