Quantifying biological heterogeneity from single-cell data.png

Computational Systems Biology

Developing predictive models for precision medicine.
Archived

Overview

With the advances of high-throughput experimental techniques, biomedical research is turning into information science. This requires the use of machine and deep-learning approaches, statistics and mathematical modelling. Individual cellular processes that comprise the interplay of several molecular players, such as cell signaling, can now be quantitatively characterized to allow a systematic view of biological processes. A better understanding of biological processes is crucial in order to provide robust predictive models that improve disease prognoses and treatment strategies. Our group is exploiting a large variety of data — multi-omics datasets, single-cell proteomics and mass spectrometry-based quantitative proteomics — to dissect the molecular mechanisms of cancer. Our goal is to develop predictive models for precision medicine.

Data

  • Omics: RNAseq, CNV, SNP, miRNA, SWATH–MS
  • Single cell: Mass cytometry
  • Clinical data: Survival outcome, treatment
  • Literature: Publications
  • Compound structure: SMILES, graph representation of molecules
  • Networks: protein–protein interactions, pathways

Methods

  • Machine learning: Deep learning, dimensionality reduction, clustering, classification, generative models
  • Statistical inference: Probabilistic models, network inference
  • Mathematical modeling: Stochastic hybrid models, Boolean networks
compsysbio.png

Research goals

  • Tumor heterogeneity
  • Leading drivers of cancer
  • Molecular mechanisms

Personalized medicine

  • Patient stratification
  • Early diagnosis
  • Targeted treatment

Research

At IBM Research in Zurich, we develop novel approaches to analyze different molecular levels of high-throughput data. From single-cell to cell population-averaged data (proteomics, transcriptomics), we aim to integrate multiple layers of genome-scale information. This, in combination with clinical information and prior knowledge through literature mining, enables us to understand molecular mechanisms and explore applications to personalised medicine.

Our main research projects include, but not are limited to, studying cell-to-cell heterogeneity, integrative multi-omics analysis, dynamic network inference and robust biomarker discovery, most of which are applied in the case of cancer. Recently, we focused on anticancer drug modelling, specifically on leveraging biomarker information into generative models for de-novo drug design, attempting to bridge systems biology and anticancer drug discovery.

We gratefully acknowledge our numerous collaborations with university hospitals, research institutes and universities that work alongside our team in many of our projects.

Research topics

Interpretability for machine learning and computational biology

Understanding real-world datasets is often challenging due to their size, complexity and/or poor knowledge about the problem to be tackled (i.e. electronic health records, OMICS data, etc.).

To achieve high accuracy for important tasks, equally complex machine/deep-learning models are usually used. In many situations, the decisions achieved by such automated systems can have significant—and potentially deleterious—consequences.

In biology and healthcare, interpretability becomes important for three main reasons.

1. Trust

For example, doctors and patient need to be confident about the decision achieved by a deployed model. By providing the rationale behind a decision could make a model more trustable.

2. Debugging

A model could return unexpected predictions, possibly indicating poor performance. Interpretability could help by shedding light on the causes behind poor performance, such as unfair dataset bias or poor model training.

3. Generating biological hypotheses

Surprising results do not always have a negative connotation. Rather, they might be due to the trained model leveraging a true pattern in the data that is unknown even to field experts, such as an unknown protein–protein interaction. Interpretable methods can potentially uncover these patterns, which can then be used as the basis for novel biological hypotheses.

Tumor heterogeneity

Tumor cells exhibit a high degree of variability in terms of morphology, phenotype, metastatic potential and underlying molecular profile. This heterogeneity is present not only across different patients (inter-tumor heterogeneity) but also within the same tumor (intra-tumor heterogeneity) and has emerged as an inherent property of cancer.

Identifying the sources of heterogeneity and its implications in clinical outcomes, such as response to therapy or ability to metastasize, has become a cornerstone for the development of effective disease management strategies.

Read more about modeling spatial heterogeneity of the tumor microenvironment.

Read more about quantifying biological heterogeneity from single-cell data.

Multimodal data integration

Developing a predictive computational technology to exploit and integrate multiple molecular and clinical data.

Read more about multimodal data integration.

Research assets

PaccMann

Anticancer drug modelling for precision medicine.

INtERAcT

Automatic text mining and analysis.

PIMKL

Pathway-induced multiple kernel learning.

CellCycleTRACER

A novel computational method to quantify cell cycle and cell volume variability.

Chimaera

Estimating the frequency of genetic alterations.

COSIFER

Consensus inference of molecular networks.

Technical resources

Funding

We gratefully acknowledge generous funding from SystemsX.ch, SNF and the European Union.