Computational systems biology

Developing predictive models for precision medicine

To make diagnostics truly personalized — assign each patient to a disease subtype, determine the aggressiveness of the disease, suggest individualized treatment options — modern precision medicine requires biomarkers.

Advances in high-throughput data generation have dramatically expanded the search space for biomarker discovery, making the selection of an optimal biomarker signature from large and noisy datasets challenging. Thus, the task of selecting a biomarker signature — a small number of predictors able to predict the response — has become an important task in omics research.


Our goals

Our research aims at developing a framework for biomarker signature selection from omics data sets.

For this task we:

Connect a variety of feature selection algorithms with supervised regression and classification methods.

Introduce stability as a metric for biomarker signature robustness in the selection process.


To benchmark the procedure of biomarker signature selection, we use measurements of 35 murine strains from the BXD genetic reference strain panel [1]. Animals of each strain were respectively exposed to high-fat and chow diets [2], yielding 70 samples in total.

We use 2100 liver proteins measured with SWATH-MS to predict seven metabolism-related continuous phenotypic traits: body weight, fat mass, lean mass, blood glucose and insulin levels, body temperature during the cold test, and respiration volume.

[1] Peirce, J.L., et al., “A new set of BXD recombinant inbred lines from advanced intercross populations in mice,” BMC Genetics 5(7) 2004.

[2] Wu Y., et al., “Multilayered genetic and omics dissection of mitochondrial activity in a mouse reference population,” Cell 158(6) 1415-1430, 2014.

Our challenges

  • The number of biological samples is much less than the number of features (proteins, transcripts and genes measured). Therefore we must make special adaptations of machine-learning procedures.
  • Features in the omics datasets are highly intercorrelated. Therefore, selecting only a few biomarker features that adequately represent the whole dataset requires background knowledge such as network connectivity.
Biomarker data Biomarker data

Ask the expert



Jelena Čhuklina

Jelena Čhuklina

IBM Research scientist

We gratefully acknowledge generous funding from

PrECISE logo