To benchmark the procedure of biomarker signature selection, we use measurements of 35 murine strains from the BXD genetic reference strain panel . Animals of each strain were respectively exposed to high-fat and chow diets , yielding 70 samples in total.
We use 2100 liver proteins measured with SWATH-MS to predict seven metabolism-related continuous phenotypic traits: body weight, fat mass, lean mass, blood glucose and insulin levels, body temperature during the cold test, and respiration volume.
 Peirce, J.L., et al.,
“A new set of BXD recombinant inbred lines from advanced intercross populations in mice,”
BMC Genetics 5(7) 2004.
 Wu Y., et al.,
“Multilayered genetic and omics dissection of mitochondrial activity in a mouse reference population,”
Cell 158(6) 1415-1430, 2014.
- The number of biological samples is much less than the number of features (proteins, transcripts and genes measured). Therefore we must make special adaptations of machine-learning procedures.
- Features in the omics datasets are highly intercorrelated. Therefore, selecting only a few biomarker features that adequately represent the whole dataset requires background knowledge such as network connectivity.