Developing predictive models for precision medicine
Personalized medicine relies heavily on a patient’s data analysis, including but not limited to the genomic datasets that are becoming increasingly more available today. Taking this into consideration, prostate cancer will be taken as a case study to develop an interdisciplinary project that combines hypothesis-driven diagnostic strategies with data-driven estimations within a novel computational framework.
The aim of this project is to develop a novel computational framework to identify informative genomic variants in prostate cancer that addresses the urgent clinical need to stratify primary prostate cancer tissues into two classes: aggressive and insignificant disease. The computational framework will help to classify patients into discriminative groups and generate the associated genotypic profiles.
The increasing availability of omics datasets has opened new ways to characterize and categorize cancers to guide therapeutic strategies. Many efforts have been focused on identifying recurrent alterations in different cancer types and improving the patients’ stratification towards the aggressiveness of the disease.
In many cases, despite current knowledge of the most common alterations, it is yet to be shown that patients with these alterations follow a significantly different prognostic profile than those without. Thus, there is still a need to establish a molecular characterization of these tumors to improve the stratification of patients.
Our ultimate goal is to identify genomic alterations that can help stratify patients and avoid unnecessary medical intervention.
Prostate cancer (PCa) is the second most common cancer type and the fourth leading cause of cancer death in men worldwide. These numbers are even more severe when developed countries are considered alone. On the other hand, although PCa is a serious disease, clinically insignificant PCa seems to be more prevalent in older men where the life expectancy is less than the time required by the disease to manifest symptoms.
One of the biggest challenges with primary prostate tumors is the overdiagnosis and the fact that the treatment is associated with potentially major and debilitating side effects. Current prognostic factors are not sufficient to determine patients’ survival risk. For this purpose, new tests are required to distinguish between aggressive and insignificant disease, thus reducing the overdiagnosis rate and the number of avoidable deaths.
This project aims to work towards the molecular characterization of the primary prostate tumors so as to improve the stratification of these patients. More specifically, its primary goal is to identify subtypes of PCa with different survival risks and their associated genotypic profiles. To date, there have been some studies to address this issue and, although significant progress has been made, deeper investigations are needed to characterize these tumors.
Most previous studies have used a single-locus analysis strategy to identify variants in strong association with the disease. This requires a pre-selection step of genomic features in order to increase the statistical power of the tests when there are not enough available samples. Consequently, these strategies limit their search to a few a priori known candidate modifications and do not take full advantage of high-throughput datasets. On the other hand, more advanced methods that attempt to reduce the dimensionality of these datasets, and cluster them, usually work as “black boxes,” which makes the biological interpretation of these findings challenging.
Novel computational framework
Motivated by these limitations, the goal of this thesis is to develop a novel computational framework that can combine and analyze high-throughput datasets from different molecular data types in the same patient samples. Our proposed method compresses a sparse and underdetermined dataset in order to gain more knowledge and then link this knowledge to a more interpretable outcome.
In the case of PCa, the expected outcome should provide strong associations between groups of patients with a specific clinical property and groups of genomic features from multi-omics datasets.
Finally, when an efficient and robust method has been produced, this project aims to impact the capacity being built up in the field of systems biology not only with regard to methodology and the algorithmic level but also clinical-level supporting precise as well as predictive and personalized medicine (3P medicine).
A computational pipeline to construct a phenotype–genotype association network. It creates a summary of the basic building blocks of an object, where and how much they are used.
We learn a dictionary from the omics dataset. These are the building blocks of an object.
The network of the dictionary is built and the topology is investigated to expose multi-variate relationships between the dictionary entries. This means how the building blocks are combined to build the object.
The phenotypic and genomic traits are mapped in the network. This determines where and how much a building block is used overall.