Modeling 3D genome structure from Hi-C data with deep learning
A plethora of studies during the past decade have highlighted that the 3D structure of the genome influences critical cell functions, such as DNA replication, gene expression, cell fate decisions and differentiation [Bonev&Cavalli 2016]. To determine how chromatin is folded within the nucleus, a number of experimental techniques are employed, whereby chromosome conformation capture methods, and notably Hi-C, are gaining popularity [Lieberman et al. 2009]. Chromatin interactions captured by Hi-C are represented as a contact matrix, where each entry determines the frequency of interactions between a pair of genomic bins in a population of cells [Dekker et al. 2013]. One of the main applications of Hi-C involves building realistic 3D models of chromatin structure from the extracted contact matrices. Numerous methods have been proposed in past years; however, they suffer from important limitations in terms of underlying assumptions, low resolution or scalability. To address this, early efforts in our group resulted in REACH-3D (REcurrent Autoencoders for CHromatin 3D structure prediction) [Cristecsu et al. 2018], a novel deep-learning approach based on autoencoders with recurrent neural units that infers an ensemble of structures from a Hi-C matrix.
The proposed project involves developing a deep-learning framework for inferring a 3D chromatin structure. To achieve this, the student will extend our prior REACH-3D model by exploiting recent developments in attention mechanisms [Bahdanau et al. 2020]. Alternative neural-network architectures based on graph neural networks [Veličković et al. 2020] or transformers [Vaswani et al. 2020] will also be considered. The goal of the Master’s thesis is not only to infer the 3D genome structure, but also to provide feedback on the chromatin contacts driving genome folding in a location-specific manner. To test the methods, publicly available datasets will be exploited [HiC 2018], and the inferred structures will be benchmarked against established methods. Focus will be placed not only on the accuracy of the results, but also on the scalability of the methods with respect to the genome size and the Hi-C resolution.
We invite applications from ETH Master students with a background in Computer Science, Computational Biology/Bioinformatics or related fields. The ideal candidate should have a solid background in machine learning, deep learning and data analysis. Strong programming skills in Python and practical experience with at least one deep-learning framework (Tensorflow, PyTorch, Keras) are essential. Prior knowledge of molecular biology is not a prerequisite.
IBM is committed to diversity at the workplace. With us you will find an open, multicultural environment. Excellent flexible working arrangements enable all genders to strike the desired balance between their professional development and their personal lives.
How to apply
Interested candidates are welcome to submit an application including CV and transcript of grades.