Master’s student position

Interpretability for computational biology

Ref. 2019-08

Project description

Understanding real-world datasets is often challenging due to size, complexity and/or poor knowledge about the problem to be tackled (i.e. electronic health records, OMICS data, etc.). To achieve high accuracy for important tasks, equally complex machine/deep-learning models are usually used. In many situations, the decisions achieved by such automated systems can have significant—and potentially deleterious—consequences.

The goal of this project is to analyze and benchmark the performance of existing interpretability methods and strategies [1,2] relative to various tasks in computational biology (e.g. transcription factor-binding prediction, cancer type classification). In particular, the student will focus on mass cytometry data [3]. Specifically, the student will perform the following tasks:

  • Extensive literature review of existing interpretability methods, and (if interested) of philosophical and psychological essays about interpretability.
  • If code is not provided, reimplement some of the existing methods.
  • Define the desiderata of an interpretability method in the context of cell subpopulation identification using CyTOF data, and accordingly design performance metrics to benchmark and compare existing methods systematically for interpretability.
  • Develop a new interpretability method and compare it to existing methods.

Requirements

  • Working knowledge of Python, or equivalent.
  • Comfortable knowledge of statistics and machine learning.

In addition, a background in either systems biology, applied mathematics, or cognitive sciences could be beneficial, although it is not essential.

Diversity

IBM is committed to diversity at the workplace. With us you will find an open, multicultural environment. Excellent flexible working arrangements enable both women and men to strike the desired balance between their professional development and their personal lives.

How to apply

Interested candidates, please contact: .


[1] Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. “Why should I trust you?: Explaining the predictions of any classifier.” Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 2016.
[2] Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. “Anchors: High-precision model-agnostic explanations.” Thirty-Second AAAI Conference on Artificial Intelligence. 2018.
[3] Spitzer MH, Nolan GP. “Mass Cytometry: Single Cells, Many Features.” Cell 165(4), 780-91, 2016.