Topics at the Zurich Lab

Ref. code  
Z-2017-1

Systems security for cloud storage systems

[ Project description | Close ]

Department:   Cloud & Computing Infrastructure
Short Description:  

The cloud storage & security research team at IBM Research – Zurich is looking for outstanding graduate students in the area of systems security. Candidates should have a good understanding of operating systems architecture, distributed systems and C/C++ system programming skills. Knowledge of memory corruption prevention and fuzz testing techniques is beneficial.

During the six-month internship, the successful candidate will develop novel strategies for fuzz testing of complex OS components and evaluate them on actual software such as distributed file systems.

Z-2017-2

Blockchain application web development

[ Project description | Close ]

Department:   Cloud & Computing Infrastructure
Short Description:  

We are looking for motivated interns to join our work on industry platforms and blockchain. Ideal candidates are familiar with blockchain systems (e.g., Hyperledger) and have a track record in end-to-end web development (JavaScript+frameworks, Node.js, Nginx, NoSQL, Golang).

During the six-month internship, candidates will be working on a blockchain application for identity management. We expect applicants to be self-taught, organized and open. They should have a passion for agile web development as well as a strong interest in scientific teamwork.

Z-2017-3

Cognitive storage / Brain-inspired large-scale data storage systems

[ Project description | Close ]

Department:   Cloud & Computing Infrastructure
Short Description:  

Storing and retrieving large amounts of data will be one of the most challenging aspects of future data systems. Big advances in storage technologies and in storage management have enabled storage administrators to keep up with the exponential data growth over the decades, and to keep the ever increasing storage system complexity manageable so far. However, these advances in storage technologies appear to be not sufficient for accommodating future data growth and for effectively handling ever increasing system complexity. A promising approach to tackle the challenges posed is represented by cognitive storage systems, where machine learning techniques are applied to optimize the efficiency of data storage in a content-dependent manner.

In this 4 to 6-month project, we would like to explore the possibility of advancing the state-of-the-art algorithms in the context of cognitive storage. Just like how our brains can trace back to old memories through a series of thoughts, each of them linked to one another through some notion of “similarity”, the idea is to leverage machine learning algorithms and graph theory to link different pieces of data stored in the system using their metadata to be able to assess data relevance and consequently tune the data storage policies.

Candidates choosing this topic are expected to have background and interest in machine learning and graph theory and be capable of implementing their ideas during the time period of their internship in a suitable programming language (e.g., Python, Scala, Java) preferably on Hadoop / Spark type of environments that can handle large amounts of data.

References

[1] Cognitive storage: Teaching computers what to learn and what to forget
[2] What can Google tell us about ‘the memory web’ in the brain?

Z-2017-4

Computational memory

[ Project description | Close ]

Department:   Cloud & Computing Infrastructure
Short Description:  

For decades, conventional computers based on the von Neumann architecture have performed computations by repeatedly transferring data between their memory and their processing units. As computation becomes increasingly data-centric and as scalability limits are being reached in terms of performance and power, alternative computing paradigms are needed that collocate computation and storage.

One such paradigm is that of computational memory, where the physics of nanoscale resistive memory devices are used to achieve this. At IBM Research – Zurich, we have shown that millions of such devices can be organized to perform high-level computational primitives. Such coexistence of computation and storage at the nanometer scale could be the enabler for new ultra-dense, massively parallel computing systems.

We would like to invite applications from highly qualified interns who can contribute to this research effort. The ideal candidate should have a sound foundation in mathematics. A background in the theory of computation or machine learning is highly desirable. The candidates should also be well versed in programming languages such as Matlab and Python. Familiarity with resistive memory devices is desirable but not necessary.

Z-2017-5

Effect of reduced precision arithmetic

[ Project description | Close ]

Department:   Cloud & Computing Infrastructure
Short Description:  

This project requires investigating the effect of reduced precision arithmetic in the context of training and inference of probabilistic recommendation models. The intern will develop and compare new implementations using fixed-point arithmetic as well as reduced-precision floating-point arithmetic. In order to achieve maximum performance, these implementations should leverage SIMD/AVX CPU extensions as well as the new half-precision floating-point arithmetic provided by the latest NVIDIA GPUs (Pascal architecture).

Z-2017-6

Accelerated recursive graph algorithms

[ Project description | Close ]

Department:   Cloud & Computing Infrastructure
Short Description:  

This project will focus on developing and implementing accelerated recursive graph algorithms on various CPU and GPU platforms. In particular, the intern is expected to exploit the parallelism inherent in such algorithms, in general in minimum cost flow algorithms, in order to unleash potential performance benefits. Current applications include natural language processing and image retrieval.

Z-2017-7

Electrostatic gating of Brownian motors

[ Project description | Close ]

Department:   Science & Technology
Short Description:  

Recently we have developed a Brownian motor that transports nanoscale particles in water without fluid flow. The motor relies on a permanent electrostatic potential and is driven by an external oscillating electric field. In this project, we want to study whether local electrostatic potentials are suitable to control the motor function.

Tasks

  • Build a sample and cover glass holder that enables electric contacts to be fed to a silicon sample.
  • Conduct first experiments on the effect of local electric fields on the motor function.

Desired skills

  • Creative and self-motivated
  • Good experimental skills
Z-2017-8

Advanced single-particle tracking

[ Project description | Close ]

Department:   Science & Technology
Short Description:  

We use interference to detect small particles with high frame rates. Depending on the distance of the particle from the reflective surface, the particle contrast may change sign or become vanishingly small.

Task

  • Extend existing Python libraries to enable tracking of blinking particles.

Desired skills

  • Creative and self-motivated
  • Good knowledge of Python
Z-2017-9

Managing networks of connected electrical/thermal energy conversion systems

[ Project description | Close ]

Department:   Science & Technology
Short Description:  

Smart grids are considered a key component of future electrical grids in order to incorporate a growing proportion of renewable energy sources. In comparison, the management of thermal grids has been mostly neglected, although thermal demand exceeds electricity demand. Our team is developing adsorption heat pumps that work as thermal transformers between different temperature levels. In combination with conventional mechanical compression heat pumps, it will be possible to incorporate both renewable electricity and waste heat in electrical/thermal grids with maximum flexibility while reducing the need for electrical energy storage. New metering concepts and smart-grid management tools will be required to enable this vision of a “holistic” energy grid. During this internship, the successful candidate will work together with our experimental heat pump and smart grid teams to develop a computational framework to describe such heterogeneous networks.

Recommended background: Proficiency in MATLAB, possibly Simulink, basic knowledge of energy science.

Z-2017-10

Nanoparticle-based all-copper electrical interconnects for future high-performance microprocessors

[ Project description | Close ]

Department:   Science & Technology
Short Description:  

The power density of future microprocessors will continue to increase. However, current electrical interconnects do not support such high current densities. Therefore, the Smart System Integration group at IBM Research – Zurich is exploring all-copper electrical interconnects to improve electromigration resistance and reliability compared to currently used solder joints.

These novel electrical interconnections are performed by a dip-transfer process of a nanoparticle-based copper paste, followed by low-temperature sintering. First feasibility studies indicate substantial electrical and mechanical improvements with respect to state-of-the-art joints.

Accordingly, we aim to continue exploring the all-copper interconnect technology by investigations of the underlying physical principles of the dipping and sintering processes. Moreover, a morphological study from the nano to the macro-scale of the resulting nano-porous copper (by TEM, FIB, SEM, EBSD) and detailed electrical and mechanical characterization including electromigration and corrosion effects are required. The study will be performed experimentally on our cutting-edge research facilities. In addition, data analysis and theoretical considerations will complete the work.

Finally, we believe that, once fully controlled, all-copper interconnect technology can be a breakthrough with worldwide impact on the electronic packaging industry.

Z-2017-11

Cognitive healthcare

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

IBM Research is creating fundamental technology to enable humans to engage with advanced IT systems capable of learning from existing data—structured and unstructured—as well as from the interactions between people and the systems themselves. Thanks to the leadership of IBM Research, IBM’s Watson family of cognitive computing solutions is the most advanced artificial intelligence platform, which provides the basis for our work.

A strategic focus area for the Foundations of Cognitive Solutions team at IBM Research – Zurich is to develop cognitive technology and solutions to support patient care. By exploiting the large amount of existing public and clinic-specific medical data, we are creating systems that can improve patient care by assisting medical professionals in a seamless way.

We currently have a range of exciting opportunities for student projects that would be suitable for both internships and Master’s projects.

Required skills

Applicants to our team should have a passion for both developing new algorithms and technology, and developing solutions to apply our results to exciting new applications. Skills are required in the following areas:

  • natural language processing
  • graph algorithms
  • machine learning
  • software development in languages suited to the problem at hand (e.g., R, Python, Java)
Z-2017-12

Medical knowledge representation

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

The most fundamental challenge in these efforts is to gather and represent data from diverse sources so that it can be used to provide insight. Existing ontologies provide the basis for organizing information and concept expansion makes it possible to extend the ontologies to capture more detailed domain-specific concepts. Our work in this space is focused on developing new algorithms for extracting and organizing information and implementing them in our existing environment. The structure of our knowledge graph also needs to be improved continuously to support queries that arise from new applications.

Another challenge in this space is the development of robust and reusable techniques that make it possible to relate and exploit information that has been ingested from sources in a variety of different languages.

Required skills

Applicants to our team should have a passion for both developing new algorithms and technology, and developing solutions to apply our results to exciting new applications. Skills are required in the following areas:

  • natural language processing
  • graph algorithms
  • machine learning
  • software development in languages suited to the problem at hand (e.g., R, Python, Java)
Z-2017-13

Next best question

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

Cognitive systems that interact with people (for example patients or medical professionals) frequently provide assistance based on input provided by the user. As an example, a medical triage system can help a non-professional find the right care provider based on a description of symptoms in everyday language. For such a system to be useful it needs to prompt the patient to provide the most important input in a natural way, building on the information that the user has already provided. Graph clustering algorithms need to be extended or developed that can identify the user input that is required to relate the user’s needs to the most relevant information in the knowledge graph. This provides the basis for assisting the user while keeping the interactions with the system at a reasonable level.

Required skills

Applicants to our team should have a passion for both developing new algorithms and technology, and developing solutions to apply our results to exciting new applications. Skills are required in the following areas:

  • natural language processing
  • graph algorithms
  • machine learning
  • software development in languages suited to the problem at hand (e.g., R, Python, Java)
Z-2017-14

Automatic testing of learning systems

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

The goal of a cognitive-based system is to learn the answers to questions from existing data and day-to-day interactions with users. In most cases the answer to a user’s questions is not known in advance. This creates interesting challenges for testing and verification since we need to verify that the system is providing the best possible answers in spite of not knowing the answer. Advances in testing are therefore required at all levels. On the one hand, we have to verify that the key concepts are correctly recognized from the available data. Next, we have to ensure that the knowledge representation allows the most relevant information to be used to generate hypothesis and finally answers to new questions. Finally, testing the answers will rely on a combination of checking the answers to known questions from test data not used for learning, and also developing tools that allow experts to evaluate and give feedback to the system in a natural manner.

Required skills

Applicants to our team should have a passion for both developing new algorithms and technology, and developing solutions to apply our results to exciting new applications. Skills are required in the following areas:

  • natural language processing
  • graph algorithms
  • machine learning
  • software development in languages suited to the problem at hand (e.g., R, Python, Java)
Z-2017-15

Accelerated deep neural networks for cognitive systems

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

Deep neural networks are powerful methods used in a variety of large-scale real-world problems such as image classification, object detection, natural language processing, and human action recognition. Although the state-of-the-art results for easier tasks exceed human accuracy, these methods continue to have drawbacks that researchers have been able to solve with only limited success and that still require a considerable time investment.

Among them we mention:

  • The need for more robust methods to reduce the time/costs of the training phase.
  • Transfer learning from one network to another in order to accommodate new available datasets without retraining from scratch.

To address the reduction of the cost of training, researchers have discussed solutions such as better initialization methods, smarter updating rules in the back-propagation process, adaptive learning rate, introducing batch normalization, and dropout techniques. However, the problem with the transfer of learning between networks is still largely unexplored.

Working together on this project, we intend to first get a better understanding on what a deep neural network learns by analyzing the correlation of the neurons activation on various input data and then customize the network in an informed way to rapidly accept the modification of the dataset. We will also incorporate fine tuning and approximate computing techniques in order to compensate for the increased size of the network.

Our goal is to converge faster to a better accuracy when tuning the network with features based on reasoning than when using a completely unsupervised learning method and to eliminate the need of retraining a network from scratch whenever new input is available.

Requirements

  • Strong C++ and Python programming skills
  • Good understanding of numerical linear algebra and function analysis
  • Experience with a deep neural network framework is a plus
Z-2017-16

Fast, approximate computation of complex math functions with SIMD on IBM POWER architectures

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

The candidate will develop new algorithms to approximate computationally expensive mathematical functions on IBM POWER architectures. State-of-the-art techniques are based on look-up tables and search algorithms, thus performance is limited by the size/number of tables and a consequent large amount of memory access. This leads to sub-optimal performance on modern chips, which are primarily designed to deliver a large amount of floating-point operations (Flops) per second. In addition, existing algorithms offer no flexibility regarding the final accuracy of the function. Indeed, the classical 16 digits associated with double-precision floating-point numbers are generally far beyond the actual user requirements, which are typically from 4 to 9 digits.

The new algorithms developed during the project will exploit modern IBM POWER hardware. They will make extensive use of single-instruction multiple data (SIMD) as well as classical floating-point additions/multiplications, drastically reducing the amount of memory access compared to classical techniques. In addition, the algorithms will offer the flexibility to specify the desired amount of correct digits, without wasting time and energy to obtain more accurate digits than necessary. During the development, the candidate will also evaluate the energetic impact of the new algorithms. Upon completion, the new algorithms will becine part of a library for approximate computation of mathematical functions that is currently under development at IBM Research – Zurich.

Requirements

  • Strong in C or Fortran programming
  • Good familiarity with Linux and shell editors (e.g., vi or emacs)
Z-2017-17

Vision-based 3D scene representation & segmentation

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

State-of-the-art (computer vision-based) mapping approaches allow the construction of geometrically consistent representations of the physical world as perceived over a short window of time by a single robot only. However, the longer-term observation and faithful mapping of a scene would provide important spatial and temporal cues on its constituent parts (namely objects and object assemblies), as well as their interrelations.

The goal of this project is to explore a first step in this direction by designing and implementing a suitable framework & data representation enabling long-term geometrically consistent mapping in practice. The applicant will work with and adapt the existing dense visual odometry asset at IBM Research – Zurich to produce a working proof of concept implementation.

Requirements

The applicant is expected to have a solid working knowledge of C++ on Linux. Prior experience with computer vision (classical and deep learning based) as well as robotic SLAM frameworks is desirable.

Z-2017-18

Novel approaches to optimize the number of auxiliary qubits and computational resources for the simulation of generic fermionic systems

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

With Moore’s Law coming to an end, the scientific community is facing new challenges that require changing computing paradigms. On the other hand, science has reached an exciting time where devices can be built that make use of quantum effects. The simulation of fermionic systems in today’s quantum computing architectures is still a challenge. Quantum computers, based on qubits following bosonic statistics, make such simulations problematic because encoding fermionic statistics requires the use of complicated transformation schemes (such as Jordan–Wigner or Bravyi–Kitaev). Unfortunately, these transformations introduce many-body interactions (klocal terms) into the Hamiltonian, which cannot be implemented in the current experimental quantum computing setups. A solution to this problem consists into the unfolding of the k-local terms into a sum over 2-local term in an enlarged Hilbert space. This has the undesired effect of exponentially increasing the complexity of the problem (NP-hard problem). In this project, we will study a series of novel approaches with the aim of optimizing the number of auxiliary qubits and computational resources required for simulating generic fermionic systems (e.g. solution of the electronic structure problems in molecular systems).

Qualifications needed

  • BSc or MSc in Physics (preferably with a major in theoretical physics)
  • Good knowledge of quantum mechanics
  • Good knowledge of linear algebra
  • Good handling of Python / Matlab and/or C++
Z-2017-19

Computational methods for cancer personalized medicine

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

Computational methods for cancer personalized medicineDespite their great promise, high-throughput technologies in cancer research have often failed to translate into major therapeutic advances in the clinic. One challenge lies in the high level of tumor heterogeneity displayed by human cancers, which renders the identification of driving molecular alterations difficult, and thus often results in therapies that only target subsets of aggressive tumor cells. Another challenge lies in the difficulty of integrating disparate types of molecular data into mathematical disease models that can make actionable clinical statements.

The systems biology group at the IBM Research – Zurich Lab aims to develop new mathematical and computational approaches to analyze and exploit the latest generation of biomedical data. In the context of cancer, the group focuses on the integration of high-throughput molecular datasets to build comprehensive molecular disease models; the development of new approaches to reconstruct signalling protein networks from single-cell time-series proteomic data; and the application of Bayesian approaches and high-performance computing to the problem of network reconstruction.

An active line of research focuses on prostate cancer, a leading cause of cancer death amongst men in Europe, but also prone to over-treatment. This internship will focus on the analysis of molecular (genomic, transcriptomic, and proteomic) and clinical data, and the use of the last generation cognitive technologies developed at IBM, with the goal of characterizing tumor heterogeneity. In addition, we will aim to develop new methodologies to integrate disparate types of data into models that can help risk-stratify patients. Candidates should have a strong background in computer science, machine learning, mathematics or physics and be interested in cancer-related research.

Z-2017-20

Identifying mechanisms of action in association studies

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

Identifying mechanisms of action in association studiesGenome wide association studies (GWAS) have emerged as a powerful tool to identify genetic variants associated with complex phenotypes and disease. Despite the many newly discovered associations, the variants identified by these studies typically explain only a small fraction of the heritable component of disease risk. Furthermore, few genetic variants are found within coding regions of genes, and the elucidation of the molecular mechanism by which these loci influence the phenotype remains challenging.

The systems biology group at IBM Research – Zurich has developed a new computational approach to elucidate the molecular mechanisms behind the association of genetic variants to breast cancer susceptibility. Specifically, the algorithm tests whether a genetic variant (e.g. a SNP) modulates the activity of a transcription factor, by perturbing its interaction with specific targets. The proposed internship will focus on its application to two independent breast cancer genomic and transcriptomic datasets and the analysis of results in a disease context.

Required expertise

  • Working knowledge of C or C++.
  • Working knowledge of Matlab, R or equivalent.
  • Comfortable knowledge of statistics and mathematical modeling.
  • In addition, some knowledge of molecular biology, genetic and systems biology, as well as high-throughput technologies for the molecular characterization of cancer samples would be beneficial although it is not essential.
Z-2017-21

Computational framework for digital pathology

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

Computational framework for digital pathologyIn digital pathology we focus on the analysis of digitized histopathology and molecular expression images, as well as cytology images. Imaging of tissue specimens is a powerful tool in extracting quantitative metrics of phenotypic properties while preserving the morphology and spatial relationship of the tissue microenvironment. Novel staining technologies like immunohistochemistry (IHC) and in situ hybridization (ISH) empower further the evidencing of molecular expression patterns, by multicolor visualization. Such techniques are thus commonly used for predicting disease susceptibility and stratification and treatment selection and monitoring. However, translating molecular expression imaging into direct health benefits has been slow. Two major factors attribute to that. On the one hand, disease susceptibility and progression is a complex, multifactorial molecular process. Diseases like cancer exhibit tissue and cell heterogeneity, impeding the differentiation between different stages or types of cell formations, most prominently between inflammatory response and malignant cell transition. On the other hand, the relative quantification of the stained tissue selected features is ambiguous, tedious and thus time consuming and prone to clerical error, leading to intra- and interobserver variability and low throughput. At IBM Research – Zurich we are developing advance-image analytics to address both above limitations, aiming to transform the analysis of stained tissue images into a high-throughput, robust, quantitative and data-driven science.

For our growth area on digital pathology we are looking for motivated candidates for the enhancement and advancement of our computational framework. Candidates should be majoring in Computer Science, Electrical Engineering or related fields, with experience and interest in image processing, pattern recognition and/or machine learning.

Z-2017-22

Video analytics for patient monitoring

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

Video analytics for patient monitoringIn patient care, particularly in intensive care units (ICUs), monitoring patients on a 24-hour basis to detect signs of state deterioration or imminent complications is a critical and, in acute cases, a life-saving task. Manual monitoring by specially trained and highly experienced ICU personnel complements the vital signs monitoring devices, such as EEG, to address issues such as false alarms, miss detections and overall patient state assessment. Full-time bedside care is demanding and cost-ineffective. ICU personnel rely on the vital signs monitoring devices’ alarm system which, as described before, tends to weight on sensitivity rather than specificity, and thus generate numerous false alarms. ICU personnel are trained to filter those alarms in their subconscious, but inevitably may miss critical alarms due to conflicting priorities or missfiltering. Video monitoring can complement the existing monitoring systems by covering the idle times of manual inspection and thus detecting signs of patient state deterioration that are missed either by the vital signs monitoring devices or the ICU personnel response.

Video monitoring of patients generates petabytes of rich data, carrying critical information that can, in isolation, but even more in combination with the vital signals and patient information, enhance the system’s capability to detect critical states in an early, robust and personalized way, and thus significantly improve patient care and personnel effectiveness. Extracting critical information from a video stream, detecting patterns of patient state deterioration and linking those patterns to the basic principles that underlie state changes and intervention-critical cases are part of cognitive computing that lead to data-driven insights. At IBM Research –Zurich we focus on detecting epileptic seizures on patients in status epilepticus, based on the analysis of the patient monitoring video streams. We employ sparse coding technologies to model neuro-ICU patient non-seizure states and novelty detection technology to detect epileptic seizure episodes.

For our growth area on video-based patient monitoring we are looking for motivated candidates for the enhancement and advancement of our computational framework. Candidates should be majoring in Computer Science, Electrical Engineering or related fields, with experience and interest in image processing, pattern recognition and/or machine learning.

Z-2017-23

Computational framework for analysis of line and scatter plots

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

Scientific documents such as papers, reports and patents but also other professional documents such as financial or medical reports include very often numerous graphs. The goal of those graphs is to illustrate, in a graphical way, data sets that explain, describe or emphasize the textual content of those documents. Those data sets can be generated through experiments, measurements, observations or other means, and are uniquely depicted in those graphs for the reader to extract the message they convey in a fast and efficient way. With the emergence of internet search, archival storage and along with the speed new scientific documents are created having a tool that can automatically scan through numerous documents and extract the main scientific knowledge from them and present it to us in a concise and meaningful way is of great value. However for a document to be completely and thoroughly analyzed, its graphs need also to be processed and the main knowledge, as presented by the depicted data sets, extracted. However, as those graphs are mostly stored as bitmap images, very often those data sets are noisy, with the graphical symbols used to depict them, such as lines, markers and text, being overlapping, overriding or intersecting. At IBM Research – Zurich we develop computational techniques, based on image processing and machine learning, to automatically identify the graphical symbols used, extract their semantics and finally the data (knowledge) they represent. From the taxonomy of various graphs, we are currently mostly focusing on line and scatter plots.

For our growth in the area of extracting knowledge from scientific graphs we are looking for motivated candidates to enhance our computational framework in the analysis of line and scatter plots. Candidates should be majoring in Computer Science, Electrical Engineering or related fields, with experience and interest in image processing, pattern recognition and/or machine learning.

Z-2017-24

Algorithmic re-engineering for POWER9/NVIDIA

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

A key aspect of any linear scaling electronic structure theory is the circumvention of the diagonalization bottleneck through purification or density matrix minimization methods, in which the computationally expensive kernel is based on sparse matrix-matrix multiply.

Recently, IBM Research – Zurich published a midpoint-based parallel sparse matrix-matrix multiplication algorithm, which looks very promising for practical applications. The method has several advantages, such as a) reduced communication volume, b) effective load balancing, and c) better communication latency (communication is only with nearby processes, leading to better usage of modern computer networks).

In this project, we propose to port our sparse matrix-matrix multiply kernel to modern GPUs hardware. The candidate will work on the oncoming IBM POWER9 CPU coupled to NVIDIA Volta-based Tesla GPUs via NVLink. To decrease computational time, the candidate will explore different techniques to boost low-level atomic kernels and investigate asynchronous communications to hide data transfer by local computations.

Requirement: Programming expertise on GPU.

Z-2017-25

Advanced machine learning for predicting organic chemical reactions

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

Organic synthesis is heavily used in many different industrial sectors, such as the oil, food and pharmaceutical industries. Organic molecules usually contain a very high degree of complexity, and thus very complex synthetic paths are needed to produce them.

Mapping the organic reaction space by varying input variables such as reactants, temperature and pressure requires an enormous experimental effort, limiting — most of the time — the discovery of new synthetic schemes.

In this project, we will explore advanced machine learning technics for predicting organic chemical synthesis as well as considering efficient data structure to encode organic chemical information.

Requirement: Machine Learning and programming experience.

Z-2017-26

Improving description of excited states at the DFT level of theory

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

Characterizing the optical properties of complex molecular systems is of paramount importance for understanding and improving organic photovoltaics devices. To this end, TDDFT-based molecular dynamics calculations offer a very good balance between accuracy and efficiency. However, in order to reach a good level of predictivity, high rank range-separated functionals are required.

The aim of this project is to design an efficient implementation and optimization of this class of functionals within the plane-wave code CPMD using the most recent algorithms (Lin L, JCTC, 12, 2241 (2016)) and HPC advances (parallelization).

Requirements

  • C++ and Fortran programming expertise.
  • Knowledge of DFT and excited-state calculations.
Z-2017-27

Designing novel solid state electrolytes for battery applications

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

The fabrication of efficient and long-lasting batteries is one of the main challenges in modern energy storage technology. Solid-state electrolytes conductors (SSEC) are a very promising class of materials that exhibits the unprecedented combination of (1) high ionic conductivity for Li ions (>3×10−4 S cm−1), and (2) chemical stability.

Molecular dynamics simulations are a valuable tool for the investigation of the transport mechanism, which is key for the design of better performing batteries. However, due to the size and time scales of the diffusion process, ab-initio MD techniques are not able to capture the physics of the problem and therefore force-filed (FF) based simulations are required. In this project, we will work on the design of polarizable FF model for the description of the ionic transport in the Garnet family (LLZO, Li7La3Zr2O12). We will use DFT-based techniques to derive the FF parameters and investigate the effect of doping (e.g. Zr » W) on the overall performance of the SSEC.

Requirements

  • Knowledge of DFT and classical molecular dynamics simulations.
  • Basic programming expertise.
Z-2017-28

From coarse-grained to finite element method to the modelling of oil recovery

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

In the coarse-grained (CG) approach, groups of atoms are identified and treated as a single entity, also called a bead, thus resulting in a reduction of the degrees of freedom of the systems. The idea is to reduce the complexity of the system allowing a large computational acceleration compared to all-atoms simulations at the cost of a loss of accuracy which can be controlled by a careful selection of the groups of atoms being part of each bead and with a proper parameterisation. This project is focused on the application of CG molecular-dynamics simulations to the modelling of complex oil and water mixtures within the reservoir. Our aim will be to use CG simulations to extract parameters, that are not experimentally available for complex multi-component multi-phase mixtures, to improve the reliability of the finite element (FE) simulations to have not only a qualitative correct physics but also quantitative. Examples of parameters are capillary pressures, relative permeabilities, densities, solubility, viscosities, compressibilities, and equations of state for the two-phase, black oil, volatile oil, and compositional models.

Requirements

  • Classical molecular dynamics simulations and experience with computational fluido-dynamics.
  • Basic programming expertise.
Z-2017-29

Flexible hierarchical smart contracts for blockchain technology

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

Blockchain technology as the underlying technology of cryptocurrencies has gained significant interest well beyond the finance industry. Recently, the potential of this technology is being explored in many industries, especially whenever multiple parties with limited mutual trust have to interact and an indelible and immutable record of their interactions (transactions) needs to be kept. One typical example is the tracking and tracing of the lifecycle of parts from their current state back to their production, for instance in the aircraft industry.

Blockchain technology also includes the concepts of smart contracts, which is essentially program code contained in the blockchain that is executed to run certain logic to verify conditions or “contracts”. Smart contracts are expressed as procedural code (Go or JAVA on Hyperledger) that runs on the distributed nodes of the blockchain network.

The scope of this internship is the development of flexible hierarchical smart contracts in Go or Java. Hierarchical smart contracts are relevant for instance in the above-mentioned parts tracking example, where hierarchies (e.g. product » consists of assemblies » consists of parts, etc.) exist and (trans)actions may affect one, multiple, or all hierarchy levels. The focus will be on contracts that can easily be adapted to a broad variety of use cases.

Requirements

Candidates should be self-taught and motivated to learn new things. Ideally, they are familiar with the basic principles of blockchain technology (Byzantine failures, distributed DB, etc.) and have a track record in programming in Go and/or JAVA. Familiarity with end-to-end web development (JavaScript + frameworks, Node.js, etc.) is desirable.

Z-2017-30

Predictive analytics: From correlations to causality

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

Configuring enterprise relational databases is a notoriously hard problem, driven by the large number of configuration parameters that come with enterprise relational database systems, such as IBM DB2 or Oracle, and their respective widely varying ranges. These factors typically prohibit database administrators to identify and apply the optimal configurations for specific workload types, such as OLTP or OLAP. In an ideal world, this problem would be tackled by using full factorial design, which is running all possible combinations of configurations and choosing the best one. However, as the number of parameters increases linearly, the number of combinations explodes, making this approach intractable in practice. For example, IBM DB2 comes with over 100 configuration parameters. In the simplest scenario, if every one of the 100 parameters can only take one of two possible values, the number of combinations that need to be evaluated is 2100.

In the PASIR4DB project, we are looking into tackling the problem of database reconfiguration through machine learning techniques, as neither top-down or bottom-up approaches are suited. Not only can machine learning be successfully used to explore and understand the huge configuration space, but it can also generalize beyond specific configuration scenarios. We collect database incident tickets from IT environments, as well as server and database configuration data, to use as a data set. Then, using analytics, we identify those databases whose configurations are correlated with incidents, as well as the specific configuration parameters that should be reconfigured and how to reconfigure them. The pipeline is shown in Fig. 1.

Predictive Analytics: From Correlations to Causality

Overview of the PASIR4DB pipeline.

On the one hand, our extensive study in [1] shows that non-linear models, such as random forest, achieve highest performance in terms of accuracy and generalization, since they capture nonlinear relationships between configuration parameters, as opposed to linear models, which assume all parameters are independent. On the other hand, nonlinear models suffer from high computational cost and lower interpretability. However, random forest coupled with decision trees already provide easily interpretable rules, that can be used out-of-the-box by any database administrator.

In the next steps, we plan to investigate causality relationships between configuration parameters. This challenging problem has two main benefits:

  • Simplify multivariate models, by including only those configuration parameters that cause changes in other parameters, therefore reducing computation time and increasing interpretability;
  • Infuse causal knowledge into the model, such that it also recommends reconfigurations of parameters caused by others.

Publications

[1] Giurgiu, M. Botezatu, D. Wiesmann. Comprehensible Models for Reconfiguring Enterprise Relational Databases to Avoid Incidents. In proceedings of ACM CIKM 2015.
]2] Giurgiu, A. Almasi, D. Wiesmann. Do you know how to configure your enterprise relational database to avoid incidents? In proceedings of IFIP/IEEE IM, 2015.

Z-2017-31

Performance optimization of predictive analytics tool suite

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

IBM’s Predictive Analytics for Server Incident Reduction (PASIR) uses text mining and multivariate statistical analysis to identify problem servers and forecasts expected improvements from range of up-grade scenarios (see Figure). The PASIR tool has been applied to over a hundred IT environments and modernization actions have resulted in significant reductions to incident volumes and a corresponding increase in environment availability. The primary use cases of PASIR are planning a refresh program, identifying at risk application environment, identifying servers for cloud migration, and contributing to proposal cost penalty analysis for at risk servers.

The project will mainly focus on contributing to the next generation of PASIR. This will involve the re-architecting, parallelizing, and optimizing of a complex predictive analytics application. The successful applicant will need to be comfortable working across multiple languages and technologies including Python, PHP, Scala, Hadoop, and SQL. Experience working on large-scale production software is highly desirable.

Overview of the PASIR Concept.

Overview of the PASIR Concept.

Z-2017-32

Mining and classifying semi-natural text data

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

We are looking for applicants in the area of mining and classification of semi-natural text data. The project description can be found here.

Z-2017-33

Revealing novel subpopulations of cancer cells using single-cell data

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

Tumor cells exhibit a high degree of variability in terms of phenotypic traits such as their morphology, metastatic potential and molecular profile, present not only across different patients but also within the same tumor (Marusyk et al., 2012). Tumor heterogeneity has emerged as a fundamental characteristic of multiple cancer types and a missing link to our understanding of the mechanisms underlying disease complexity. Mass cytometry (CyTOF) is a single-cell proteomics technique that allows the simultaneous quantification of dozens of proteins at a single-cell level (Bandura et al., 2009). Computational analysis of the resulting high-dimensional data enables the identification of cell subpopulations associated with different stages of tumor progression and metastasis. To date, a number of state-of-the-art methods for identifying cell subpopulations exist in the literature, each with its strengths and weaknesses (Qiu et al., 2011, Amir et al., 2013, Levine et al., 2015). This project involves the study of tumor heterogeneity in a population of breast cancer cells analyzed by CyTOF, available through collaboration of the Systems Biology group of ZRL with the Bodenmiller lab at the University of Zurich. More specifically, the goals of the project are to (i) review state-of-the-art methods for identification of cell subpopulations from CyTOF data (ii) apply a selection of the methods on the abovementioned experimental data and compare results (iii) produce a final report, emphasizing on the evaluation of the accuracy and performance of different methods.

Requirements

The ideal intern should have a good background in Machine Learning, particularly in clustering and dimensionality reduction methods and must be fluent MATLAB or Python. A background in Biology is not necessary.

Z-2017-34

Development of web tools for the analysis of single-cell data

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

Advancements of single-cell experimental methods have made it possible to quantify a variety of biological entities (e.g. proteins, genes) in thousands of individual cells and thus study complex biological processes such as differentiation or disease progression. Mass cytometry (CyTOF) is a single-cell proteomics technique that allows the simultaneous quantification of dozens of proteins at a single-cell level (Bandura et al., 2009). However, single-cell resolution comes at the expense of unwanted variability, such as the one origination from confounding factors. In the Systems Biology group of the IBM Research – Zurich Lab we have been developing computational methods to analyze and account for these effects, focusing specifically on the cell cycle and cell volume variability (Rapsomaniki et al., ISMB 2016, Orlando, USA). Our method, implemented as a web app, includes a number of features such as data processing and transformation, cell cycle classification and reconstruction of continuous trajectories. This project involves the implementation of computational methods into the existing web-based platform. More specifically, the goals of the project are: (i) benchmarking the existing methods using diverse experimental datasets (ii) integration of additional functionalities in the web app (iii) comparison of results for different data and/or methods.

Requirements

The ideal intern should have experience in deploying web applications in Django. Knowledge of Python and HTML, CSS and Javascript is also required. A background in Biology is not necessary.

Z-2017-35

Implementation of a data-driven consensus interactome

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

To decipher system-wide molecular interaction networks is crucial to understand biological processes of cells. Disease progression has been viewed as the perturbation of such networks [1,2]. Knowing how molecules are interconnected forms the basis to study progression of complex diseases such as cancer. An integrated molecular map is also called network of molecular interactions or interactome. We use multiple inference methods to build a data-driven consensus interactome inspired by DREAM5 [3]. The aim of the project is to add state-of-the-art methods that will contribute to the consensus approach, to test different method and data type combining approaches and to validate performance of the consensus method using simulated data sets. After the validation the consensus method will be run on real omics data sets.

Requirements

Basic knowledge of OOP (Object Oriented Programming), Python, applied statistics.

References

[1] Vidal, M. et al. 2011. “Interactome Networks and Human Disease.” Cell 144 (6): 986-998.
[2] Barabási, A.-L. et al. 2011. “Network Medicine: A Network-Based Approach to Human Disease.” Nature Reviews. Genetics 12 (1): 56-68.
[3] Marbach, D. et al. 2012. “Wisdom of Crowds for Robust Gene Network Inference.” Nature Methods 9 (8): 796-804.

Z-2017-36

Text understanding for domain-specific question answering

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

At IBM Research – Zurich, we are developing cognitive solutions for challenging NLP and text mining problems on very large, domain-specific, unstructured text documents. One way to represent the gist of the information available in a certain domain of interest is to use an ontology that captures all important entity types (concepts) in the domain and their interrelations along with their instances. The goal of one of our cognitive projects is to develop a domain-specific automatic question answering system, where the development of a domain-specific ontology is a very important component.

Although very rich ontologies (e.g. DBpedia, Freebase, YAGO) exist for the common knowledge domain, this is not the case for many other domains. Thus the main focus of the internship is to work on approaches for (semi-) automatic ontology extraction for a given target domain from a large amount of relevant domain-specific text corpora. This is a complex task that involves adaptation and application of various state-of-the-art NLP, information extraction and machine-learning techniques for solving tasks such as efficient entity mention detection and type assignment, efficient relation phrase detection and type assignment, synonymous entity mentions (relation phrase) discovery via deep neural networks embeddings, and fast graph clustering, to name a few. A team of domain experts is also available for any potential evaluation or labeling tasks.

Requirements

Candidates should have a Bachelor’ degree in Computer Science with strong programming skills in Java/Python or similar, strong analytical and problem solving skills, and excellent communication and team skills. Experience with NLP, machine learning, text mining, deep learning, software engineering and Big Data analytics is a plus.

Z-2017-37

Deep learning for time series and event series analysis

[ Project description | Close ]

Department:   Cognitive Computing & Industry Solutions
Short Description:  

Analysis of continuous and discrete-valued time series are essential techniques for the intelligent management of complex systems across industries such as IT, aviation and energy. It powers predictive maintenance, where system failures are predicted and prevented before they eventually occur, avoiding the consequences of an outage or costly repairs.

Machine learning is widely applied for understanding, forecasting and predicting based on time-series data. Deep-learning models such as long short-term memory (LSTM) exhibit interesting properties for time-series analysis, such as powerful pattern recognition, implicit state-space modeling and accounting for long-range dependencies. However, their application is often not straightforward and requires careful consideration based on the data.

The successful candidate will get the opportunity to apply and perfect state-of-the-art machine-learning methods for event-series data to predict failures of real-world industrial systems. The candidate will get access to the Lab’s GPU-backed compute cluster and potentially the chance to test their model on live system data.

Requirements

  • Experience in running a research project or a solid background in statistics, probability theory, machine learning and time-series analysis.
  • Hands-on experience implementing machine-learning algorithms.
  • Familiarity with Theano, TensorFlow, Torch or other GPU-accelerated scientific libraries.
  • Great confidence in coding Python, C++, Matlab or R.