2018 Great Minds student internships

Pitch your vision of the most exciting IT challenges and win an internship at IBM Research

Topics at the Africa Labs

Ref. code  
A‑2018‑01

Understanding breast cancer through Big Data (South Africa)

[ Project description | Close ]

This project seeks to understand the composition of breast cancer cases (tumor location and biomarkers present) for different types of breast cancer, based on pathology reports.

Requirements

  • Prior research and publications in some area of Natural Language Processing (NLP) or biostatistics
  • Hands-on experience implementing statistical machine learning algorithms
  • Coding experience in Python, Java, Javascript (e.g. Node.js), C++, Matlab or R.
A‑2018‑02

Facial and emotional recognition (South Africa)

[ Project description | Close ]

This project will analyze labelled video data of standard emotions for subjects from over a dozen countries and multiple races. This rich dataset contains naturally expressed emotions and on-command expressions.

Requirements

  • Prior research and publications in computer vision or affective computing
  • Hands-on experience implementing computer vision machine learning algorithms, and deep learning
  • Coding experience in Python, Java, Javascript (e.g. Node.js) or similar.
A‑2018‑03

Accelerated language learning in classrooms (South Africa)

[ Project description | Close ]

This project seeks to improve English proficiency in disadvantaged communities where there is a lack of skilled teachers and a mismatch between the predominant language of instruction and the language of assessment. This is done through technology-assisted daily reading and writing exercises in classrooms.

Requirements

  • Demonstrable research proficiency/li>
  • Programming experience in Python, Java and JavaScript
  • Working knowledge of popular and current machine learning methods and implementation techniques.
A‑2018‑04

Healthy conversations (Kenya)

[ Project description | Close ]

This project seeks to develop conversational system for patient and care provider interaction via text. Various health-based channels such as adverse drug event detection, medication compliance, disease education and others are being developed to extend the reach of care of health systems in Africa. Specifically, the system is aimed at supporting individuals with lifelong chronic illnesses such as HIV, hypertension, and diabetes. A major challenge is the development of the conversational system to interact with the multi-lingual, literate to semi-literate diverse population across Africa.

Requirements

  • Experience in NLP, Conversational Agents, and related technologies
  • Hands-on experience implementing statistical machine learning algorithms
  • Coding experience in Python, Java, Javascript (e.g. Node.js), C++, Matlab or R.
A‑2018‑05

Non-communicable disease monitoring in sub-Saharan Africa (Kenya)

[ Project description | Close ]

This project seeks to assist in the diagnosis, treatment and monitoring of non-communicable diseases such as hypertension and diabetes in sub-Saharan Africa where there are startlingly few doctors and nurses per capita. We are developing a suite of interactive tools either to assist non-highly-trained medical personnel or to interact directly with the patient.

Requirements

  • Prior research and publications in healthcare, biology or biostatistics
  • Hands-on experience implementing statistical machine learning algorithms
  • Coding experience in Python, Java, Javascript (e.g. Node.js), C++, Matlab or R.
A‑2018‑06

Adding transparency and accountability to government processes (Kenya)

[ Project description | Close ]

We seek to deeply understand various cumbersome and manual government processes and then automate and streamline them in an effort to improve the ease of doing business in Kenya and other countries in sub-Saharan Africa.

Requirements

  • Prior research in process modeling and optimization, AI planning or any related area
  • Hands-on experience implementing algorithms in AI planning, control theory or any related area
  • Coding experience in Python, Java, Javascript (e.g. Node.js), or similar.
A‑2018‑07

Data and service marketplace platform on a blockchain network (Kenya)

[ Project description | Close ]

This project seeks to understand and develop a data and service marketplace platform on a blockchain network in order to facilitate the repository of verifiable data, models and services (e.g., AI services) among participants of the blockchain network, allowing faster data creation, real-time data sharing, sharing richer datasets, and collaboration across various system of record platforms.

Requirements

  • Prior research and publications in machine learning, security protocol, data science, blockchain, software engineering
  • Hands-on experience implementing distributed systems, distributed/statistical machine learning algorithms, blockchain protocols (e.g., hyperledger)
  • Coding experience in Go, Python, Java, Javascript (e.g. Node.js).
A‑2018‑08

Composable consent directive (Kenya)

[ Project description | Close ]

This project seeks to understand, develop and experiment a generalized consent directive protocols on a blockchain network that can be composed given a context specification (e.g., healthcare data sharing, banking data sharing).

Requirements

  • Prior research and publications in protocols, blockchain, software engineering, ontology, knowledge engineering, formal languages
  • Hands-on experience implementing distributed systems, algorithms, blockchain protocol (e.g., hyperledger), knowledge engineering, protocol verification
  • Coding experience in Go, Python, Java, Javascript (e.g. Node.js), ontology languages.
A‑2018‑09

Understanding online user-generated data in emerging markets (Kenya)

[ Project description | Close ]

This project seeks to analyze and leverage alternative sources of data (e.g. cell phones, social media) to create novel financial profiles to help identify and recommend products.

Requirements

  • Prior research and publications in data mining
  • Hands-on experience implementing statistical machine learning algorithms
  • Coding experience in Python, Java, Javascript, C++, Matlab or R.

Topics at the Zurich Lab

Ref. code  
Z-2018-1

An interplay of machine learning, deep learning and NLP

[ Project description | Close ]

At IBM Research – Zurich, we are developing cognitive solutions for challenging NLP and text-mining problems on very large, domain-specific text documents. In one of our cognitive projects, we first aim to discover specific information that is of interest in the target domain in large text documents. We then aim to provide an understandable, short text summarization of the discovered information of interest. Furthermore, locating text excerpts very similar to a given short target text is also of great interest because it can enable a great amount of automation in the creation of the documents in the target domain.

To achieve these goals, one needs to address challenging text classification, text summarization, text similarity and text search problems that require adaptation and application of state-of-the-art machine learning, deep learning and NLP techniques. There is another challenge when dealing with domain-specific text: Although very rich ontologies to improve the quality of text search results exist for the common knowledge domain (e.g. DBpedia, Freebase, YAGO), this is not the case for many other domains. Therefore, approaches for (semi-)automatic ontology extraction for the target domain from a large amount of relevant domain-specific text corpora are also of interest in this project. A team of domain experts is available for any potential evaluation or labeling tasks.

Requirements

Candidates should have

  • A Computer Science background (Bachelor degree or graduation imminent)
  • Strong programming skills in Python/Java or similar
  • Strong analytical and problem-solving skills
  • Excellent communication and team skills
  • Experience with machine learning, NLP, text mining, deep learning, software engineering and Big Data analytics is a plus.
Z-2018-2

Connecting blockchain technology with the physical world

[ Project description | Close ]

Blockchain technology as the underlying technology of cryptocurrencies has gained significant interest well beyond the finance industry. Currently, the potential of this technology is being explored in many industries, especially whenever multiple parties with limited mutual trust have to interact, and indelible and immutable records of their interactions (transactions) must be kept. One example where blockchain technology can be used beneficially is the tracking and tracing of goods throughout the supply chain, for example to prevent the distribution of counterfeit products. In addition to creating a trustworthy record of transactions for asset traceability in a blockchain, it is crucial that the physical products to be tracked are uniquely identifiable, that the products and their identities cannot be cloned, and that a trusted link between the physical goods and the entries in the blockchain system is provided.

The scope of this internship is to extend an existing blockchain to create a supply chain solution for the pharma industry to provide a trusted link to the physical products using the concept of physical markers. Another aspect of the internship is to help investigate physically unclonable identifier concepts for IoT/Industry 4.0.

Requirements

Candidates should be self-taught and motivated to learn new things. Ideally, they are familiar with the basic principles of blockchain technology and have a track record in programming in Go and/or JAVA. Familiarity with end-to-end web development (JavaScript + frameworks, Node.js, etc.) is desirable.

Z-2018-3

Big Data time series analysis using deep learning

[ Project description | Close ]

Analysis of continuous and discrete-valued time series is essential for the intelligent management of complex systems in a range of industries. Predictive maintenance aims to predict system failures before they occur, preventing the consequences of outages and costly repairs.

Machine learning is widely applied for understanding, forecasting and predicting from time series data. Deep-learning techniques have great performance in discovering hidden patterns when large amounts of data are available. However, the applicability and business value of such techniques is largely impacted by the subtleties of modelling and the quantity, quality and freshness of data used for training.

The successful candidate will have the opportunity to apply and perfect state-of-the-art machine-learning methods for time-series analysis to predict failures of real-world industrial systems, and/or work on a highly scalable Big Data infrastructure that enables training and deployment of machine-learning models in a reliable manner.

Requirements

  • Ability to run a research project or having a solid background in statistics, probability theory, machine learning and time-series analysis
  • Hands-on experience building machine-learning algorithms and/or large-scale data processing techniques
  • Preferred: Familiarity with big data technologies such as Spark and Kafka and/or GPU-accelerated scientific libraries for machine learning
  • Great confidence in coding Python, Scala or R.
Z-2018-4

Explaining machine-learning classifier predictions

[ Project description | Close ]

Despite widespread adoption, machine-learning models, especially those with high complexity, essentially remain black boxes. Understanding why a model has made a specific prediction is important if action is to be taken based on that prediction. Such understanding also provides insights into the model, which is essential when deciding whether the model is expected to behave reasonably when deployed in the wild or whether a new model needs to be developed.

The scope of this internship is to build a method that can explain individual predictions in the classification space, as well as a more general approach to quantify the model’s trust level prior to applying it to real-world data. Specifically, the focus will be on explaining various models, ranging from SVM to random forest and, potentially, deep-learning networks. The preferred implementation language is Python.

Requirements

Candidates should be self-taught and motivated to learn new things. Ideally, they are familiar with at classic supervised machine-learning algorithms and have a track record in programming in python. Experience with deep learning networks is desirable.

Z-2018-5

Hardware security for cloud systems

[ Project description | Close ]

A Hardware Security Module (HSM) is a general-purpose computing environment that withstands both physical and logical attacks and has special hardware to perform cryptographic operations and protect keys. An HSM is accessed from a host computer system using a carefully-designed set of API functions. However, HSMs are instantiated today on host systems such that they contradict the cloud philosophy.

The hardware security research team at IBM Research – Zurich is enabling IBM’s HSM into cloud infrastructures. The candidate will first implement the communication channel through a standard network. Then he or she will integrate a service channel for system and key management. If time permits, the candidate can start to implement a mean for ensuring secure creation and management of key material and for managing the HSMs from a host.

Requirements

Candidates should have a good understanding of Linux operating systems and C/C++ system programming skills. Knowledge of system security and cryptography is beneficial.

Z-2018-6

Cloud-scale key service docker engine

[ Project description | Close ]

Multi-tenant security is a steeply rising concern for cloud services. Data encryption both in storage and network is becoming ubiquitous. One largely underestimated challenge in this context is related to the handling of encryption keys, be it in key generation and retrieval, key storage, support for deep key hierarchies, or key lifecycle management, all with sufficient performance to enable cloud-scale operation in a multi-tenant environment.

This project aims to leverage a new key-handling infrastructure as available on the state-of-art Intel Purley platforms with the Lewisburg C62× bridge chip, either as an integral part of our newest Purley servers or as PCIe endpoints. In particular, a secure and high-performance Docker service engine should be implemented using Intel QAT/PTT/KPT(/IE) technologies.

Requirements

C and C++, Docker, some basic security know-how is considered beneficial.

Duration

Minimum 3 months, 6 months preferred, thesis work possible.

Z-2018-7

Blockchain application development

[ Project description | Close ]

We are looking for motivated interns to join our work on industry platforms and blockchain. Ideal candidates are familiar with blockchain systems (e.g. Hyperledger) and have a track record in end-to-end web development (JavaScript+frameworks, Node.js, Nginx, NoSQL, Golang).

During the internship, candidates will work on a blockchain application for supply-chain management (and potentially other areas).

Requirements

We expect applicants to be self-taught, organized and open-minded. They should have a passion for agile web development as well as a strong interest in scientific teamwork.

Z-2018-8

Learning logs and finding anomalies

[ Project description | Close ]

Machine-generated logs are large sequences of semi-structured data, which sometimes need to be understood quickly and reliably in response to an incident, such as a service outage. This project develops and refines algorithms to create a log digest that is useful to a human investigator. In addition to discoverying structure, this includes the automated discovery of anomalies in the log that may hint at the cause of an incident. The performance of new and existing algorithms is compared.

Requirements

Strong algorithmic and coding skills required.

Z-2018-9

Automated knowledge-based deep learning

[ Project description | Close ]

Deep neural networks are powerful tools used in a variety of large-scale, real-world problems such as image classification, object detection, natural language processing, and human action recognition. Although the state-of-the-art results for easier tasks exceed human accuracy, when applied to more complex applications (e.g. in healthcare, finance and manufacturing) they face issues that researchers have been able to solve with only limited success.

Such issues include:

  • Extremely long training times (up to several months)
  • Extremely challenging optimization of hyper-parameters (e.g.  network topology, learning configuration and data augmentation configuration)
  • Unbalanced datasets or missing data.

Moreover, relevant publications for machine learning are appearing at an incredible pace, making it difficult to keep track of all algorithms and their most optimal implementations. Several publications promise increased classification or execution performance, but they do not release code nor are they evaluated on all data sets of interest.

To cope with the drawbacks listed above, we at the IBM Research – Zurich Laboratory are developing a framework that facilitates data preprocessing, training and visualization of the results. A common representation of hundreds of experiments allows us to learn from previous research at a higher pace and make automated suggestions in terms of hyper-parameters when new datasets are at hand.

In this Great Minds project, the student will have access to our current infrastructure and framework, and will be asked to extend it. The student will work on state-of-the-art networks, with the aim of improving our framework’s suggestion in terms of hyper-parameters. Our infrastructure is based on a large set of POWER systems with latest-generation GPUs. Maintaining and optimizing code running on such a large infrastructure becomes a real scalability challenge.

Requirements

  • Strong programming skills, C++ or Python preferred
  • Experience with machine-learning framework, Tensorflow or Theano preferred.
Z-2018-10

Developing AI models to formulate consumer goods

[ Project description | Close ]

Optimizating the formulations of consumer goods is a very important industrial task for any product that reaches end users. In this project, we will explore different approaches in machine learning and neural networks for optimizing the formulations of consumer goods versus pre-assigned key performance indicators.

Requirements

Knowledge of basic machine learning / neural network methodologies is required. No specific knowledge of materials or chemistry is required.

Z-2018-11

AI-based construction of Hamiltonian in quantum mechanical simulations

[ Project description | Close ]

The IBM team recently disclosed a parallel sparse matrix–matrix multiplication [1] that is very promising for quantum mechanical simulations of large biological systems. In this project, we will use the same computational framework to explore different AI methodologies for constructing the underlying Hamiltonians along the quantum mechanical simulations.

Requirements

Knowledge of basic machine learning / neural network methodologies is required. No specific knowledge of computational material science or chemistry is required.

[1] J. Chem. Theory Comput., 11(7), 3145, 2015.