Great Minds student internships
2022 Projects
Topics at the Africa Lab
in Johannesburg
Ref. code Project description |
---|
SA-2022-01 Neuro-Symbolic AI for Natural Language Understanding Neural/sub-symbolic interpretations of logical inference and reasoning dates back to the first description of artificial neural networks and its use as threshold logic. However, symbolic AI was the dominant paradigm for decades during the advent of AI as it represented interpretable and general, human-like reasoning. Neuro-symbolism aims to combine the fault-tolerance, parallelism and learning of connectionism with the logical abstractions and inference of symbolism. Neuro-symbolism promises to combine the strengths of performing logical abstractions within connectionist settings. Neuro-Symbolic integration can be done for e.g. 1) propositionalization of raw data for a symbolic interpretation; and 2) predicate implementation to perform logical functions on ground propositions; and 3) predicate invention for rule induction and theory learning; and 4) implementing various logic reasoning constructs like modus ponens, inference, implication, entailment and modal logic. Requirement
|
SA-2022-02 Improving Sub-Seasonal to Seasonal Climate Predictions Sub-Seasonal to Seasonal (S2S) climate prediction has long been a gap in operational weather forecasts. Its timescale varies from two weeks to an entire season, although some authors have recently used the term S2S more broadly to include seasonal forecasts up to 12 months ahead. S2S is considered more challenging than both numerical weather prediction (NWP) (1-15 days) and Seasonal forecasts (2-6 months) due to the limited predictive information from land and ocean and the weak predictive signal from the atmosphere. Improving S2S forecasts would significantly impact downstream applications such as streamflow forecasting, heatwave prediction, water resource management, and in-season climate-aware crop modeling on the sub-seasonal time scale. Requirements
|
Topics at the Africa Lab
in Nairobi
Ref. code Project description |
---|
K-2022-01 Automated Subgroup Analysis of Post-COVID Condition Risk Factors and Interventions Post-COVID conditions are a wide range of new, returning, or ongoing symptoms that occur in individuals previously infected by the SARS-CoV-2 virus even if they did not have COVID-19 symptoms. Post-COVID symptoms typically occur 3 months from the onset of COVID-19, last for at least 2 months, and cannot be explained by an alternative diagnosis. To date, little is known about the prevalence, incidence, risk factors, and interventions for ameliorating post-COVID conditions. The overarching goal of this research is to evaluate variations of care associated with Post-COVID conditions and related interventions. The specific objectives are multifold:
In this project, we will analyze the National COVID Cohort Collaborative (N3C) dataset provided and maintained by the National Center for Advancing Translational Sciences (NCATS), a component of the National Institutes of Health, United States (https://ncats.nih.gov/n3c/about/data-overview). The N3C dataset is a collection of clinical, laboratory, and diagnostic data about over 8 million persons with 2.7 million positive COVID-19 cases from multiple institutions in the United States as of October 2021. It is de-identified, aggregated, and harmonized in the NCATS N3C Data Enclave and has been made available for the research community to study COVID-19 outcomes, treatments, and interventions. Data Preprocessing:
References |
K-2022-02 Future of Health: Transformation of Health Data in the Generation of Contextual Predictions Our team is focused on improving the process of evidence informed decision making, and we develop or extend tools from the space of Artificial Intelligence/Machine Learning to complement computational models already familiar in the domain of interest. Related Reading |
K-2022-03 Cross-Modal Representation Analysis in Dermatology Academic Materials Images depicting dark skin tones are significantly under-represented in the educational materials used to teach primary care physicians and dermatologists to recognize skin diseases. This could contribute to disparities in skin disease diagnosis across different racial groups. Previously, domain experts have manually assessed textbooks to estimate the diversity in skin images. Manual assessment does not scale to many educational materials and introduces human errors. To automate this process, we are working on a project that aims to automatically analysis representation of skin tones in dermatology academic materials, such as textbooks. This project is a cross-lab collaboration effort of IBM Research Labs in Nairobi (Kenya), Zurich (Switzerland) and New York (USA) along with external collaborations with researchers from academia including Stanford University. Current work focuses on extracting images from documents, selecting skin images, segmentation of skin pixels and estimation of skin tones1. A promising extension of the current work focuses on analyzing the textual content of the academic materials in addition to the imagery content in cross-modal setting to evaluate representation of subgroups (e.g., skin tones, sex, and age). The proposed project requires familiarity with recent natural language and image processing techniques. Related Reading |
Topics at the Europe Lab
in Zurich
Ref. code Project description |
---|
Z-2022-01 Advancing AI Models for Document Conversion Documents are ubiquitous in everyday life. They are created at an ever increasing rate and encode often very valuable information. Unfortunately, they are often in complex formats such as PDF, which erase all their structure. Requirements
|
Z-2022-02 Graph Convolutional Networks to Find Hidden Knowledge in Large Document Graphs Knowledge can be catogerized into two components, i.e. a factual part and an hypothesized part. For the factual part, one can use graph structures, in which nodes represent entities (e.g. materials, properties, value-ranges, etc) and links represent the facts. For example, if we have the statement `Material A has property B of value C.`, we can represent this in a graphs as `node A` -> `node B` -> `node C`. Requirements
|
Z-2022-03 Computer Vision for Deep Search in Bioactive Molecule Images Computer vision, in particular object detection and instance-segmentation methods, are of high importance for Deep Search in the bioactive molecule domain and generally in organic chemistry. The reason being that images of molecules (in scientific literature) and images of so-called Markush structures (in patents) contain crucial information that is not available in the documents' text. Requirements
|
Z-2022-04 NLP for Material Science Natural Language Processing is a cornerstone technology to extract valuable information from documents. Despite the recent impressive progress that has been made in this field, there are still grand challenges for NLP, especially with regard to extracting data in specific technical fields. Requirements
|
Z-2022-05 NLP for Business-Insights Natural Language Processing is a cornerstone technology to extract valuable information from documents. Despite the recent impressive progress that has been made in this field, there are still grand challenges for NLP, especially with regard to extracting data in specific technical fields. Requirements
|
Z-2022-06 Enterprise NLP powered by ML and DL The IBM Research Laboratory in Zurich is leading the design of novel cutting-edge solutions customized to tackle challenging industry-specific Natural Language Processing (NLP) problems pertaining to specialized domains. The main goal is to replace or accelerate traditional human-supervised procedures with automated services leveraging Machine Learning and Deep Learning methods. Toward this goal, we are looking to strengthen our team with highly motivated interns that will contribute to the design and development of such solutions. The successful candidate will join our team at the Zurich Research Laboratory, having the opportunity to work in a unique research-corporate environment, and gather first-hand experience in developing novel AI services based on advanced Machine Learning and Deep Learning methods in the NLP domain. Core activities
Minimum qualifications
Preferred qualifications
|
Z-2022-07 AI for Civil Engineering Applications Aging and deteriorating infrastructure (bridges, tunnels, dams, among others) is a struggle for companies around the world. With the cost of physical inspections and continued maintenance rising all the time, these companies need a better way to manage their current infrastructure. Indeed, roughly 50 billion dollars and two billion civil-engineering labor hours are spent annually monitoring bridges for defects. Asset managers need to identify elements to be repaired or replaced quickly, minimizing the lifetime cost of maintenance of their asset portfolio, without any compromise on safety and regulations. However, correct risk assessment and prioritization become a challenge when inspecting a single bridge takes from days to months. Minimum qualifications
Preferred qualifications
|
Z-2022-08 Extracting Chemical Information from the Chemical Literature We have developed numerous machine-learning algorithms for predicting the precursors or products of chemical reactions and recommending the procedures required to carry out reactions in the laboratory. Millions of patents provided the data essential to train these models. Thousands more further chemical reactions are described in articles published in the chemical literature. As a result, they have the ability to improve the algorithms' performance. They are, however, typically provided in a separate format, making them inaccessible to programs meant to extract information from patents. The goal of this project is to design new tools to extract chemical information from the text and images of articles published in the chemical literature. Requirements
|
Z-2022-09 Design of Novel Chemical Reactions Synthetic organic chemistry has always been concerned with the discovery of novel chemical reactions. Each new reaction adds to the arsenal of synthetic tools available and expands the possibilities for developing and optimising novel molecules. The majority of novel reactions have been discovered by chance, and it has been up to chemists to identify and investigate them in depth using their "chemical intuition." The goal of this research is to investigate machine learning-enhanced methodologies for designing new chemical reactions from publicly available chemical data. Requirements
|
Z-2022-10 Automating AI for Advanced Data-Driven Material Manufacturing The manufacture of materials generates a vast amount of data, which includes processing conditions, quality checks, and property measurements. The information contained in the data is frequently not fully explored since significant correlations are frequently hampered by the complexity. Machine learning algorithms assist in extracting knowledge from complex data, revealing previously undetectable insights. However, properly adjusting the model parameters may be time expensive, depending on the data structure. Requirements
|
Z-2022-11 Decentralized Digital Identity Platform and Use Cases IBM has a long history in the area of identity management being as a corerequirement of any trusted business relationship. Actors in any business relationship should be well identified and their messages to other parties authenticated. The scope of our research includes but is not restricted to:
As an intern, you will investigate identity solutions for client use cases and have first-hand experience of building identity solutions for real-world systems.
|
Z-2022-12 Secure execution on a blockchain: Hyperledger Fabric Private Chaincode Hyperledger Fabric is a permissioned blockchain platform that offers common program execution on an infrastructure shared by multiple parties, of which no-one is trusted. Hyperledger Fabric Private Chaincode (FPC) enables the secure execution of chaincode using Intel SGX for Hyperledger Fabric. Intel SGX is the most prominent trusted execution environment (TEE) available today, it offers secure execution contexts called enclaves on a CPU, which isolate data and programs from the host operating system in hardware. The FPC project takes up technology from a research project at IBM Research Europe - Zurich. |
Z-2022-13 CBDC-DID The rise of digital payments at the detriment of cash has stirred interest in a digital alternative that’s as resilient and reliable as cash – especially, in the face of natural disasters or large-scale infrastructure outages. This digital alternative is Central-bank Digital Currency (CBDC for short). CBDC is governments’ response to a fragmented payment landscape that’s primarily controlled by the private sector. CBDC is aimed to replace cash and offer similar guarantees: from being a store of value and medium of exchange, to enabling offline payments and anonymous transactions (to a degree). The scope of our research includes but is not restricted to:
As an intern, you will investigate identity solutions for CBDC and have first-hand experience of building blockchain solutions for real-world systems. |
Z-2022-14 Deep Learning Incorporating Biologically-Inspired Neural Dynamics and Learning Neural networks are the key technology of artificial intelligence that has led to breakthroughs in many important applications. These were achieved primarily by artificial neural networks that are loosely inspired by the structure of the brain, comprising neurons interconnected by synapses that are trained offline and fixed after deployment. Meanwhile, the neuroscientific community has developed the Spiking Neural Network model that additionally incorporates biologically realistic temporal dynamics in the neuron structure. Although ANNs achieve impressive results, there is a significant gap in terms of power efficiency and learning capabilities between deep ANNs and biological brains. One promising avenue to reduce this gap is to incorporate biologically-inspired dynamics and synaptic plasticity mechanisms into common deep-learning architectures. Recently, the IBM team has demonstrated a new type of ANN unit, called a Spiking Neural Unit (SNU), that enables us to incorporate the SNN dynamics directly into deep ANNs. Our results demonstrate competitive performance, surpassing state-of-the-art RNNs, LSTM- and GRU-based networks.
|
Z-2022-15 Neurosymbolic Architectures to Approach Human-like AI Neither symbolic AI nor deep neural nets alone have reproduced the kind of intelligence expressed in humans. This is because, symbolic AI fundamentally lacks the ability to learn directly from examples, while neural nets are not able to dynamically bind information—an open problem that caused the persistent failure of neural nets to reuse knowledge and generalize systematically. In this project, we plan to combine the best of both worlds to approach human-level intelligence. Specifically, we will devise a novel look at data-driven representations, associated operations, and analog computing substrates that naturally enable them. For benchmarking, we will focus on solving abstract visual reasoning problems that mainly involve two aspects of intelligence: visual perception and abstract reasoning. |
Z-2022-16 In-Network Computing Computing in-the-network is a system architecture paradigm promising benefits such as reduced load on CPUs, freeing up cores for other tasks, more predictable latency, and the ability to cope with high network bandwidths. Once viewed primarily as a control-plane connectivity paradigm, in-network computing is emerging rapidly as an intelligent data-processing accelerator of more complex processes and applications operating beyond the traditional perimeter. This internship aims at investigating the integration of domain-specific accelerators with cloud FPGAs targeting extreme-scale data processing. In addition, to meet the demands of modern cloud economics we will offload the control-plane provisioning of the standalone FPGAs to a serverless platform (e.g. Knative, OpenWhisk, etc.). The candidate will be given the opportunity to develop and evaluate his/her In-Network Computing solution in off-the-shelf FPGAs (e.g. Xilinx Alveo) or to study the scalability potential over the disaggregated cloudFPGA research platform that features the world-record density of 64 network-attached FPGAs per 2U-node. Requirements |