Topics at the Dublin Lab

Ref. code  

Data privacy for health care

[ Project description | Close ]

Organizations, public bodies, institutes and companies gather enormous volumes of data that contain personal information. For reputation, compliance and legal reasons, the personal information needs to be de-identified before shared with third parties, such as analytics teams or research scientists. The healthcare domain is particularly challenging since it deals with highly sensitive information. The de-identification process aims to achieve the following three goals: a) significantly and provably minimize the re-identification risk b) maintain a high level of data utility to allow supporting intended secondary purposes and c) maintain the truthfulness of the data at a record level to the largest possible extent.

This project aims to explore innovative ways to provide a framework for calculating re-identification risk in meaningful and realistic settings and generating reports for a mixed audience of technical, legal and compliance audience. The project will build the foundational metrics for capturing the balance between information loss and risk assessment. The end goal is a research prototype that will demonstrate the framework in various scenarios.

Required skills

  • Programming (Java 8 or Python)
  • Basic background on information theory and statistical disclosure control
  • Basic database management and query
  • Basic knowledge of Spark-based computing (optional)
  • Familiar with cloud services (optional)
  • Good presentation skills

Multi-modal information retrieval for annotating text documents with relevant images

[ Project description | Close ]

Recent advances in word embedding approaches have taken the natural language processing and the speech processing community by storm. Joint embedding of text (ranging from character n-grams to whole documents) with images has opened up avenues to explore multi-modal data processing. In particular, this proposed project will investigate the potential effectiveness of joint embedding approaches, i.e. images with words (Frome et. al., NIPS '13), for multi-modal information retrieval. The specific application that we are interested in is to automatically enhance the readability of a text document, e.g. a Wiki page, by automatically inserting relevant images in appropriate places of the text.

Required skills

  • Working knowledge in recurrent neural networks, and word/document embeddings.
  • Working knowledge of extracting image features using convolutional nets.
  • Knowledge in information retrieval models.
  • Strong programming skills in a high level programming language, such as Java, Python or C/C++.
  • Quick text manipulation with shell programming, e.g. bash/awk etc.

Decision gisting

[ Project description | Close ]

This project is concerned with developing a system that “listens” to a meeting where decisions are to be made, creates a summary of the discussion pertaining to the decision (identifies the decision, the proposed alternatives (and by whom), the criteria discussed, the constraints discussed).

This project offers a variety of options in terms of the specific content of the internship: (i) developing a new set of annotation on an existing corpus, (ii) starting the development of a new corpus, (iii) developing new algorithms for alternatives and preference extraction (iv) starting a new task such as discussion segmentation (when are people discussing about a decision and when are they discussing other topics) or agreement detection (detecting when an agreement has been reached in the conversation, if any).

Required skills

Natural Language Processing, Text mining, Machine Learning


Machine learning models for the humanitarian sector

[ Project description | Close ]

65 million people are globally displaced, the highest ever in human history. Humanitarian aid budgets are also the largest they have been, yet only ~20% of aid recipients feel their needs have been met. The intern will contribute to the ongoing effort in humanitarian needs assessment by building models that leverage data sources from humanitarian agencies to estimate different types of relief needed during crisis.

Required skills

Data mining, machine learning


Probabilistic preference model to account for incomparability

[ Project description | Close ]

When comparing options that that are judged on several attributes (e.g. apartments or jobs) some comparisons are more difficult than others. For instance, it is difficult to choose between two apartments if one is well located but very expensive and the other is affordable but poorly located. When posed with such comparisons, that involve a significant trade-off across attributes, it is more likely that decision-makers will express incomparability or indifference. We propose to use a random utility model to represent this effect. In this model, attribute weights and marginal utility functions parameters are drawn from probability distributions whose parameters represent the DM's preferences.

The intern will contribute to the development of the model and test its accuracy on real experimental data. The internship provides an excellent opportunity to learn about decision analytics, and how to improve preference elicitation by taking behavioral results under consideration.

Required skills

Machine learning, Bayesian inference, Multi-attribute preference models, familiarity with Matlab.


Virtual testing and hardware-in-the-loop simulation

[ Project description | Close ]

Vehicles are undergoing a huge revolution. They are transitioning from being isolated entities operating on the road to be connected, informed, devices. The goal of this project is to design new collaborative services for connected vehicles, which leverage technological IoT innovation and mathematical rigour. In particular, the focus of the project is on designing and developing a Hardware-in-the-Loop (HiL) platform to validate large scale systems arising in a number of applications for partially-autonomous driving functions and cognitive automotive analytics.

In this contexts, the duties of the intern include:

  • Contributing to the development of the HiL platform;
  • Testing and validation cognitive automotive related functions;
  • Implementation of HiL software via Python.

Required skills

  • Excellent knowledge of Python;
  • Excellent knowledge of Markov Processes, with a focus on asymptotic behaviours;
  • Familiarity with SUMO;
  • Familiarity with the Android SDKs;
  • Track record on control/systems/CS top journals;
  • Enthusiastic and committed to deliver results in a fast paced environment.

Data-driven robust optimization

[ Project description | Close ]

IBM Research Ireland is seeking a summer intern in the area of Robust Optimization (RO). Specifically, the candidate will be required to further develop a distributionally robust optimization approach from a data-driven perspective. While some RO approaches build uncertainty sets directly from data, most of the models in the Robust Optimization literature are not directly connected to data. Recent work on this issue have started to lay a foundation to this perspective. Further developing a data-driven theory of RO is interesting from a theoretical perspective, and also compelling in a practical sense, as many real-world applications are data-rich. The candidate will be required to scope, improve and apply existing algorithms to a set of applications that are of relevance to IBM. Those include but are not restricted to cognitive IOT, portfolio optimisation and air traffic management among others.

Required skills

  • Experience with Matlab
  • Experience with modeling optimization problems (e.g. Yalmip)
  • Experience with optimization software
  • Expertise in statistics preferred
  • Evidence of relevant publications
  • Ideally, PhD candidate in area of robust optimization.

Distributed optimization and consensus

[ Project description | Close ]

We are now in a time where everything can be interconnected and programmed: the IoT is rapidly leading to a new industrial revolution, where a network of objects communicate with each other and objects (or nodes) indeed collaborate in order to fulfil a common goal that could not be achieved by any individual object, if this was isolated. 

This new technological paradigm is also leading to a new paradigm in Control Theory: for these networked applications, it is indeed more convenient to design “local”, or decentralized control protocols, residing onto each node, rather than designing a central controller orchestrating the behaviour of all the objects in the network. 

We seek to explore new methodologies to design novel decentralized control protocols for one or more of the following cases:

  • Networks where each node has asynchronous sampling times and communications;
  • Networks where a subset of objects “lies” to the others and indeed becomes malicious for other nodes;
  • Networks with evolving topologies.

The duties of the intern include:

  • Formalization of control and distributed optimization problems;
  • Synthesis of decentralized control protocols;
  • Simulation via Python and/or Matlab/Simulink;
  • Benchmarking the decentralized algorithms onto real world applications.

Required skills

  • Excellent knowledge of Matlab/Simuling and/or Python;
  • Excellent knowledge of nonlinear dynamical systems and control (excellent knowledge of stability analysis techniques such as Lyapunov, passivity, contraction is a plus);
  • Track record on control/systems top journals;
  • Enthusiastic and committed to deliver results in a fast paced environment.

Real-time car sharing

[ Project description | Close ]

The overall goal is to allow commercial and individual car sharing possibilities using automated driving cars. More specifically, the project aims to develop real time car sharing concepts. This includes the optimisation of automated driving vehicles allocation, pick-up, and drop-off, based on end user needs and real time and reliable information about the actual vehicles' statuses and their scheduled routes. Work should build on existing IBM assets and previous projects.

Required skills

  • Knowledge of, and interest in, optimisation
  • Programming skills in Java, Python, or JavaScript (NodeJS)
  • Ability to read, write and debug code
  • Familiarity with at least one IDE, such as Eclipse, NetBeans, or IntelliJ

Real-time filters for monitoring drivers behavior

[ Project description | Close ]

Poor car-following behavior is responsible for a significant number of car accidents. In this work, we aim to design and implement a system that trains the driver to be a better driver utilising the measurements from the vehicle proximity sensors, e.g. relative distance and speed to leading vehicle(s). Recent work has shown that offline parameter identification of car-following models was a tedious task that requires precise knowledge of the model specificity's, i.e. parameters may only be identifiable in specific traffic regimes.

The work will consist in designing an algorithm that efficiently performs online parameter identification of car-following models given the available measurements. The first step is to study the mapping between the identifiability of car-following parameters with their corresponding traffic flow regime. The second step is the design an online filter that integrates this mapping. Then, a risk model based on safety indicators and simulation analysis will be derived. Finally, a prototype system will be implemented. The system will only intervene if a dangerous behaviour is detected according to the risk model.

Required skills

System identification, particle filter, Python.


Feedback control of the modal shift considering priority queues in traffic light control

[ Project description | Close ]

The modal shift towards public transport is the priority of many city policy makers. New techniques of traffic optimisation in cities look into prioritising traffic fleets that are less prone to deteriorating pollution and noise levels, i.e. pedestrians, cyclists, buses. Recent work has been done on queue-based traffic light optimisation.

This work will adapt recent work on prioritising specific traffic classes. It will assess how this new optimisation policy affects travel times of pedestrians, buses, vehicles, etc. in a city considering a realistic demand model for trips in the city. Then by considering the elasticity coefficient between travel times and OD trips available in the literature, it will investigate how many citizens are actually contributing to the modal shift towards public transport. A feedback mechanism to control the modal shift will be proposed.  

Required skills

Strong interest in optimisation, genuine interest in code development (Python, SUMO, GIS) and in transport policy.


City-scale real-time pollution estimation from roadway traffic

[ Project description | Close ]

Pollution monitoring is only starting to take place in cities thanks to the increasing environmental data sources, e.g. air pollution measurements, weather data, precise knowledge of traffic volumes. Recent work in this area includes a data assimilation framework for urban air pollution monitoring and a modelling chain to estimate pollution levels from highway traffic.

This project will take advantage of available data feeds for traffic volumes, weather conditions, air pollution levels at stations across a city. The goal of this project is to be able to assess with accuracy how much city traffic contributes to air pollution levels. Pollutant emissions and dispersion levels will be integrated in a data assimilation framework. This will provide insights on how to better manage air pollution levels on critical days by leveraging city traffic.

Required skills

Code development (interfacing different types of models, different types of data sources), critical thinking, genuine interest in sustainable cities.


Cognitive disruption control in complex schedules

[ Project description | Close ]

Schedule disruptions are a significant source of unplanned costs in various transport services (e.g., airlines, rail). Operations controllers can mitigate the financial and service impacts of disruptions if they can estimate the costs of the disruptions and have suggestions as to what to do.

In this project, the student will work with IBM Research staff to develop/implement/test the cognitive disruption control algorithms for complex schedules, which allow controllers to mitigate the effect of disruptions.  This includes (i) use of disruption models to explore shortcomings of existing strategies for selecting control actions, (ii) modifications of strategies for selecting control actions, (iii) implementing these strategies in algorithms and (iv) assessing their performance on real data. The final goal of the project is to develop/test the prototype and implement it as a general purpose C++ library.

Required skills

  • PhD student in Optimisation, Control Theory, or Applied Mathematics.
  • Background in numerical analysis and/or optimisation algorithms.
  • Experience in C++/Python/Matlab programming.

Ensemble based forecasting of wave conditions

[ Project description | Close ]

Ensemble techniques have been demonstrated to outperform individual models in operational forecasting and minimising prediction errors (Mallet and Sportisse, 2006). This is particularly relevant for forecasting wave conditions in coastal ocean regions subject to model errors arising from incorrect forcing data, model parametrizations and model structural errors (Rogers et al., 2005).

In this study we aim to combine physics models of near-shore circulation and wave characteristics with ensemble forecasting methods to generate optimal forecasts with defined uncertainty. The approach is applied to a case-study site, Santa Cruz, California. The system involves a coupled wave model and circulation model. Circulation patterns are resolved by EFDC, a 3D circulation model, while wave information is computed using SWAN, a third-generation wave model that computes wind-generated waves in coastal and inland waters. Input data includes a high-resolution meteorological field with predictions highly sensitive to the accuracy of wind fields.

We aim to investigate methodologies to optimally combine multiple forecasts of wave characteristics. We investigate different linear combinations of models to improve performance of model-data comparisons. The weights attached to these models are investigated and techniques to select and forecast optimum weights evaluated

Required skills

The ideal candidate will have experience in numerical modelling and the Linux/unix environment. In addition, the ability to analyse large datasets combing basic statistics with choice of analysis software (R, Python, etc.) is useful.


[1] Mallet, V., Sportisse, B., 2006. Ensemble-based air quality forecasts: A multimodel approach applied to ozone. J. Geophys. Res. Atmospheres 111.
[2] Rogers, W.E., Wittmann, P.A., Wang, D.W., Clancy, R.M., Hsu, Y.L., 2005. Evaluations of Global Wave Prediction at the Fleet Numerical Meteorology and Oceanography Center*. Weather Forecast. 20, 745–760.


Forking events analysis in blockchain protocols

[ Project description | Close ]

Blockchain is the fabric at the core of crypto-currencies and it is based on the concept of algorithmic consensus. A methodology to preserve the fabric in the presence of consensus breaking inconsistencies (generated by an adversary or by inconsistent sub-versioning of the core software) is for a Blockchain to be forked. While this can be considered a brute force patch it has been implemented more than once in widely used crypto-currencies. This project aims at applying probability theory and advanced computational complexity to study current and past fork events in crypto-currencies, their root causes, their impact on the Blockchain fabric and the possibility of alternative consensus repairing solutions.

Required skills

  • Currently enrolled in a PhD in Cybersecurity, Cryptography or Computers Science.
  • Deep knowledge of probability theory, advanced computational complexity.
  • Coding skills: C, C++, Go

Cognitive inverse modelling and its application to hydraulic diffusivity inversion

[ Project description | Close ]

Extraction of fluids from porous media is critical for both petroleum resource management and supplying drinking water to a global population. Sparse sampling through wells and the heterogeneity of geologic formations make inverse estimation of the permeability field difficult and an under-determined problem. In this project we propose a cognitive strategy for estimating the heterogeneous diffusion coefficient – permeability field – of a 2D confined aquifer. The aquifer is modeled using a 2D linear Darcy equation within a relatively simple geometry. The permeability field is to be inferred from given measurements of water levels from a network of wells.

The key idea behind this project is to use state of the art spectral cluster algorithms to learn clusters in the input space, e.g. massive amounts of geo-physically plausible samples of permeability fields, from the data available in the output space, e.g. observed water heights at well locations. The expected outcome is:

  • learning a number and centers of the clusters in the input space
  • using the cluster central points as starting points for the subsequent inversion
  • uncertainty quantification

Uncertainty quantification is a second important point in the project: clusters are similar in terms of the misfit function, hence we can expect that, after the inversion the members of the clusters will allow one to quantify the misfit variance associated to the optimized cluster centers. The student will work with IBM Research staff to develop/implement/test a cognitive inversion prototype for 2D Darcy flows. The work-flow of the project will be as follows:

  • use existing conditional gaussian samplers to generate samples of the permeability field
  • compute corresponding solutions of Darcy equation (existing simulator)
  • creating the similarity matrix and learning the clusters (existing Matlab prototype)
  • inversion (a combination of L-BFGS method with numerical gradient for the Darcy equation)
  • sampling from the inverted permeability field (to be developed)

The final goal of the project is to develop/test the prototype and implement it as a C++ library of a general purpose.

Ideal intern skills

  • PhD student in Applied Mathematics, Control Theory, or Civil Engineering.
  • Background in numerical analysis.
  • Experience in C++ programming.

Automatic predictive maintenance

[ Project description | Close ]

Predictive maintenance plays an important role in different industries including manufacture, healthcare and transportation. In practice, predictive maintenance is very time demanding and requires skilled data scientists to extract hand-crafted features from sensor data. In this project, an intern student will tackle these problems by automating this process. In particular, the automation for predictive maintenance must work with raw sensor data collected from IoT devices. Sensor data might be erroneous, missing and sampled at different resolutions. The intern student will design algorithms to efficiently search for relevant features to predict failures of a device using state of the art feature and representation learning techniques from the data mining and deep learning fields.

Required skills

Machine learning, statistics, data mining, time series analysis, deep learning, programming (Python, Spark (optional)).


Learning like humans using deep learning

[ Project description | Close ]

Help us push the boundaries of deep-learning and AI. We use deep learning on large scale data to help cognitive systems understand the relationships between data of different types:
images, graphs, text, time-series, and more. You will learn how to use DNN for simple cognitive tasks such as zero-shot learning and knowledge discovery. An ideal outcome of this internship will be a publication at a top conference.

Required skills

  • Knowledge of machine learning or data mining techniques
  • Programming in Python
  • Some familiarity with deep learning preferred

Can an AI understand the weather?

[ Project description | Close ]

The weather impacts everyone, however, finding the relationships between weather and human behaviour is challenging. This challenge is due to both the complexity of human behaviour and the massive scale of weather data. We use deep learning and other machine learning tools to predict the impact of weather on transportation, airlines, agriculture, and more. You will learn how to train machine learning models at scale. This internship will work towards having an impact on IBM's clients.  

Required skills

  • Knowledge of machine learning or data mining techniques
  • Programming in Python
  • Some familiarity with deep learning preferred

Large-scale cognitive workload optimization

[ Project description | Close ]

Machine Learning and Applied AI techniques are broadly used to come up with prompt answers and provide insights to challenging problems, crossing industries and societal challenges. IBM is spearheading the transition to cognitive computing. As part of your internship, you will be given the opportunity to create impact in optimizing performance vs efficiency tradeoffs at the intersection of machine/deep learning runtimes and OpenPOWER systems, when solving real-life problems.

Required xkills

  • Programming in C/C++ or Scala, CUDA,Scripting
  • Nice to have: Background on machine learning/AI techniques/algorithms

More info: See blog article “Putting the AI in PowerAI


Pathfinding in new cloud architectures

[ Project description | Close ]

As the transition to cloud computing keeps progressing, we are in the exploration of new cloud architectures in multiple directions, such as e.g. bringing new levels of efficiency through resource pooling or extracting more value through hw/sw specialization to common denominator needs of cloud workloads and services. As part of your internship, you will have the opportunity to work with us on related pathfinding activities, testing ideas on experimental architectures featuring resource disaggregation and near-data computing.


  • Programming in C/C++, Linux programming, OS concepts
  • Nice to have: FPGA programming/workflows

Healthcare & social care

[ Project description | Close ]

Are you looking to perform cutting-edge research with real-world impact? The IBM Research Lab in Dublin seeks talented and enthusiastic research interns to join our team.

We are interested in candidates with excellent technical skills who are willing in applying their research skills to solve real-world problems. You will work with cutting edge technologies, researching novel techniques to acquire, represent and exploit urban, environmental, social and health data and information to improve how health in managed and delivered.

You will be expected to participate in defining a challenging problem, design and develop a new solution, learning about new technologies and domains. If you want a challenge -- and feel you have what it takes to work with one of the world's top industrial research organizations -- then we would like to hear from you.

We are looking for PhD students in the domains of health informatics, nursing informatics, social care informatics or related disciplines. Experience with artificial intelligence, data management, machine learning or data mining techniques is preferred.

Required skills

  • Excellent problem-solving skills
  • Ability to execute, starting from problem definition, to a working implementation
  • Previous experience working in team environments with limited supervision is a plus
  • Excellent written and verbal communication and presentation skills