Master’s student or intern

Toward ML-based database tuning and optimization

Ref. 2021_006

Databases represent the foundation of virtually any large scale online service. Their performances are therefore crucial for the services that they power, and their provisioning costs represent an important part of the operational costs of said services. Unfortunately, maximizing the performance of the database underlying a specific service while also minimizing its provisioning costs is far from being a trivial task.

In fact, several databases exist that offer similar capabilities, but they embrace different designs that deliver very different performance depending on the workload. In addition, each database has its own set of internal parameters, which on its own has a huge impact on the performance of the database, as well as on its resouce demands, e.g., in terms of storage capacity. The complexity of the problem is further exaerbated in cloud environments, where also the choice of the virtual machine flavor and the renting cloud provider may affect both the performance and the provisioning cost.

The Cloud Data Platforms group is working towards a fully automated solution that is able to jointly identify the right database and configuration for a given workload, as well as the cloud deployment that minimizes its provisioning costs. The goal of this internship is to advance the state-of-the-art in autonomous database systems, by investigating the use of machine learning to address issues such as automatic database selection, techniques for self-tuning databases, and cost optimization in the cloud. The group has a consolidated experience in the field, having investigated machine learning approaches to self-tuning databases and to performance/cost optimization of cloud applications.

Specific use cases for the project include FoundationDB, an open-source distributed data-base that powers some business-critical IBM Cloud database services, as well as embedded key-value stores such as RocksDB and WiredTiger, which are widely used in many production systems and research works. These systems embrace designs that target different workloads, and they expose tens to hundreds of tuning knobs.

The project has a strong research component, since it aims to address in a joint fashion different challenging research questions that have been tackled, so far, mostly only in isolation.


IBM is committed to diversity at the workplace. With us you will find an open, multicultural environment. Excellent flexible working arrangements enable all genders to strike the desired balance between their professional development and their personal lives.

How to apply

We are inviting applications from students to conduct their master’s thesis work or an internship project at the IBM Research lab in Zurich on this exciting topic. The research focus will be on advancing the state-of-the-art in AI-based database tuning and optimization. It also involves interactions with several researchers focusing on various aspects of the project and with the IBM Cloud data services team. The ideal candidate should have experience in distributed systems, databases, and Machine Learning, and have strong programming skills (C++, Python). Hands-on experience with distributed database systems or ML frameworks is a bonus but not necessary.

For more information on technical questions please contact Dr. Diego Didona ().