Master's thesis

Efficient storage and retrieval of geospatial data in SQL data lakes

Ref. 2023_012

The IBM Zurich Research Laboratory in Rueschlikon is continuously offering positions to motivated students in the field of software engineering and machine/deep learning.

A master’s student position is available for a candidate with a passion in the field of efficient storage and retrieval of geospatial data in SQL data lakes. For six months, the candidate will join the group "AI for Climate Impact (ACT)"" working with satellite (Earth Observation) and climate data on a triple digit petabyte scale.

IBM Research has a long heritage of expertise in storing high dimensional data in key-value stores (e.g., HBase). As part of our research, we want to understand how to efficiently store and retrieve high dimensional (geospatial climate and satellite data) using moden data lake technology (e.g., Apache Iceberg, Presto). Ideally, a reference implementation for some of the methods found on the XArray interface would be translated to SQL and made available in a prototype.

Minimum Qualifications

  • Enrolled in a Master's degree in computer science, physics or engineering
  • Excellent python coding skills
  • SQL skills
  • High amount of creativity and outstanding problem-solving ability

Preferred Qualifications

  • Experience in BigData technologies (e.g., Apache Spark, HBase, Presto, Impala)
  • Experience with Data Lakehouse technologies (e.g., Apache Iceberg, Hudi, Deltalake)
  • Experience with Geospatial frameworks (e.g., XArray, Geopandas)
  • Experience in parallel computing (e.g., LSF, Slurm, DASK, Ray)
  • Excellent oral and written English
  • Strong interpersonal skills


How to apply

If you are interested in this exciting position, please submit your most recent curriculum vitae and latest diploma.