Research and Engineering Positions

Hybrid Cloud Data

Ref. 2022_012

Focus Area

Our team develops innovative solutions to address challenging data management and compliance problems facing many organizations today, particularly against the backdrop of increased use of cloud computing. As computing resources become more distributed and data needs continue to grow (e.g., data-intensive applications, AI), and as regulations focus increasingly on how personal data is handled (e.g. the EU General Data Protection Regulation, emerging regulation governing AI), the need for efficient, scalable data management tools and automation is growing rapidly.

To answer questions such as “what data do we have” and “are our data sets stored and processed in line with regulations and company policies”, we have developed a system, Pathfinder, which creates enterprise data maps using automatically collected metadata, linked and enriched to create an overview (i.e., a map, which is a Knowledge Graph) of data assets, how they are stored and accessed, and where they are processed. Metadata is gathered from various sources such as data catalogs, data stores (file systems, databases, and cloud object stores) as well as from systems which copy, transform and process data. Data maps can be tuned for specific purposes such as combining data governance with AI model monitoring to establish trustworthy AI systems.

We are looking for outstanding candidates to help us further develop and realize our vision of the future of hybrid cloud data management.

How We Work

Team members participate in all aspects of our projects from developing new ideas, designing solutions, through their realization. We collaborate with other teams in IBM's global research labs and business units to ensure that we take advantage of our colleague's experience and existing technologies. We build our systems combining in-house development with open source technologies, such as Kubernetes and Apache Kafka. We contribute to open source projects such as Debezium for efficient database replication and Strimzi for supporting Kafka on Kubernetes. Publishing our results in leading conferences and journals is an important part of our work.

Requirements

Candidates should have experience in the fields of distributed systems (e.g. Kubernetes, Apache Hadoop and Spark), software engineering and database systems (SQL, NoSQL, graph databases such as Neo4j), data orchestration engines like Apache Airflow, ETL platforms like IBM DataStage, Data Warehousing, and Data Virtualization . Experience in specifying policies and developing compliant systems would be a strong asset. A passion for both designing and implementing software systems is essential, including experience with modern development practices and tools (e.g., GitHub, CI/CD, Docker).

Candidates should have a PhD or Masters degree in Computer Science or equivalent experience.

We would like to hear about related projects that you have worked on and we would appreciate links to any open source projects that you have contributed to.

Diversity

IBM is committed to diversity at the workplace. With us you will find an open, multicultural environment. Excellent flexible working arrangements enable all genders to strike the desired balance between their professional development and their personal lives.

How to apply

Please submit your CV along with a short description of why this position interests you and how you think you could contribute below.

For information on technical questions, please contact Dr. Doug Dykeman, Manager Hybrid Cloud Data Platforms, .

Below are some of our recent publications that will give you a detailed view onto aspects our ongoing work:
1. Pathfinder: Building the Enterprise Data Map, 2021 IEEE Conference on Big Data
2. Securing Kafka with Encryption-at-Rest, 2021 IEEE Conference on Big Data
3. Building and Operating a Large-Scale Enterprise Data Analytics Platform, Journal of Big Data Research, Volume 23, 15 February 2021