Master Thesis, Internship
Internship or Master Thesis on What roles can modern NVMe storage play in accelerating LLM/RAG pipelines?
Ref. 2024_022
Project description
LLM/RAG pipelines have attracted a significant amount of community attention due to their wide-spread use, effectiveness, and potential applicability in a variety of domains. In an ideal case all data/metadata used in these workflows will be contained in memory. However, DRAM technology is facing challenges from multiple fronts as it is not scaling, has high cost ($/GB), and is energy inefficient. As the performance of NVMe devices approaches 10s of GB/s bandwidth, single digit microsecond IO latencies with millions of small IO operations/sec in a single machine, the key research question that we are interested in is : how can modern NVMe hardware/software stack help run data-intensive RAG pipelines (focus on RAG, i.e. information retrieval and LLM inference)?
In this context, we aim to answer the following research questions:
- What dependencies do RAG pipelines have on storage and how is storage accessed in RAG pipelines? Are there specific access patterns, locality (temporal or spatial), or storage formats that make I/O generated by RAG workflows unique?
- Vector DBs (VdB) play a critical role in a RAG pipeline. Can VdB benefit from using high-performance NVMe flash arrays that can support millions of small I/O operations per seconds? What are the best I/O strategies (scheduling, placement polling) over multiple NVMe devices.
- With disaggregated storage, VdB can also be deployed in a distributed setting as a service. Here not only the storage, but also the networking performance comes into play. How efficiently does VdB-distributed use state-of-the-practice I/O hardware (networking and storage)
Qualifications:
- Enrolled or in possession of a Master's degree in computer science with a keen interest in data storage research, cloud computing and performance engineering.
- Excellent coding skills: Familiarity with Linux environments and software development tools (git/GitHub, IDEs, gcc, gdb, QEMU, virtual machine and containers etc.).
- High amount of creativity and outstanding problem-solving ability.
Preferred Qualifications:
- Experience with systems programming and internals (kernel, memory management, hardware, CPU)
- Experience with data storage and NVMe storage internals and specification
- Experience in machine learning or cloud
- Experience with performance engineering tools (perf, fio, ebpf)
- Excellent oral and written English with good presentation skills.
- Strong interpersonal skills and excellent written and verbal communication.
Diversity
IBM is committed to diversity at the workplace. With us you will find an open, multicultural environment. Excellent flexible working arrangements enable all genders to strike the desired balance between their professional development and their personal lives.
How to apply
Please submit your application through the link below. This position is available starting immediately or at a later date.