Master Thesis, Semester project or Internship
Large Vision-Language Models for Enterprise Visual Inspection
Ref. 2024_017
Project description
Are you looking for an opportunity to be part of an innovative computer vision project, where you can challenge your knowledge and skills on real enterprise applications and data? You are looking at the right place! Our team is developing a radically new way to build computer vision models based on interactive visual prompting.
In less than a year, we created a new pipeline and beated every other SOTA approach, including the most recent few-shot learning benchmarks. Our pipeline makes use of some of the most popular Large Vision Models, such as SAM/SAM2 and DINOv2, in a novel and unique way. We have implemented it into our new service – Visual Prompting Lab – that has been already successfully used on enterprise applications, including discrete manufacturing, quality inspection, and asset inspection (particularly in civil infrastructure).
We now have plenty of ideas on how to bring this work to the next level, by utilizing Vision-Language Models, Agentic AI and other recent AI developments. More in detail, we can offer research and innovation projects on topic such as:
- Interactive Prompting
- Large Vision-Language Models
- Anomaly Detection
- Pre-training/Fine-Tuning of Large Vision Models (aka Vision Foundation Models)
- Agentic AI
- Generative AI
A unique aspect of all our projects, is the opportunity to work on client data that are not available to the public and represent a big challenge even for the most successful state-of-the-art methods. As part of our team, you will collaborate with experienced Research Scientists and AI Software Engineers that will lead and help you to successfully complete the challenges of the proposed task. You will also have access to modern GPUs and Cloud infrastructures. The technology created in our team is powering IBM mainstream products, such as Maximo Visual Inspection and soon watsonx. We will also encourage you to do your fist patent with us and to publish your work on a Top AI conference.
This opportunity is particularly designed for students from ETH Zurich, as our IBM Research Laboratory is located just 40 minutes away from there. We welcome candidacies of students from EPFL, USI and other prestigious universities, also from abroad when the students have the possiblity to finance their stay in Zurich for the duration of the project – which is generally 6 months.
Minimum qualifications
- Bachelor’s degree in computer science, machine learning or a related technical field, including equivalent practical experience
- Excellent coding skills
- Proficient working in Unix/Linux environments
- Excellent communication and presentation skills in English
- Team player, self-motivated, able to solve problems autonomously
Preferred qualifications
- Experience with one or more of the following: computer vision, natural language processing, algorithms and data structures, test automation, distributed computing, CI/CD
- Practical experience with PyTorch
- Advanced programming experience, such as C/C++ programming experience
- Independent worker with the ability to effectively operate with flexibility in a fast-paced, constantly evolving team environment
Diversity
IBM is committed to diversity at the workplace. With us you will find an open, multicultural environment. Excellent flexible working arrangements enable all genders to strike the desired balance between their professional development and their personal lives.
How to apply
If you are interested in this exciting position, please submit your most recent curriculum vitae. We encourage candidates to also share a 3-minute video, in which they introduce themselves, as well as highlight their motivation and expertise. The video is not mandatory.
Interview process
After the initial screening based on the uploaded documentation, identified candidates will be contacted for a first technical discussion on their experience, background, and motivations, followed by a coding interview and a project matching discussion.