Bosch Group
Generative AI Systems Engineer – Vision-Language Models
Company
Role
Generative AI Systems Engineer – Vision-Language Models
Location
Job type
Full-time
Posted
8 hours ago
Salary
Job description
Role Summary
We are seeking a Generative AI Systems Engineer to design, evaluate, and optimize Vision-Language Model (VLM) systems for real-world applications.
This role requires a combination of:
- Model understanding
- Experimental rigor
- Systems and production thinking
You will work on benchmarking, fine-tuning, and deploying multimodal models, with a strong emphasis on tradeoff analysis across accuracy, latency, and cost.
Key Responsibilities
Model Evaluation & Benchmarking
- Evaluate pretrained VLMs on domain-specific datasets
- Define and justify appropriate evaluation metrics
- Analyze model behavior, including systematic failure modes
Model Adaptation & Fine-Tuning
- Implement parameter-efficient fine-tuning techniques (e.g., LoRA, QLoRA)
- Optimize training under limited data and compute constraints
- Make data-centric and model-centric improvements with clear justification
Experimental Rigor
- Design controlled experiments to compare baseline vs improved models
- Quantify improvements across:
- accuracy
- latency
- cost
- Provide clear, defensible explanations for observed outcomes
System Design & Deployment
- Architect scalable inference pipelines for multimodal models
- Optimize for:
- low latency
- high throughput
- cost efficiency
- Implement serving layers (API/service) with reproducible environments
Data Engineering
- Build pipelines to process and align:
- images
- textual queries
- structured metadata
- Analyze dataset characteristics, including biases and distribution gaps
B.E/B. Tech
- 5–7 years of industry experience in ML/AI systems
- Strong proficiency in Python and ML frameworks (e.g., PyTorch)
- Experience with VLMs, LLMs or any other multimodal models
- Understanding of model evaluation and experimentation practices
- Familiarity with ML system design (inference, scaling, optimization)
Preferred Qualifications
- Experience with Vision-Language Models (e.g., LLaVA, BLIP, Flamingo-style architectures)
- Hands-on experience with parameter-efficient fine-tuning methods
- Knowledge of model optimization techniques:
- quantization
- batching
- caching (e.g., embedding reuse)
- Experience with Docker / containerized deployments
- Exposure to large-scale or real-world datasets


