AECOM
Senior Director, AI Operations (AI/LLM Production Systems)
Company
Role
Senior Director, AI Operations (AI/LLM Production Systems)
Location
Job type
Full-time
Posted
21 hours ago
Salary
Job description
We’re defining how AI runs in production across the enterprise.
As AI adoption scales, the challenge shifts from building models to operating them reliably. This role owns how AI/LLM and agentic systems are run, supported, and governed, ensuring they are reliable, observable, cost-efficient, and continuously improving in real-world environments.
You will lead the development of the enterprise AI Operations practice, establishing the standards, operating model, and visibility required to support AI at scale. This includes defining how systems are monitored, how incidents are managed, how risks are controlled, and how performance is continuously improved.
Working closely with Engineering, AI Platform, Product, and Delivery teams, you will ensure all production AI systems meet clear operational standards and that leadership has consistent visibility into system health, performance, and risk.
This is a hands-on, senior leadership role with end-to-end accountability for how AI systems perform in production.
This position will offer flexibility for hybrid work schedules to include both in-office presence and telecommute/virtual work, to be based from either Houston or Dallas, TX.
Key Responsibilities
- Define and scale the enterprise AI Operations practice, including operating model, standards, and governance
- Establish production readiness and operability standards across AI/LLM and agentic systems
- Own production reliability, including SLAs/SLOs, incident management, and support models
- Implement observability and monitoring for AI systems (latency, drift, behavior, failures, cost)
- Ensure clear ownership, escalation paths, and accountability across production AI systems
- Build controls for agent behavior, model usage, and operational risk
- Drive performance, reliability, and cost optimization across AI workloads
- Lead operational reviews and reporting, providing visibility into system health, risks, and trends
- Identify systemic issues and drive continuous improvement across AI systems and processes
- Partner with Engineering, Product, and Platform teams to ensure production readiness and alignment
Minimum Qualifications
- Bachelor's Degree plus extensive years of SRE, MLOps, production operations, or platform engineering experience, including 6 years of leadership experience, or demonstrated equivalency of experience and/or education
- Experience operating AI/ML/LLM systems in production (serving real users at scale) with clear ownership and accountability
- Background in SRE, MLOps, or distributed systems, with depth in reliability and operational excellence
- Strong understanding of AI production failure modes (e.g., drift, hallucinations, orchestration issues, cost inefficiencies)
- Experience building and scaling observability, monitoring, and telemetry systems (e.g., OpenTelemetry, Datadog, Prometheus, Grafana)
- Proven track record defining SLAs/SLOs, incident management, and operational frameworks for complex systems
- Experience leading cross-functional efforts across engineering, platform, and product teams
- Ability to operate at both strategic and hands-on levels, setting direction while driving execution
Preferred Qualifications
- Experience with LLM platforms or frameworks (e.g., Azure AI, AWS Bedrock, LangChain)
- Experience with agentic systems, RAG pipelines, or orchestration frameworks
- Background in ITIL or service management, applied to modern distributed systems
- Familiarity with Responsible AI and governance frameworks
- Relocation assistance is not available for this position
- Sponsorship for US work authorization is not available for this position, now or in the future.
About AECOM
AECOM is proud to offer comprehensive benefits to meet the diverse needs of our employees. Depending on your employment status, AECOM benefits may include medical, dental, vision, life, AD&D, disability benefits, paid time off, leaves of absences, voluntary benefits, perks, flexible work options, well-being resources, employee assistance program, business travel insurance, service recognition awards, retirement savings plan, and employee stock purchase plan.
AECOM is the global infrastructure leader, committed to delivering a better world. As a trusted professional services firm powered by deep technical abilities, we solve our clients’ complex challenges in water, environment, energy, transportation and buildings. Our teams partner with public- and private-sector clients to create innovative, sustainable and resilient solutions throughout the project lifecycle – from advisory, planning, design and engineering to program and construction management. AECOM is a Fortune 500 firm that had revenue of $16.1 billion in fiscal year 2025. Learn more at aecom.com.
What makes AECOM a great place to work
You will be part of a global team that champions your growth and career ambitions. Work on groundbreaking projects - both in your local community and on a global scale - that are transforming our industry and shaping the future. With cutting-edge technology and a network of experts, you’ll have the resources to make a real impact. Our award-winning training and development programs are designed to expand your technical expertise and leadership skills, helping you build the career you’ve always envisioned. Here, you’ll find a welcoming workplace built on respect, collaboration and community—where you have the freedom to grow in a world of opportunity.
As an Equal Opportunity Employer, we believe in your potential and are here to help you achieve it. All your information will be kept confidential according to EEO guidelines.


