Grow your career internally or refer a friend to athenahealth!

Role summary: Help power the infrastructure behind athenahealth’s AI platform. The Senior MLOps Engineer is a Senior Associate-level role based in Boston, MA in a hybrid work model, responsible for engineering, maintaining, and advancing centralized AI platforms that support model training, serving, monitoring, and operational reliability. This role partners across engineering, data science, and platform teams to improve platform performance, security, and availability while enabling scalable AI development and deployment. This role reports to the Senior Manager.

Team summary: Core AI is at the center of athenahealth’s company-wide initiative to unlock the value of healthcare information through data science, machine learning, and generative AI. Working with large-scale healthcare data, the team develops platforms and capabilities that support innovative AI use cases across the business and help improve the healthcare experience for providers and patients.

The MLOps team designs, develops, deploys, monitors, and supports the cloud-based platform that powers frontier, foundational, and customized models at athenahealth. This team focuses on operational reliability, platform performance, secure delivery, and scalable engineering practices. The Senior MLOps Engineer will help build and improve the tools, infrastructure, and workflows that make it easier for teams to train, deploy, observe, and manage AI services in production. This role works closely with AI engineers, data scientists, and infrastructure partners to ensure the platform is resilient, efficient, and ready to support evolving business and technical needs.

Essential Job Responsibilities

Engineer and maintain centralized AI and MLOps platforms that support model training, deployment, and serving at scale.
Ensure the availability, performance, and reliability of cloud-based ML training and inference platforms through monitoring, alerting, and proactive issue identification.
Build automation and platform tools that improve deployment speed, service stability, and engineering confidence.
Deploy and maintain containerized services and ML workloads in Kubernetes-based environments.
Integrate security practices into software and infrastructure delivery workflows to support secure, reliable platform operations.
Collaborate across engineering, infrastructure, and AI teams to support high availability, disaster recovery, and strong customer outcomes.
Evaluate and integrate emerging AI tools and technologies from providers such as OpenAI, Anthropic, Google, Microsoft, and AWS where they align with platform needs.
Develop microservices and platform components in public cloud environments such as AWS, Azure, or GCP.
Use AI tools and platform capabilities in day-to-day engineering work to improve troubleshooting, automation, deployment workflows, and operational efficiency, while continuing to learn and apply new tools as they become relevant to the role.

Additional Job Responsibilities

Support incident response, root cause analysis, and follow-up remediation efforts.
Document platform architecture, operational procedures, and engineering standards.
Contribute to platform roadmap discussions and technical planning activities.
Partner with data scientists and AI engineers to improve model training and deployment workflows.
Assist with evaluation of new frameworks, tooling, and observability capabilities.
Improve system visibility through dashboards, metrics, and log aggregation practices.
Participate in design reviews and cross-team technical discussions.
Contribute to continuous improvement of MLOps, DevOps, and SRE practices.

Expected Education & Experience

Bachelor’s degree in Computer Science or an equivalent field, or equivalent professional experience.
4 to 6 years of experience in Software Engineering, Data Engineering, MLOps, DevOps, SRE, or a related technical area.
Strong experience with Kubernetes, including designing, deploying, and maintaining enterprise-class ML models and services.
Proficiency in Python and experience developing microservices in public cloud environments such as AWS, Azure, or GCP.
Experience with ML platform and infrastructure technologies such as Terraform, Spark, service mesh architectures including Istio, and cloud security practices.
Experience deploying and maintaining Linux-based, scalable, fault-tolerant software platforms.
Experience with Azure AI Foundry or Amazon Bedrock, and familiarity with services such as LiteLLM, LangSmith, Arize, or Braintrust.
Experience with monitoring and observability tools such as Grafana, Prometheus, and CloudWatch.
Experience with databases and data platforms such as Snowflake, Postgres, MySQL, Redis, and DynamoDB.
Familiarity with CI/CD, configuration management, and orchestration tools such as Jenkins, Puppet, Bottlerocket, or Chef.
Experience working with Data Scientists and AI Engineers, including support for model training pipelines such as Kubeflow.

Expected Compensation $145,000 - $247,000

The base salary range shown reflects the full range for this role from minimum to maximum. At athenahealth, base pay depends on multiple factors, including job-related experience, relevant knowledge and skills, how your qualifications compare to others in similar roles, and geographical market rates. Base pay is only one part of our competitive Total Rewards package - depending on role eligibility, we offer both short and long-term incentives by way of an annual discretionary bonus plan, variable compensation plan, and equity plans.

Have you notified your current manager of your application?

Senior MLOps Engineer - Analytics & AI

Job description

Explore more

Similar jobs

Restaurant Operations Manager

Software Engineer, Machine Learning - Full-time

Head of Boston Expansion & Summer Camp Operations (STEM Enrichment)

Executive Director, Clinical Data Science

Senior Technical Game Designer; Project

Lead - Solution Engineer