SecuvyAI · Remote (US) · Full-Time

> ⚠️ No recruiting agencies. Direct applicants only. Open to US Citizens and Green Card holders only. No visa sponsorship available.

About SecuvyAI

SecuvyAI is a cutting-edge Data Privacy and Security Intelligence platform trusted by enterprises to discover, classify, and govern sensitive data at scale. We combine AI-driven automation with deep compliance expertise to help organizations stay ahead of privacy regulations — GDPR, CCPA, HIPAA, and beyond. Our platform ingests and classifies petabytes of structured and unstructured data across cloud, on-prem, and hybrid environments, powered by a sophisticated ML/AI engine at its core.

The Opportunity

We're looking for a highly experienced Senior Python Developer with deep roots in data engineering and a strong track record building and operating ML/AI pipelines in production. You'll sit at the intersection of data infrastructure and applied machine learning — designing the systems that power SecuvyAI's intelligent data classification, PII detection, and privacy risk scoring capabilities.

This is a high-impact, hands-on engineering role. You'll work closely with data scientists, platform engineers, and product teams to take models from experimentation to production at scale — and keep them running reliably.

What You'll Do

Data Engineering

Design, build, and maintain large-scale data ingestion, transformation, and processing pipelines using Python-native and distributed frameworks
Architect reliable, scalable ETL/ELT workflows that handle structured, semi-structured, and unstructured data across cloud and on-prem sources
Optimize pipeline performance for throughput, latency, and cost at petabyte scale
Build and maintain data quality frameworks — validation, lineage tracking, anomaly detection, and alerting
Partner with platform engineers to ensure pipelines are observable, testable, and production-grade

ML/AI Pipeline Development

Build and operationalize end-to-end ML pipelines — data ingestion, feature engineering, model training, evaluation, deployment, and monitoring
Develop and maintain feature stores, training data pipelines, and model serving infrastructure
Collaborate with data scientists to productionize models for PII classification, sensitive data detection, entity recognition, and risk scoring
Implement MLOps best practices — experiment tracking (MLflow/W&B), model versioning, A/B testing, and automated retraining pipelines
Integrate LLM-based and NLP-based components into the SecuvyAI data intelligence engine
Monitor deployed models for drift, degradation, and data quality issues in production

Collaboration & Code Quality

Write clean, well-tested, production-grade Python with a focus on maintainability and performance
Participate actively in code reviews and contribute to engineering standards for data and ML code
Work cross-functionally with Data Science, Platform Engineering, and Product to align on data contracts and pipeline SLAs
Contribute to technical documentation, runbooks, and internal knowledge sharing

Required Qualifications

8–10 years of professional software engineering experience with Python as your primary language
Deep, hands-on data engineering expertise — ETL/ELT, pipeline orchestration, data modeling, and distributed data processing at scale — this is a must
Proven experience building and maintaining ML/AI pipelines in production — not just experimentation, but reliable, monitored, production deployments
Strong experience with Apache Spark (PySpark) for large-scale batch and streaming data processing
Hands-on experience with workflow orchestration tools — Apache Airflow, Prefect, or Dagster
Solid understanding of stream processing (Kafka, Kinesis, or Flink) and real-time data architectures
Experience with ML frameworks and tooling: scikit-learn, PyTorch, TensorFlow, Hugging Face Transformers, or equivalent
Familiarity with MLOps platforms and practices — MLflow, Weights & Biases, Kubeflow, or SageMaker Pipelines
Proficiency with cloud data platforms: AWS (Glue, EMR, S3, SageMaker), GCP (Dataflow, BigQuery, Vertex AI), or Azure (ADF, Synapse, Azure ML)
Strong command of SQL and experience with both relational (PostgreSQL, MySQL) and analytical (Redshift, BigQuery, Snowflake) databases
Experience with data quality and observability tooling (Great Expectations, Monte Carlo, or similar)
Comfortable working in a fully remote, async-first engineering environment
US Citizen or Permanent Resident (Green Card holder) required — no visa sponsorship

Preferred Qualifications

Experience in data privacy, security, or compliance domains — PII detection, data classification, or sensitive data governance
Hands-on experience with NLP pipelines — named entity recognition (NER), text classification, or document understanding at scale
Experience working with LLMs in production — prompt engineering, fine-tuning, RAG architectures, or LLM-integrated data pipelines
Familiarity with feature stores (Feast, Tecton, or Hopsworks)
Experience with dbt for data transformation and data modeling in analytical pipelines
Knowledge of data catalog or metadata management tools (Apache Atlas, DataHub, Collibra, or similar)
Prior experience mentoring junior data engineers or working with distributed remote teams
Contributions to open-source data or ML projects, or public technical writing

Compensation & Benefits

Base salary: $150,000 – $175,000 depending on experience
Meaningful equity in a high-growth AI startup
Fully remote with flexible hours — async-first culture
Comprehensive health, dental, and vision insurance

No recruiting agencies. We will not respond to agency outreach.

Senior Python Developer — Data Engineering & ML/AI Pipelines

Job description

Explore more

Career resources

Career resources