Secuvy
WebsiteSenior Python Developer — Data Engineering & ML/AI Pipelines
Company
Role
Senior Python Developer — Data Engineering & ML/AI Pipelines
Location
California, US
Job type
Full-time
Found on Mokaru
2 months ago
Salary
Benefits
Job description
SecuvyAI · Remote (US) · Full-Time
> ⚠️ No recruiting agencies. Direct applicants only. Open to US Citizens and Green Card holders only. No visa sponsorship available.
About SecuvyAI
SecuvyAI is a cutting-edge Data Privacy and Security Intelligence platform trusted by enterprises to discover, classify, and govern sensitive data at scale. We combine AI-driven automation with deep compliance expertise to help organizations stay ahead of privacy regulations — GDPR, CCPA, HIPAA, and beyond. Our platform ingests and classifies petabytes of structured and unstructured data across cloud, on-prem, and hybrid environments, powered by a sophisticated ML/AI engine at its core.
The Opportunity
We're looking for a highly experienced Senior Python Developer with deep roots in data engineering and a strong track record building and operating ML/AI pipelines in production. You'll sit at the intersection of data infrastructure and applied machine learning — designing the systems that power SecuvyAI's intelligent data classification, PII detection, and privacy risk scoring capabilities.
This is a high-impact, hands-on engineering role. You'll work closely with data scientists, platform engineers, and product teams to take models from experimentation to production at scale — and keep them running reliably.
What You'll Do
Data Engineering
- Design, build, and maintain large-scale data ingestion, transformation, and processing pipelines using Python-native and distributed frameworks
- Architect reliable, scalable ETL/ELT workflows that handle structured, semi-structured, and unstructured data across cloud and on-prem sources
- Optimize pipeline performance for throughput, latency, and cost at petabyte scale
- Build and maintain data quality frameworks — validation, lineage tracking, anomaly detection, and alerting
- Partner with platform engineers to ensure pipelines are observable, testable, and production-grade
ML/AI Pipeline Development
- Build and operationalize end-to-end ML pipelines — data ingestion, feature engineering, model training, evaluation, deployment, and monitoring
- Develop and maintain feature stores, training data pipelines, and model serving infrastructure
- Collaborate with data scientists to productionize models for PII classification, sensitive data detection, entity recognition, and risk scoring
- Implement MLOps best practices — experiment tracking (MLflow/W&B), model versioning, A/B testing, and automated retraining pipelines
- Integrate LLM-based and NLP-based components into the SecuvyAI data intelligence engine
- Monitor deployed models for drift, degradation, and data quality issues in production
Collaboration & Code Quality
- Write clean, well-tested, production-grade Python with a focus on maintainability and performance
- Participate actively in code reviews and contribute to engineering standards for data and ML code
- Work cross-functionally with Data Science, Platform Engineering, and Product to align on data contracts and pipeline SLAs
- Contribute to technical documentation, runbooks, and internal knowledge sharing
Required Qualifications
- 8–10 years of professional software engineering experience with Python as your primary language
- Deep, hands-on data engineering expertise — ETL/ELT, pipeline orchestration, data modeling, and distributed data processing at scale — this is a must
- Proven experience building and maintaining ML/AI pipelines in production — not just experimentation, but reliable, monitored, production deployments
- Strong experience with Apache Spark (PySpark) for large-scale batch and streaming data processing
- Hands-on experience with workflow orchestration tools — Apache Airflow, Prefect, or Dagster
- Solid understanding of stream processing (Kafka, Kinesis, or Flink) and real-time data architectures
- Experience with ML frameworks and tooling: scikit-learn, PyTorch, TensorFlow, Hugging Face Transformers, or equivalent
- Familiarity with MLOps platforms and practices — MLflow, Weights & Biases, Kubeflow, or SageMaker Pipelines
- Proficiency with cloud data platforms: AWS (Glue, EMR, S3, SageMaker), GCP (Dataflow, BigQuery, Vertex AI), or Azure (ADF, Synapse, Azure ML)
- Strong command of SQL and experience with both relational (PostgreSQL, MySQL) and analytical (Redshift, BigQuery, Snowflake) databases
- Experience with data quality and observability tooling (Great Expectations, Monte Carlo, or similar)
- Comfortable working in a fully remote, async-first engineering environment
- US Citizen or Permanent Resident (Green Card holder) required — no visa sponsorship
Preferred Qualifications
- Experience in data privacy, security, or compliance domains — PII detection, data classification, or sensitive data governance
- Hands-on experience with NLP pipelines — named entity recognition (NER), text classification, or document understanding at scale
- Experience working with LLMs in production — prompt engineering, fine-tuning, RAG architectures, or LLM-integrated data pipelines
- Familiarity with feature stores (Feast, Tecton, or Hopsworks)
- Experience with dbt for data transformation and data modeling in analytical pipelines
- Knowledge of data catalog or metadata management tools (Apache Atlas, DataHub, Collibra, or similar)
- Prior experience mentoring junior data engineers or working with distributed remote teams
- Contributions to open-source data or ML projects, or public technical writing
Compensation & Benefits
- Base salary: $150,000 – $175,000 depending on experience
- Meaningful equity in a high-growth AI startup
- Fully remote with flexible hours — async-first culture
- Comprehensive health, dental, and vision insurance
No recruiting agencies. We will not respond to agency outreach.


