Background
- Architected an end-to-end ML delivery platform: designed SQL-based data extraction pipelines with automated validation checks, curated model-ready datasets, and established reproducible build artefacts using DVC and Docker—reducing data processing cycle time by 30% within three months.
- Engineered and deployed a production-grade RAG service on AWS (ECS / ALB / ECR): implemented hybrid retrieval with dense reranking, citation/audit trails, and an evaluation harness with regression-style quality gates—achieving sub-200 ms P95 inference latency at launch.
- Implemented a full CI/CD pipeline (GitHub Actions) with automated unit, integration, and acceptance tests; enforced accuracy/latency trade-off thresholds as mandatory quality gates before every production promotion.
- Established release discipline through versioned runbooks, blue/green rollback patterns, Kubernetes-compatible health-check probes, and operational telemetry (structured logs, Prometheus metrics)—achieving zero unplanned downtime across all releases.
- Exposed versioned inference endpoints via FastAPI, containerised all runtimes with Docker, and adopted Terraform for repeatable infrastructure provisioning across environments.
- Designed and operationalised end-to-end predictive analytics systems for LV/MV grid operations: architected SQL + Python pipelines ingesting 5 TB+ of operational data, engineered domain-specific features, and delivered batch inference workflows sustaining 99.9% operational uptime.
- Built and standardised a data quality framework (schema validation, referential-integrity checks, automated refresh logic) that unified KPI definitions across teams and reduced decision-making latency by 40% through monitoring-friendly dashboards.
- Orchestrated fault-detection modelling lifecycle from raw sensor data through feature engineering, model selection (AUC/F1 evaluation), and deployment of batch inference jobs integrated with operational monitoring systems.
- Scaled and led a technical unit of 10+ engineers within six months; established execution cadences, incident-response runbooks, and delivery-predictability frameworks that reduced mean-time-to-recovery (MTTR) for grid incidents.
- Received internal Enel Innovation Award (2017) for designing an AR-enabled smart helmet prototype to monitor subcontractor work quality and enhance operational safety compliance.