Apply for Job
Lead Data Platform Enginner
Petaling Jaya, MY
Job Description
Role Mission: To lead and scale the Data Platform Engineering and Operations function within StarHub’s Digital Experience Platform (DXP) Data organization. This role ensures the continuous reliability, scalability, and security of StarHub’s C360 cloud data platform built on AWS, Snowflake, SageMaker, and Datapipe. The incumbent drives operational excellence, automation, and engineering maturity across the platform, while prototyping and rolling out new platform capabilities that enable agility, innovation, and performance for data and AI workloads across the enterprise.
Accountabilities:
-
Own end-to-end infrastructure and platform operations of the DXP Data Platform across AWS, Snowflake, and SageMaker environments (DEV, SIT, PROD).
-
Lead the design, build, and automation of data platform engineering and DevOps practices, ensuring continuous improvement and zero-downtime operations.
-
Lead the prototyping, implementation, and rollout of new platform capabilities and services across AWS, Snowflake, and SageMaker.
-
Implement governance, security, and compliance standards & improvements for cloud infrastructure, data access, and network controls.
-
Drive operational excellence through monitoring, alerting, cost optimization, and performance tuning.
-
Manage a hybrid team of internal platform engineers and vendor-augmented resources supporting Day 2 operations and enhancements.
-
Partner with Data Engineering, Architecture, Security, Infrastructure & Tooling teams to ensure aligned technical roadmaps, compliance readiness, and audit traceability.
Responsibilities:
-
Platform Engineering & Operations:
-
Own AWS infrastructure for DXP Data Platform (multi-AZ, multi-VPC setup).
-
Administer Snowflake environments, including user roles, RBAC, performance optimization, warehouse lifecycle, and cost controls.
-
Manage SageMaker environments (Studio, Canvas, Notebooks) for enabling multi-domain ML use cases.
-
Operate DataPipes (EKS + Airflow) for ingestion orchestration, ensuring high availability and version-controlled configurations via IaC (CloudFormation/CDK).
-
Maintain & enhance logging, monitoring, observability & DevOps automation via modern tools such as CloudWatch, Splunk, PagerDuty, Slack, ServiceNow, Snowflake observability features.
-
Platform Prototyping & Enhancement:
-
Design, prototype, and implement new platform features across AWS, Snowflake, and SageMaker to support innovation in data processing, analytics, and ML operations.
-
Lead rollout and production hardening of new platform components (e.g., new Snowflake Cortex AI features, SageMaker pipelines, AWS-native services).
-
Evaluate and integrate new services or capabilities aligned to StarHub’s data platform roadmap.
-
Develop technical design standards and documentation for new features and automation processes.
-
Automation & Continuous Improvement:
-
Implement Day 2 platform enhancements including auto-scaling, self-healing workflows, and CI/CD automation.
-
Enhance EKS cluster performance, pipeline automation, and integration efficiency across cloud and on-prem data sources.
-
Drive infrastructure-as-code (IaC) adoption for all environments and standardize rollout strategies (blue-green/canary).
-
Governance, Security & Compliance:
-
Enforce enterprise security policies for IAM, VPC isolation, PrivateLink, and encryption (KMS, Secrets Manager).
-
Coordinate vulnerability remediation (EKS upgrades, CVE patching, EC2 AMI refresh, Docker image hardening).
-
Ensure infrastructure audit readiness in partnership with Information Security (ITSec) and Compliance teams.
-
Operations & Cost Management:
-
Monitor and optimize Snowflake warehouse utilization, compute spend, and S3 data lifecycle management.
-
Maintain tagging, dashboards, and cost visibility frameworks across AWS and Snowflake.
-
Implement cost governance guardrails and usage quotas across platform components.
-
Team & Vendor Leadership:
-
Lead a blended team of StarHub engineers and partner vendors responsible for platform sustainment and evolution.
-
Manage augmented vendor teams, ensuring consistent delivery quality and knowledge transfer to internal engineers.
-
Build in-house capability in platform engineering, IaC, and automation disciplines.
Team Scope/ Stakeholders:
-
Scope: DXP Data Platform infrastructure (AWS, Snowflake, SageMaker, Datapipe) supporting C360, AI/ML, and analytics workloads enterprise-wide.
-
Decision Rights: Platform design approval, selection of new AWS/Snowflake/SageMaker capabilities, DevOps tooling and IaC framework choices, and vendor performance oversight.
-
Stakeholders: Platform Engineering, Data Engineering, Data Science, Architecture & Governance, Information Security, and Infrastructure teams.
-
Resources: Hybrid team of ~2-4 engineers (StarHub and partner resources) managing infrastructure, platform automation, enhancements, and operations.
Minimum Profile/ Track Record:
-
8-10 years of experience in cloud and platform engineering, with extensive experience on AWS-based data platforms.
-
Proven leadership of cross-functional engineering teams managing production-grade, multi-environment platforms.
-
Hands-on expertise in:
-
AWS Services: VPC, EC2, S3, RDS, Lambda, KMS, CloudFormation/CDK, Transfer Family, CloudWatch, CloudTrail.
-
Snowflake: administration, RBAC, warehouse optimization, DevOps automation, Cortex AI, and Streamlit integration.
-
EKS / Airflow / Airbyte (Datapipe): container orchestration, CI/CD pipelines, and deployment automation.
-
SageMaker: multi-domain setup, pipeline management, Studio/Canvas lifecycle, and MLOps enablement.
-
Monitoring & Observability: CloudWatch, Splunk, Snowflake Account Usage, cost dashboards, PagerDuty, Slack, ServiceNow.
-
Demonstrated success in prototyping, implementing, and scaling new cloud and data platform features into production.
-
Experience managing Day 2 operations, incident response, and SRE-driven performance stabilization.
-
Familiarity with machine learning integration and model lifecycle management.
-
Experience enforcing ITSec and compliance standards (IAM, KMS, PDPA/GDPR).
-
Proven success in transitioning platform operations from vendor-managed to in-house ownership.