Cloud Engineer, Data platform

Date: 20 May 2025

Location:

SG

Company: StarHub Ltd

Job Description

JOB PURPOSE

The Data Platform Team is responsible for designing, implementing, and managing a modern data platform that embraces the principles of data mesh, empowering teams to create and manage their own data products. Our mission is to deliver high-quality, scalable data solutions that drive business value across the organization.
As a key member of this team, you will be responsible for building scalable, stable, and secure data pipelines that support both batch and streaming workloads. Your work ensures reliable data delivery across domains and supports the development of reusable, self-serve data products.
In this role, you will collaborate with business owners, engineers, and data stewards to implement ingestion frameworks and transformation jobs that align with the data-as-a-product vision. You will apply best practices in data engineering to enable efficient data integration across cloud and on-prem environments.

KEY RESPONSIBILITIES

Design and develop scalable, secure, and efficient data ingestion pipelines for structured and unstructured data from internal and external systems across AWS and on-prem environments.
Work closely with architects and business domain teams to translate data requirements into robust data pipelines and process workflows.
Design, build, and maintain real-time and batch data pipelines using Kafka, Spark Streaming, AWS EMR, Glue, Lambda, and other AWS services to
ingest and process high-frequency data from diverse internal and external sources.
Implement data partitioning, compaction, and optimization techniques to improve data processing performance and reduce cloud storage costs.
Assist in incident investigations, root cause analysis, and resolution of data pipeline failures or performance bottlenecks.
Document data flow designs, ingestion standards, and transformation logic clearly for use by other engineers, data stewards, and auditors.

QUALIFICATIONS

Required:

Minimum 2 years of experience (or 5 years for a senior position) in Data Engineering, Software Engineering or related fields.
Proven experience building and managing real-time and batch data pipelines on AWS using services such as EMR, Glue, Lambda, S3, and EC2.
Strong knowledge of Python and Spark, with hands-on experience designing low latency ETL/ELT pipelines.
Experience handling large-scale datasets and optimizing cloud storage formats and query performance.
Familiarity with infrastructure components such as IAM roles, Security Groups, and VPC networking to support secure data access and movement.
Hands-on experience with Linux environments, shell scripting, and AWS CLI for managing data and computation resources.
Strong communication and collaboration skills to work across data, engineering, and network teams.
Ability to maintain clean, structured documentation of ingestion logic, transformation steps, and data flow dependencies.

Preferred:

Certifications in cloud technology platforms (such as cloud architecture, container platforms, systems, and/or network virtualization).
Knowledge of telecom networks, including mobile and fixed networks, will be an added advantage.
Familiarity with data fabric and data mesh concepts, including their implementation and benefits in distributed data environments, is a bonus.