Data Engineer at
DXC Technology Philippines
2020
-
2022
Built production ETL and ELT pipelines for finance, gaming, and digital operations clients using Python, SQL, AWS Glue, Airflow, Kafka, Redshift, and S3, combining transactional, event, customer, and operational datasets into curated models used by reporting, segmentation, and risk-monitoring teams.
Developed API and file-based ingestion jobs that processed JSON payloads, relational extracts, event logs, and third-party data feeds, adding validation rules, retry handling, and audit columns so downstream users could understand data freshness, source ownership, and load history more easily.
Improved Spark and SQL workloads by tuning joins, partitions, indexes, materialized views, and window-function queries across Redshift, SQL Server, PostgreSQL, and MySQL, reducing repeated reporting bottlenecks and making large operational datasets easier to query during peak business hours.
Built dbt transformations and Kimball-style dimensional models for customer activity, payments, campaigns, and service operations, organizing staging, intermediate, and mart layers so analysts could trace business metrics from raw sources to dashboard-ready tables without relying on ad-hoc SQL copies.
Supported streaming ingestion with Kafka and Python consumers for high-volume events, applying deduplication, late-arrival handling, and sequence checks before loading curated datasets into Redshift and S3, which helped stabilize analytics feeds used by customer engagement and operational teams.
Implemented data quality checks with SQL validations, Airflow task monitoring, and exception reports to flag missing files, row-count mismatches, broken joins, and unexpected value changes, reducing manual investigation time and making recurring pipeline failures easier for support teams to diagnose.
Created Looker, Tableau, and Redash dashboards on top of curated warehouse tables, turning complex SQL outputs into usable views for product, operations, and finance teams while keeping metric definitions documented and consistent across recurring business reviews.
Automated cloud resource provisioning and deployment workflows with Terraform, Docker, Jenkins, and Git-based release processes, making data jobs easier to promote across environments and reducing configuration drift between development, staging, and production workloads.
Worked with Salesforce, SAP, HubSpot, and internal application data to build customer and campaign analytics models, combining behavioral and transactional signals that supported segmentation, retention analysis, and more timely marketing performance reporting.
Documented pipeline ownership, data contracts, transformation logic, incident notes, and recovery steps in shared runbooks, improving handover quality across remote teams and helping analysts understand which datasets were reliable for monthly reporting, operational reviews, and ad-hoc analysis.