@michaelhariesnamucoibea - Senior Data Engineer

Senior Data Engineer

Data Engineer

Available for hire

Years of experience

8+ years

Experience level

Senior

Available for

Full-time, Part-time, Contract

Available from

07 Aug 2026

I am a Senior Data Engineer with 8 years of experience building production data pipelines, warehouse models, and analytics-ready datasets across finance, fintech, healthcare, insurance, and logistics environments. I work mainly with SQL, Python, Airflow, dbt, Snowflake, Amazon Redshift, Spark, and cloud data platforms.

What makes me unique is that I do not only focus on moving data from one place to another. I care about making data reliable, well-controlled, easy to maintain, and useful for real business decisions. I have strong experience improving data quality, adding reconciliation checks, documenting data lineage, and supporting reporting platforms that analysts and business teams can trust.

I am looking for a Senior Data Engineer role where I can work on production pipelines, analytics platforms, and regulated data environments, while collaborating closely with analysts, engineers, and business stakeholders. I bring a calm, practical, and accountable working style, and I enjoy solving data problems in a way that creates long-term value for the business.

Languages

English

Employment History

Senior Data Engineer at Full Scale 2022 - 2026

Designed financial-data lakehouse pipelines for fintech and insurance clients using Python, PySpark, Spark Structured Streaming, Kafka, Airflow, dbt, AWS S3, Snowflake, and Redshift, normalizing high-volume transaction, pricing, behavioral, and reference datasets into canonical models for analytics, risk reporting, and downstream data products. Built event-time ingestion workflows with Kafka and Spark to handle late-arriving records, duplicate messages, schema drift, and missing partitions across streaming and batch sources, improving daily data freshness from multi-hour delays to near-real-time availability for key reporting and research datasets. Optimized dbt and Airflow pipelines with incremental models, snapshots, Jinja macros, XCom-driven dependency checks, sensor tasks, and partition-aware SQL patterns, reducing processing time by 30% while keeping critical datasets available before analyst and stakeholder reporting windows. Designed S3 and Delta Lake storage layouts using practical lakehouse patterns similar to Iceberg table design, including partition pruning, compaction, schema evolution, retention rules, and snapshot-style recovery, improving large-table scan performance and reducing avoidable storage growth across historical datasets. Created validation and reconciliation workflows with Great Expectations, dbt tests, Airflow checks, and SQL-based anomaly detection to compare source counts, landing-zone files, curated tables, and BI outputs, reducing recurring data quality incidents and giving analysts clearer confidence in production datasets. Developed shared Python utilities for data loading, schema validation, metadata checks, and reusable DataFrame transformations across Pandas, PySpark, and Polars-style workflows, helping analytics and data science users consume normalized datasets without repeatedly rewriting ingestion or cleanup logic. Modeled slowly changing reference data, customer activity history, revenue events, and operational metrics in Snowflake, Redshift, and BigQuery using dimensional models, point-in-time joins, and optimized window functions so analysts could reproduce historical views without breaking backtests or trend analysis. Built automated test coverage for pipeline changes using dbt tests, Great Expectations suites, SQL regression checks, and Airflow validation tasks, then wired the checks into Jenkins CI/CD so schema changes, null-rate spikes, and late-arriving data defects were caught before production release. Partnered with data science, backend, product, and business stakeholders to translate vague reporting and modeling needs into stable datasets, access patterns, and runbooks, improving trust in self-service analytics and contributing to client retention and data-platform revenue expansion of 11%.

Data Engineer at DXC Technology Philippines 2020 - 2022

Built production ETL and ELT pipelines for finance, gaming, and digital operations clients using Python, SQL, AWS Glue, Airflow, Kafka, Redshift, and S3, combining transactional, event, customer, and operational datasets into curated models used by reporting, segmentation, and risk-monitoring teams. Developed API and file-based ingestion jobs that processed JSON payloads, relational extracts, event logs, and third-party data feeds, adding validation rules, retry handling, and audit columns so downstream users could understand data freshness, source ownership, and load history more easily. Improved Spark and SQL workloads by tuning joins, partitions, indexes, materialized views, and window-function queries across Redshift, SQL Server, PostgreSQL, and MySQL, reducing repeated reporting bottlenecks and making large operational datasets easier to query during peak business hours. Built dbt transformations and Kimball-style dimensional models for customer activity, payments, campaigns, and service operations, organizing staging, intermediate, and mart layers so analysts could trace business metrics from raw sources to dashboard-ready tables without relying on ad-hoc SQL copies. Supported streaming ingestion with Kafka and Python consumers for high-volume events, applying deduplication, late-arrival handling, and sequence checks before loading curated datasets into Redshift and S3, which helped stabilize analytics feeds used by customer engagement and operational teams. Implemented data quality checks with SQL validations, Airflow task monitoring, and exception reports to flag missing files, row-count mismatches, broken joins, and unexpected value changes, reducing manual investigation time and making recurring pipeline failures easier for support teams to diagnose. Created Looker, Tableau, and Redash dashboards on top of curated warehouse tables, turning complex SQL outputs into usable views for product, operations, and finance teams while keeping metric definitions documented and consistent across recurring business reviews. Automated cloud resource provisioning and deployment workflows with Terraform, Docker, Jenkins, and Git-based release processes, making data jobs easier to promote across environments and reducing configuration drift between development, staging, and production workloads. Worked with Salesforce, SAP, HubSpot, and internal application data to build customer and campaign analytics models, combining behavioral and transactional signals that supported segmentation, retention analysis, and more timely marketing performance reporting. Documented pipeline ownership, data contracts, transformation logic, incident notes, and recovery steps in shared runbooks, improving handover quality across remote teams and helping analysts understand which datasets were reliable for monthly reporting, operational reviews, and ad-hoc analysis.

Data Analyst at Lalamove 2018 - 2020

Analyzed logistics, delivery, customer, driver, and payment datasets using SQL, Python, PostgreSQL, MySQL, and SQL Server, cleaning inconsistent fields and preparing analytics-ready tables that supported operations reporting, route performance analysis, and recurring business reviews. Built ETL workflows with SSIS, Pentaho Kettle, Python scripts, and scheduled SQL jobs to consolidate CSV files, relational tables, and operational extracts into reporting databases, reducing repetitive manual preparation work for weekly logistics and customer-service reports. Optimized reporting queries by improving SQL joins, indexes, filters, date logic, and aggregation patterns across OLTP and OLAP workloads, helping dashboards load more consistently and giving stakeholders faster access to delivery performance and customer activity metrics. Created exploratory analyses and dashboard datasets for delivery volume, cancellation trends, driver availability, customer behavior, and regional performance, translating raw operational data into clear summaries that non-technical managers could use for planning and issue follow-up. Supported early Kafka, MongoDB, Cassandra, S3, RDS, and DynamoDB data workflows by validating extracts, checking record counts, reviewing sample payloads, and helping senior engineers confirm that event and NoSQL datasets matched expected reporting definitions. Built reusable Python and SQL scripts for data cleaning, deduplication, timestamp standardization, and basic anomaly checks, improving consistency across recurring analysis and reducing confusion caused by mismatched time zones, duplicate rows, and incomplete operational records. Worked closely with operations, finance, and engineering teams to clarify metric definitions, document data assumptions, and explain analysis results in simple language, building the practical SQL, data modeling, and stakeholder communication foundation used in later data engineering roles.

Python Developer Intern at Lalamove 2018 - 2018

Cleaned and prepared customer, marketing, and operations datasets using Python, Pandas, NumPy, and SQL, standardizing inconsistent columns, handling missing values, and producing structured files that senior analysts could use for reporting and exploratory analysis. Wrote basic SQL queries and Python transformation scripts to pull data from relational tables, join operational extracts, and convert raw CSV files into cleaner formats for dashboards, internal reporting, and small analytical experiments. Assisted with early machine learning prototypes in scikit-learn by preparing features, splitting datasets, running simple classification models, and reviewing precision, recall, confusion matrices, and other evaluation outputs under guidance from senior team members. Built small automation scripts for recurring reporting tasks, including file checks, data formatting, and scheduled extracts, giving the team a more repeatable process for handling weekly operational reports and reducing manual spreadsheet cleanup. Supported documentation for data definitions, script usage, source files, and known data issues, helping the team keep track of assumptions and making it easier for other interns and analysts to understand existing reporting workflows. Learned production data practices by supporting senior engineers with debugging, test data preparation, SQL validation, and basic pipeline monitoring, building a practical foundation in Python, SQL, data cleaning, and analytics workflow discipline.

Education

Bachelor of Computer Science at The Hong Kong University of Science and Technology 2014 - 2018

Senior Data Engineer

Skills

Languages

Employment History

Education

Get realtime job alerts