Senior Data Engineer at
FullStack
2023
-
2026
Designed and maintained ETL and ELT data pipelines for fintech, insurance, and customer analytics clients, using Python, PySpark, SQL, Airflow, dbt, Databricks, Azure Synapse Analytics, Snowflake, and Redshift to support enterprise reporting, operational analytics, and data integration across cloud environments.
Built PySpark transformation jobs and SQL models to process transactional, behavioral, CRM, and event-based datasets from Salesforce, HubSpot, SAP, partner APIs, and product systems, preparing clean and reusable datasets for analysts, data scientists, and business stakeholders.
Worked on Azure-based data engineering workflows using Azure Data Lake, Azure Synapse Analytics, Azure SQL, and Databricks, organizing raw and curated layers so reporting teams could trace source data, transformation logic, and business definitions more easily.
Optimized Spark, SQL, and dbt workloads by improving incremental loading logic, partitioning strategy, join patterns, and warehouse model design, reducing processing time for several recurring pipelines by roughly 32% while keeping daily reporting refreshes stable.
Implemented data integration best practices across batch and near-real-time pipelines, including schema validation, source-to-target checks, freshness monitoring, late-arriving data handling, and clear documentation for ownership, dependencies, and recovery steps.
Collaborated with client engineering and analytics teams to clarify vague data requirements, translate business questions into technical pipeline logic, and deliver maintainable datasets for revenue reporting, customer segmentation, product usage analysis, and operational dashboards.
Used Git-based version control, Jenkins pipelines, and code review workflows to manage changes across Airflow DAGs, dbt models, PySpark scripts, and SQL transformations, keeping deployments controlled across development and production environments.
Added automated test processes for data pipelines, including dbt tests, PySpark validation scripts, row-count comparisons, null checks, duplicate-key checks, and Airflow task-level alerts to catch data quality issues before they reached reporting users.
Supported client-facing data platform improvements by making pipelines more reliable, documenting integration logic, and improving trusted reporting datasets used by product and revenue teams, contributing to stronger renewal visibility and business planning accuracy by 9%.