R
raydesouza_0420's photo
From Brazil 10:45 AM (GMT-03:00)
$25/hr or $40,000/yr

Active over a week ago


Member since May 2026

Share this profile:

Senior Data Engineer

Data Engineer
Available for hire
Years of experience
9+ years
Experience level
Senior
Available for
Full-time, Part-time, Contract
Download Resume / CV

Senior Data Engineer with 9+ years of experience building ETL pipelines, cloud data platforms, and analytics-ready datasets across financial, operational, customer, and product data environments. Strong hands-on background with Python, PySpark, SQL, Azure Synapse Analytics, Azure Data Lake, Databricks, Airflow, dbt, Snowflake, and distributed data processing. Experienced in designing reliable data integration workflows, improving data quality, optimizing warehouse performance, and supporting business teams with trusted reporting datasets. Known for clear communication, practical problem-solving, and steady delivery in remote, client-facing environments where data systems need to be maintainable, well documented, and dependable

Languages

Employment History

Senior Data Engineer at FullStack 2023 - 2026
Designed and maintained ETL and ELT data pipelines for fintech, insurance, and customer analytics clients, using Python, PySpark, SQL, Airflow, dbt, Databricks, Azure Synapse Analytics, Snowflake, and Redshift to support enterprise reporting, operational analytics, and data integration across cloud environments. Built PySpark transformation jobs and SQL models to process transactional, behavioral, CRM, and event-based datasets from Salesforce, HubSpot, SAP, partner APIs, and product systems, preparing clean and reusable datasets for analysts, data scientists, and business stakeholders. Worked on Azure-based data engineering workflows using Azure Data Lake, Azure Synapse Analytics, Azure SQL, and Databricks, organizing raw and curated layers so reporting teams could trace source data, transformation logic, and business definitions more easily. Optimized Spark, SQL, and dbt workloads by improving incremental loading logic, partitioning strategy, join patterns, and warehouse model design, reducing processing time for several recurring pipelines by roughly 32% while keeping daily reporting refreshes stable. Implemented data integration best practices across batch and near-real-time pipelines, including schema validation, source-to-target checks, freshness monitoring, late-arriving data handling, and clear documentation for ownership, dependencies, and recovery steps. Collaborated with client engineering and analytics teams to clarify vague data requirements, translate business questions into technical pipeline logic, and deliver maintainable datasets for revenue reporting, customer segmentation, product usage analysis, and operational dashboards. Used Git-based version control, Jenkins pipelines, and code review workflows to manage changes across Airflow DAGs, dbt models, PySpark scripts, and SQL transformations, keeping deployments controlled across development and production environments. Added automated test processes for data pipelines, including dbt tests, PySpark validation scripts, row-count comparisons, null checks, duplicate-key checks, and Airflow task-level alerts to catch data quality issues before they reached reporting users. Supported client-facing data platform improvements by making pipelines more reliable, documenting integration logic, and improving trusted reporting datasets used by product and revenue teams, contributing to stronger renewal visibility and business planning accuracy by 9%.
Data Engineer at Mismo 2020 - 2023
Developed data engineering solutions for gaming, sports betting, and digital product clients, using Python, SQL, Airflow, AWS Glue, Kafka, Redshift, dbt, and Databricks to support ETL workflows, customer analytics, and operational reporting across multiple business units. Rebuilt older ETL jobs that depended on manual exports and long-running SQL scripts, moving them into scheduled Airflow DAGs with Python transformations, clearer dependencies, and better failure handling for recurring reporting datasets. Created ingestion pipelines from CRM systems, payment records, campaign platforms, product events, and third-party APIs into AWS S3 and Amazon Redshift, supporting analytics workflows for roughly 450,000 active users across customer engagement and transaction reporting. Designed dimensional models for customer, transaction, campaign, and product usage data, using fact tables, dimension tables, incremental dbt models, materialized views, and SQL optimization techniques to make reporting datasets easier to query and maintain. Worked closely with analysts and business users to define KPI logic, event definitions, and customer lifecycle metrics, then converted those definitions into reusable SQL models and dashboard-ready tables for Looker, Tableau, and Redash. Improved data quality by adding validation checks for missing identifiers, duplicate events, late-arriving records, and mismatched totals between source applications and warehouse tables, reducing repeated reconciliation work across recurring reports. Supported cloud data infrastructure using AWS S3, Redshift, Kafka, Docker, Terraform, and Jenkins, helping package pipeline changes, provision resources, and move data engineering updates through controlled deployment workflows. Used Git and peer review practices to manage Python, SQL, and dbt changes, keeping transformation logic traceable and reducing the risk of undocumented updates in shared reporting and analytics pipelines. Integrated Salesforce, SAP ERP, HubSpot, product activity, and transactional data into Redshift models that supported segmentation, customer behavior analysis, and campaign reporting, improving campaign conversion tracking quality by about 15%. Documented pipeline schedules, source-to-target mappings, known data limitations, and troubleshooting steps so support teams and non-technical stakeholders could understand how datasets were produced and when data issues needed escalation.
Data Analyst at IBTI IT solutions 2017 - 2020
Supported data analysis and reporting workflows for telecommunications and operational datasets, using SQL, Python, SSIS, Pentaho Kettle, PostgreSQL, MySQL, and SQL Server to clean, transform, and prepare data for recurring business reports. Built ETL routines with SSIS Data Flow Tasks, Pentaho transformations, and Python scripts to consolidate customer, billing, support, and usage data from relational systems into structured reporting tables used by finance, operations, and service teams. Optimized SQL queries, joins, indexes, stored procedures, and reporting views across PostgreSQL, MySQL, and SQL Server, improving performance for frequently used analytics reports by about 28% without changing the underlying business logic. Created recurring reporting extracts and dashboard datasets for service volumes, billing exceptions, customer activity trends, and operational KPIs, helping business users review data issues without depending on repeated manual pulls from the data team. Used Python with Pandas and NumPy for exploratory data analysis, identifying missing values, inconsistent identifiers, duplicate records, and unusual patterns in source files before they were loaded into reporting workflows. Assisted senior engineers with SQL Server Agent jobs, Cron-based scripts, batch schedules, and early data storage processes, gaining practical experience with orchestration, refresh dependencies, and production reporting support. Prepared validation summaries and reconciliation checks between source systems and reporting outputs, documenting whether differences were caused by timing, transformation rules, missing records, or source-system data quality issues.
Python Developer Inter at IBTI IT solutions 2016 - 2017
Developed Python scripts to clean customer, marketing, and operational CSV files using Pandas and NumPy, standardizing inconsistent fields, removing duplicate records, and preparing raw data for reporting and analysis by senior team members. Wrote SQL queries and basic transformation scripts to join relational tables, extract reporting datasets, and organize raw information into cleaner structures for spreadsheets, dashboards, and internal business review. Built small automation utilities in Python to reduce repeated manual work around file validation, column mapping, data formatting, and simple exception reporting for internal users who depended on recurring data outputs. Assisted with early machine learning prototypes in scikit-learn by preparing training datasets, running basic classification models, and reviewing evaluation outputs such as confusion matrices, precision, and recall with senior engineers. Supported API and database integration tasks by parsing JSON responses, loading structured data into relational tables, and troubleshooting basic issues with missing fields, inconsistent formats, or failed scheduled scripts. Documented Python scripts, SQL logic, and data preparation steps clearly so other developers and analysts could reuse the work, creating a practical foundation for later ETL, data pipeline, and cloud data engineering responsibilities.

Education

Bachelor's Degree of Software Engineering at Federal University of São Carlos 2013 - 2017