Data Engineer II at
Beyondsoft
2020
-
2024
● Assisted in building ETL pipelines using Snowflake, Snowpark (Python), PySpark, dbt, Airbyte, and AWS
S3, following Medallion Architecture (Bronze, Silver, Gold).
● Wrote Python scripts and Snowflake UDFs to standardize business logic and support transformations.
● Supported pipeline orchestration with Apache Airflow and Azure Data Factory (ADF), monitoring jobs,
troubleshooting failures, and coordinating ingestion from on-prem and cloud systems.
● Helped implement dbt tests, documentation, and automated ELT workflows to ensure high-quality,
analytics-ready datasets.
● Queried and validated data in AWS S3 and Athena for reporting and ad-hoc analysis.
● Designed and delivered scalable batch and streaming pipelines using Python, PySpark, and SQL,
integrating Apache Kafka and Kafka Connect for real-time ingestion from APIs, IoT, and transactional
sources.
● Implemented CDC in Snowflake using streams, tasks, Python UDFs, and stored procedures for incremental
loads.
● Executed large-scale transformations in PySpark, tuning performance with partitioning, caching, and
adaptive query execution. tuning performance with part
● Designed data models and optimized queries in Snowflake, leveraging Snowpark for advanced
in-warehouse processing.
● Architected AWS S3 data lakehouse solutions with lifecycle management to Glacier, supporting raw,
curated, and enriched layers.
● Automated pipeline build, test, and deployment using GitHub Actions and Azure DevOps.
● Partnered with data scientists and analysts to convert business requirements into production-grade
datasets for reporting and ML use cases.
● Monitored pipeline performance and optimized PySpark, SQL, and Snowflake workloads to improve
freshness and reduce costs.
● Strengthened data quality with validation rules, schema enforcement, and logging frameworks across the
platform.