Remote Data Engineer

Posted

This job is closed

This job post is closed and the position is probably filled. Please do not apply.  Automatically closed by a robot after apply link was detected as broken.

Description:

  • The Data Engineer will report to the Senior Manager of AI & Data Platform and will be responsible for building tools and infrastructure to support the Data Products and Insights & Innovation teams.
  • This role requires a talented, curious self-starter who is driven to solve complex problems and can manage multiple domains and stakeholders.
  • The Data Engineer will collaborate with all levels of the Data and AI team and various engineering teams to develop data solutions, scale data infrastructure, and advance the organization towards a data-centric model.
  • Key responsibilities include designing, building, and deploying components of a modern data stack, including CDC ingestion using Debezium, a centralized Hudi data lake, and various data pipelines.
  • The role involves maintaining legacy Python ELT scripts while transitioning to dbt models in Redshift, ensuring operational stability and innovation.
  • Collaboration within a cross-functional team is essential for planning and rolling out data infrastructure and processing pipelines that support analytics, machine learning, and GenAI services.
  • The Data Engineer must thrive in ambiguous conditions, independently identifying opportunities to optimize pipelines and improve data workflows under tight deadlines.
  • Responsibilities also include responding to PagerDuty alerts, implementing monitoring solutions, and ensuring high availability and reliability of data systems.
  • Strong communication skills are necessary to assist technical and non-technical audiences and to help internal teams surface actionable insights that enhance customer satisfaction.

Requirements:

  • Candidates must have 3+ years of experience in building data pipelines and managing a secure, modern data stack, including CDC streaming ingestion using tools like Debezium into a Hudi data lake.
  • At least 3 years of experience working with AWS cloud infrastructure, including Kafka (MSK), Spark/AWS Glue, and infrastructure as code (IaC) using Terraform is required.
  • Strong coding skills in Python, SQL, and dbt are necessary, with the ability to write and review high-quality, maintainable code.
  • Prior experience in building data lakes on S3 using Apache Hudi with various file formats such as Parquet, Avro, JSON, and CSV is essential.
  • Candidates should have experience building and managing multi-stage workflows using serverless Lambdas and AWS Step Functions.
  • Familiarity with data governance practices, including data quality, lineage, and privacy, is required, along with experience using cataloging tools.
  • Experience developing and deploying data pipeline solutions using CI/CD best practices is necessary.
  • Working knowledge of data integration tools such as Stitch and Segment CDP is required.
  • Knowledge and practical experience with analytical and machine learning tools like Athena, Redshift, or Sagemaker Feature Store is a bonus.

Benefits:

  • Employees have the flexibility to work from the office in downtown Toronto or remotely, allowing them to choose their preferred work environment.
  • The company offers diverse learning experiences, educational allowances, mentorship, and support for personal growth.
  • A strong investment in health and wellness is made, addressing body, mind, and soul.
  • Fair compensation and various office perks are provided, along with the expected benefits of a growing tech company.
  • Wave promotes a diverse and inclusive culture, valuing individuality and encouraging open feedback, fostering an environment where innovation flourishes.
  • The company has been recognized as one of Canada's Top Ten Most Admired Corporate Cultures and one of Canada’s Great Places to Work in various categories.
Leave a feedback