Remote Senior Data Engineer at Daniel J Edelman Holdings

Description:

We are currently seeking a Senior Data Engineer with 5-7 years of experience.
The ideal candidate will have the ability to work independently within an AGILE working environment.
Experience working with cloud infrastructure leveraging tools such as Apache Airflow, Databricks, DBT, and Snowflake is required.
Familiarity with real-time data processing and AI implementation, including generative AI, is highly advantageous.
Responsibilities include designing, building, and maintaining scalable and robust data pipelines to support analytics and machine learning models, ensuring high data quality and reliability for both batch and real-time use cases.
The candidate will design, maintain, and optimize data models and data structures in tools such as Snowflake and Databricks.
They will leverage Databricks and Cloud-native solutions for big data processing, ensuring efficient management of Spark jobs and seamless integration with other data services.
The role involves utilizing PySpark and/or Ray to build and scale distributed computing tasks, enhancing the performance of machine learning model training and inference processes.
Monitoring, troubleshooting, and resolving issues within data pipelines and infrastructure while implementing best practices for data engineering and continuous improvement is essential.
The candidate will integrate generative AI capabilities into data pipelines and workflows to support advanced use cases such as data enrichment, automated content generation, and natural language processing.
Collaboration with machine learning engineers to optimize generative AI workflows, ensuring seamless deployment and scalability in production environments is required.
Developing APIs and tools to enable internal teams to consume generative AI models and services efficiently is part of the role.
Staying informed about advancements in generative AI technologies and recommending their adoption to improve business processes and analytics capabilities is expected.
The candidate will diagrammatically document data engineering workflows and generative AI integrations.
Collaboration with other Data Engineers, Product Owners, Software Developers, and Machine Learning Engineers to implement new product features by understanding their needs and delivering on time is crucial.

Requirements:

A minimum of 5 years of experience deploying enterprise-level scalable data engineering solutions is required.
Strong examples of independently developed data pipelines end-to-end, from problem formulation, raw data, to implementation, optimization, and results are necessary.
A proven track record of building and managing scalable cloud-based infrastructure on AWS (including S3, Dynamo DB, EMR) is essential.
Experience implementing and managing AI model lifecycles in production, including generative AI models, is required.
Familiarity with tools like OpenAI API, Hugging Face Transformers, or equivalent platforms for generative AI is advantageous.
Strong experience using Apache Airflow (or equivalent), Snowflake, and Lucene-based search engines is necessary.
Advanced SQL and Python knowledge with associated coding experience is required.
Experience with Databricks (Delta format, Unity Catalog) is essential.
Strong experience with DevOps practices for continuous integration and continuous delivery (CI/CD) is necessary.
Experience wrangling structured and unstructured file formats (Parquet, CSV, JSON) is required.
Understanding and implementation of best practices within ETL and ELT processes is essential.
Data quality best practice implementation using tools like Great Expectations is necessary.
Real-time data processing experience using Apache Kafka (or equivalent) is advantageous.
Knowledge of generative AI model architectures and their integration into scalable systems is required.
A proven ability to work independently with minimal supervision is essential.
The candidate should take initiative and be action-focused.
Mentoring and sharing knowledge with junior team members is expected.
A strong ability to collaborate within cross-functional teams is necessary.
Excellent communication skills with the ability to communicate with stakeholders across varying interest groups are required.
Fluency in spoken and written English is essential.

Benefits:

The position offers the opportunity to work in a global, multidisciplinary research, analytics, and data consultancy.
Employees will be part of a team dedicated to building trusting relationships with people through data and intelligence.
The company promotes a diverse, inclusive, and authentic workplace.
Candidates are encouraged to apply even if their experience does not perfectly align with every qualification, as they may be the right fit for this or other roles.