This job post is closed and the position is probably filled. Please do not apply.
🤖 Automatically closed by a robot after apply link
was detected as broken.
Description:
As a Senior DevOps Engineer, you will be responsible for ensuring that our platform is stable and healthy.
You will foster developer-run ownership and empower developers to build resilient products.
Your role includes supporting developers during the application build phase with operational design, automation, capacity planning, and monitoring.
You will plan, manage, and oversee all aspects of the production environment for all merchant loyalty use cases.
You will define strategies for all facets of observability and identify areas of improvement in production.
You will respond to incidents and improvise the platform based on feedback, measuring the reduction of incidents over time.
Your responsibilities include ensuring reliable, fault-tolerant, efficiently scalable, and cost-effective services and infrastructure.
You will maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
You will practice sustainable incident response and conduct blameless postmortems.
You will ensure that batch production scheduling and processes are accurate and timely.
You will analyze ITSM activities of the platform and provide feedback to development teams on operational gaps or resiliency concerns.
You will support services before they go live through system design consulting, capacity planning, and launch reviews.
You will scale systems sustainably through automation and push for changes that improve reliability and velocity.
You will work with a global team spread across multiple geographies and time zones.
Requirements:
A Bachelor’s degree in computer science, software engineering, or a similar field is required.
You must have experience in Splunk and SignalFx.
Proficiency with Amazon Web Services, including RDS, is necessary.
Relevant data DevOps, SRE, or general systems engineering experience is required.
You should have experience managing large production platforms.
Experience architecting and implementing data governance processes and tooling is essential.
Strong coding ability in Python or other languages such as Java, C#, Golang, C, C++, Perl, or Ruby is required.
A systematic problem-solving approach, strong communication skills, and a sense of ownership and drive are desired.
You should be able to help debug and optimize code and automate routine tasks.
Experience in dealing with difficult situations and making urgent decisions is needed.
An interest in designing, analyzing, and troubleshooting large-scale distributed systems is preferred.
You should have a good handle on Change Management and Release Management aspects of software.
Benefits:
3Pillar offers a flexible work environment, allowing you to work from the office, home, or a blend of both.
You will be part of a global team, learning from top talent around the world and across cultures.
The company emphasizes well-being, offering fitness programs, mental health plans, and generous time off.
There are ample career growth and development opportunities across various projects, offerings, and industries.
3Pillar is an equal-opportunity employer, committed to diversity and values like Intrinsic Dignity and Open Collaboration.