Faptic Technology is a leading provider of IT consulting and managed services, specializing in Azure cloud solutions, software development, and site reliability engineering (SRE).
The company is looking for an Azure Site Reliability Engineer (SRE) to join their dynamic team on a part-time, B2B basis.
The ideal candidate will have expertise in managing cloud infrastructure, monitoring services, and ensuring system reliability using Microsoft Azure.
Key responsibilities include applying SRE principles, implementing log management and diagnostics, configuring firewall services, and ensuring the reliable operation of Azure-based services.
The role also involves utilizing Terraform for infrastructure automation, developing CI/CD pipelines using Azure DevOps, and conducting incident response and root cause analysis.
The engineer will collaborate with teams to improve observability, establish incident management processes, and track service uptime and performance metrics.
Compliance with security and governance policies within Azure environments and optimizing cost management strategies are also essential tasks.
The candidate will prepare, test, and execute disaster recovery plans to ensure business continuity.
Requirements:
Candidates must have Microsoft certification at an intermediate level or above, such as Microsoft Certified: Azure Administrator Associate, Azure DevOps Engineer Expert, or Azure Solutions Architect Expert.
Hands-on experience with Terraform for infrastructure automation is required.
Strong expertise in Azure DevOps for CI/CD pipeline management is essential.
Proficiency in Azure Monitor, Log Analytics, and Application Insights is necessary.
Experience in setting up alerts and monitoring solutions for proactive issue resolution is required.
A strong background in incident response and troubleshooting in cloud environments is essential.
Exposure to Palo Alto services and systems is preferred.
Candidates should have a solid understanding of cloud security best practices and networking in Azure.
Familiarity with PowerShell, Bash, or Python for automation and scripting tasks is required.
Experience working in an on-call rotation for incident response is necessary.
The ability to generate and analyze service reports on uptime, performance, and reliability is essential.
Experience in disaster recovery planning, testing, and execution to ensure high availability is required.
Preferred qualifications include experience with configuration management tools (Ansible, Chef, or Puppet), knowledge of alternative cloud platforms, and experience with Azure Functions, Logic Apps, and Event Grid.
Benefits:
The position offers the opportunity to work with a leading provider of IT consulting and managed services.
Employees will be part of a dynamic team focused on solving complex challenges in cloud solutions and software development.
The role allows for part-time work on a B2B basis, providing flexibility.
Employees will gain experience in cutting-edge technologies and practices in Azure cloud solutions.
The company fosters a culture of innovation and collaboration, enhancing professional growth and development.