Remote Site Reliability Engineer

at Faptic Technology

Posted 2 days ago 2 applied

Description:

  • Faptic Technology is a leading provider of IT consulting and managed services, specializing in Azure cloud solutions, software development, and site reliability engineering (SRE).
  • The company is looking for an Azure Site Reliability Engineer (SRE) to join their dynamic team on a part-time, B2B basis.
  • The ideal candidate will have expertise in managing cloud infrastructure, monitoring services, and ensuring system reliability using Microsoft Azure.
  • Key responsibilities include applying SRE principles, implementing log management and diagnostics, configuring firewall services, and ensuring the reliable operation of Azure-based services.
  • The role also involves utilizing Terraform for infrastructure automation, developing CI/CD pipelines using Azure DevOps, and conducting incident response and root cause analysis.
  • The engineer will collaborate with teams to improve observability, establish incident management processes, and track service uptime and performance metrics.
  • Compliance with security and governance policies within Azure environments and optimizing cost management strategies are also essential tasks.
  • The candidate will prepare, test, and execute disaster recovery plans to ensure business continuity.

Requirements:

  • Candidates must have Microsoft certification at an intermediate level or above, such as Microsoft Certified: Azure Administrator Associate, Azure DevOps Engineer Expert, or Azure Solutions Architect Expert.
  • Hands-on experience with Terraform for infrastructure automation is required.
  • Strong expertise in Azure DevOps for CI/CD pipeline management is essential.
  • Proficiency in Azure Monitor, Log Analytics, and Application Insights is necessary.
  • Experience in setting up alerts and monitoring solutions for proactive issue resolution is required.
  • A strong background in incident response and troubleshooting in cloud environments is essential.
  • Exposure to Palo Alto services and systems is preferred.
  • Candidates should have a solid understanding of cloud security best practices and networking in Azure.
  • Familiarity with PowerShell, Bash, or Python for automation and scripting tasks is required.
  • Experience working in an on-call rotation for incident response is necessary.
  • The ability to generate and analyze service reports on uptime, performance, and reliability is essential.
  • Experience in disaster recovery planning, testing, and execution to ensure high availability is required.
  • Preferred qualifications include experience with configuration management tools (Ansible, Chef, or Puppet), knowledge of alternative cloud platforms, and experience with Azure Functions, Logic Apps, and Event Grid.

Benefits:

  • The position offers the opportunity to work with a leading provider of IT consulting and managed services.
  • Employees will be part of a dynamic team focused on solving complex challenges in cloud solutions and software development.
  • The role allows for part-time work on a B2B basis, providing flexibility.
  • Employees will gain experience in cutting-edge technologies and practices in Azure cloud solutions.
  • The company fosters a culture of innovation and collaboration, enhancing professional growth and development.