- Home
- Remote Jobs
- Job Listings for Site Reliability Engineer
Job Search Results
Job Icon Guide
- Candidates Everywhere
- Candidates in US
- Australia Candidates
- Canada Candidates
- India Candidates
- UK Candidates
- a Certain City or State
Featured Companies are employers who have come directly to FlexJobs, been approved by our staff, and have directly posted their jobs to the FlexJobs site.
- 100% Remote Work
- Full-Time
- Employee
- US National
Maintain and engineer Kubernetes clusters, deploy monitoring and logging solutions, work with development teams to foster site reliability principles, define and manage SLOs/SLIs/error budgets. Debug production/non-production issues and take part in ..
- 100% Remote Work
- Full-Time
- Employee
- Canada, or US National
Design long-term technical solutions and cross-team mechanisms to achieve reliability goals. Define a roadmap for engineering teams to utilize automated, self-service, scalable, efficient, observable, and reliable infrastructure services as a product.
- 100% Remote Work
- Full-Time
- Employee
- Indonesia
Enhance infrastructure observability through monitoring, logging, and tracing. Implement best practices, maintain monitoring tools, and collaborate with development teams. Improve system reliability, operational transparency, and resource utilization.
- 100% Remote Work
- Full-Time
- Employee
- WA, OR, CA, NV, ID, UT, AZ, MT, WY, CO, NM
Create and maintain automated systems to ensure reliability and consistency of software delivery. Troubleshoot and resolve production errors, establish and maintain system uptime, and develop and implement standards and best practices for software de..
- 100% Remote Work
- Full-Time
- Employee
- 120,000.00 USD Annually
- US National
Build CI and CD pipelines. Optimize and scale workloads. Secure containers and web services. We like you to know Docker, Kubernetes, GCP, AWS, Go, Postgres, Redis, familiarity with JavaScript, excellent communication skills (English)...
- 100% Remote Work
- Full-Time
- Employee
- France
Design, develop, deploy, and maintain reliable and scalable infrastructure. Manage large Kubernetes clusters. Measure and optimize system performance. Provide primary operational support and engineering for multiple teams. 7+ years of experience in a...
- 100% Remote Work
- Full-Time
- Employee
- Argentina, Bolivia, Brazil, Chile, Colombia, Ecuador, Guyana, Paraguay, Peru, Suriname, Uruguay, Venezuela
Work with a team of DevOps/SRE and DBA professionals. Improve infrastructure and processes. Monitor and maintain cloud infrastructure. Aid in reconfiguring existing architecture for rapid deployments.Take ownership and responsibility for our cloud oper...
- 100% Remote Work
- Full-Time
- Employee
- Work from Anywhere
Apply IaC to develop infrastructure as code practice, automate software operations for private and public clouds, improve cloud and container portfolio, maintain core services, collaborate with development teams, troubleshoot, and provide assistance.
- 100% Remote Work
- Full-Time
- Employee
- India
Implement scalable and reliable systems, scale existing backend systems, collaborate with developers to set up tooling for CI/CD practices, build and operate infrastructure to support website and ML projects.
- 100% Remote Work
- Full-Time
- Employee
- US National
Seeking a full-time reliability engineer for a remote-based role. The incumbent must posses experience with Kubernetes and A.S. and have experience with Terraform or Cloudformation and in Linux and shell scripting. A security background is needed.
- 100% Remote Work
- Full-Time
- Employee
- United Kingdom
Always up-to-speed on the latest technologies. Constantly on the lookout for new and innovative ways to solve complex problems through rigorous experimentation. Open, transparent and direct communication style while working in tight collaboration with...
- 100% Remote Work
- Full-Time
- Employee
- US National
Install, upgrade and manage systems powering customer infrastructure running Circonus software. Communicate with management and customers regarding aberrant system's behavior. Participate in an on-call schedule.
- 100% Remote Work
- Full-Time
- Employee
- A range of 104,000.00 - 146,800.00 USD Annually
- US National
Create and manage reusable and flexible infrastructure solutions. Support production and lower environment systems. Troubleshoot infrastructure issues and analyze performance. Participate in feature planning and automate tasks for speed and consistency.
- 100% Remote Work
- Full-Time
- Freelance
- Argentina, Bolivia, Brazil, Chile, Colombia, Ecuador, Guyana, Paraguay, Peru, Suriname, Uruguay, Venezuela
Provide senior-level site reliability engineering services in LATAM, including incident response, infrastructure maintenance, monitoring, automation, scalability planning, and product roadmap influence. Requires 5+ years of site-reliability experienc..
- 100% Remote Work
- Full-Time
- Employee
- GA
Manage public cloud infrastructure, implement alarming and observability mechanisms, define backup strategy, deploy application metrics tooling, and write post-mortem documentation for production incidents. AWS, Docker, Kubernetes, and GitHub knowled..
- 100% Remote Work
- Full-Time
- Employee
- A range of 144,000.00 - 278,000.00 USD Annually
- Topeka, KS
Engage with teams to improve service delivery and reliability. Measure and monitor production systems for availability and system health. Drive teams towards better operational excellence and improve reliability, resilience, and observability.
- 100% Remote Work
- Full-Time
- Employee
- Bulgaria
Design, build, and maintain Linux-based infrastructure. Ensure reliability, availability, and performance of systems. Collaborate with development team to meet application needs. Monitor system performance and automate tasks for streamlined operations.
- 100% Remote Work
- Full-Time
- Employee
- A range of 85,000.00 - 95,000.00 USD Annually
- US National
Design and automate infrastructure, monitor and log systems, and write code. Collaborate with cross-functional teams to support cloud-based solutions and drive meaningful change. Full-time position with competitive salary and benefits.
- 100% Remote Work
- Full-Time
- Employee
- Canada, Mexico, Argentina, Bolivia, Brazil, Chile, Colombia, Ecuador, Guyana, Paraguay, Peru, Suriname, Uruguay, Venezuela, Australia, Bangladesh, China, Hong Kong, India, Indonesia, Japan, Malaysia, New Zealand, Pakistan, Philippines, Singapore, Sri Lanka, Taiwan, Thailand, Vietnam, or US National
Optimize, automate, and improve performance of the cloud environment. Develop solutions to enhance key performance indicators. Gather and analyze metrics for system optimization and fault resolution. Drive innovation and scalability of the platform.
- 100% Remote Work
- Full-Time
- Employee
- A range of 129,500.00 - 142,000.00 USD Annually
- US National
Design, develop and maintain self-service tools for streamlined development process, continuous integration pipelines, improved observability and cloud infrastructure provisioning. Create documentation and tutorials. Stay updated on latest software...
- 100% Remote Work
- Full-Time
- Employee
- United Kingdom
Be on-call to respond to incidents, handle production incidents and author postmortems, create and test system disaster recovery process, develop tools for engineering efficiency, advocate for GitOps methodology.
- 100% Remote Work
- Full-Time
- Employee
- A range of 175,000.00 - 205,000.00 USD Annually
- US National
Ensure availability and reliability of critical services, collaborate with engineering teams, drive incident management process, optimize systems and workflows, mentor engineers, and contribute to reliability engineering direction.
- Full-Time
- Employee
- US National
Option for telecommuting. Candidate will handle daily monitoring and maintenance of applications, deploy new releases across multiple SaaS customers, and help software engineering on core applications. Must have experience building systems in AWS.
- 100% Remote Work
- Full-Time
- Employee
- US National
Work with technologies like Kubernetes, Helm, Docker, AWS, Terraform, Datadog, Prometheus, Ansible, StrongDM, Python, Go, Ruby, GitLab, and GitLab CI. Engage with engineering community, address production and development issues, work cross-functional..
- 100% Remote Work
- Full-Time
- Employee
- Puerto Rico, or US National
Collaborate on the design, build, and maintenance of reliable and scalable infrastructure and software systems. Track error budgets against service level agreements and identify opportunities for improvement in terms of reliability.
- 100% Remote Work
- Full-Time
- Employee
- A range of 115,000.00 - 130,000.00 USD Annually
- US National
As a Site Reliability Engineer, build, manage, and deploy infrastructure as code for digital media delivery and supply chain management. Ensure reliable, scalable, and performant Kubernetes clusters and automate infrastructure provisioning and management.
- 100% Remote Work
- Full-Time
- Employee
- US National
Provide feedback to developers on how their products operate at scale; write code, submit bugs, and work with other teams within the company. Must have a bachelor's degree and experience with software development and Python. WFH.
- 100% Remote Work
- Full-Time
- Employee
- Miami, FL
Automate infrastructure, improve tooling, provide support for incidents and be part of on-call rotation. Troubleshoot system design and communicate effectively. Knowledge of cloud providers, containers, networking and security technologies.
- 100% Remote Work
- Full-Time
- Employee
- Milan, Italy
Manage and monitor installed systems and infrastructure both on premises and in the cloud. Implement security, backup, and redundancy strategies. Install, configure, test and maintain complex technical systems and architectures.
- 100% Remote Work
- Full-Time
- Employee
- Bulgaria
Collaborate with cross-functional teams to enhance system reliability, performance, and scalability. Implement and maintain automation processes to streamline operations. Drive security initiatives and adhere to best practices in cybersecurity.
- 100% Remote Work
- Full-Time
- Employee
- Hungary
Collaborate with teams to improve system reliability, performance, and scalability. Automate processes, ensure cybersecurity best practices, and analyze system performance for enhancements.
- 100% Remote Work
- Full-Time
- Employee
- US National
Define and implement best practices for site reliability, scalability, performance, and managing costs. Design and implement monitoring solutions to proactively identify issues and prevent disruptions. Work with development engineers to help troubles..
- 100% Remote Work
- Full-Time
- Employee
- A range of 115,000.00 - 135,000.00 USD Annually
- US National
Improve fault-tolerance and maintainability of code in proprietary data pipelines and trading systems. Diagnose and fix bugs in code. Lead complex deployments. Automate manual workflows.
- 100% Remote Work
- Full-Time
- Employee
- A range of 117,120.00 - 201,300.00 USD Annually
- US National (Not hiring in AK, ND, WY)
Own and manage Splunk Cloud in FedRAMP environments. Collaborate across the organization to deliver high-quality products. Lead teams of engineers in building a scalable cloud-based environment. Mentor and support new engineers to achieve their potential.
- 100% Remote Work
- Full-Time
- Employee
- A range of 65,000.00 - 115,000.00 USD Annually
- Boston, MA
Manage production and pre-production environments, security, change management, deployment, architecture, and tools. Analyze performance and ensure scalability and reliability of applications hosted in AWS. Automate deployment, monitoring, and incide..
- 100% Remote Work
- Full-Time
- Employee
- Seattle, WA
Develop healthcare software and applications to improve patient care, streamline healthcare workflows, and enhance healthcare processes. Collaborate with diverse functions, lead cross-functional teams, and utilize data and service architectures to enh..
- 100% Remote Work
- Full-Time
- Employee
- Seattle, WA
Engineer modern healthcare technologies to improve patient care, streamline workflows, and integrate healthcare applications. Collaborate with diverse functions to develop products and tools, and lead projects involving data analysis to optimize trea..
- 100% Remote Work
- Full-Time
- Employee
- Cyprus
Engage in and improve the entire service lifecycle, from design to optimization. Develop, test, and maintain tools to support developers in building better software. Enhance cloud infrastructure and internal developer platform.Fluent in English & Russian.
- 100% Remote Work
- Full-Time
- Employee
- US National
Investigate and identify issues in production infrastructure and application layer. Lead automation efforts to monitor and triage database performance issues. Contribute to release management and continuous delivery.
- 100% Remote Work
- Full-Time
- Employee
- US National
Match customer requirements to advanced capabilities in VNF/CNF/NFVi/NFVO/VNFM/VIM/MEC. Provide technical pre-sales consulting. Design and implement telco-grade open source multi-tenant private clouds and micro clouds. Investigate, report, and fix so..
- 100% Remote Work
- Full-Time
- Employee
- US National
Design, implement, and support public cloud solutions for clients. Provide technical guidance to junior team members and stakeholders on AWS-related matters. Research and develop new solutions and create documentation.
- 100% Remote Work
- Full-Time
- Employee
- A range of 65,000.00 - 115,000.00 USD Annually
- Columbus, OH
Manage production and pre-production environments, analyze performance, troubleshoot issues, and automate deployment and incident response processes. Work closely with cross-functional teams to ensure optimal system reliability and scalability.
- 100% Remote Work
- Full-Time
- Employee
- Dhaka, Bangladesh
Manage IT infrastructure, troubleshoot IT issues, upgrade and install hardware and software, maintain networks and servers, and optimize performance. Bachelor's degree and 2+ years of experience required.
- 100% Remote Work
- Full-Time
- Employee
- Hanoi, Vietnam
Manage IT infrastructure, upgrade and install hardware and software, troubleshoot IT issues, maintain networks and servers, act as a cloud system admin, automate alerting and monitoring system logs, implement security protocols, mentor IT department ..
- 100% Remote Work
- Full-Time
- Employee
- Bengaluru, India
Manage IT infrastructure, troubleshoot and resolve IT issues, maintain networks and servers, automate alerting and monitoring systems, implement security protocols, mentor IT department employees, and stay up to date with advancements in IT administr..
- Full-Time
- Employee
- A range of 110,300.00 - 190,300.00 USD Annually
- Denver, CO
Develop and maintain observability tools, monitor and respond to incidents, collaborate with engineering teams to plan and coordinate new observability requirements, educate and lead efforts to improve observability among all engineering teams.
- 100% Remote Work
- Employee
- Portland, OR
Design system architecture for application deployment on multi environments. Develop tools for monitoring web services and online products. Manage applications on cloud platforms. Improve security policy and process of web and cloud-based applications.
- 100% Remote Work
- Full-Time
- Employee
- A range of 125,269.00 - 179,597.00 USD Annually
- New York, NY, or US National
Guide observability, incident response, and postmortems. Implement incident response tools to minimize outages. Build collaborative monitoring solutions and improve existing systems for scale and performance.
- 100% Remote Work
- Full-Time
- Employee
- A range of 124,300.00 - 266,400.00 USD Annually
- Argentina, Bolivia, Brazil, Chile, Colombia, Ecuador, Guyana, Paraguay, Peru, Suriname, Uruguay, Venezuela, Canada, Mexico
Automate operational tasks, maintain warning and maintenance systems, develop monitoring and alerting systems, respond to emergencies, enhance security measures, act as Subject Matter Experts, collaborate with stakeholders, work on various projects.
- 100% Remote Work
- Full-Time
- Employee
- US National
Automate software operations for reusability and consistency across private and public clouds, taking into consideration the complexities of distributed systems. Python software development experience, with large projects.