Site Reliability Engineer (SRE) - EMEA

Blockdaemon · Dubai

Completely RemoteFull TimeMid Level, SeniorInformation Technology

Posted 22 months ago

This role is no longer accepting applications.

Job description

Description As a Site Reliability Engineer (SRE) based in EMEA, you will play a critical role supporting our Blockdaemon team by ensuring the reliability, scalability, and performance of our systems and services. You will collaborate closely with cross-functional teams to design, implement, and maintain robust and resilient infrastructure solutions. The ideal candidate is passionate about automation, possesses strong analytical skills, and thrives in a fast-paced, dynamic environment.

Responsibilities

System Architecture and Design: Collaborate with software engineering teams to design scalable, highly available, and resilient systems. Drive architectural improvements to enhance system reliability and performance.
Automation and Tooling: Develop automation tools and scripts to streamline deployment, monitoring, and incident response processes. Implement and maintain infrastructure as code frameworks.
Monitoring and Alerting: Configure and maintain monitoring systems to detect and mitigate potential issues proactively. Define alerting thresholds and response procedures to ensure timely incident resolution.
Incident Management: Respond to and resolve critical incidents, perform root cause analysis, and implement preventive measures to minimize the likelihood of recurrence. Participate in an on-call rotation to provide 24/7 support as needed.
Capacity Planning and Performance Optimization: Analyze system performance metrics, identify bottlenecks, and propose optimizations to improve resource utilization and efficiency.
Security and Compliance: Work closely with security teams to implement best practices for data protection, access control, and compliance with regulatory requirements. Conduct periodic security audits and vulnerability assessments.
Documentation and Knowledge Sharing: Document system configurations, procedures, and troubleshooting steps. Share knowledge and best practices with team members to foster a culture of continuous learning and improvement.

Requirements

Proven experience with cloud platform technologies (AWS, GCP, Azure, etc), Infrastructure-as-Code tooling (Terraform, Pulumi, etc), and CI/CD orchestration platforms (CircleCI, Github Actions, etc).
Proficiency in scripting and programming languages such as Python, Golang, or TypeScript.
Experience with container and scheduling technologies (Docker, Kubernetes) and microservices architecture.
Hands-on experience with monitoring tools like Prometheus, Grafana, ELK stack, etc.
Excellent problem-solving skills and the ability to independently troubleshoot complex issues.
Strong understanding of Linux/Unix systems administration and networking concepts.
Strong communication and collaboration skills, with the ability to work effectively in cross-functional teams.

Skills & tools

AWSTerraformPythonDockerPrometheus

What the team is looking for

Use this list as a quick fit check before you apply.

01cloud technologies
02Infrastructure-as-Code
03CI/CD
04scripting
05container technologies
06monitoring tools
07problem-solving
08Linux/Unix
09networking
10communication
11collaboration

Wake up to a shortlist, not a search results page.

NeverApply scores every new listing against your CV, salary floor and visa. A handful of real matches by morning.

Get your daily matches

Blockdaemon

Dubai

Applications closed

Job details

Work model: Completely Remote
Commitment: Full Time
Experience: Mid Level, Senior
Category: Information Technology
Posted: 22 months ago

Wake up to a shortlist, not a search results page.

NeverApply scores every new listing against your CV, salary floor and visa. A handful of real matches by morning.

Get your daily matches

Applications closed