SRE (Site Reliability Engineer)

Management and IT services provider

Remote

07-09-2020

Semi Senior / Senior

Our client is hiring an SRE to join our Site Operations team. You will work as an individual contributor delivering exceptional customer experiences and leading the implementation of critical features to accelerate the delivery of our client’s products – both on the web and mobile.

You will have daily exposure to a range of technical challenges and customer experiences which will stretch and develop your technical and leadership skills.

So, if you are a motivated SRE looking to join a collaborative, close-knit team, drop us a line. You might be a perfect fit!

Tech skills/knowledge:

Windows Server or Linux (RedHat and/or Debian based distributions) Administration. [required]
Experience with at least one of the following programming languages: Python, Go. [required]
Application monitoring, troubleshooting, log analysis, system metrics analysis. [required]
Strong understanding of networking concepts (switching and routing, OSI Protocol). [required]
Experience working with VCS systems such as Git. [required]
Handle second level real-time alerts.
Resolve high-impact incidents together with an incident response team.
Feel confident learning native scripting languages (bash, powershell) to implement solutions.
Experience coordinating resources to achieve service restoration aka Incident Management.
Basic knowledge of Cloud infrastructure (AWS, Azure).
Operating System Monitoring.
Read and interpret monitoring system graphics.
Knowledge about application servers such as RedHat's JBoss/WildFly.

It's required:

Availability to travel at least 2 weeks to USA.
Advanced English (writing and speaking skills) is required to communicate with technical teams and customers. Availability to be on a passive on-call schedule.
Availability to work from Monday to Friday, and National holidays.

It's a PLUS!:

Experience working with configuration management tools such as Puppet or Ansible (Preferred).
Experience in on-premise infrastructure management and cloud-based infrastructure.
Experience in tracking problems with ticketing systems. Jira service desk (Preferred).
Experience working with containerization software - Docker Engine.
English Certifications, driver's license, U.S. Visa or European Passport are a big plus.

You will be:

Working closely with a cross-functional team of SREs, DBAs, developers, and Engineers to ensure the reliability of the platform.
Participate on an Agile team, with daily scrum meetings, as well as planning and grooming meetings.
Developing your monitoring skills by using different monitoring tools.
Developing custom tools to automate processes as you see fit in order to reduce toil and increase engineering work.
Monitoring metrics for overall reliability of a distributed SaaS product.
Interacting with Cloud Services from Azure. Working mostly on Windows platforms. Working on some Linux platforms.
Troubleshooting over distributed systems and applications.

English level: Upper intermediate

Time Shifts: 11 p.m to 7 a.m from Sunday to Thursday

Location: Remote

Apply now

Use my Linkedin

Tech Recruitment Experts

Know the proposals

SRE (Site Reliability Engineer)

Apply now