SRE (Site Reliability Engineer) Ssr/Sr

Management and IT services provider
Site Reliability Engineer Semi Senior / Senior
Buenos Aires - Palermo

Aplica a esta búsqueda

Use my LinkedIn

Our client is hiring a SRE (Site Reliability Engineer) SSR/SR to join team.


  • Windows Server or Linux (RedHat and/or Debian based distributions) Administration. [required]
  • Experience with at least one of the following programming languages: Python, Go. [required]
  • Application monitoring, troubleshooting, log analysis, system metrics analysis. [required]
  • Strong understanding of networking concepts (switching and routing, OSI Protocol). [required]
  • Experience working with VCS systems such as Git. [required]
  • Handle second level real-time alerts.
  • Resolve high-impact incidents together with an incident response team.
  • Feel confident learning native scripting languages (bash, powershell) to implement solutions.
  • Experience coordinating resources to achieve service restoration aka Incident Management.
  • Basic knowledge of Cloud infrastructure (AWS, Azure).
  • Operating System Monitoring.
  • Read and interpret monitoring system graphics.
  • Knowledge about application servers such as Red Hat JBoss/WildFly.

It's a PLUS!:

  • Experience working with configuration management tools such as Puppet or Ansible (Preferred).
  • Experience in on-premise infrastructure management and cloud-based infrastructure.
  • Experience in tracking problems with ticketing systems. Jira service desk (Preferred).
  • Experience working with containerization software - Docker Engine.
  • English Certifications, driver's license, U.S. Visa or European Passport are a big plus

It's required:

  • Availability to travel at least 2 weeks to USA.
  • Advanced English (writing and speaking skills) is required to communicate with technical teams and customers. 
  • Availability to be on a passive on-call schedule.


You will be:

  • Working closely with a cross-functional team of SREs, DBAs, developers, and Engineers to ensure the reliability of the platform.
  • Participate on an Agile team, with daily scrum meetings, as well as planning and grooming meetings.
  • Developing your monitoring skills by using different monitoring tools.
  • Developing custom tools to automate processes as you see fit in order to reduce toil and increase engineering work.
  • Monitoring metrics for overall reliability of a distributed SaaS product.
  • Interacting with Cloud Services from Azure. Working mostly on Windows platforms. Working on some Linux platforms.
  • Troubleshooting over distributed systems and applications.

Location: Palermo.