Identify sources of instability in large-scale distributed systems and drive operational excellence
Analyze complex systems from a reliability and resilience perspective
Improve reliability and drive down the burden of toil with tooling and automation Implement and continually improve application and system monitoring.
Resolve complex technical issues as necessary
Use modern tools to streamline configuration management
Diagnose complex system performance problems using dumps, traces, or other diagnostics aids
Third party Integrations
Implement and continually improve application and system monitoring.
Participate in on-call rotations
At least 2 years work experience in enterprise application operation team.
Advanced database knowledge (MySQL Preferred).
Intermediate programming knowledge (PHP preferred).
Familiar with Linux.
Strong troubleshooting skills will be a plus.
Familiar with CI/CD principles.
Knowledge of Source control tools.
Having Experience in Monitoring tools like Grafana and Zabbix
Intermediate knowledge of at least one high-level scripting language
Good written and verbal communication skills in English
University Degree in Computer Science, Computer Engineering or another relevant field.
Good interpersonal communication and presentation skills.
Ability to be a team player.
Ability to work effectively in multiple cultures and at a range of levels.
Ability to constantly build up skill set using a mix of self-motivated and course-based learning environment.
Ability to work independently, proactively to see the big picture and work through solutions as needed.