Since 2010, Playtika has been a pioneer in the games industry. We were among the first to offer free-to-play social games on social networks and, shortly after, on mobile platforms. We were also one of the originators of live game operations, offering users personalized, daily game experiences with new events and exciting new features 365 days a year.
3 вересня 2025

Site Reliability engineer (вакансія неактивна)

Київ, Вінниця

Responsibilities:

  • Maintain and improve existing monitoring configurations (alerts, dashboards, service discovery, scrape configs, etc.)
  • Implement and enhance alerting logic, including threshold tuning and dynamic alert conditions
  • Troubleshoot monitoring and metrics-related issues (e.g., missing data, false alerts, broken dashboards)
  • Support and improve self-developed metrics collectors and Python-based monitoring services
  • Assist NOC and SRE teams with alert deduplication, escalation rules, and alert quality improvements
  • Participate in design and implementation of observability improvements for new services and infrastructure components
  • Review, modify, and extend existing scripts and plugins (primarily Python and Bash)
  • Provide monitoring-related guidance to development, infrastructure, and operations teams
  • Ensure monitoring tools and services operate reliably within Kubernetes clusters and Linux systems
  • Maintain monitoring configuration in Git and follow internal version control best practices
  • Participate in cross-team initiatives to improve the overall monitoring and incident response ecosystem

Requirements:

  • Strong hands-on experience with Linux systems (primarily Ubuntu)
  • Practical knowledge of Prometheus ecosystem, VictoriaMetrics, Grafana, and Zabbix
  • Experience supporting monitoring systems in Kubernetes-based infrastructure
  • Solid scripting skills (Bash)
  • Familiarity with Git and common version control workflows
  • Good understanding of networking and infrastructure concepts (ports, protocols, DNS, etc.)
  • Ability to troubleshoot metric collection, alert firing, and data visualization issues
  • Basic knowledge of SQL (e.g., for querying time-series or metadata stores)
  • Strong communication skills for cross-functional collaboration

Nice to have:

  • Understanding of high-availability and failover patterns in observability systems
  • Experience working with SLO/SLA-based alerting or anomaly detection mechanisms
  • Exposure to automation and CI/CD pipelines for monitoring infrastructure