Site Reliability Engineer

Company: Hunter Technical Resources

At least 1 year(s)
Job Description

Senior Site Reliability Engineer

Job Overview:

The team devoted to providing automated solutions and services for our client to measure, evaluate and plan for visible, reliable application delivery. As a site reliability engineer, you will work as a member of software engineering teams to build and run large-scale, widely-distributed, fault-tolerant solutions. You will collaborate with an extremely talented and diverse infrastructure, operations, and development team to scale and evolve an existing platform. Our Engineering team handles billions of transactions each day in an extremely latency-sensitive environment. The tools and use-cases are diverse, and our challenge is to increase the development velocity by optimizing various parts of the delivery pipeline, while emphasizing reliability, uptime, capacity, and performance.

If you love to figure out how all the pieces are put together in a build environment, or if automation, and building tools to monitor and manage your applications sounds interesting to you, we want to talk to you.


  • At least one scripting language (Bash, Python, or similar)
  • Configuration management systems (Puppet, Ansible, and Docker knowledge preferred)
  • Distributed version control system experience (Git preferred)
  • Database operations at scale (MySQL, MongoDB, Dynamo, RDS)
  • Linux (CentOS/RHEL/Amazon Linux) system engineering expertise
  • Networking knowledge (AWS VPC experience is a plus)
  • High-availability approaches including load balancing, dynamic scaling, and capacity planning
  • Experience using metrics and monitoring to ensure customer SLA objectives are met
  • Experience operating Cloud Computing platforms (e.g. Amazon AWS, Google Compute, Azure) and their PaaS based components (Elastic Beanstalk, Cloudfront, S3, RDS, etc.)
  • Excellent written communication, problem solving, and process management skills
  • Desire to work in a fast paced, evolving, growing, and dynamic environment
  • Preferred experience:

  • Containerization platforms (Docker, Rancher, Kubernetes)
  • Agile development, testing, and deployment expertise
  • Experience in Java including Spring Boot
  • Maven, Gradle, and Jenkins
  • Experience with application telemetry tools such as InfluxDB, Prometheus, Grafana, Datadog, or New Relic
  • Experience with log aggregation and anomaly detection platforms such as Splunk, Sumologic, Graphite, CloudWatch, or ELK stack
  • Operating in a developer-empowered environment where software delivery teams deploy and monitor their applications throughout the application lifecycle
  • Big data platforms such as Cloudera, Vertica, Hadoop, Amazon Redshift, or Elastic MapReduce
  • Package management platforms such as npm, pip, Ruby gems, rpm, and others
