The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

IBM Site Reliability Engineer
United States, California, San Jose
562899057

06.05.2024

Your Role and Responsibilities
Automation: Develop and maintain automation tools and scripts to streamline deployment, monitoring, and management of the infrastructure and
or services.Documentation: Maintain up-to-date documentation for the infrastructure, processes, and procedures.
Collaboration: Work closely with development teams, product managers, and other stakeholders to understand requirements and ensure the reliability of the platform.
Continuous Improvement: Participate in post-incident reviews, retrospectives, and other forums to identify areas for improvement and drive continuous improvement initiatives.

Required Technical and Professional Expertise

Experience with Cloud Platforms: Strong experience with cloud platforms such as AWS, Azure, or Google Cloud Platform, including expertise in
Deploying and managing services in these environments.
Managing, and troubleshooting containerized applications.
Automation and Scripting: Strong scripting skills (e.g., Python, Bash) and experience with configuration management tools (e.g., Ansible, Chef, Puppet) to automate deployment and management tasks.

Troubleshooting and Problem Solving: Strong troubleshooting skills and the ability to quickly identify and resolve complex issues in a production environment, including experience with incident response and post-incident analysis.

Preferred Technical and Professional Expertise

DevOps Culture: Experience working in a DevOps culture and mindset, including a strong understanding of the collaboration between development and operations teams to achieve business goals.
Container Orchestration: Proficiency in container orchestration tools such as Kubernetes and OpenShift, including experience in deploying,
Monitoring and Logging: Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack) to monitor the health and performanceof infrastructure and applications.
Experience with Scalable Architectures: Experience designing and implementing scalable architectures for cloud-based applications, including knowledge of best practices for scalability, performance, and reliability.
Experience with Monitoring and Observability: Experience with advanced monitoring and observability practices, including using tools such as Prometheus, Grafana, and Kubernetes-native monitoring solutions to gain insights into system performance and behavior.

These jobs might be a good fit

PayPal Site Reliability Engineer United States, Arizona, Scottsdale

Apple Site Reliability Engineer United States, California, Cupertino

JFrog Site Reliability Engineer United States, Nebraska

Oracle Site Reliability Engineer United States

Professional CV Builder tool from Expoint.

Get to the top of the "yes list" with a standout CV!

CREATE CV

IBM Site Reliability Engineer United States, California, San Jose 562899057

PayPal Site Reliability Engineer United States, Arizona, Scottsdale

Apple Site Reliability Engineer United States, California, Cupertino

JFrog Site Reliability Engineer United States, Nebraska

Oracle Site Reliability Engineer United States

IBM Site Reliability Engineer
United States, California, San Jose
562899057