Sr. DevOps Engineer (Disaster Recovery) Job

Job Overview

Toronto, Ontario
Job Type
Full Time
Salary / Compensation
Details Not Provided
Date Posted
1 year ago

Additional Details

Good Exp. Required (5 - 9 Years)

Job Description

Why you’ll love working here:

  • high-performance, people-focused culture

  • our commitment that equity, diversity and inclusion are fundamental to our work environment and business success, which helps employees feel valued and empowered to be their authentic selves

  • learning and development initiatives, including workshops, Speaker Series events and access to LinkedIn Learning, that support employees’ career growth

  • competitive, 100% company-paid extended health and dental benefits for permanent employees with recent additions to promote inclusive coverage to a diverse employee population. These recent additions include gender affirmation and fertility drug and treatment coverage

  • membership in HOOPP’s world class defined benefit pension plan, which can serve as an important part of your retirement security

  • access to an annual wellness reimbursement program for health and wellness-related expenses for permanent employees

  • virtual fitness, yoga and meditation classes, nutritional consultations and wellness seminars

  • we offer a hybrid flexible work model that embraces remote work in Ontario for eligible roles

  • the opportunity to make a difference and help take care of those who care for us, by providing a financially secure retirement for Ontario healthcare workers

What you will do:

  • Work as a member of an Agile cross-functional infrastructure team while collaborating with various customers and stakeholders from the business and IT teams.
  • Actively participate in Agile Scrum practices such as daily standups, backlog refinement, planning and sprint retrospectives.
  • Create a safe, supportive and participatory environment that produces ongoing mutual respect
  • Play a lead role in delivering backup and DR solutions on AWS for new and existing features supporting and collaborating on with our application developers/Business Analysts, other IT Infrastructure and Governance teams. This includes leadership and orchestration across multiple teams
  • Fully understands business requirements and uses best practices and knowledge of internal or external business issues to plan and improve systems and applications in terms of reliability, scalability, recoverability, resiliency, and supportability.
  • Lead and participate in disaster recovery planning, design, tests, write wiki articles; provide support during system outage/outbreak incidents and disaster scenarios.
  • Assess, adapt and evolve operational strategies and design new solutions for DR/BCM for application/process resilience.
  • Partner with DR/BCM leads in the IT4Enterprise and Governance & Risk teams.
  • Enhance operational incident response management and perform incident aligning with HOOPP’s IT standards and policy.
  • Educate and promote DR and business continuity process and best practices throughout the ISG, IT partners and business stakeholders and build good working relationships.
  • Interface and collaborate with internal business partners and vendors to gather requirements, design and implement solutions, manage technical operations, triage and fix operational issues
  • Support other development teams by helping their developers, learning about their needs, and educating them about platform/application resilience best practices.
  • Support and advocate cloud cost management
  • Enhance proactive monitoring and support of cloud infrastructure, backup and recovery process, services and network.
  • Design and implement automation solutions for backup and restore, disaster recovery in AWS cloud; develop CI & CD pipelines to maximize efficiency from Infrastructure as a Code (IaC) standpoint
  • Continuously evaluate existing systems and DR process with industry standards, and make recommendations for improvement and drive automation,
  • Share Knowledge/ provide cross-training within the team.
  • Participate in 24/7 on-call rotation for incident response and escalations.
  • Perform system administration analysis, troubleshoot and resolve complex production issues spanning multiple systems and technologies.
  • Work closely with IT and Development teams to answer technical questions or resolve issues within system administration, data protection and system resilience.
  • Work with internal HOOPP IT architecture, infrastructure, security, risk and governance teams to ensure that all system, governance, and disaster recovery and business continuity requirements are met.
  • Participate in cloud cost reduction and optimizations
  • Research and maintain up-to-date knowledge of current technology trends and best practices for cloud site reliability engineering.
  • Learn technical cloud infrastructure through operational support activities stack and our systems by applying changes to each of them.

Who you are:

  • 7+ years’ experience in IT Infrastructure Operations and/or Software Development in progressively more senior roles
  • College or University degree in the field of computer science and Information Technology
  • 5+years’ experience in setting up general application backup and recovery and operations (including VM-hosted applications, database backup/recovery)
  • Technical writing and diagramming skills
  • Possesses in-depth knowledge and expertise in designing, building, implementing, and maintaining complex and automated DR/BCM solutions on Cloud infrastructure.
  • Knowledge of service and hosting solutions such in a as private/public cloud using IaaS, PaaS and SaaS platforms and their integrations.
  • Knowledge of Disaster Recover technologies such as Data replication, CommVault, AWS Backups, Data Lifecycle Manager (DLM), Commvault, AWS Application Resilience Hub, AWS Elastic Disaster Recovery, CloudEndure, etc.
  • Knowledge of the various techniques for meeting Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) in the cloud, based on application criticality.
  • Knowledge Database consistency & duplication, MS SQL, Oracle and Cloud based SQL Azure SQL or AWS RDS SQL and Oracle engines.
  • Knowledge of configuration management and automation tools such as Ansible, Terraform, Puppet or Cloud based native services for provisioning and recovery of cloud workloads.
  • General understanding of cloud networking principles, vulnerability management controls, and identity services (Active Directory/Azure AD, Centrify/Ping, ADFS, etc.)
  • Experience and understand Health and System Monitoring (AWS Cloudwatch, Datadog, Splunk, etc. or similar)
  • Hands-on operations experience with cloud platforms or ability to learn within a short period of time (AWS)
  • Expertise in Windows and Linux operating systems
  • Experience in Windows PowerShell, Python, Bash or other mainstream scripting language
  • Strong analytical and troubleshooting abilities to resolve issues
  • Experience leading delivery on major initiatives.
  • Strong interpersonal, coordination and communication skills with an ability to take end-to-end ownership.
  • Highly motivated and passionate to learn with the ability to work in a cross functional team-based environment
  • Understanding virtualization and container technologies (Docker, Kubernetes, AWS EKS, etc.)
  • Strong knowledge of Agile practices such as Scrum or Kanban, MVP, etc.
  • Some or growing experience including DevOps Automation tools like Terraform, Microsoft Azure DevOps (Repos, Pipelines and Branching strategy) AWS services (CodeCommit, CodeDeploy, etc.); Cloud based server-less or microservice based architectures (AWS, Azure or GCP)
  • Hold relevant industry certifications (including, but not limited to one or more of AWS certifications)
  • AWS Solutions Architect Associate,
  • AWS Certified SysOps Administrator,
  • AWS Certified DevOps Engineer – Professional
  • AWS Certified Solutions Architect – Professional


This website uses cookies to ensure you get the best experience on our website. Cookie Policy