Incident Response Manager

Posted 17 Hours Ago
Be an Early Applicant
Hiring Remotely in Seattle, WA
Remote
Senior level
Fintech • Payments
The Role
As an Incident Response Manager at Stripe, you will be responsible for driving incident response and management, determining impact, mitigating incidents, communicating with users, and orchestrating Root Cause Analysis. You will collaborate with global teams to ensure 24/7 incident coverage and work on scaling incident response capabilities and technical expertise in products.
Summary Generated by Built In

Who we areAbout Stripe

Stripe is a financial infrastructure platform for businesses. Millions of companies—from the world’s largest enterprises to the most ambitious startups—use Stripe to accept payments, grow their revenue, and accelerate new business opportunities. Our mission is to increase the GDP of the internet, and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyone’s reach while doing the most important work of your career.

About the team

The Incident Ops team is a global 24/7 team responsible for driving incident response and management from detection to resolution. Stripe is proud of its five 9s reliability and this team is at the forefront of ensuring we keep it that way - working hand-in-hand with Reliability Eng and across the Tech Org. This team of incident response managers (IRM) is defined by our sense of ownership and how we drive incidents to resolution - marshaling the necessary cross-functional resources to respond to and resolve service outages, critical bugs, security attacks and anything that significantly impacts the users of our products. The team is user-first and ensures appropriate external communications from Stripe and senior management to keep our users informed of disruption to their experience of Stripe. The team is skilled in communications, incident handling and technical adeptness as incidents can arise from anywhere and cut across products and orgs in Stripe.

What you’ll do

As an Incident Response Manager (IRM), you’ll play the key role in driving the right level of response from Stripes to incidents, determining impact, rallying Stripes to mitigate, communicating to users and ensuring appropriate remediations and orchestrate the Root Cause Analysis (RCA) process. You’ll work hand-in-hand with IRMs and engineers globally to ensure solid 24/7 coverage on how we monitor, detect, respond, communicate and mitigate incidents. When not managing incidents, you'll help scale our ability to respond to incidents, improve our operations, analyze data to provide insights and deepen our technical expertise in products. As a result, you’ll be seen as the protector of our users - in minimizing the impact of incidents on their business and ensuring that Stripe is always thinking of our users.

Responsibilities

  • Act as an on-call Incident Commander, responsible for driving and managing incident resolution with a high level of urgency, cross-functional collaboration, and accuracy, while partnering with a global and diverse set of teams, including Engineering, Product, Policy, Risks, PR, Legal, Execs, etc.
  • Lead all user-facing incidents across domains at Stripe - including reliability, technical, security, and data privacy
  • "User First" approach to determine impact, providing accurate situation reports, facilitating comms bridges, and ensuring useful and timely external communications to users
  • Proactively update internal stakeholders, make decisions through data and influence by partnering with Engineering, Sales, Support and other cross-functional teams
  • Contribute to the root cause analysis process while conducting post-mortems, remediations identification, and ensure problem management tasks meet SLA and user expectations
  • Drive improvements in the incident handling process and incident management metrics and tooling based on trends and data of Stripe's incidents in collaboration with engineering, product and operations teams
  • Collaborate closely with leadership for building team strategy based on the team vision
  • Collaborate and coach other Incident Response Managers on the team

Who you are

We’re looking for someone who meets the minimum requirements to be considered for the role. If you meet these requirements, you are encouraged to apply. The preferred qualifications are a bonus, not a requirement.

Minimum requirements

  • 5+ years of demonstrable major incident experience for organizations that run mission critical applications or always-on Saas environments.
  • Demonstrated ability to lead multiple incidents concurrently with authority and influence responders with agency and reasoning skills to resolve ambiguous problems and drive to root cause.
  • Strong full stack technical skills with development/support experience with cloud based technologies 
  • Demonstrated experience developing code and automation using Python, Ruby, JavaScript or shell scripting.
  • Solid understanding of infrastructure, including physical, virtual, and container-based compute platforms
  • Strong quantitative, and analytical skills in data manipulation using SQL, Splunk or other tools.
  • Excellent task management skills, must be detail-oriented with ability to remain composed, methodical, and think fast in a high-pressured environment.
  • Exceptional written and verbal English communication skills, with the ability to translate complex technical issues for internal and external stakeholders

Preferred qualifications

  •  Domain expertise in classes of incidents such as technical, privacy, security or crisis with a strong desire to continuously learn about Stripe's products, technical issues and systems.
  • Ability to review complex technical details regarding ongoing issues/events and convey the key details to senior stakeholders to facilitate real-time decision making.
  • Experience with broad user-facing communications (e.g. status pages, tweets) and/or targeted communications (e.g. direct emails, support ticket responses).
  • Familiarity operating or managing distributed architectures with the ability to correlate system behaviors based on known inter-dependencies.
  • Demonstrated experience with full stack development and support

Top Skills

Python
The Company
Seattle, WA
5,600 Employees
Hybrid Workplace
Year Founded: 2010

What We Do

Stripe is a technology company that builds economic infrastructure for the internet. Businesses of every size—from new startups to public companies—use our software to accept payments and manage their businesses online. Our mission is to increase the GDP of the internet.

Learn more at www.stripe.com/jobs.

Gallery

Gallery

Similar Companies Hiring

Hotel Engine Thumbnail
Travel • Software • Hospitality • Fintech • Consumer Web
US
500 Employees
Cash App Thumbnail
Software • Payments • Mobile • Fintech • Financial Services • Blockchain
Seattle, WA
3500 Employees
Opendoor Thumbnail
Software • Real Estate • PropTech • Fintech • eCommerce
Seattle, WA
1900 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account