AWS Incident Management Specialist Job at JustinBradley, Reston, VA

bjVORVQzc2FJb05XZWh5dThmZVZZNGY3SWc9PQ==
  • JustinBradley
  • Reston, VA

Job Description

JustinBradley’s client, a leading source of mortgage financing, is seeking an AWS Incident Management Specialist to join their team and manage IT production incidents to resolution in a 24/7/365 environment using our client’s incident management processes. You will guide incident triage calls from a technical perspective, utilize monitoring tools and dashboards to aid troubleshooting, share technical insights, outline resolution activities, and drive improvements in incident management processes. You will also provide regular status updates to stakeholders, assist with postmortem activities, and support efforts related to operational enhancements and application maintenance in production.

Key Responsibilities:

  • Incident Management : Lead and manage IT production incidents to resolution using incident management processes. Communicate the incident status, impact, and resolution actions effectively to stakeholders. Participate in triage calls and manage incident response in a timely and accurate manner.
  • AWS Expertise : Utilize hands-on experience managing and monitoring AWS-based applications. Troubleshoot and resolve incidents related to AWS cloud infrastructure (EC2, ELB, RDS, Redshift, DynamoDB, Aurora, Route53, ECS, Lambda, S3, CloudWatch, WAF, etc.) in real-time.
  • Performance Engineering : Conduct performance engineering for AWS cloud applications. Utilize tools like Dynatrace and Splunk for transaction-level monitoring and troubleshooting. Leverage AWS tools and resources to analyze and resolve incidents promptly.
  • Monitoring Tools Management : Manage and monitor AWS cloud applications and underlying infrastructure using monitoring tools like Extrahop, SolarWinds, Netcool, Catchpoint, MoogSoft, and others. Analyze dashboards and monitoring data to identify trends and patterns in application performance and health.
  • Incident Triage & Resolution : Lead and guide technical incident triage calls, analyze various components of the infrastructure (AWS, UNIX, DNS, LDAP, SSL, etc.), and perform detailed root-cause analysis using wire data analytics, event correlation, and performance management tools.
  • Documentation & Postmortems : Assist with the creation of Root Cause Analysis (RCA) and Correction of Errors (COE) documentation. Participate in postmortem activities and recommend improvements to prevent future incidents. Ensure effective follow-up on items that could negatively impact production operations.
  • Process Improvement : Recommend and implement improvements to incident management processes. Provide recommendations on process changes, create reports, and respond to ad-hoc requests from senior management.
  • On-Call Support : Participate in an on-call rotation, working nights, weekends, and holidays as required to provide continuous support for incident management and resolution.
  • Stakeholder Communication : Report incident details and metrics to senior leadership. Effectively communicate complex technical issues to non-technical stakeholders.

Education & Experience:

  • Education : Bachelor’s Degree or equivalent required.
  • Experience : Minimum of 6 years of relevant experience managing IT incidents and troubleshooting in a cloud environment, particularly AWS.

Specialized Knowledge & Skills:

  • Extensive experience managing AWS cloud environments, including services like EC2, RDS, Lambda, DynamoDB, CloudWatch, and more.
  • Hands-on experience troubleshooting infrastructure and application incidents on AWS.
  • Experience with transaction-level monitoring using tools like Dynatrace and Splunk.
  • Expertise in analyzing various components of the application and infrastructure, including AWS, UNIX, LDAP, DNS, SSL, and databases (Oracle/MS SQL).
  • Proven ability to manage complex incidents and lead triage calls with cross-functional technical teams.
  • Strong communication skills, including the ability to convey technical details to non-technical stakeholders.
  • Ability to multi-task and perform well under pressure in high-stress situations.
  • Familiarity with monitoring and observability tools such as SolarWinds, Extrahop, MoogSoft, and Catchpoint.
  • AWS certifications (e.g., AWS Solution Architect – Associate or higher) preferred.

Preferred Qualifications:

  • Familiarity with tools like CloudFormation or Terraform.
  • Experience troubleshooting Middleware products in UNIX/Linux environments and knowledge of Service Oriented Architecture (SOA), Java, etc.
  • Exposure to other cloud platforms like Azure or Google Cloud.
  • Experience with OpenTel and monitoring dashboards for incident detection and alerting.

Work Environment:

  • 24/7/365 operational support environment.
  • Ability to work various shifts, including nights, weekends, and holidays, as required.

JustinBradley is an EO employer - Veterans/Disabled and other protected employees.

Job Tags

Holiday work, Shift work, Night shift,

Similar Jobs

Newly Launched HFT

Lead Options Trading C++ Developer Job at Newly Launched HFT

 ...evolving our strategies to Futures and ETFs. Job Description Our founding Portfolio Manager is looking for an experienced Low Latency C++ Developer to help build a greenfield trading platform, to execute our Options Trading strategies. We are not only reliant upon... 

MBS Professional Staffing

Japanese Translator Job at MBS Professional Staffing

 ...MBS is hiring a Japanese Translator in Cincinnati, OH. This position is a full-time, long-term contract opportunity. FULL JOB DESCRIPTION PRINCIPAL DUTIES & RESPONSIBILITIES Translate written materials from Japanese to English and vice versa. Ensure that... 

Aequor

Quality Control Microbiologist Job at Aequor

 ...for the role: o Bachelors degree, preferably in Chemistry or Microbiology, or equivalent education/experience o Microbiology, Biology, Biochemistry bachelor degree or similar science education o Minimum of two years relevant progressive experience in a Quality... 

Kinternational worldwide Travel

Travel Agent Job at Kinternational worldwide Travel

 ...We are looking for Remote (work from home) Travel Agents! Our number onegoal is to give our clients the satisfaction that they can go on vacation and not have to worry about anything. Letting them relax and enjoy themselves and letting us handle all of their traveling... 

Monarch Strategies

Entry Level Sales Representative Job at Monarch Strategies

 ...give up when faced with challenges. What You Need: Education High school diploma or equivalent. Recent grads welcome! Experience No sales experience? No problem! Well train you. Attitude A positive mindset and eagerness to learn. Ready to...