Amazon outage shows one bad day can break the internet
A recent Amazon Web Services failure shut down thousands of sites and left hundreds of millions of users offline. (CNS)
When Amazon Web Services went down this week, much of the internet went with it.
The outage began in northern Virginia at AWS’ largest data center, and ultimately impacted over 1,000 organizations and more than 100 million users worldwide. While most services returned to normal after the 15-hour long outage, the system failure highlights how a single error at a major tech firm can ripple across global internet operations.
“Day-to-day, people don’t even realize that there’s so many services that rely on a single entity’s infrastructure,” said Alan Liu, an assistant professor of computer science at the University of Maryland. “When this happens, the impact is so large.”
The outage affected both Amazon-owned organizations and companies that rely on its technological infrastructure.
“These companies paid computing costs to Amazon, like we pay utility fees to [Baltimore Gas and Electric] to get gas and electricity,” said Liu. “These companies are paying money to Amazon for their computer storage and network services.”
This isn’t the first outage of its kind. In 2021, Meta experienced a near six-hour global disruption that yielded similar consequences. And experts say that it probably won’t be the last.
“We all need to be more resilient and robust in the way we plan,” said David Mussington, a University of Maryland public policy professor of the practice. “Practicing incident recovery and response and protecting key data assets is the way forward.”
What happened?
The Monday outage occurred at US-East-1 in northern Virginia, the largest AWS data center in the United States. It impacted thousands of organizations globally, including leading airlines, media outlets, real-time communication platforms and home security systems.
AWS announced the outage at 12:11 AM PDT on Oct. 20, saying “we are investigating increased error rates and latencies for multiple AWS services in the US-EAST-1 Region.”
Amazon periodically updated their statements throughout the day, saying the disruption was caused by a Domain Name System malfunction, leading to database errors and cascading failures affecting networks worldwide. This is a system that translates user-friendly domain names, like “CNSMaryland.org,” into the numerical IP addresses that computers use to communicate with each other.
Sites including Delta Airlines, Netflix, Venmo, Snapchat, Duolingo, Canvas and many others experienced “increased error rates.” Affected entities experienced stalls, delayed communication and posting and, in some cases, complete shut downs.
AWS identified the issue and began restoring services by early afternoon on Oct. 20. While most services have now returned to normal, some may have experienced residual delays.
What are the consequences?
This outage caused significant financial losses for AWS and affected companies. Liu said the potential financial impact could surpass $100 billion.
When organizations that rely on digital website traffic experience a shutdown, their revenue stream is interrupted. Multiply that loss across the hundreds of companies affected, and the $100 billion figure becomes highly plausible.
“The issue is risk and recovery of loss of business continuity due to an infrastructure failure, and that is something you can mitigate through planning and appropriate contracting,” said Mussington.
According to Mussington, planninoutagg should include the establishment of secondary and tertiary providers outside of the tech market leaders as a contingency.
Will this happen again?
AWS has multiple industry competitors including Microsoft Azure, Google Cloud Platform and IBM Cloud. These companies that face identical vulnerabilities could have experienced similar outages, Mussington said.
“I think the major thing is usually human error,” Liu said. “Engineers are using some code that was not fully tested, or some code that is causing a configuration error.”
“It didn’t happen to them this time, but it could,” Mussington said.
AWS has the most extensive, reliable and secure global cloud infrastructure in the world, according to their website. Yet, they still experienced a large-scale outage.
This begs the question, how secure are the world’s most secure platforms?
“Amazon is the leading global player in this field,” Liu said. “But the leading player needs to take bigger responsibility on safeguarding the services that depend on it.”
___
Ruby Siefken writes for the Capital News Service.
The outage began in northern Virginia at AWS’ largest data center, and ultimately impacted over 1,000 organizations and more than 100 million users worldwide. While most services returned to normal after the 15-hour long outage, the system failure highlights how a single error at a major tech firm can ripple across global internet operations.
“Day-to-day, people don’t even realize that there’s so many services that rely on a single entity’s infrastructure,” said Alan Liu, an assistant professor of computer science at the University of Maryland. “When this happens, the impact is so large.”
The outage affected both Amazon-owned organizations and companies that rely on its technological infrastructure.
“These companies paid computing costs to Amazon, like we pay utility fees to [Baltimore Gas and Electric] to get gas and electricity,” said Liu. “These companies are paying money to Amazon for their computer storage and network services.”
This isn’t the first outage of its kind. In 2021, Meta experienced a near six-hour global disruption that yielded similar consequences. And experts say that it probably won’t be the last.
“We all need to be more resilient and robust in the way we plan,” said David Mussington, a University of Maryland public policy professor of the practice. “Practicing incident recovery and response and protecting key data assets is the way forward.”
What happened?
The Monday outage occurred at US-East-1 in northern Virginia, the largest AWS data center in the United States. It impacted thousands of organizations globally, including leading airlines, media outlets, real-time communication platforms and home security systems.
AWS announced the outage at 12:11 AM PDT on Oct. 20, saying “we are investigating increased error rates and latencies for multiple AWS services in the US-EAST-1 Region.”
Amazon periodically updated their statements throughout the day, saying the disruption was caused by a Domain Name System malfunction, leading to database errors and cascading failures affecting networks worldwide. This is a system that translates user-friendly domain names, like “CNSMaryland.org,” into the numerical IP addresses that computers use to communicate with each other.
Sites including Delta Airlines, Netflix, Venmo, Snapchat, Duolingo, Canvas and many others experienced “increased error rates.” Affected entities experienced stalls, delayed communication and posting and, in some cases, complete shut downs.
AWS identified the issue and began restoring services by early afternoon on Oct. 20. While most services have now returned to normal, some may have experienced residual delays.
What are the consequences?
This outage caused significant financial losses for AWS and affected companies. Liu said the potential financial impact could surpass $100 billion.
When organizations that rely on digital website traffic experience a shutdown, their revenue stream is interrupted. Multiply that loss across the hundreds of companies affected, and the $100 billion figure becomes highly plausible.
“The issue is risk and recovery of loss of business continuity due to an infrastructure failure, and that is something you can mitigate through planning and appropriate contracting,” said Mussington.
According to Mussington, planninoutagg should include the establishment of secondary and tertiary providers outside of the tech market leaders as a contingency.
Will this happen again?
AWS has multiple industry competitors including Microsoft Azure, Google Cloud Platform and IBM Cloud. These companies that face identical vulnerabilities could have experienced similar outages, Mussington said.
“I think the major thing is usually human error,” Liu said. “Engineers are using some code that was not fully tested, or some code that is causing a configuration error.”
“It didn’t happen to them this time, but it could,” Mussington said.
AWS has the most extensive, reliable and secure global cloud infrastructure in the world, according to their website. Yet, they still experienced a large-scale outage.
This begs the question, how secure are the world’s most secure platforms?
“Amazon is the leading global player in this field,” Liu said. “But the leading player needs to take bigger responsibility on safeguarding the services that depend on it.”
___
Ruby Siefken writes for the Capital News Service.
“Day-to-day, people don’t even realize that there’s so many services that rely on a single entity’s infrastructure,” said Alan Liu, an assistant professor of computer science at the University of Maryland. “When this happens, the impact is so large.”
The outage affected both Amazon-owned organizations and companies that rely on its technological infrastructure.
“These companies paid computing costs to Amazon, like we pay utility fees to [Baltimore Gas and Electric] to get gas and electricity,” said Liu. “These companies are paying money to Amazon for their computer storage and network services.”
This isn’t the first outage of its kind. In 2021, Meta experienced a near six-hour global disruption that yielded similar consequences. And experts say that it probably won’t be the last.
“We all need to be more resilient and robust in the way we plan,” said David Mussington, a University of Maryland public policy professor of the practice. “Practicing incident recovery and response and protecting key data assets is the way forward.”
What happened?
The Monday outage occurred at US-East-1 in northern Virginia, the largest AWS data center in the United States. It impacted thousands of organizations globally, including leading airlines, media outlets, real-time communication platforms and home security systems.
AWS announced the outage at 12:11 AM PDT on Oct. 20, saying “we are investigating increased error rates and latencies for multiple AWS services in the US-EAST-1 Region.”
Amazon periodically updated their statements throughout the day, saying the disruption was caused by a Domain Name System malfunction, leading to database errors and cascading failures affecting networks worldwide. This is a system that translates user-friendly domain names, like “CNSMaryland.org,” into the numerical IP addresses that computers use to communicate with each other.
Sites including Delta Airlines, Netflix, Venmo, Snapchat, Duolingo, Canvas and many others experienced “increased error rates.” Affected entities experienced stalls, delayed communication and posting and, in some cases, complete shut downs.
AWS identified the issue and began restoring services by early afternoon on Oct. 20. While most services have now returned to normal, some may have experienced residual delays.
What are the consequences?
This outage caused significant financial losses for AWS and affected companies. Liu said the potential financial impact could surpass $100 billion.
When organizations that rely on digital website traffic experience a shutdown, their revenue stream is interrupted. Multiply that loss across the hundreds of companies affected, and the $100 billion figure becomes highly plausible.
“The issue is risk and recovery of loss of business continuity due to an infrastructure failure, and that is something you can mitigate through planning and appropriate contracting,” said Mussington.
According to Mussington, planninoutagg should include the establishment of secondary and tertiary providers outside of the tech market leaders as a contingency.
Will this happen again?
AWS has multiple industry competitors including Microsoft Azure, Google Cloud Platform and IBM Cloud. These companies that face identical vulnerabilities could have experienced similar outages, Mussington said.
“I think the major thing is usually human error,” Liu said. “Engineers are using some code that was not fully tested, or some code that is causing a configuration error.”
“It didn’t happen to them this time, but it could,” Mussington said.
AWS has the most extensive, reliable and secure global cloud infrastructure in the world, according to their website. Yet, they still experienced a large-scale outage.
This begs the question, how secure are the world’s most secure platforms?
“Amazon is the leading global player in this field,” Liu said. “But the leading player needs to take bigger responsibility on safeguarding the services that depend on it.”
___
Ruby Siefken writes for the Capital News Service.
The outage affected both Amazon-owned organizations and companies that rely on its technological infrastructure.
“These companies paid computing costs to Amazon, like we pay utility fees to [Baltimore Gas and Electric] to get gas and electricity,” said Liu. “These companies are paying money to Amazon for their computer storage and network services.”
This isn’t the first outage of its kind. In 2021, Meta experienced a near six-hour global disruption that yielded similar consequences. And experts say that it probably won’t be the last.
“We all need to be more resilient and robust in the way we plan,” said David Mussington, a University of Maryland public policy professor of the practice. “Practicing incident recovery and response and protecting key data assets is the way forward.”
What happened?
The Monday outage occurred at US-East-1 in northern Virginia, the largest AWS data center in the United States. It impacted thousands of organizations globally, including leading airlines, media outlets, real-time communication platforms and home security systems.
AWS announced the outage at 12:11 AM PDT on Oct. 20, saying “we are investigating increased error rates and latencies for multiple AWS services in the US-EAST-1 Region.”
Amazon periodically updated their statements throughout the day, saying the disruption was caused by a Domain Name System malfunction, leading to database errors and cascading failures affecting networks worldwide. This is a system that translates user-friendly domain names, like “CNSMaryland.org,” into the numerical IP addresses that computers use to communicate with each other.
Sites including Delta Airlines, Netflix, Venmo, Snapchat, Duolingo, Canvas and many others experienced “increased error rates.” Affected entities experienced stalls, delayed communication and posting and, in some cases, complete shut downs.
AWS identified the issue and began restoring services by early afternoon on Oct. 20. While most services have now returned to normal, some may have experienced residual delays.
What are the consequences?
This outage caused significant financial losses for AWS and affected companies. Liu said the potential financial impact could surpass $100 billion.
When organizations that rely on digital website traffic experience a shutdown, their revenue stream is interrupted. Multiply that loss across the hundreds of companies affected, and the $100 billion figure becomes highly plausible.
“The issue is risk and recovery of loss of business continuity due to an infrastructure failure, and that is something you can mitigate through planning and appropriate contracting,” said Mussington.
According to Mussington, planninoutagg should include the establishment of secondary and tertiary providers outside of the tech market leaders as a contingency.
Will this happen again?
AWS has multiple industry competitors including Microsoft Azure, Google Cloud Platform and IBM Cloud. These companies that face identical vulnerabilities could have experienced similar outages, Mussington said.
“I think the major thing is usually human error,” Liu said. “Engineers are using some code that was not fully tested, or some code that is causing a configuration error.”
“It didn’t happen to them this time, but it could,” Mussington said.
AWS has the most extensive, reliable and secure global cloud infrastructure in the world, according to their website. Yet, they still experienced a large-scale outage.
This begs the question, how secure are the world’s most secure platforms?
“Amazon is the leading global player in this field,” Liu said. “But the leading player needs to take bigger responsibility on safeguarding the services that depend on it.”
___
Ruby Siefken writes for the Capital News Service.
“These companies paid computing costs to Amazon, like we pay utility fees to [Baltimore Gas and Electric] to get gas and electricity,” said Liu. “These companies are paying money to Amazon for their computer storage and network services.”
This isn’t the first outage of its kind. In 2021, Meta experienced a near six-hour global disruption that yielded similar consequences. And experts say that it probably won’t be the last.
“We all need to be more resilient and robust in the way we plan,” said David Mussington, a University of Maryland public policy professor of the practice. “Practicing incident recovery and response and protecting key data assets is the way forward.”
What happened?
The Monday outage occurred at US-East-1 in northern Virginia, the largest AWS data center in the United States. It impacted thousands of organizations globally, including leading airlines, media outlets, real-time communication platforms and home security systems.
AWS announced the outage at 12:11 AM PDT on Oct. 20, saying “we are investigating increased error rates and latencies for multiple AWS services in the US-EAST-1 Region.”
Amazon periodically updated their statements throughout the day, saying the disruption was caused by a Domain Name System malfunction, leading to database errors and cascading failures affecting networks worldwide. This is a system that translates user-friendly domain names, like “CNSMaryland.org,” into the numerical IP addresses that computers use to communicate with each other.
Sites including Delta Airlines, Netflix, Venmo, Snapchat, Duolingo, Canvas and many others experienced “increased error rates.” Affected entities experienced stalls, delayed communication and posting and, in some cases, complete shut downs.
AWS identified the issue and began restoring services by early afternoon on Oct. 20. While most services have now returned to normal, some may have experienced residual delays.
What are the consequences?
This outage caused significant financial losses for AWS and affected companies. Liu said the potential financial impact could surpass $100 billion.
When organizations that rely on digital website traffic experience a shutdown, their revenue stream is interrupted. Multiply that loss across the hundreds of companies affected, and the $100 billion figure becomes highly plausible.
“The issue is risk and recovery of loss of business continuity due to an infrastructure failure, and that is something you can mitigate through planning and appropriate contracting,” said Mussington.
According to Mussington, planninoutagg should include the establishment of secondary and tertiary providers outside of the tech market leaders as a contingency.
Will this happen again?
AWS has multiple industry competitors including Microsoft Azure, Google Cloud Platform and IBM Cloud. These companies that face identical vulnerabilities could have experienced similar outages, Mussington said.
“I think the major thing is usually human error,” Liu said. “Engineers are using some code that was not fully tested, or some code that is causing a configuration error.”
“It didn’t happen to them this time, but it could,” Mussington said.
AWS has the most extensive, reliable and secure global cloud infrastructure in the world, according to their website. Yet, they still experienced a large-scale outage.
This begs the question, how secure are the world’s most secure platforms?
“Amazon is the leading global player in this field,” Liu said. “But the leading player needs to take bigger responsibility on safeguarding the services that depend on it.”
___
Ruby Siefken writes for the Capital News Service.
This isn’t the first outage of its kind. In 2021, Meta experienced a near six-hour global disruption that yielded similar consequences. And experts say that it probably won’t be the last.
“We all need to be more resilient and robust in the way we plan,” said David Mussington, a University of Maryland public policy professor of the practice. “Practicing incident recovery and response and protecting key data assets is the way forward.”
What happened?
The Monday outage occurred at US-East-1 in northern Virginia, the largest AWS data center in the United States. It impacted thousands of organizations globally, including leading airlines, media outlets, real-time communication platforms and home security systems.
AWS announced the outage at 12:11 AM PDT on Oct. 20, saying “we are investigating increased error rates and latencies for multiple AWS services in the US-EAST-1 Region.”
Amazon periodically updated their statements throughout the day, saying the disruption was caused by a Domain Name System malfunction, leading to database errors and cascading failures affecting networks worldwide. This is a system that translates user-friendly domain names, like “CNSMaryland.org,” into the numerical IP addresses that computers use to communicate with each other.
Sites including Delta Airlines, Netflix, Venmo, Snapchat, Duolingo, Canvas and many others experienced “increased error rates.” Affected entities experienced stalls, delayed communication and posting and, in some cases, complete shut downs.
AWS identified the issue and began restoring services by early afternoon on Oct. 20. While most services have now returned to normal, some may have experienced residual delays.
What are the consequences?
This outage caused significant financial losses for AWS and affected companies. Liu said the potential financial impact could surpass $100 billion.
When organizations that rely on digital website traffic experience a shutdown, their revenue stream is interrupted. Multiply that loss across the hundreds of companies affected, and the $100 billion figure becomes highly plausible.
“The issue is risk and recovery of loss of business continuity due to an infrastructure failure, and that is something you can mitigate through planning and appropriate contracting,” said Mussington.
According to Mussington, planninoutagg should include the establishment of secondary and tertiary providers outside of the tech market leaders as a contingency.
Will this happen again?
AWS has multiple industry competitors including Microsoft Azure, Google Cloud Platform and IBM Cloud. These companies that face identical vulnerabilities could have experienced similar outages, Mussington said.
“I think the major thing is usually human error,” Liu said. “Engineers are using some code that was not fully tested, or some code that is causing a configuration error.”
“It didn’t happen to them this time, but it could,” Mussington said.
AWS has the most extensive, reliable and secure global cloud infrastructure in the world, according to their website. Yet, they still experienced a large-scale outage.
This begs the question, how secure are the world’s most secure platforms?
“Amazon is the leading global player in this field,” Liu said. “But the leading player needs to take bigger responsibility on safeguarding the services that depend on it.”
___
Ruby Siefken writes for the Capital News Service.
“We all need to be more resilient and robust in the way we plan,” said David Mussington, a University of Maryland public policy professor of the practice. “Practicing incident recovery and response and protecting key data assets is the way forward.”
What happened?
The Monday outage occurred at US-East-1 in northern Virginia, the largest AWS data center in the United States. It impacted thousands of organizations globally, including leading airlines, media outlets, real-time communication platforms and home security systems.
AWS announced the outage at 12:11 AM PDT on Oct. 20, saying “we are investigating increased error rates and latencies for multiple AWS services in the US-EAST-1 Region.”
Amazon periodically updated their statements throughout the day, saying the disruption was caused by a Domain Name System malfunction, leading to database errors and cascading failures affecting networks worldwide. This is a system that translates user-friendly domain names, like “CNSMaryland.org,” into the numerical IP addresses that computers use to communicate with each other.
Sites including Delta Airlines, Netflix, Venmo, Snapchat, Duolingo, Canvas and many others experienced “increased error rates.” Affected entities experienced stalls, delayed communication and posting and, in some cases, complete shut downs.
AWS identified the issue and began restoring services by early afternoon on Oct. 20. While most services have now returned to normal, some may have experienced residual delays.
What are the consequences?
This outage caused significant financial losses for AWS and affected companies. Liu said the potential financial impact could surpass $100 billion.
When organizations that rely on digital website traffic experience a shutdown, their revenue stream is interrupted. Multiply that loss across the hundreds of companies affected, and the $100 billion figure becomes highly plausible.
“The issue is risk and recovery of loss of business continuity due to an infrastructure failure, and that is something you can mitigate through planning and appropriate contracting,” said Mussington.
According to Mussington, planninoutagg should include the establishment of secondary and tertiary providers outside of the tech market leaders as a contingency.
Will this happen again?
AWS has multiple industry competitors including Microsoft Azure, Google Cloud Platform and IBM Cloud. These companies that face identical vulnerabilities could have experienced similar outages, Mussington said.
“I think the major thing is usually human error,” Liu said. “Engineers are using some code that was not fully tested, or some code that is causing a configuration error.”
“It didn’t happen to them this time, but it could,” Mussington said.
AWS has the most extensive, reliable and secure global cloud infrastructure in the world, according to their website. Yet, they still experienced a large-scale outage.
This begs the question, how secure are the world’s most secure platforms?
“Amazon is the leading global player in this field,” Liu said. “But the leading player needs to take bigger responsibility on safeguarding the services that depend on it.”
___
Ruby Siefken writes for the Capital News Service.
The Monday outage occurred at US-East-1 in northern Virginia, the largest AWS data center in the United States. It impacted thousands of organizations globally, including leading airlines, media outlets, real-time communication platforms and home security systems.
AWS announced the outage at 12:11 AM PDT on Oct. 20, saying “we are investigating increased error rates and latencies for multiple AWS services in the US-EAST-1 Region.”
Amazon periodically updated their statements throughout the day, saying the disruption was caused by a Domain Name System malfunction, leading to database errors and cascading failures affecting networks worldwide. This is a system that translates user-friendly domain names, like “CNSMaryland.org,” into the numerical IP addresses that computers use to communicate with each other.
Sites including Delta Airlines, Netflix, Venmo, Snapchat, Duolingo, Canvas and many others experienced “increased error rates.” Affected entities experienced stalls, delayed communication and posting and, in some cases, complete shut downs.
AWS identified the issue and began restoring services by early afternoon on Oct. 20. While most services have now returned to normal, some may have experienced residual delays.
What are the consequences?
This outage caused significant financial losses for AWS and affected companies. Liu said the potential financial impact could surpass $100 billion.
When organizations that rely on digital website traffic experience a shutdown, their revenue stream is interrupted. Multiply that loss across the hundreds of companies affected, and the $100 billion figure becomes highly plausible.
“The issue is risk and recovery of loss of business continuity due to an infrastructure failure, and that is something you can mitigate through planning and appropriate contracting,” said Mussington.
According to Mussington, planninoutagg should include the establishment of secondary and tertiary providers outside of the tech market leaders as a contingency.
Will this happen again?
AWS has multiple industry competitors including Microsoft Azure, Google Cloud Platform and IBM Cloud. These companies that face identical vulnerabilities could have experienced similar outages, Mussington said.
“I think the major thing is usually human error,” Liu said. “Engineers are using some code that was not fully tested, or some code that is causing a configuration error.”
“It didn’t happen to them this time, but it could,” Mussington said.
AWS has the most extensive, reliable and secure global cloud infrastructure in the world, according to their website. Yet, they still experienced a large-scale outage.
This begs the question, how secure are the world’s most secure platforms?
“Amazon is the leading global player in this field,” Liu said. “But the leading player needs to take bigger responsibility on safeguarding the services that depend on it.”
___
Ruby Siefken writes for the Capital News Service.