Author Topic: Mass worldwide IT outage affects airlines, media and banks  (Read 632 times)

Nearly Sane

  • Administrator
  • Hero Member
  • *****
  • Posts: 63199
Mass worldwide IT outage affects airlines, media and banks
« on: July 19, 2024, 08:14:25 AM »

Roses

  • Hero Member
  • *****
  • Posts: 7947
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #1 on: July 19, 2024, 08:31:41 AM »
I have just seen that on the BBC news channel.  :o
"At the going down of the sun and in the morning we will remember them."

splashscuba

  • Hero Member
  • *****
  • Posts: 1955
  • might be an atheist, I just don't believe in gods
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #2 on: July 19, 2024, 09:23:12 AM »
Looks like Crowdstrike is the issue
I have an infinite number of belief systems cos there are an infinite number of things I don't believe in.

I respect your right to believe whatever you want. I don't have to respect your beliefs.

Nearly Sane

  • Administrator
  • Hero Member
  • *****
  • Posts: 63199
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #3 on: July 19, 2024, 09:32:14 AM »
Looks like Crowdstrike is the issue
Yes, a company that really doesn't want to be in the news.


https://mashable.com/article/crowdstrike-microsoft-outage-windows-blue-screen-explained

Sebastian Toe

  • Hero Member
  • *****
  • Posts: 7684
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #4 on: July 19, 2024, 11:36:38 AM »
Hope they have a robust blackout/recovery process!
"The word God is for me nothing more than the expression and product of human weaknesses, the Bible a collection of honourable, but still primitive legends.'
Albert Einstein

Nearly Sane

  • Administrator
  • Hero Member
  • *****
  • Posts: 63199
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #5 on: July 19, 2024, 12:10:26 PM »
Hope they have a robust blackout/recovery process!
They are part of the infrastructure in many large companies, the recovery process is also about those processes.

Sebastian Toe

  • Hero Member
  • *****
  • Posts: 7684
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #6 on: July 19, 2024, 01:09:26 PM »
By "they" I meant all of those as well!
"The word God is for me nothing more than the expression and product of human weaknesses, the Bible a collection of honourable, but still primitive legends.'
Albert Einstein

Nearly Sane

  • Administrator
  • Hero Member
  • *****
  • Posts: 63199
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #7 on: July 19, 2024, 01:15:55 PM »
When it was mentioned earlier this morning that the govt was working on the support for this, I have to note that I felt relieved it was this one and not the last one.

splashscuba

  • Hero Member
  • *****
  • Posts: 1955
  • might be an atheist, I just don't believe in gods
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #8 on: July 20, 2024, 08:44:12 AM »
When it was mentioned earlier this morning that the govt was working on the support for this, I have to note that I felt relieved it was this one and not the last one.
Not sure what the support government could have offered here. It'll be down to Crowdstrike to push an update and individual IT departments etc to recover servers and PCs. Small businesses would use whatever support Crowdstrike offers or Google/Youtube to resolve.
I have an infinite number of belief systems cos there are an infinite number of things I don't believe in.

I respect your right to believe whatever you want. I don't have to respect your beliefs.

Nearly Sane

  • Administrator
  • Hero Member
  • *****
  • Posts: 63199
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #9 on: July 20, 2024, 10:23:06 AM »
Not sure what the support government could have offered here. It'll be down to Crowdstrike to push an update and individual IT departments etc to recover servers and PCs. Small businesses would use whatever support Crowdstrike offers or Google/Youtube to resolve.
There's the impact on govt debts and agencies, e.g. NHS, and communication about the problems, and areas like banking payments where impacts could involve govt decisions.

Stranger

  • Hero Member
  • *****
  • Posts: 8236
  • Lightly seared on the reality grill.
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #10 on: July 20, 2024, 03:09:10 PM »
I was actually a little surprised at how many relatively simple devices seem to be using Windows at all. I mean, it's a massive overkill for a point of sale terminal, for example, and unnecessary complexity leads to unreliability.

About a decade ago, I know that one company (probably shouldn't mention which) was busy porting a prototype self-driving car system (obviously far more complicated than a POS terminal) from Windows to Linux, so it could eventually be cut down further.

I suppose standard Windows is cheaper to develop on, and computing power is relatively cheap too (compared to then), but we've now seen the consequences...
x(∅ ∈ x ∧ ∀y(yxy ∪ {y} ∈ x))

Roses

  • Hero Member
  • *****
  • Posts: 7947
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #11 on: July 21, 2024, 12:19:55 PM »
Have any of you been to the supermarket since this problem began, and were you able to pay by card?

I do my weekly Tesco supermarket shopping at about 7am on a Monday. When I go tomorrow I hope I can use my debit card.
"At the going down of the sun and in the morning we will remember them."

Stranger

  • Hero Member
  • *****
  • Posts: 8236
  • Lightly seared on the reality grill.
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #12 on: July 21, 2024, 12:37:40 PM »
I do my weekly Tesco supermarket shopping at about 7am on a Monday. When I go tomorrow I hope I can use my debit card.

According to the BBC, Tesco was operating normally on Friday, so clearly weren't affected much, if at all. Some other supermarkets had problems, but it looks like all the bigger ones have recovered.
x(∅ ∈ x ∧ ∀y(yxy ∪ {y} ∈ x))

Roses

  • Hero Member
  • *****
  • Posts: 7947
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #13 on: July 21, 2024, 01:47:27 PM »
According to the BBC, Tesco was operating normally on Friday, so clearly weren't affected much, if at all. Some other supermarkets had problems, but it looks like all the bigger ones have recovered.

Thanks, Stranger.

"At the going down of the sun and in the morning we will remember them."

jeremyp

  • Admin Support
  • Hero Member
  • *****
  • Posts: 32012
  • Blurb
    • Sincere Flattery: A blog about computing
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #14 on: July 21, 2024, 02:29:45 PM »
I was actually a little surprised at how many relatively simple devices seem to be using Windows at all. I mean, it's a massive overkill for a point of sale terminal, for example, and unnecessary complexity leads to unreliability.

About a decade ago, I know that one company (probably shouldn't mention which) was busy porting a prototype self-driving car system (obviously far more complicated than a POS terminal) from Windows to Linux, so it could eventually be cut down further.

I suppose standard Windows is cheaper to develop on, and computing power is relatively cheap too (compared to then), but we've now seen the consequences...

Windows is actually pretty reliable these days. In fact, a lot of its poor reputation stems from the old Windows 95 line which wasn't very stable or or buggy third party drivers. The question is (as long as you have powerful enough hardware) do you target an operating system used by billions with the resources of Microsoft behind it or the offering of some smaller company that can't respond to bugs as quickly.
This post and all of JeremyP's posts words certified 100% divinely inspired* -- signed God.
*Platinum infallibility package, terms and conditions may apply

Stranger

  • Hero Member
  • *****
  • Posts: 8236
  • Lightly seared on the reality grill.
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #15 on: July 21, 2024, 04:14:49 PM »
Windows is actually pretty reliable these days. In fact, a lot of its poor reputation stems from the old Windows 95 line which wasn't very stable or or buggy third party drivers. The question is (as long as you have powerful enough hardware) do you target an operating system used by billions with the resources of Microsoft behind it or the offering of some smaller company that can't respond to bugs as quickly.

Yeah, I'm well aware of how much better it is now than it was. Hell, I've been using PCs since they ran MS-DOS. Before that, the first time I had a computer on my desk at work, it ran CP/M.

I don't really disagree, it's just that it's still primarily designed as a desktop OS, and even my brand new Windows 11 laptop had a 'Blue Screen of Death' while I was setting it up, and I wasn't even doing anything non-standard on it (apart from tweaking the disk encryption in a way documented my Microsoft themselves, and it wasn't at that stage that it happened). It recovered with a single reboot, but nevertheless...

I'm also aware of how you can make software very much more reliable using the methodologies and practices used in the nuclear, aerospace, and (to an extent) automotive industries. That is, where software is safety critical and a bug can literally, directly kill people.

I'm not suggesting that it would be practical to develop every POS terminal to those standards, but we probably need to think more carefully about how things are developed and tested, depending on how critical the relevant systems are.

Apparently there is something that used to be called 'Embedded Windows' and is now 'Windows for IoT' that's designed for single dedicated devices, not sure if that was affected by this problem though. There is also a MS version called 'Azure Sphere', which is based on the Linux kernel. Again, not sure if this would be vulnerable.

It's also the case that Microsoft themselves sometimes completely fuck things up and don't offer proper solutions (as long as it's only a few customers who are affected). They did one update not so long ago (KB5034441) that changed the way the recovery partition was used, which meant that some machines didn't have enough space in their WinRE partition to install it (including my desktop), with only the helpful message "Error 0x80070643". All they offered was a powershell script (with hardly any instructions) or manual instructions to fix it using the command prompt. How many users even know these command line interfaces exist?

Using a big corporation isn't a guarantee of reliability.

Anyway, more of a few musings and a bit of a rant. As you were...  :)
x(∅ ∈ x ∧ ∀y(yxy ∪ {y} ∈ x))

Nearly Sane

  • Administrator
  • Hero Member
  • *****
  • Posts: 63199
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #16 on: July 21, 2024, 05:06:49 PM »
The point here is surely that given the spread of Crowdstrike's impact then coexistence testing is enormously important?
« Last Edit: July 21, 2024, 05:28:58 PM by Nearly Sane »

Stranger

  • Hero Member
  • *****
  • Posts: 8236
  • Lightly seared on the reality grill.
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #17 on: July 21, 2024, 07:10:19 PM »
The point here is surely that given the spread of Crowdstrike's impact then coexistence testing is enormously important?

Yes, of course, but I imagine that is already part of their normal procedures. This has all the hallmarks of a massive cock-up that meant that, either the wrong build was released, that hadn't gone through the normal testing process, or some last minute 'minor' change was made that had, err... unintended consequences.
x(∅ ∈ x ∧ ∀y(yxy ∪ {y} ∈ x))

splashscuba

  • Hero Member
  • *****
  • Posts: 1955
  • might be an atheist, I just don't believe in gods
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #18 on: July 22, 2024, 12:45:20 PM »
Yes, of course, but I imagine that is already part of their normal procedures. This has all the hallmarks of a massive cock-up that meant that, either the wrong build was released, that hadn't gone through the normal testing process, or some last minute 'minor' change was made that had, err... unintended consequences.
It was a bit more nuanced than that. They wrote their code as a Kernal driver that was fully certified and tested, which it has to be to be allowed in the Windows Kernal (Zone 0) as you can imagine. What Cloudstrike did was circumvent this certification process for code updates by including code changes in a config file that the unchanged driver picked up. Apparently a config file with the wrong or no data was deployed which caused a kernal addressing issue by the driver.
Cloudstrike was using this loophole to circumvent a process intended to protect Zone 0 (kernal).
I imagine there will be some comeback and rolling back of this method of deploying code.
Deleting this config file in Safe Mode fixes the problem but obviously needs to be physically done and not remotely.
I have an infinite number of belief systems cos there are an infinite number of things I don't believe in.

I respect your right to believe whatever you want. I don't have to respect your beliefs.

jeremyp

  • Admin Support
  • Hero Member
  • *****
  • Posts: 32012
  • Blurb
    • Sincere Flattery: A blog about computing
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #19 on: July 22, 2024, 01:39:01 PM »
It was a bit more nuanced than that. They wrote their code as a Kernal driver that was fully certified and tested, which it has to be to be allowed in the Windows Kernal (Zone 0) as you can imagine. What Cloudstrike did was circumvent this certification process for code updates by including code changes in a config file that the unchanged driver picked up. Apparently a config file with the wrong or no data was deployed which caused a kernal addressing issue by the driver.
Cloudstrike was using this loophole to circumvent a process intended to protect Zone 0 (kernal).
I imagine there will be some comeback and rolling back of this method of deploying code.
Deleting this config file in Safe Mode fixes the problem but obviously needs to be physically done and not remotely.

On a slight tangent, your mis-spelling of "kernel" reminds me that Commodore once did the same thing with the Commodore 64 and it stuck.Fortunately, I don't think Crowdstrike is targeted at the C64.
This post and all of JeremyP's posts words certified 100% divinely inspired* -- signed God.
*Platinum infallibility package, terms and conditions may apply

splashscuba

  • Hero Member
  • *****
  • Posts: 1955
  • might be an atheist, I just don't believe in gods
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #20 on: July 22, 2024, 01:45:28 PM »
On a slight tangent, your mis-spelling of "kernel" reminds me that Commodore once did the same thing with the Commodore 64 and it stuck.Fortunately, I don't think Crowdstrike is targeted at the C64.
I had a C16. Was my first 'PC'
I have an infinite number of belief systems cos there are an infinite number of things I don't believe in.

I respect your right to believe whatever you want. I don't have to respect your beliefs.

Stranger

  • Hero Member
  • *****
  • Posts: 8236
  • Lightly seared on the reality grill.
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #21 on: July 22, 2024, 02:51:53 PM »
It was a bit more nuanced than that. They wrote their code as a Kernal driver that was fully certified and tested, which it has to be to be allowed in the Windows Kernal (Zone 0) as you can imagine. What Cloudstrike did was circumvent this certification process for code updates by including code changes in a config file that the unchanged driver picked up. Apparently a config file with the wrong or no data was deployed which caused a kernal addressing issue by the driver.
Cloudstrike was using this loophole to circumvent a process intended to protect Zone 0 (kernal).
I imagine there will be some comeback and rolling back of this method of deploying code.
Deleting this config file in Safe Mode fixes the problem but obviously needs to be physically done and not remotely.

Thanks for the extra detail, but it doesn't really change the point that the updated config file should have been tested with the existing code. I would be amazed it that wasn't supposed to have been done according to their normal release procedure.
x(∅ ∈ x ∧ ∀y(yxy ∪ {y} ∈ x))

jeremyp

  • Admin Support
  • Hero Member
  • *****
  • Posts: 32012
  • Blurb
    • Sincere Flattery: A blog about computing
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #22 on: July 23, 2024, 11:24:32 AM »
It was a bit more nuanced than that. They wrote their code as a Kernal driver that was fully certified and tested, which it has to be to be allowed in the Windows Kernal (Zone 0) as you can imagine. What Cloudstrike did was circumvent this certification process for code updates by including code changes in a config file that the unchanged driver picked up. Apparently a config file with the wrong or no data was deployed which caused a kernal addressing issue by the driver.
Cloudstrike was using this loophole to circumvent a process intended to protect Zone 0 (kernal).
I imagine there will be some comeback and rolling back of this method of deploying code.
Deleting this config file in Safe Mode fixes the problem but obviously needs to be physically done and not remotely.

This is a slight mischaracterisation of what happened. See this technical note from Crowdstrike:

https://www.crowdstrike.com/blog/falcon-update-for-windows-hosts-technical-details/

The config file is really analogous to the virus definition files that AV software uses. Their use is not a "loophole" but a necessary feature to enable the vendor to keep pace with all the new methods of attack that are being discovered daily.

This post and all of JeremyP's posts words certified 100% divinely inspired* -- signed God.
*Platinum infallibility package, terms and conditions may apply

Nearly Sane

  • Administrator
  • Hero Member
  • *****
  • Posts: 63199

jeremyp

  • Admin Support
  • Hero Member
  • *****
  • Posts: 32012
  • Blurb
    • Sincere Flattery: A blog about computing
Re: Mass worldwide IT outage affects airlines, media and banks
« Reply #24 on: July 25, 2024, 01:55:29 PM »
Not great at PR

https://www.bbc.co.uk/news/articles/ce58p0048r0o

Apparently the reason why many people found it didn't work is that Uber Eats detected the flurry of redemptions as unusual activity and ironically assumed it was a cyber attack.

And yes, if you've worked a 24 hour shift rebooting all your Windows servers, a $10 voucher as recompense doesn't really cut it.
This post and all of JeremyP's posts words certified 100% divinely inspired* -- signed God.
*Platinum infallibility package, terms and conditions may apply