I'm gonna go and take issue with the claim that you were the first to come up with microarchitectural attacks, and that their story begins in 2004:
- Dan Page published [1] in 2002, describing an attack on DES exploiting cache timings. In 2003, he published [2] describing countermeasures to this class of attacks.
- Concurrently, a Japanese team also attacked DES (and MISTY1) with cache timing [3, 4].
- Dan Bernstein published the first version of his AES cache timing attack in late 2004 [5].
Thanks for writing this. I didn't like the opening paragraph either, and your comment is a better rebuttal than I could have written.
Colin, I think this is a good perspective on Spectre and Meltdown, but it feels like the entire opening paragraph was an unnecessary attempt at a "me too" that implies you're the modern father of these sorts of attacks when that isn't really the case.
I understand you're establishing credibility and expertise (and you have both), but it's disingenuous to begin with, The story of these attacks starts in late 2004, and then go on to describe your own work. I think the rest of your post has its own utility without writing a narrative of these attacks that inserts you at the beginning of them.
I spent a long time wondering about this. Honestly, I think I am (along with Osvik and Tromer) modern father of these sorts of attacks. (See my reply to pbsd for why.)
I think people in the field recognize my credibility and expertise here, and for them that paragraph is superfluous; but I was aiming this blog post at a wider audience (hence the effort spent on non-technical analogies to help them understand the issues) so I thought it was important to explain my background to people who had never heard of side channel attacks before.
If you're writing for a broader audience, then all the more reason to omit this history. It's like hearing about Darth Vader's childhood; people just want you to get to the good part.
I like to think of it this way: readers from a general audience have a certain mental budget for understanding that they bring to an article like this. Once they exhaust that budget, you've lost them. So it helps to take special care to spend it on your key points.
It also, if I can be frank, makes you look hungry for credit, which you'll get more of if you play it cool.
Fair enough. My personal writing style tends towards the narrative "start at the beginning and tell the story in chronological order" but I certainly understand the desire to "get to the good part".
If I ever write a blog post about scrypt and all the work which has come after it, maybe I'll skip the lengthy analysis of how Tarsnap customers inspired me to investigate the topic of password-based key derivation. :-)
I have no background in computer architecture security or security at all, and appreciated the background of how he saw his work fit into the current day. Especially so because cperciva's original paper was being widely shared on HN in the ensuing Spectre/Meltdown threads.
Another straightforward way to handle this is to bury the lede a little bit, and begin by briefly describing the previous cache work. That saves you from having to make an elaborate explanation of your specific claim on the history here. "Show, don't tell".
That's a much more defensible claim---one I have no issue with---but it was easy to misread the post and believe it was claiming something more general.
Thanks! It's getting late here (3 AM) so this might not be the best wording, but would it help if I added a note after the third paragraph along the lines of "Note that there have been previous side channel attacks which depended on how microarchitectural features (usually caches) affected code execution; but my work was the first to demonstrate information leaking from a program into the microarchitectural state and then being extracted from there." ?
(Feel free to suggest other wording too -- as I said, it's getting late here and my word-putting-together skills are currently subpar.)
That seems fine on its own, but it somewhat undermines the paragraph with "but they have all followed the same basic mechanism", seeing as these earlier attacks relying on caches etc did not follow this mechanism. Changing "all" to "most" or "many" would probably suffice.
I think Meltdown can be best characterised as being the same process that leaks and retrieves the information. The bug being that a process is able to leak data into the micro-architectural state that it isn't allowed to read directly.
Fair enough. The point remains, information is leaking into the microarchitectural state and then being extracted from there -- quite different from earlier attacks which simply exploited the fact that the microarchitecture resulted in certain operations during a cryptographic computation being faster or slower depending on the data being handled.
But to be honest, whether something was published 14 or 17 or 36 years ago... is kind of interesting, but only mildly so.
How about this question: was Intel aware of these publications?
If they were, then... what happened?
Or maybe they were not aware, or maybe someone was, but failed to communicate it to the right people, or maybe it was assigned JIRA x2399827348 and forgotten.
Intel was absolutely aware of microarchitectural side-channel attacks before Spectre and Meltdown; an Intel employee (Onur Aciicmez) published the first branch predictor side channel paper (at least that I'm aware of).
What would it mean for a corporation of this size to be aware of a highly specialized finding in the research literature? Intel has 100k+ employees, so it's quite likely that an Intel employee has read this paper, but what kind of effort and incentive does it take to successfully escalate this kind of thing to high enough that the ship turns?
edit: apparently I just restated the latter part of your post, nevermind :)
I don't know. What would it take for an HN commenter to continue reading an HN thread before making suppositions about what Intel was and wasn't keeping track of?
I already countered him on microarchitectural covert channels. Goes back further to VAX Security Kernel (1992) being designed to certify to A1-class which mandated covert-channel analysis that regular, "secure coders" didn't do. INFOSEC pioneers had already found them in software and hardware like disks. A person on that project reported them for cache timing as well. Aside from the Trojan model, they also described them back then as inherent design flaws where shared resources leak details about one computation to another. In other words, cpercival's model. They talked about Trojan model mostly because solving the threat of superset model, subversion, solved other one as side effect. Knocking out all backdoors and leaks was what culture of the time highlighted the most. I pointed it out plus some follow up commentary here:
I didn't respond to the later nonsense dismissals on HN and elsewhere about the Intel CPU Security submission since I had surgery shortly after that on an impacted tooth. Didn't want to be online all drugged up talking about these things. ;) Suffice it to say, high-assurance security had already found piles of risk areas for both penetration and side channels in Intel CPU's with some attempts at mitigation (including avoid Intel CPU's) by the mid-1990's. They encouraged Intel, purchasers, and security community to deal with it as part of routine work in improving security.
As usual, the mainstream security community just ignored everything they said when they bumped into each other. It's not like high-security folks stopped trying to tell them about prior successes and problems:
Then, a bright researcher independently discovered the side channels in caches later. They started reacting to that claim. They found some similar issues in other stuff looking narrowly. Now, we have another clever attack that started with a shared resource as would've been identified in the 1992 methodology that got stretched in really creative ways. It was still same root cause they ignored or justified for things like lowest price/performance versus alternatives doing it securely. Or just physical separation of different security domains which highest-security setups stuck with grudgingly.
My prediction in one of the Lobsters comments was that piles of comments would happen about this that didn't involve actually solving it (social gratification), more people would similarly write articles boasting their understanding to generate extra rep since talking problems in is rewarded in mainstream INFOSEC more than preventing/solving them, a few mitigations would show up that were narrowly-focused on just this new kind of problem (like happened with caches), they'd still ignore prior work in high-assurance like Kemmerer or Wray's analyses that found similar problems 20+ years before analyzing whole system, they'd mostly ignore the new work on information-flow analysis (some in link below), and we'd at best get some time until the next problem that could've been prevented by 1990's or recent methods since that's how mainstream security industry and culture works.
They're right on track since that's about all I saw while in recovery for a week. Endless articles using the buzzwords to their advantage plus people who don't know we could've beaten this in the 1990's because security professionals in industry suppress that knowledge for some reason. That needs to stop. At least they're rediscovering 60's-90's knowledge at an accelerated rate now.
Your argument here being that a mysterious group of old-school TCSEC people understood microarchitectural side channel attacks years ago, so much so that we could have eliminated them in the 1990s, but chose to say nothing about them --- until now.
My rebuttal would be that if these people had discovered cache timing attacks (and not covert channel issues that weren't directly relevant to the x86 server threat model, but are six-degrees-of-Kevin-Bacon away from cache timing), they would have discovered cache timing attacks. It's a little like those people who keep saying that if we only had length-delimited strings instead of ASCIIZ, we'd have no memory corruption.
"Your argument here being that a mysterious group of old-school TCSEC people understood microarchitectural side channel attacks years ago, so much so that we could have eliminated them in the 1990s, but chose to say nothing about them --- until now."
There you go misrepresenting and slandering folks again. What I've said since I joined HN is that a group of people in military, private companies, and academia did the following: invented INFOSEC, did assessments/pentests, and tried to build clean-slate systems that were secure; regularly published that stuff in security publications and/or conferences [1][2]; built products with those methods [3][4][5]; put it in computer security texts to teach next generation [6]; led others to do the same [7][8][9]. This large group of people doing tens of millions of dollars of research, development, education, and outreach over 40-50 year period was anything but a "mysterious group." They were a group you and a lot of other people were ignoring, dismissing, mischaracterizing, and so on. That's an entirely, different thing that says more about you and others that ignored them than the people working hard to secure our systems publishing how to do it for everyone's benefit for decades on end.
I find it interesting that many in INFOSEC ignore or trash talk their pioneers so much. These people invented the field, made the landmark contributions, and addressed many fundamental problems from early 60's to early 90's. Mainstream INFOSEC and hacker culture seems to start giving credit from about 90's on for just specific kinds of attack and defense publications in specific places. A few, rare exceptions exist but it's the general rule. That's like aeronautical engineers not recognizing Wright Brothers, computer scientists ignoring Turing, electrical engineers ignoring work in boolean logic, mathematicians ignoring algebra/calculus, and so on. While ignoring that, they claim to be concerned about the same things wanting solutions to those problems. Then, when pioneers' solutions are shown to them, they keep insisting there were no pioneers, no solutions, or claim they were ineffective without reading their work. Then, they give credit to some like themselves for re-solving a tiny part of the same problems. It's a disgraceful thing to do with no rational or moral justification, esp as computer security suffers for it. So, I'll continue calling it out for as long as I see any of you do it. "Mysterious group..." Perfect example...
"My rebuttal would be that if these people had discovered cache timing attacks (and not covert channel issues that weren't directly relevant to the x86 server threat model, but are six-degrees-of-Kevin-Bacon away from cache timing), they would have discovered cache timing attacks. "
My argument is security assessment of a major project contained a covert channel analysis, that they reported [10] the cache leaked secrets via a timing channel, that they kept finding more shared resources throughout hardware, and those needed to be mitigated. The predecessors I mentioned above had invented covert-channel, analysis techniques to spot leaks. They used them. Previous work found them in software and hardware, but not CPU's. To make it extra clear, the work I cited found (adding emphasis) a "timing channel" using the "CPU cache" that could "leak secrets" across security boundaries. Later in 2005 or so, Colin notices a "timing channel" in the "CPU cache" that can "leak secrets" across security boundaries. That's the same thing. Timing channels via CPU cache. Even models the same way in a security lattice.
Starting in 1992, it was published, discussed by members of the field, put in a patent by one, and that noted "cache-related" channels as a general problem to always look for. The field then knew that caches leaked secrets unless they were designed not to. We assume that about all of them per TCSEC rules until standard, mandatory analysis shows they don't have the weakness. Intel contains a CPU cache that's not designed for security, has code using secrets, and might run hostile code. Intel's cache therefore can or does leak secrets to hostile code. The end.
They didn't stop there. Another group in security field analyzed Intel CPU's [11] in mid-1990's. It identified numerous risk areas for both penetration and covert channels. It includes a whole section, "Cache and TLB Timing Channels," that warns of their existence referencing Wray's 1991 work. Throughout the paper, it argued the x86 CPU's had numerous security weaknesses with some obvious attacks and more coming. With all that said, the status quo was to not use x86 for secure systems unless absolutely forced to by the market because they were insecure. Those that did have to build on x86 attempted timing channel mitigations but they didn't work. It has to be fixed at the root cause. So the very work you dismissed here on HN had the x86 side channels in caches you just asked me for. I wonder if you even read it.
While folks like you ignored that research, those paying attention freaked the hell out. Caches are everywhere. Turning them off made performance horrible. After software mitigations failed, the solution was fall-back to separate CPU's for trusted and untrusted respectively. Many in military stayed with fully-separate hardware. After it became popular again in 2000's, partitioning and masking caches were designed by academics hoping to stop the leaks. Big vendors didn't use them. Mainstream security sector ignored them so no crowd-sourcing was going to happen. Now, we have The Big One hitting that starts with cache problems warned about in general by 1992 with x86-specific warnings by 1995. And well-known members of the security community of 2018 are still pretending that work didn't happen. Fortunately, there's CompSci folks that stay building on the prior work you thought didn't exist with recent attempts at cache protection having a lot better penalties than it did in the 1990's. Maybe someone will build it into a RISC-V or something later if we're lucky.
In the meantime, you might want to study the prior methods that seemed to find a lot of problems and solutions you later talked about or were looking for. You can bet those lines of research have found and solved even more problems than that. It's how I keep citing old work when people want solutions to "current" problems. I'm willing to learn about problems and solutions from anyone in any style of INFOSEC if it will protect our systems. Try reading pioneers' work instead of dismissing or smearing it sometime. Who knows: you might learn how to solve another root problem with some clever approach, theirs or yours building on it, that I'll be citing in the future. I hope so since we can use all the eyes we can get.
I think you're standing pretty far outside the mainstream of academic cryptography when you suggest that covert channels are the "the same thing" as side channel coerced extraction of secrets, but then, maybe all the academic cryptographers are misrepresenting and slandering the pioneers too.
That's just evolution of terms over time. Those that published original work in INFOSEC called leaks through shared resources covert channels. Kemmerer and Wray devised methods for identifying storage and timing channels respectively. Since it fit the definition, those that discovered cache-based, timing channels called them covert, timing channels. Since it fit and it was their discovery, I call them what they called them. Only fair to let inventor decide esp if using same term as prior versions of a concept.
Later, people re-discovering these things wanted a new term for the passive ones. They're still covert channels if we use already-established definitions since they're unintentional leakage through shared resources between a secret container and an observer. At least, that's what some and I argued with some others disagreeing. The other side won with the majority of the field started using side channels. So, it wins out just because it's popular. It's kind of how "programmer is in control" is called the "C philosophy" even though Thompson got it from BCPL. C and UNIX momentum are what people remember until I tell them about Richards actually inventing the key concepts.
No slander, misrepresentation, or anything there. Just people missing or trying to change definitions in a field. That's an annoying, normal part of human language. It would be different if they said prior researchers didn't find leaks in caches, didn't improve security with their certification standards, and other provably-false claims. Those would be ignorance or slander. See the difference?
> Google discovered a problem and reported it to Intel, AMD, and ARM on June 1st. Did they then go around contacting all of the operating systems which would need to work on fixes for this? Not even close. FreeBSD was notified the week before Christmas, over six months after the vulnerabilities were discovered.
Absolutely shitty behavior by Google, AMD, Intel and ARM. Notifying the FreeBSD devs so late is a slap in the face to the FreeBSD community and makes the internet as a whole less safe.
I am now looking forward to the time a FreeBSD (or Linux) developer finds a complex-to-mitigate architectural bug, only pushes it to their kernel, and the world burns down around us because of this petty shit (including these posters, Google and hardware vendors in this). So thank you for that.
Linux as a whole? I doubt it; but I can definitely imagine individual Linux distributions being contacted and asked to agree to strict terms (including "no patches get shared") before they are given details of future issues. Nobody wants to cut Linux out of things, so there's going to be an inclination to err on the side of "well maybe they didn't understand what the embargo meant...".
Somebody clearly wanted to cut BSD out of things... And succeeded.
Also, while the Linux devs failing to honour the embargo ended up being a quite public and problematic thing, I _seriously_ doubt some of the other OS vendor dev teams didn't talk about the problem out of school - and I strongly suspect state-actors and blackhats have pipelines of information on security problems being worked on at Microsoft, Apple, Google, and many other big players. Overenthusiastic devs at local bars/cafes, boastful or bignoting devs online, or disgruntled employees/ex-employees, they are almost certainly targeted by both blackhat and state level actors.
I'm pretty sure a security problem of this magnitude, requiring remediation work my this many different teams, work which took six months or more of planning and executing - could not _possibly_ have stayed completely unknown externally right up to the agreed embargo date. People just don't work that way.
If that's what it was, they were both mixing up FreeBSD and OpenBSD and had a time machine.
OpenBSD has been clear in the past about not respecting embargoes, however. That's their right (Linus takes the same position) but I would completely understand people deciding to not notify OpenBSD as a result.
But as far as I'm aware, FreeBSD has an exceptional track record when it comes to respecting vulnerability embargoes.
This is false, OpenBSD agreed to the inital embargo on KRACK, the author wanted it extended, OpenBSD asked if they could publish a silent fix/patch, the author agreed, then later renegged, OpenBSD published it anyways, then the author got pissy.
The idea that OpenBSD broke a bug embargo is misinformation.
The analogies with the visa/passport are good, except the one about Meltdown:
Because she doesn't have a North Korean visa, she (somehow) checks the expiry date on someone else's North Korean visa, and then (if it is about to expire) runs out to renew it
Maybe try this version: she calls the embassy to ask if she needs to renew her visa (which she doesn't have). The clerk, who has access to all the visas ever issued, tells her that no one has a visa that expires in the next few months, so she is good. This way, you find out about other people's expiration dates.
I agree that analogy makes more sense, but I wanted an analogy which involved my girlfriend doing something impossible -- because a process accessing memory it doesn't have permission to access is supposed to be impossible.
These side-channel attacks -- which, as illustrated, are not new -- are becoming perceived as more acute because more so than before we're intermingling processes with vastly different purposes, responsibilities, and origins, on the same hardware. When you do this, it's exceedingly difficult to cloak intrinsic attributes of execution, such as cache timing, or even execution timing, from other contexts of execution that have access to observe enough facets of the system.
This applies at all levels of reading, really. We're mixing the trafficking of sensitive authentication information in the same process as code that executes instantly upon clicking on hyperlinks. We're mixing business and pleasure on the same filesystem, in the same OS and memory space. We're mixing mutually untrusting entities' private data and code on timesharing systems that we're renting in someone else's datacenter.
Process boundaries, both in terms of physical reality and design choices on what constitutes a system boundary, are critical, and will need additional thought. At the same time, a boundary that's more robust to side-channel attacks than most is having all the hardware exclusive to yourself.
Also, this entire issue neatly demonstrates the problems with executing untrusted code. Solving the issue of vetting code and evaluating trust, especially by average users, is extremely difficult; but the current culture of the Web largely consists of running untrusted code immediately from a single user-entered string, or through automatic or manual navigation thenceforth. This is a frightening proposition, and yet entirely mainstream today.
This problem is made worse by technologies like JavaScript, which resemble system programming languages in the lack of restrictions placed on untrusted code. JS code can create threads, sockets, timers, perform I/O, loop or allocate forever. This is complete overkill for something that is mostly needed for interactive websites. Web Assembly doesn't fix this problem either. JavaScript has killed accessibility for the web and created numerous security problems. I don't want it and try to block it when I can.
In section 5.2 of Bernstein's paper, he discusses isolating single-source transformations. His example is locking down jpegtopnm using standard Unix mechanisms so it can only read stdin and write stdout and do absolutely nothing else. It seems this pattern really should be one that is trivial for the programmer to use, without having to go through the process of managing a pool of UIDs, remember about setting process limits, etc. Then it might be used correctly more often. Would be a useful building block for lots of code, especially if it were portable across many versions of *nix.
This is basically what Capsicum (FreeBSD + Linux) and pledge (OpenBSD) are for. I agree that it would be nice to have a standard interface, but the functionality provided by those is sufficiently different that it's hard to imagine how a single API could be created which usefully covers both.
The reason why I thought it's so important to do this effort is that, this bug was in the news for days, everybody is going to do upgrades in one way or the other, and even to pay for the performance cost, yet most of the people out there will not understand what it is about. I've the feeling that this disconnects non technical people from technology in a very bad way. That's the reason I believe that divulgation is so crucial.
The particular code path which I found was leaking information was fixed in OpenSSL. Some operating systems (including FreeBSD) turned off HyperThreading by default for a while to allow software authors to fix their code, and then re-enabled it again a few years later.
Intel stopped using HyperThreading for a while (it was in the Pentium 4, but that architecture was abandoned), and also made some changes to the caches which they claimed would help, but they never provided any details and I've never seen any verification of their claims in that regard. (Obviously whatever changes they made weren't enough to prevent the Spectre and Meltdown attacks conveying information through the cache!)
So the best summary is that people threw up their hands and said "yeah, it's a problem, let's hope that nobody writes code which leaks any sensitive information into the microarchitectural state". Not exactly a good position to be in...
> For my Tarsnap online backup service I compile and cryptographically sign the packages on a system which has never been connected to the Internet. Before I turned it on for the first time, I opened up the case and pulled out the wifi card; and I copy files on and off the system on a USB stick.
How can you be sure the system you copied those files from is not compromised, the micro-controller on the USB stick has not been compromised, and now it's not going to take control of the build system and replace the compiler chain with a modified one which will make all your software vulnerable?
Good write up. The following blurb got me thinking:
"In the industry we refer to "airgapped" systems; this is a reference back to the days when connecting to a network required wires, so if there was a literal gap with just air between two systems, there was no way they could communicate."
If we know the Internet is a scary place why do nuclear plants, power plants, etc. have Internet connectivity?
Because it's the most efficient way to communicate with the corporate headquarters and clients/customers? These places are fundamentally businesses and they have the same communication needs as any other.
If you're worried about servers being cracked, the best thing is to avoid shared cloud servers and use dedicated private servers.
These attacks rely on some other user-process malware running on the server to gain information via side channels. If every process in your server is your own, for your web services or whatever, you don't have to worry about it.
As usual, nothing beats dedicated private servers.
If you plan to run one VM on one physical machine, perhaps you don’t have to worry your neighbor reading your data. But a malicious program running on your machine CAN still exploit the bugs. In fact one can exploit the browrser via Javascript by taking advantage of Spectre. There is no way to avoid the vulnerabilities until you have patchrd your system and your software. The three bugs we know of is not a cloud vs dedicated choice.
I was illustrating the point that these bugs are not limited to a cloud vs dedicated server sevicd model. I don’t think OP would risk not patching its dedicated server, then it echoes my point that these bugs are just as dangerous as in dedicated server model too.
I've read the ppapers and lots of articles on the subject and still can't get how do I know where I land using Spectre Variante 1. I know I can read some byte in Memory, but do I know where exactly it comes from? I know that I can increase the x value so that I can read other values, but until which point? Thanks
- Dan Page published [1] in 2002, describing an attack on DES exploiting cache timings. In 2003, he published [2] describing countermeasures to this class of attacks.
- Concurrently, a Japanese team also attacked DES (and MISTY1) with cache timing [3, 4].
- Dan Bernstein published the first version of his AES cache timing attack in late 2004 [5].
[1] https://eprint.iacr.org/2002/169
[2] https://doi.org/10.1016/S1363-4127(03)00104-3
[3] https://link.springer.com/chapter/10.1007/978-3-540-45238-6_...
[4] https://web.archive.org/web/20060906064630/http://web.engr.o...
[5] https://cr.yp.to/antiforgery/cachetiming-20041121.pdf