Dan Lorenc, CEO of Chainguard, a software supply chain security company, joins SE Radio editor Robert Blumen to talk about software supply chain attacks. They start with a review of software supply chain basics; how outputs become inputs of someone else’s supply chain; techniques for attacking the supply chain, including compromising the compilers, injecting code into installers, dependency confusion, and typo squatting. They also consider Ken Thompson’s paper on injecting a backdoor into the C compiler. The episode then considers some well-known supply chain attacks: researcher Alex Birsan’s dependency confusion attack; the log4shell attack on the Java Virtual Machine; the pervasiveness of compilers and interpreters where you don’t expect them; the SolarWinds attack on a network security product; and CodeCov compromising the installer with code to insert exfiltration of environment variables into the installer. The conversation ends with some lessons learned, including how to protect your supply chain and the challenge of dependencies with modern languages.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.
Robert Blumen 00:00:17 For Software Engineering Radio, this is Robert Blumen. Today I have with me Dan Lorenc. Dan is the founder and CEO of Chainguard, a startup in the software supply chain security area. Prior to founding Chainguard, Dan was a software engineer at Google, Discuss, and Microsoft. Dan, welcome to Software Engineering Radio.
Dan Lorenc 00:00:42 Thanks for having me.
Robert Blumen 00:00:43 Today, Dan and I will be discussing attacks on the software supply chain. We have some other content in this area, number 498 on CD, 338 on Jenkins, and several others on CD that you can see in the show notes. This episode will be all gloom and doom, but don’t despair, we will publish another one later this year about securing the software supply chain. There’s so much here to talk about. I wanted to do an entire episode on attacks. Dan, before we get started, is there anything else you’d like listeners to know about your background that I didn’t cover?
Dan Lorenc 00:01:25 No, that was a pretty good summary.
Robert Blumen 00:01:27 Okay. We have covered this before, but let’s do a brief review. When we’re talking about software supply chain, what are the main pieces?
Dan Lorenc 00:01:37 Yeah, so software supply chain is very similar to a physical one. It is all the other companies, people, individuals, communities responsible for taking all of the dependencies and other systems that you use to build your software; getting those to you, keeping them up to date, keeping them secure and letting you use them in the course of your development of your software. And then the downstream side of that as well. We’re all in this massive software supply chain together. Nobody is building code on an island. Nobody’s building code by themselves. So most people working on software are somewhere in the middle of that chain. So all of your consumers, all of those people taking and using your software in their day to day life. That’s how I think of the software supply chain.
Robert Blumen 00:02:16 If I understand, then there are parts that you run, like perhaps a build server. There are dependencies that you pull in and then if you publish software or an API, you become part of the supply chain for other people. Did I get that right?
Dan Lorenc 00:02:31 Yep. Yeah, that’s a great summary.
Robert Blumen 00:02:33 What is the attack surface of the supply chain?
Dan Lorenc 00:02:37 It is massive, right? So it’s all those groups, all those systems, all those companies, all those build servers, all those organizations involved in getting you your code that you use, getting you your dependencies and your libraries and your services. Any one of them can be attacked. So the attack surface is absolutely massive.
Robert Blumen 00:02:53 As I’ve been reading about this, it seems that certain things tend to get mentioned a lot, one of them being Jenkins and another one being NPM. Am I making somewhat of a biased or disproportionate reading with the literature, or are those really the points that people are attacking the most?
Robert Blumen 00:04:07 I found a report from Sonatype called “state of the software supply chain.” According to this report, software supply chain attacks have increased 650% and are having a severe impact on business operations. Some attacks reportedly have caused billions of dollars of damage. Why have attackers turned their attention to the supply chain in recent years?
Dan Lorenc 00:04:32 Yeah, I think there’s no clear commonly accepted answer here. I have my pet theory and some folks have shared it, but these aren’t new, right? Sonotype is picking up these trends and the trends are new, but software supply chain attacks aren’t very new. They go all the way back to the early eighties, actually. The first one that I found was from Ken Thompson’s famous paper “Reflections on Trusting Trust,” which we can talk about more later if you want. But we’ve known about these for going on 40 years, but what we are seeing is attackers actually targeting them. The best answer I’ve heard for why now is a combination of a few factors, but the biggest one is that we’ve finally just gotten good enough at locking down and applying basic security hygiene everywhere else. Attackers are lazy on purpose. They take the easiest way in when they want to target an organization.
Dan Lorenc 00:05:16 Supply chain attacks haven’t gotten much easier. They’ve gotten a little bit easier just in with the rise of open source and the more interconnected web of services that we’re using today, but not markedly be easier, but they’ve become much easier in comparison to all of the other methods. We’re finally using SSL everywhere across the internet. If you look back 5 or 10 years, we weren’t quite at that level of ubiquity. MFA is finally still taking off even though it’s been slow and somewhat controversial in some circles. Strong password hygiene, all of these things used to be much easier ways to attack with basic fishing campaigns. But as we’ve gotten good enough at preventing these other methods of intrusion, the supply chain becomes more attractive relatively.
Robert Blumen 00:05:55 Is it possible to generalize what are the intentions of the attackers, or is supply chain simply a mode of attack and the usual reasons may not have changed?
Dan Lorenc 00:06:08 Yeah, I don’t think there’s anything new about the motivations here. We’re seeing all the same usual suspects forming supply chain attacks: nation states, cryptocurrency, mining, ransomware, all of the above.
Robert Blumen 00:06:22 How are supply chain attacks detected?
Dan Lorenc 00:06:25 The interesting part about supply chain attacks is that there’s no one type of attack. It’s a whole bunch of problems, like we talked about. It’s a whole bunch of different attack points because the attack surface is so large, so all the attacks look very different. If you look back just over the last couple of years, the two most famous examples that got the most headlines were at the attack on SolarWinds, that company back at the end of 2020 in which their build system was compromised. The second one was obviously Log4Shell or Log4J at the end of the following year and these two were, they’re both categorized as supply chain attacks. People keep saying we need to improve supply chain security to prevent issues like these, but when you actually zoom in, they’re completely different.
Dan Lorenc 00:07:03 It’s not even really fair to categorize Log4Shell an attack. It was just a bug that was left sitting around in a widely used code base for a decade that nobody knew was there. When it was found out, then attackers tried to escalate it; the bug itself wasn’t any kind of attack. So yeah, I don’t think there’s an easy answer for fixing these or detecting them. They’re all very different. So the basic patterns of intrusion detection are things that you would use to detect something like SolarWinds, the attack they faced, where with Log4Shell, it’s about asset inventory, static code analysis, S-bombs understanding of what code you’re running so you can apply upgrades faster. So they’re all very different.
Robert Blumen 00:07:40 In reading about this area, many of these attacks were discovered in some cases years after the intruder had penetrated the network. Do you think that’s characteristic of supply chain attacks, or that could equally well be said of all the other attacks that exist on networks?
Dan Lorenc 00:08:01 I think it depends. I think a lot of the attacks that we’ve seen and gotten detected, like the Solarwinds one, for example, it wasn’t detected until after the exploit was triggered. This was kind of a piece of malware that was smart enough to sit around and wait for a while before doing anything. So that made it hard to detect until it actually started misbehaving. If it hadn’t had that timer built in, it would’ve been detected a lot quicker. Attacks like — jumping back to not really an attack, quote-unquote — like the Log4Shell example, that bug was present for a decade, and then all of a sudden once it was found, researchers went and found a whole bunch of similar ones nearby which caused the fix rollouts to be a little bit slower. So it’s possible somebody knew about the exploit earlier and just didn’t use it or didn’t hide it or didn’t share it, so it remained hidden. So yeah, I don’t think there’s anything remarkably different about supply chain attacks in general, but there are certain ones that can lurk around for a lot longer.
Robert Blumen 00:08:53 You mentioned SolarWinds, Log4Shell. I do want to come back in a bit to talk about some of the more well-known attacks. I want to talk briefly about some of the techniques that are used. As you pointed out, supply chain is not a technique, it’s a part of the system that can be attacked many different ways. I have a list here of about 10 or 12, but maybe you could start with your list. What are some of the top techniques or attack vectors that are used to attack the supply chain?
Dan Lorenc 00:09:27 Yeah, the easiest way I like to frame this is by looking at the steps in a supply chain because they’re all attacked and they’re all attacked pretty commonly. You start out if you hear that classic like “shift left” philosophy. So if we start out left, where left is developers, developers get attacked, individual ones; they’re outside of your company working on open-source packages or inside of your company. That’s a whole another angle known as like insider threats. But if developers’ passwords get compromised or their laptops get stolen and they happen to be maintainers of a large project on, say, PiPi or NPM, now malicious code can get uploaded there, and we see stuff like that happen very commonly and that’s why registries like PiPi from the Python Software Foundation and NPM. But you know, now they’re rolling out mandatory multifactor authentication to help protect against those threats because we do see them, whether it’s phishing or targeted attacks.
Robert Blumen 00:10:16 Let’s drill down into that a little bit. Somebody gets the laptop of a developer who commits to a well-known Python repository. Now they would be able to commit something that shouldn’t be there into the repository. Walk us through the steps, how that results in an attack on some other part of the ecosystem.
Dan Lorenc 00:10:37 Sure, yeah, there’s a couple different ways this can happen. If somebody’s a maintainer of a package directly — on PiPi, for example — one of the common misconceptions or people don’t quite realize with the open-source code and most of these languages is that you don’t consume the code directly from the Git repository or something. You can, but it’s a lot of extra work and isn’t necessarily encouraged or easy. Instead, most people consume this intermediate form called a package. So if you’re a Python developer, you write your code on GitHub let’s say, and then you turn that into an artifact or something, you might, you don’t really compile it but you package it up into a wheel, or a zip file, or something like that, they’re called in Python. And then you upload that to the Python package index and then people download that. And so, if you’re compromised, depending on exactly what permissions you have you could either, an attacker could either push code directly to the repository and wait for that to get packaged up and sent them to PiPi.
Dan Lorenc 00:11:27 Or if you have access to the package index directly, they could just slip something into a package and upload that. Depending on how users have their systems set up, they’d pull down that update right away the very next time they build and deploy. We see this commonly used to install crypto miners or phish for credentials on a developer’s machine — steal Amazon tokens or something like that. In a lot of these cases, attack one developer and then that’s used to laterally move to attack all of the people depending on that package.
Robert Blumen 00:11:54 Once you get this bad package then, if it’s trying to steal credentials, does it have a technique to exfiltrate them back to the attacker?
Dan Lorenc 00:12:05 Yeah, this is kind of how a lot of them end up getting detected. They might use some form of code obfuscation to hide exactly what’s going on, but it would usually look something like a little script that runs, scans the home directory to look for SSH keys or other secret variables you have stored there and then send them to an IP address somewhere. Some people have gotten a little more clever with it. I think the famous dependency confusion attack used DNS requests or something like that that aren’t commonly flagged by firewalls to exfiltrate data that way. But as soon as you have a network connection, you can’t really trust that the data stays private.
Robert Blumen 00:12:38 Just now you mentioned dependency confusion, that’s also on my list. Explain what that is.
Dan Lorenc 00:12:44 Yeah, that was a really interesting attack, or class of attacks I guess, depending on how you want to characterize it because it affected multiple different programming languages that a researcher found some time last year. Thankfully it was a researcher doing this to report the bugs and close the loops, not really steal data from companies, but now we do see copycats rolling out trying to steal data using this technique. And the basic premise here is that a lot of companies have rightly recognized that publishing code and using code directly from open source and public repositories does come with some risks. They try to use private repositories or private mirrors where they’ve vetted things and they published their own code into, but it turns out a lot of these package managers had some features built in to make it really, really easy to install stuff where it would just try all these different mirrors at the same time to look for a package until it found one. And the order there kind of surprised some folks.
Dan Lorenc 00:13:29 So if you have an internal registry at your large company where you publish code, it turns out that it actually checked the public one first for all of these packages. And normally that’s not a problem if you have an internal package name that nobody is using publicly to store your own code. But if somebody finds out what those names are and happens to upload something to PiPi or RubyGems or something like that with the same name, turns out you’re going to get their code instead of yours. And as soon as you grab that, that code starts running and it’s basically handing out remote code execution, one of the worst types of vulnerabilities for attackers, as long as they can guess the names of your packages. And that’s not something people normally protect that closely. You don’t really see names as incredibly sensitive data. Sometimes the code is, but the name of the package is something that people copy around all the time and post in log messages and errors on Stack Overflow when they’re debugging. So it’s not something that’s widely considered a secret.
Robert Blumen 00:14:19 If I understand this then, suppose I work at large company XYZ and we have an internal repository and perhaps if we’re in a typical perimeter network, the DNS of that repository, it’s not public DNS, it’s private DNS within the corporate network and it’s called XYZ Python Registry. And in that registry we have a package, it’s called XYZ credit card charge, something like that. And according to what you said, the package resolver in Python might look for that name XYZ credit card charge in a range of different repositories, including public repositories and it would not necessarily prefer the private one ahead of public ones. So, you can get ahead of the private one in the line and hopefully it will pull your code down if you’re the bad guy?
Dan Lorenc 00:15:19 Yeah, that was basically the technique. It sort of makes sense if you don’t think about it too closely. If you’re installing 200 packages, 198 of them probably do come from that open-source one, the public registry. So let’s try that first and then fall back to the other two times. This wasn’t put in intentionally, it was just something that sat around for a better part of a decade before somebody noticed that it could be abused in this manner.
Robert Blumen 00:15:38 I’ve heard of a technique, which I believe is related, called typo squatting. Can you talk about that?
Dan Lorenc 00:15:45 Yeah, very similar. This kind of bleeds into the social engineering category of attacks where it’s hard to exactly classify it. But the general technique there is you find a commonly used package for a website or tool or something with the name and then you upload something with a very similar name, whether it’s a small typo, or replacing a character with the Unicode version that looks the same unless you actually look at the raw bites, or even more social engineering versions. This is something we faced a lot when I was at Google. We’d upload libraries with the name of something like Google Cloud Ruby Client. Somebody else would upload one with like Google Ruby Client or GCP Ruby client or switching around all these acronyms. Creativity is endless here, they’re an infinite number of ways to make something look real, and the naming conventions are all kind of just made up. These get uploaded, and then you kind of have to sit and wait — and this is where the social engineering part comes in — for somebody to either typo it or copy paste it or have it show up in a search engine somewhere to grab your copy instead of the correct one.
Robert Blumen 00:16:41 If you’re the bad guy then you might post some Stack Overflow questions about that package, just try to get it out there in the search engines and hopefully somebody else will see that on Stack Overflow and copy paste that into their. . .?
Dan Lorenc 00:16:56 Exactly.
Robert Blumen 00:16:56 Okay. Another technique, which if you want to use this as a launchpad to talk about the Ken Thompson paper, would be injecting things into the build.
Dan Lorenc 00:17:09 Yeah, so this is kind of what happened in the SolarWinds case, but this is really what Ken kind of pointed out back in the 80s. So it’s a really interesting paper — again, the title is “Reflections on Trusting Trust.” It’s very short. I think he gave the talk actually during his Turing Award acceptance speech or something. Yeah, you should really read the paper. I’d encourage anybody working with computers to do it. It’s got a funny story too. The story is, he was at Bell Labs at the time in the group that invented most modern programming languages, the Unix operating system, all this stuff that we still use today. When he wanted to prank his coworkers who are all also incredibly smart folks like him, and what he decided to do was insert a backdoor into the compiler they were all using.
Dan Lorenc 00:17:47 When any code got built with that compiler, it would insert a little backdoor into that code. So, when you executed a program you built, it would do something funny like print out the user’s password or something like that before it ran the rest of the program. That was kind of the little backdoor that he stuck in. Knowing that these folks were really smart and, they’d assume it was a compiler bug, he made the compiler kind of propagate this so he went another level here. So instead of just having this backdoor in the source code, building a compiler, handling that to folks — they’d immediately then go build a new compiler to work around it. He made it propagate. So, the compiler when it was compiling a normal program would insert this backdoor, but if it was compiling a new compiler it would insert the backdoor again into that compiler so it continued to propagate.
Dan Lorenc 00:18:28 So he did this, gave everyone the compiler, had to kind of hide and sit and wait for a little bit, deleted all the source codes. Now there’s no more evidence this backdoor existed; the compiler just kind of had it there in the byte code. And it would propagate back doors into every program it built. Now he knew the folks were also smart enough to look at the raw assembly and figure out what was happening and be able to remove it by patching the program directly. So he went one more level — and this isn’t in the original paper, I swear I saw this somewhere in one of the little talks but I haven’t been able to find it again — he also made it so that when you were compiling the disassembler that people would use to read the raw machine code, it would insert a backdoor into the disassembler to hide the back doors in all of the programs. So imagine these folks stepping through the code in the disassembler, getting to the section, seeing no evidence of any backdoor anywhere and then their password’s still getting printed out. Because the compiler, the disassembler, and all the programs have kind of been backdoored at that level.
Robert Blumen 00:19:16 This reminds me of things I’ve heard about root kits that can intercept system calls, so when you try to list files to see if you have a malicious file, it will intercept the LS and not show you the file.
Dan Lorenc 00:19:29 Yeah, very similar to something like that where the back door’s operating at a lower level for you to even be possible to detect. He kind of basically showed that unless you have trust in every piece of software and tool and service that was used to build the software you’re using, recursively, all the way back to the first compilers that bootstrapped every programming language, then it’s hard to have any trust in the programs that we’re running today because everything could be capable of being backdoored and then hiding those back doors. There have been some techniques to mitigate this with multiple reproducible builds and using different compilers and different outputs and things like that, but it’s all very complicated and scary.
Robert Blumen 00:20:05 What about the role of code obfuscation which this, this example you’re talking about with Ken Thompson could be considered an example of code obfuscation. Are there others?
Dan Lorenc 00:20:15 Yeah, yeah these are used a lot. A lot of security scanners and static analysis tools just kind of read code and look for things that shouldn’t be doing kind at a cursory level, and thankfully a lot of attackers are lazy and don’t go through the trouble of hiding stuff too much. So you can see stuff like things getting uploaded to random IP addresses or domains in other countries, but some folks do try to obfuscate it and hide it, hide these strengths that are commonly searched for and, base 64 encoding or something like this. And that kind of has a drawback too because obfuscated code is generally, there’s also scanners that are really good at looking for stuff that’s been intentionally obfuscated. So yeah, it’s kind of a trade-off either way.
Dan Lorenc 00:20:56 You can take it farther though, right? These are all kind of automated obfuscation techniques that leave some kind of fingerprints of what they do. There’s manual ways to do this as well. There are a lot of “bug doors,” I think is the technique there where if you could read code and see every bug, then you’d be the best programmer in the world. Nobody can do that, and it’s possible to write code that leaves a bug in place that you knew was there that a reviewer or somebody else might not notice. There’s a great competition each year called the International Obfuscated C Code Competition. I’m not sure if you’re familiar with this. In it, every year people are challenged to write C code that does one task but then does something else as malicious or funny as possible that people can’t see upon a cursory read. If you’ve ever seen some of these submissions then, yeah, you’d probably be terrified at the idea of obfuscated code sitting in plain sight.
Robert Blumen 00:21:39 I have looked at some of those submissions. I did at one point know how to program in C, and looking at these programs I absolutely could not tell what any of them did.
Dan Lorenc 00:21:49 Yeah, and the operating systems that we all use today are millions of lines of code of C written these same ways. It’s a miracle any of it works.
Robert Blumen 00:21:58 We have talked about a couple of examples here: the Ken Thompson and the dependency confusion attack, which was launched by a researcher named Alex Birsan. He has a great article about that on Medium. Let’s talk now more about some of the attacks you’ve mentioned that I said I’d come back to, starting with the Log4Shell.
Dan Lorenc 00:22:22 Sure. Yeah, that was really a worst-case scenario that was, these types of things are just inevitable over time. But yeah, this was a vulnerability in an incredibly commonly used library, basically used for logging across the entire Java ecosystem, and Java is one of the most commonly used programming languages around the world. I say around the world, but I think this program in Log4Shell and Log4J are actually running on the Mars Rover, so not even just across the world — a little bit of hyperbole, but this was across the solar system at this point. That’s how commonly used this code was. And it was just a bug sitting present where when the logging library tried to log a specific string it could be exploited to enable remote code execution — again, the worst form of vulnerability because that means it’s downloading code from some untrusted person and running it in your trusted environment — was present for a long time.
Dan Lorenc 00:23:12 It was discovered by a researcher, it was reported, and the fixes were rolled out as quickly as possible. There was some chaos obviously involved because then researchers realized this class of attack was possible and found a bunch more at the same time that the maintainers were trying to fix the first one. So it took a little while to get them all patched, but in the meantime, attackers found it pretty quickly and started trying to exploit this over the internet. And it was as simple as typing one of these strings into the password field on a website or something like that to trigger an error message that might get logged. So we were trying this across the internet, basically, and achieving great results over a couple days until organizations were able to roll out these fixes.
Robert Blumen 00:23:49 One of my questions was going to be, I would think that the programmers who wrote the code have control over what gets logged. I’m typically writing log messages like ‘cannot connect to database.’ So my question was going to be how does an attacker get information to appear in the log? The way they would do that is they’re entering fields in forms which they know are wrong and they are making a guess, which is going to be true in many cases that the programmer is going to log either all inputs or incorrect input.
Dan Lorenc 00:24:27 Yeah, that’s basically correct. You can do this in http headers and a lot of servers will log those, you can stick it in IP address fields and stuff like that to trigger intentional errors. When developers want to debug something in production, they want as much data possible, so it’s common to log a lot of this stuff. In recent years, because of all the privacy and constraints in GDPR people have started scrubbing log messages for PII (personally identifiable information), but before that it was pretty common practice to log everything, which might include usernames and sometimes clear text passwords, and stuff like this, which we’re a whole boon for attackers too trying to steal data. For the most part, log entries are not considered sensitive and people don’t sanitize it to the extent they should.
Robert Blumen 00:25:06 So, following this down the chain, I enter the bad string in the password, I’m guessing correctly that the developer has a statement that says log-level warning: incorrect password. How does that translate into some bad code being able to run on the Java virtual machine?
Dan Lorenc 00:25:27 Yeah, so this is some pretty technical details in Java and, I think this is a case of kind of, I think the term I saw is like an ‘intersection vulnerability’ where it wasn’t really one commit or one thing that added the bug; it was kind of the intersection of two commits that were both fine by themselves but when operated together lead to unintended behavior, and this happens all the time. But yeah, the Java library here supports kind of macros or template expansion or things like this in log messages to make it easier to use and as a great feature. And then at the same time the JVM and Java itself was designed to run in all sorts of environments, right? Some even include browsers where you can embed a JVM in a browser, and there’s a little feature where it could go load an applet or something over the internet and run that in your browser tab, and it turned out that that was kind of just left on by default in a lot of these cases — that behavior to go dynamically load some code from a URL and run it.
Dan Lorenc 00:26:17 And it turned out that depending on what template strings you passed into this logging library, you might be able to trigger it to go download code and run it from the internet as it expands these templates to fill in other variables and other contexts into the logging message. So that was basically it. There were a couple other things necessary to get full remote code exploitation, like the process needed to have access to the internet to be able to make a request to go download some code and execute it, things like that. But at a minimum, people were able to trigger crashes and other types of bad behavior — availability attacks that, even if the process didn’t have internet connection, could still take down the process and trigger bad behavior.
Robert Blumen 00:26:56 If I understand this, if I’m the bad guy then I put a string in my malicious password or my malicious http header, and that string has in it a small computer program that says something like ‘http get www.bagguy.com/backdoor,’ it will load that code into the JVM, it would maybe have a dollar sign or something around it to tell the interpreter that it’s code, and the interpreter will then run that code and do whatever it does. Is that it, more or less?
Dan Lorenc 00:27:35 Pretty similar? Yeah, basically people build like a small programming language into these logging libraries. So you can do stuff like maybe split a string or uppercase it or something like that before it got locked, and there’s a bunch of built-in functions like, for example, uppercase a string or adding spaces, or something like that, or formatting as html — these kind things that you might want to do before logs get written. And one of the features of the JVM is that you could also load in other functions rather than just these built-in ones. You could have custom formatters or custom helpers in your logging library, and if you pass in a URL to that rather than the function, just a like built-in function, it would go fetch a jar from that URL and then try to execute that function and from that jar that it just downloaded from the internet. So there was no guarantee that came from a server you trusted, there was no guarantee you knew anything about that code. And so that’s kind of how this was triggered. People would just put in a URL containing a malicious jar and then put the URL to that in this logging stream,
Robert Blumen 00:28:47 Another podcast I listen to, Security Now, it’s a common theme of bugs they discuss that somewhere along the line there is an interpreter or compiler involved, and in some cases where you wouldn’t expect it. I remember one example of a program that displays images like JPEGs or something like that was running an interpreter, and somebody used that as an attack vector. Now, if I know that I’m compiling code — we’re not going to get away from having compilers — I’m going to put it on Jenkins, and if I know that Jenkins is vulnerable, I’m going to take a lot of steps to secure it. What’s disarming about this is the presence of these compilers and interpreters in places where you really don’t expect them so your guard is down and you’re not doing all the things you would do to protect a compiler.
Dan Lorenc 00:29:44 Exactly, yeah, that’s a great way to put it. Yeah, there’s a long, I guess, spectrum between full Turing-complete interpreter that can do everything and then very restricted interpreter that can only do a couple things that we’ve told it can do. And it’s not always clear exactly where you are. A lot of these compression algorithms — JPEG and some of these other formats that you brought up — are like little interpreters. The way that they compress an image is, instead of storing every single pixel and the values, they’ll kind of generate this little program that can spit out the full resulting image, and in a lot of cases that can take up a lot less space. A simple example to think through in your head is if you had a thousand by a thousand image and all the pixels were black, you could either store a thousand by a thousand little bites saying this pixel is black, or you could just write two little for loops or something like that and say for i in range for j range print black. And that second one is much, much, much smaller to store, and so that’s basically one of the fundamental principles to a lot of these fancy compression algorithms.
Dan Lorenc 00:30:44 And if they’re not implemented perfectly correct, then you don’t know that that’s what it’s doing, you’re executing some arbitrary code. And if that triggers a bug then you’ve got an interpreter running against untrusted code. It might not be able to do everything, but it might be able to do enough to cause some havoc.
Robert Blumen 00:31:01 Are you aware of any examples of how the Log4J was exploited in the wild?
Dan Lorenc 00:31:07 So, there was just a recent report that came out of the DOD and kind of an advisory council, the US government doing kind of a postmortem on the overall attack. Luckily, they found nothing terribly serious happened, which is somewhat surprising in the immediate wake of the attack. There were some fun kind of examples happening where people, I think somebody who was referring to it as like a vaccine or something like this where you’re running arbitrary code. There were some, like, good Samaritans that are kind of in this gray area, but they were purposefully triggering this exploit and instead of doing anything bad they were patching the exploit. So, there were a bunch of people kind of racing against attackers in those couple days spamming requests everywhere with those malicious user names to patch servers that were vulnerable. So that was a fun little example, but I think this is one where we’re going to see a long tail fallout.
Dan Lorenc 00:31:52 I don’t think there’s any chance at all that the entire world has patched every vulnerable instance to Log4Shell and that there are a bunch of kind of shadow IT or machines that people forgot about that are still running and holding up load-bearing systems. This exploit is so simple to do that it’s just going to sit there in an every attacker’s toolbox and as they try to laterally move inside organizations, they’re going to test everything they can find against Log4Shell, and I guarantee someone’s going to continue to find these probably for the next decade.
Robert Blumen 00:32:19 It’s not unusual you read about an attack where the company had a system that contained a bug for which a patch had been available for quite some time and for whatever reason they hadn’t applied it.
Dan Lorenc 00:32:34 Yeah, yeah. This is incredibly common. There’s a bunch of problems here that make this really hard to solve. It’s not as simple as why didn’t you fix it? We told you to. Shadow It is the big term thrown around a lot here. There’s a lot of infrastructure inside organizations that don’t show up on those spreadsheets and asset management databases. So, if you patch everything inside your company, it’s like the known unknowns kind of thing. You only patch the things you knew about. No CISO is going to sit in front of Congress and say that they patched everything; they’re going to say they patched everything they’re aware of. By definition, you can only patch the things about. And then at the same time, there are so many patches and so much software flying around that people do have to do triage.
Dan Lorenc 00:33:12 You can’t just patch everything and apply every patch that comes in. People need to make risk-based decisions here because the signal-to-noise ratio is so large. If you take a very up-to-date, very commonly used container image today that are used all over cloud, like docker images or something, and you run all these scanners against it, you’re going to find hundreds of vulnerabilities. Some have patches, some don’t. Most are marked as low or medium severity, and unless you read every single one to figure out the exact circumstances it can be triggered, you don’t know if you need to kind of stop what you’re doing and patch it. So for the most part people set thresholds and monitoring based on criticality numbers and scores and basically try to do the best they can with what they know about.
Robert Blumen 00:33:53 I want to move on to another one of these attacks that I promised to come back to: Solar Winds. What was that about?
Dan Lorenc 00:34:01 Sure, yeah, so the SolarWinds organization, it’s a company, they make a whole bunch of different pieces of software. One of them was this kind of network monitoring software. Software like that, it’s typically installed in very sensitive environments and monitors networks to look for attacks. So it’s kind of looking through lots of packets and seeing lots of sensitive information fly by as it does its job. What happened is the build server at SolarWinds was compromised through some kind of chain of traditional attacks, but an attacker got a footprint on the actual build server. This was the server where the source code was uploaded to, it ran some compilation step and signed and sent out the kind of executable at the end, and that’s how the code was delivered to end users. The attackers, instead of just compromising the SolarWinds organization, doing ransomware or stealing their data or something, instead had their little backdoor on the server, watched for the compiler to start, drop in some extra source code files, wait for the compiler to finish and then delete them at the end.
Dan Lorenc 00:34:55 So not really backdooring the compiler itself, but passing in some bad input right before it started. So it’s slightly different from the Ken Thompson example but pretty similar in effect. So if you looked it fetched the right source code, it ran the build and here’s the thing it got in the end just it also had this little malicious element inside of it. Then that software was uploaded, shipped to all the paying customers, they installed it and the code got to do whatever it wanted at that point. And this is one where it waited some kind of random number of days after installation, but a pretty long period of time to avoid any immediate detection and then would start sniffing, collecting data, and then uploading it to some endpoints. It was eventually caught because of that when it actually became active. They saw network traffic they didn’t expect, It’s a little hard to detect because this system was installed or updated weeks or days before, not immediately, right? If you update a new version and all of a sudden network traffic you don’t expect happens immediately, it’s pretty easy to pinpoint what happened. But by waiting a little bit, it makes it a little bit harder to pin down the root cause. The company figured out what happened, did a bunch of research, figured out exactly how the attack was carried out, tore down that build system, did a bunch of work to improve security there … but at that point, a lot of damage had been done to all of the users.
Robert Blumen 00:36:02 This example illustrates the point you made at the beginning about how everybody’s output is part of the supply chain, somebody else’s input. So although the original attack was on the vendor, that was used to inject the back door into the supply chain further downstream of their customers.
Dan Lorenc 00:36:24 Exactly. These attacks take a little bit more patience, you can’t quite be as targeted in them, but they have much broader ranging consequences, right? You can target one organization with a traditional attack; with a supply chain attack, you’re kind of left to who applies updates and who that organization’s customers are. But instead of one organization, you’re getting dozens, hundreds, thousands, however many folks use this software.
Robert Blumen 00:36:46 I think I read Alex Birsan — the “dependency confusion” researcher — when he put out some of these packages, he didn’t know which enterprises would be pulling his package. He only figured that out when he was able to exfiltrate from within those enterprises and see where his code ended up.
Dan Lorenc 00:37:07 Yeah, I think he, I’m trying to remember the original block quote. I think there might have been a few. Yeah I think it was a mix of guessing and then also there were some targeted ones where companies would just put their name to prefix the package or something like that to trigger it to go to the internal one. So I think it was a mix of semi-targeted versus just let’s upload stuff and see who downloads it.
Robert Blumen 00:37:25 Moving on then, another one of these attacks that came in through a development tool is known as Codecov. Are you familiar with that one?
Dan Lorenc 00:37:36 Yep. So Codecov is a product, and they also offer like a free version of it for open-source repositories to do code coverage analysis. So, when you run your tests it attempts to figure out what percentage of your code tests exercised. So generally the more the better and it’s very commonly used across open source. If you’re running a GitHub or something like that in the CI systems, you can just drop this plugin in and you get a neat little UI showing you your code coverage over time. They had an installer for this in CI systems that was just a batch script. Basically, installation instructions were download and run this batch script from a URL, and it was a similar case where an attacker kind of pivoted.
Dan Lorenc 00:38:20 They targeted Codecov, found — I think the root cause was they found a secret to an S3 bucket or something like that for Codecov — used that to look around what was in the bucket, saw that this install script was in there, realized that whatever was in this install script is what was getting downloaded and run by all of these CI jobs. They just inserted a couple lines to that script every time it was updated to grab all of the environment variables, grab whatever was on disk that it could find in the server and upload it to a URL. And this went undetected for a while. They would put it in, take it back out for a little while; the attacker would change it on again and off again over time, so it wasn’t always present. And anyone with CI systems using Codecov during this breach had to evaluate the impact of having all of their other secrets and data from that CI job, exfiltrated into some organization.
Dan Lorenc 00:39:01 So this was a supply chain attack that also attacked other supply chains, I guess. These are all other tools that are used. Some of the examples I found with the Codecov script right before and after the Codecov script in CI were secrets to sign and upload code to Maven Central for certain open-source projects. And these are the types of things that got exfiltrated during this attack. So it was one pivot from the organization to their users and then I’d be surprised if there weren’t other secrets stolen in this that are currently being held or have been used for further attacks down the supply chain.
Robert Blumen 00:39:34 Do you know any more about how that was detected? You said people noticed it was exfiltrating.
Dan Lorenc 00:39:41 I believe, I can’t say for sure, but I believe somebody just after months and months, some user actually just downloaded the script from the URL and read it and saw some weird code at the bottom and filed some bug saying hey what are these two lines doing? And that triggered the detection.
Robert Blumen 00:39:56 Another well-known incident was known as Icon Burst. Are you familiar with that one?
Dan Lorenc 00:40:01 Yeah, so I believe this was a compromised package on NPM that had some malicious code inserted inside of it. NPM is, like I said, the most widespread and largest repository by far. So most of the headlines you see about compromises like this do happen in NPM just because of the sheer numbers. But this type of thing happens in all of the other package managers and registries too. I don’t remember the root cause for that one, exactly how the package was compromised. There’s a much of different patterns we see, like in an individual developer gets compromised. We see people compromise their own packages over time. These kind of got called ransomware over the last couple of, or not ransomware, “protestware” over the last couple of years. We’ve seen that a few times, but there’s tons of different ways it can happen, and depending on how widely used these packages are, the impact varies a lot. Sometimes they’re caught before anybody uses them; sometimes they’re caught much later.
Robert Blumen 00:40:56 Just one more, this will be the last incident. It’s a little different in that it came in through a chat application. This one is called Iron Tiger. Do you have a background in that one?
Dan Lorenc 00:41:07 Yeah, so I think Iron Tiger was the group that was suspected for doing this — the code name for the APT or advanced persistent threat. Yeah, so this was a chat application, I think it was called Mimi, commonly used in China. And the chat application was for all sorts of different phones and desktop operating systems and everything. And some malware was inserted into one of the installers for Mimi at the distribution server. So very similar to the Codecov example, just instead of a development tool, this was a chat application. So it was built, uploaded to the server, and somebody had compromised that server. So it wasn’t the build server, it was the place that the packages were stored and downloaded from. Every time a new version got uploaded the attackers grabbed that, added some malware to it, and then put it back in this modified form. So anybody installing it and using that installer actually grabbed a compromised version rather than the intended version.
Robert Blumen 00:42:02 I want to wrap up here. In reviewing these different attacks, it’s hard for me to see much commonality other than that in some way they involve the supply chain, but I’m having trouble drawing any really top 10 lessons learned. What’s your perspective on that? Are there any real takeaways from this, or is this more just about doing all the things that people already know like patching and two-factor and protecting credentials and everything else?
Dan Lorenc 00:42:35 Yeah, I think there’s a lot of like low hanging fruit that folks already know, kind of brush your teeth, eat your vegetables style advice that people know they should have been doing, but kind of never really prioritized until now. That stuff you mentioned is good. Yeah, use two-factor auth to prevent phishing, patch your software, that kind of stuff. The other big really overlooked one and I think is just general build system security. Not to pick on Jenkins, it’s just the most commonly used one, but most organizations for the last decade have been fine with people just grabbing a couple old pieces of hardware, throwing Jenkins on them, sticking them in a closet somewhere and using that as their official build and deployment machine. You would never run production that way, right? You would never run your production servers on a couple servers that nobody looked at or patched or even really knew were there sitting in a closet.
Dan Lorenc 00:43:17 But for some reason people have been fine doing that for the build and deployment systems. Those are the gateway to production. Everything that goes into production comes through those systems. So it only makes sense that you should apply the same type of production hygiene and security and rules to those that you do to production. So I think that’s the big shift. Nothing crazy that has to happen there. Like we know what to do, just run your build systems like production systems and you’ll be immune to a lot of these attacks, but people just haven’t prioritized that work.
Robert Blumen 00:43:45 One other topic that came up in Software Engineering Radio 489 on package management is we got into a discussion about the recursive nature of package management where your package manager pulls in the packages that you asked for and then it cascades down to the packages that those packages asked for and so on and so on, more or less forever until you’ve pulled in hundreds or thousands of packages that if you looked at the fullest you might not even know what half of them do or why they’re there. And yet, we have to trust all that code. Is that an insolvable problem, or do we just have to trust that the internet is good? Are there ways to be a little more confident that we’re not pulling in all kinds of back doors when we run our package manager?
Dan Lorenc 00:45:16 But what those systems really provided was curation, right? You couldn’t grab any package. You only had the ones that the distribution maintainers agreed to provide and patch and maintain, which was a small set, but it was curated, it was maintained. They would provide fixes for it; you knew who you were getting it from, whether it was a company you had a contract with or a trusted group of maintainers that have worked together for 10 years and care about security. But when you run PIP install or NPM install, it’s not from anybody on the internet that’s signed up for that repository. The command looks the same, but the implications are completely different. There is no trust anymore. So, you’re getting all of the convenience, but none of the trust or guarantees.
Dan Lorenc 00:45:56 Then containers and other forms of higher-level infrastructure came, which are like meta package managers, and they grab all of these together and bundle them and you can do PIP installs and NPM installs and appget installs all in the same environment and zip that up. Another one called Helm is a package manager for containers. So, you’re getting a bunch of containers and a bunch of other Helm charts in kind of the Kubernetes world. You’re multiple layers deep at this point and it kind of explodes combinatorically. So, it’s one of those problems where it’s grown gradually over time. There hasn’t been one moment when it kind of got out of control, but now we’re looking back at it and there’s tens of thousands of things from random people on the internet getting run, used for a hello world application.
Dan Lorenc 00:46:35 I like the way you framed it. Like, do we just have to trust that the internet is good? Anybody that’s spent time on the internet knows that’s not a good strategy. Just trusting that everyone is nice on the internet, that’s not going to work forever. I think there’s a couple things we just have to do. We have to get more aware of what’s getting pulled in. A lot of that’s effort from the US government in the executive order from last year around this; it’s focused-on transparency. So, Software Bill of Materials are now a thing. You can’t just distribute software tens of thousands of things inside without telling anyone or without knowing what’s in there. Organizations are required to provide that Bill of Materials so people can at least see what’s inside of it and decide if they trust it. With that, I think is going to come panic when people realize exactly how much is in there. People will have to start getting more rigorous about it. You can’t grab thousands of things for a small application. People are going to push back and you’re going to pay more attention to the trustworthiness of the code that you’re using. But it’s going to be gradual.
Robert Blumen 00:47:23 Dan, what does your company do?
Dan Lorenc 00:47:25 Sure. My company is, the name is Chainguard. We have a bunch of open-source tools and products to help developers solve all of these supply chain security problems easily. Great jumping off point, a lot of this is really just about awareness and knowing what is going into your code. And it turns out that is actually a great benefit for developers, and that’s not something that makes your life harder. It actually makes life easier if everything is done correctly. All the complicated bookkeeping about dependencies and which versions and whether up to date applies to your code too. And if you have a really good understanding of what’s running where, you can get a more productive development cycle rather than getting in people’s way. So that’s what we’re trying to solve.
Robert Blumen 00:48:03 Dan, where can people find you if they would like to reach out or follow what you do?
Dan Lorenc 00:48:09 Sure. My company’s URL is chainguard.dev, and you can find me on Twitter @Lorenc_Dan
Robert Blumen 00:48:17 Dan, it’s been a fascinating discussion. Thank you so much for speaking to Software Engineering Radio.
Dan Lorenc 00:48:23 Yeah, thank you for having me.
Robert Blumen 00:48:25 For Software Engineering Radio, this has been Robert Blumen and thank you for listening. [End of Audio]