The article seems to assume that containers appeared to solve the software distribution problem and then somehow got repurposed into virtualization, isolation and management of production services. I think this view is very far from truth.
The virtualization/isolation aspect came first, the SWSoft Virtuozzo was doing that quite well in early 2000s. They even had [some] IO isolation which I think took around a decade to support elsewhere. Then gradually pieces of Virtuozzo/OpenVZ reached the mainline in a form of cgroups/LXC and the whole thing slowly brewed for a while until the Docker added the two missing pieces: the fast image rebuilds and the out-of-the-box user experience.
Docker of course was the revolution, but by then sufficiently advanced companies have been already using containers for isolation for a full decade.
Containers (meaning Docker) happened because CGroups and namespaces were arcane and required lots of specialized knowledge to create what most of us can intuitively understand as a "sandbox".
Cgroups and namespaces were added to Linux in an attempt to add security to a design (UNIX) which has a fundamentally poor approach to security (shared global namespace, users, etc.).
It's really not going all that well, and I hope something like SEL4 can replace Linux for cloud server workloads eventually.
Most applications use almost none of the Linux kernel's features. We could have very secure, high performance web servers, which get capabilities to the network stack as initial arguments, and don't have access to anything more.
Drivers for virtual devices are simple, we don't need Linux's vast driver support for cloud VMs. We essentially need a virtual ethernet device driver for SEL4, a network stack that runs on SEL4, and a simple init process that loads the network stack with capabilities for the network device, and loads the application with a capability to the network stack.
Make building an image for that as easy as compiling a binary, and you could eliminate maybe 10s of millions of lines of complexity from the deployment of most server applications. No Linux, no docker.
Because SEL4 is actually well designed, you can run a sub kernel as a process on SEL4 relatively easily. Tada, now you can get rid of K8s too.
Containers and namespaces are not about security. They are about not having singleton objects at the OS level. Would have called it virtualization if the word wasn't so overloaded already. There is a big difference that somehow everyone misses. A bypassable security mechanism is worse than useless. A bypassable virtualization mechanism is useful. It is useful to be able to have a separate root filesystem just for this program - even if a malicious program is still able to detect it's not the true root.
As about SEL4 - it is so elegant because it leaves all the difficult problems to the upper layer (coincidentally making them much more difficult).
> Containers and namespaces are not about security
True. Yet containers, or more precisely the immutable images endemic to container systems, directly address the hardest part of application security: the supply chain. Between the low effort and risk entailed when revising images to address endlessly emerging vulnerabilities, and enabling systematized auditing of immutable images, container images provide invaluable tools for security processes.
I know about Nix and other such approaches. I also know these are more fragile than the deeply self-contained nature of containers and their images. That's why containers and their image paradigm have won, despite all the well-meaning and admirable alternatives.
> A bypassable security mechanism is worse than useless
Also true. Yet this is orthogonal to the issues of supply chain management. If tomorrow, all the problems of escapable containers were somehow solved, whether by virtual machines on flawless hypervisors, or formally verified microkernels, or any other conceivable isolation mechanism, one would still need some means to manage the "content" of disparate applications, and container systems and the image paradigm would still be applicable.
> As about SEL4 - it is so elegant because it leaves all the difficult problems to the upper layer (coincidentally making them much more difficult).
I completely buy this as an explanation for why SEL4 for user environments hasn't (and probably will never) take off.
But there's just not that much to do to connect a server application to the network, where it can access all of its resources.
I think a better explanation for the lack of server side adoption is poor marketing, lack of good documentation, and no company selling support for it as a best practice.
> But there's just not that much to do to connect a server application to the network, where it can access all of its resources.
If you only care to run stateless stuff that never write anything (or at least never read what they wrote) - it's comparatively easy. Still gotta deal with the thousand drivers - even on the server there are a lot of quirky stuff. But then you gotta run the database somewhere. And once you run a database you get all the problems Linus warned about. So you gotta run the database on a separate Linux box (at that point - what do you win vs. using Linux for everything?) or develop a new database tailored for SeL4 (and that's quite a bit more complex than an OS kernel). An elegant solution that only solves a narrow set of cases stands no chance over a crude solution that solves every case.
Also, with the current sexy containerized stacks it's easy to forget, but having same kind of environment on the programmer's workbench and on the sever was once Unix's main selling point. It's kinda expensive to support a separate abstraction stack for a single purpose.
The lack of adoption is because it’s not a complete operating system.
Using sel4 on a server requires complex software development to produce an operating environment in which you can actually do anything.
I’m not speaking ill of sel4; I’m a huge fan, and things like it’s take-grant capability model are extremely interesting and valuable contributions.
It’s just not a usable standalone operating system. It’s a tool kit for purpose-built appliances, or something that you could, with an enormous amount of effort, build a complete operating system on top of.
Yes. I really hope someone builds a nice, usable OS with SeL4 as a base. If SeL4 is like the linux kernel, we need a userland (GNU). And a distribution that's simple to install and make use of.
It needs device drivers for modern x86 hardware. And filesystems, and a TCP stack. All of that code can be done in "SeL4 userland", but yeah - I see your point.
Are there any projects like that going on? It feels like an obvious thing.
A lot of deployments essentially virtualize Linux or run portions of NetBSD (e.g. via their "rump" kernel mechanism) to achieve driver support, file systems, etc. That's not really a general-purpose solution, though.
There is work within major consumer product companies building such things (either with sel4, or things based on sel4's ideas), and there's Genode on seL4.
> Containers and namespaces are not about security
An escape from properly configured container/namespaces is a kernel 0day. Or a 0day in whatever protocol the isolated workload talks to the outside with.
Is that why containers started? I seem to recall them taking off because of dependency hell, back in the weird time when easy virtualization wasn't insanely available to everyone.
Trying to get the versions of software you needed to use all running on the same server was an exercise in fiddling.
I think there were multiple reasons why containers started to gain traction. If you ask 3 people why they started using containers, you're likely to get 4 answers.
For me, it was avoiding dependencies and making it easier to deploy programs (not services) to different servers w/o needing to install dependencies.
I seem to remember a meetup in SF around 2013 where Docker (was it still dotCloud back then?) was describing a primary use-case was easier deployment of services.
I'm sure for someone else, it was deployment/coordination of related services.
The big selling points for me were what you said about simplifying deployments, but also the fact that a container uses significantly less resource overhead than a full blown virtual machine. Containers really only work if your code works in user space and doesn't need anything super low level (eg TCP network stack), but as long as you stay in user space it's amazing.
The main initial drive for me was that it let me separately run many things without a) trying to manage separate dependency sets, and b) Sharing RAM - Without having to physically allocate large amounts of memory to virtual machines; on an 8GB machine at a couple per VM that doesn’t let you get far.
"making it easier to deploy" is a rather... clinical description for fixing the "but it works on my machine!" issue. We could go into detail on how it solved that, but imo it comes down to that.
This matches my recollection. Easily repeatable development and test environments that would save developers headaches with reproduction. That then lead logically to replacement of Ansible etc for the server side with the same methodology.
There were many use cases that rapidly emerged, but this eclipsed the rest.
Docker Hub then made it incredibly easy to find and distribute base images.
iirc full virtualization was expensive ( vmware ) and paravirtualization was pretty heavyweight and slow ( Xen ). I think Docker was like a user friendlier cgroups and everyone loved it. I can't remember the name but there was a "web hosting company in a box" software that relied heavily on LXC and probably was some inspiration for containerization too.
edit: came back in to add reference to LXC, it's been probably 2 decades since i've thought about that.
On a personal level, that's why I started using them for self hosting. At work, I think the simplicity of scaling from a pool of resources is a huge improvement over having to provision a new device. Currently at an on-prem team and even moving to kubernetes without going to cloud would solve some of the more painful operational problems that send us pages or we have to meet with our prod support team about.
Yes, totally agree that's a contributor too. I should expand that by namespaces I mean user, network, and mount table namespaces. The initial contents of those is something you would have to provide when creating the sandbox. Most of it is small enough to be shipped around in a JSON file, but the initial contents of a mount table require filesystem images to be useful.
This makes sense if you look at containers as simply a means to an end of setting up a sandbox, but not really much sense at all if you think of containers as a way to make it easy to get an arbitrary application up and running on an arbitrary server without altering host system dependencies.
I suspect that containers would have taken off even without isolation. I think the important innovation of Docker was the image. It let people deploy consistent version of their software or download outside software.
All of the hassle of installing things was in the Dockerfile, and it was run in containers so more reliable.
I honestly think that Dockerfile was the biggest driver. Containers as a technology are useful, for the many reasons outlined in this thread. But what Dockerfiles achieved was to make the technology accessible to much wider and much less technically deep audience. The syntax is easy to follow, the vocabulary available for the DSL is limited, and the results are immediately usable.
Oh, and the layer caching made iterative development with _very_ rapid cycles possible. That lowered the bar for entry and raised the floor for everyone to get going easier.
But back to Dockerfiles. The configuration language used made it possible for anyone[tm] to build a container image, to ship a container image and to run the container. Fire-and-forget style. (Operating the things in practice and at any scale was left as an exercise for the reader.)
And because Anyone[tm] could do it, pretty much anyone did. For good and ill alike.
- most languages don't really do static linking in the same way as C
- things like "a network port" can also be a dependency, but can't be "linked". And so on for all sorts of software that expects particular files to be in particular places, or requires deploying multiple communicating executables
- some dependencies really do not like being statically linked (this includes the GNU standard library!), for things like nsswitch
I can't tell if this is a genuine question or not but if it is.. deploying a Ruby on Rails app with a pile of gems that have c deps isn't fixed with static linking. This is true for python and node and probably other things I'm not thinking of.
Surprisingly "my software" depends on a lot of other stuff. Python, Ruby, PHP, JS, etc all need tens to hundreds of native libraries that have to be deployed.
I agree: I think the container image is what matters. As it turns out, getting more (or less) isolation given that image format is not a very hard problem.
Agreed. There was a point where I thought AMIs would become the unit of open source deployment packaging, and I think docker filled that niche in a cloud-agnostic way
> Cgroups and namespaces were added to Linux in an attempt to add security to a design (UNIX) which has a fundamentally poor approach to security (shared global namespace, users, etc.)
Namespacing of all resources (no restriction to a shared global namespace) was actually taken directly from plan9. It does enable better security but it's about more than that; it also sets up a principled foundation for distributed compute. You can see this in how containerization enables the low-level layers of something like k8s - setting aside for the sake of argument the whole higher-level adaptive deployment and management that it's actually most well-known for.
> Drivers for virtual devices are simple, we don't need Linux's vast driver support for cloud VMs. We essentially need a virtual ethernet device driver for SEL4, a network stack that runs on SEL4, and a simple init process that loads the network stack with capabilities for the network device, and loads the application with a capability to the network stack. Make building an image for that as easy as compiling a binary, and you could eliminate maybe 10s of millions of lines of complexity from the deployment of most server applications. No Linux, no docker.
Wasn't this what unikernels were attempting a decade ago? I always thought they were neat but they never really took off.
I would totally be onboard with moving to seL4 for most cloud applications. I think Linux would be nearly impossible to get into a formally-verified state like seL4, and as you said most cloud stuff doesn't need most of the features of Linux.
It would be great if we got "kernel independent" Nvidia drivers. I have some experience with bare-metal development and it really seems like most of what an operating system provides could be provided in a much better way as a set of libraries that make specific pieces of hardware work, plus a very good "build" system.
> which has a fundamentally poor approach to security
Unix was not designed to be convenient for VPS providers. It was designed to allow a single computer to serve an entire floor of a single company. The security approach is appropriate for the deployment strategy.
As it did with all OSes, the Internet showed up, and promptly ruined everything.
The K8s master is just a scheduling application. It can run anywhere, and doesn't depend on much (just etcd). The kublet (which runs on each node) is what manages the local resources. It has a plugin architecture, and when you include one of each necessary plugin, it gets very complicated. There are plugins for networking, containerization, storage.
If you are already running SEL4 and you want to spawn an application that is totally isolated, or even an entire sub-kernel it's not different than spawning a process on UNIX. There is no need for the containerization plugins on SEL4.
Additionally the isolation for the storage and networking plugins would be much better on SEL4, and wouldn't even really require additional specialized code. A reasonable init system would be all you need to wire up isolated components that provide storage and networking.
Kubernetes is seen as this complicated and impressive piece of software, but it's only impressive given the complexity of the APIs it is built on. Providing K8s functionality on top of SEL4 would be trivial in comparison.
I understand what you're saying, and I'm a fan of SEL4. But isolation isn't one of the primary points of k8s.
Containerization is after all, as you mentioned, a plugin. As is network behavior. These are things that k8s doesn't have a strong opinion on beyond compliance with the required interface. You can switch container plugin and barely notice the difference. The job of k8s is to have control loops that manage fleets of resources.
That's why containers are called "containers". They're for shipping services around like containers on boats. Isolation, especially security isolation, isn't (or at least wasn't originally) the main idea.
You manage a fleet of machines and a fleet of apps. k8s is what orchestrates that. SEL4 is a microkernel -- it runs on a single machine. From the point of view of k8s, a single machine is disposable. From the point of view of SEL4, the machine is its whole world.
So while I see your point that SEL4 could be used on k8s nodes, it performs a very different function than k8s.
> Kubernetes is seen as this complicated and impressive piece of software, but it's only impressive given the complexity of the APIs it is built on.
There are other reasons it's impressive. Its API and core design is incredibly well-designed and general, something many other projects could and should learn from.
But the fact that it's impressive because of the complexity of the APIs it's built on is certainly a big part of its value. It means you can use a common declarative definition to define and deploy entire distributed systems, across large clusters, handling everything from ingress via load balancers to scaling and dynamic provisioning at the node level. It's essentially a high-level abstraction for entire data centers.
seL4 overlaps with that in a pretty minimal way. Would it be better as underlying infrastructure than the Linux kernel? Perhaps, but "providing K8s functionality on top of SEL4" would require reimplementing much of what Linux and various systems on top of it currently provide. Hardly "trivial in comparison".
cgroups first came from resource management frameworks that IIRC came out of IBM and got into some distro kernels for a time but not upstream.
Namespaces were not an attempt to add security, but just grew out of work to make interfaces more flexible, like bind mounts. And Unix security is fundamentally good, not having namespaces isn't much of a point against it in the first place, but now it does have them.
And it's going pretty well indeed. All applications use many kernel features, and we do have very secure high performance web and other servers.
L4 systems have been around for as long as Linux, and SEL4 in particular for 2 decades. They haven't moved the needle much so I'd say it's not really going all that well for them so far. SEL4 is a great project that has done some important things don't get me wrong, but it doesn't seem to be a unix replacement poised for a coup.
I kid, but seriously, good how? Because it ensures cybersecurity engineers will always have a job?
seL4 is not the final answer, but something close to it absolutely will be. Capability-based security is an irreducible concept at a mathematical level, meaning you can’t do better than it, at best you can match it, and its certainly not matched by anything else we’ve discovered in this space.
I don't think Docker came about due to cgroups and namespaces being arcane, LXC was already abstracting that away.
Docker's claim to fame was connecting that existing stuff with layered filesystem images and packaging based off that. Docker even started off using LXC to cover those container runtime parts.
You say applications and web servers kind of interchangeably. I don't know anything about SEL4. What if your application needs to spawn and manage executables as child processes? Is it Linux-like enough to run those and handle stuff like that so that those of us coding at the application layer don't need to worry about it too much?
I think the whole thing has been levels of abstraction around a runtime environment.
in the beginning we had the filesystem. We had /usr/bin, /usr/local/bin, etc.
then chroot where we could run an environment
then your chgroups/namespaces
then docker build and docker run
then swarm/k8s/etc
I think there was a parallel evolution around administration, like configure/make, then apt/yum/pacman, then ansible/puppet/chef and then finally dockerfile/yaml
> Containers (meaning Docker) happened because CGroups and namespaces were arcane and required lots of specialized knowledge to create what most of us can intuitively understand as a "sandbox".
That might be why Docker was originally implemented, but why it "happened" is because everyone wanted to deploy Python and pre-uv Python package management sucks so bad that Docker was the least bad way to do that. Even pre-kubernetes, most people using Docker weren't using it for sandboxing, they were using it as fat jars for Python.
Not only python, although python is particularly bad.
Even java things wher fatjars exist you at some point end up with os level dependencies like "and this logging thing needs to be set up, and these dirs need these rights, and this user needs to be in place" etc. Nowadays you can shove that into a container
> namespaces were added to Linux in an attempt to add security to a design (UNIX) which has a fundamentally poor approach to security (shared global namespace, users, etc.)
If the "fundamentally poor approach to security" is a shared global namespace, why are namespaces not just a fix that means the fundamental approach to security is no longer poor?
My headcanon is that Docker exists because Python packaging and dependency management was so bad that dotCloud had no choice but to invent some porcelain on top of Linux containers, just to provide a pleasant experience for deploying Python apps.
That's basically correct. But the more general problem is that engineers simply lost the ability to succinctly package applications and their dependencies into simple to distribute and run packages. Somehow around the same time Java made .jar files mainstream (just zip all the crap with a manifest), the rest of the world completely forgot how to do the equivalent of statically linking in libraries and that we're all running highly scheduled multithreaded operating systems now.
The "solution" for a long time was to spin up single application Virtual Machines, which was a heavy way to solve it and reduced the overall system resources available to the application making them stupidly inefficient solutions. The modern cloud was invented during this phase, which is why one of the base primitives of all current cloud systems is the VM.
Containers both "solved" the dependency distribution problem as well as the resource allocation problem sort of at once.
which is great when you realize that not all software is updated at the same time.
how managing multiple java runtime versions is supposed to work is still beyond me... it's a different tool at every company, and the instructions never seem to work
I would argue that the traditional way to install applications (particularly servers) on UNIX wasn’t very compatible with the needs that arose in the 2000s.
The traditional way tends to assume that there will be only one version of something installed on a system. It also assumes that when installing a package you distribute binaries, config files, data files, libraries and whatnot across lots and lots of system directories. I grew up on traditional UNIX. I’ve spent 35+ years using perhaps 15-20 different flavors of UNIX, including some really, really obscure variants. For what I did up until around 2000, this was good enough. I liked learning about new variants. And more importantly: it was familiar to me.
It was around that time I started writing software for huge collections of servers sitting in data centers on a different continent. Out of necessity I had to make my software more robust and easier to manage. It had to coexist with lots of other stuff I had no control over.
It would have to be statically linked, everything I needed had to be in one place so you could easily install and uninstall. (Eventually in all-in-one JAR files when I started writing software in Java). And I couldn’t make too many assumptions about the environment my software was running in.
UNIX could have done with a re-thinking of how you deal with software, but that never happened. I think an important reason for this is that when you ask people to re-imagine something, it becomes more complex. We just can’t help ourselves.
Look at how we reimagined managing services with systemd. Yes, now that it has matured a bit and people are getting used to it, it isn’t terrible. But it also isn’t good. No part of it is simple. No part of it is elegant. Even the command line tools are awkward. Even the naming of the command line tools fail the most basic litmus test (long prefixes that require too many keystrokes to tab-complete says a lot about how people think about usability - or don’t).
Again, systemd isn’t bad. But it certainly isn’t great.
As for blaming Python, well, blame the people who write software for _distribution_ in Python. Python isn’t a language that lends itself to writing software for distribution and the Python community isn’t the kind of community that will fix it.
Point out that it is problematic and you will be pointed to whatever mitigation that is popular at the time (to quote Queen “I've fallen in love for the first time. And this time I know it's for real”), and people will get upset with you, downvote you and call you names.
I’m too old to spend time on this so for me it is much easier to just ban Python from my projects. I’ve tried many times, I’ve been patient, and it always ends up biting me in the ass. Something more substantial has to happen before I’ll waste another minute on it.
Sure they definitely were using Docker for their own applications, but also dotCloud was itself a PaaS, so they were trying to compete with Heroku and similar offerings, which had buildpacks.
The problem is/was that buildpacks aren't as flexible and only work if the buildpack exists for your language/runtime/stack.
Exactly this, but not just Python. The traditional way most Linux apps work is that they are splayed over your filesystem with hard coded references to absolute paths and they expect you to provide all of their dependencies for them.
Basically the Linux world was actively designed to apps difficult to distribute.
It wasn't about making apps difficult to distribute at all, that's a later side effect. Originally distros were built around making a coherent unified system of package management that made it easier to manage a system due to everything being built on the same base. Back then Linux users were sysadmins and/or C programmers managing (very few) code dependencies via tarballs. With some CPAN around too.
For a sysadmin, distros like Debian were an innovative godsend for installing and patching stuff. Especially compared to the hell that was Windows server sysadmin back in the 90s.
The developer oriented language ecosystem dependency explosion was a more recent thing. When the core distros started, apps were distributed as tarballs of source code. The distros were the next step in distribution - hence the name.
Right but those things are not unrelated. Back in the day if you suggested to the average FOSS developer that maybe it should just be possible to download a zip of binaries, unzip it anywhere and run it with no extra effort (like on Windows), they would say that that is actively bad.
You should be installing it from a distro package!!
What about security updates of dependencies??
And so on. Docker basically overrules these impractical ideas.
It was more like, library writers forgot how to provide stable APIs for their software, and applications decided they just wanted to bundle all the dependencies they needed together and damn the consequences on the rest of the system. Hence we got static linked binaries and then containers.
> Basically the Linux world was actively designed to apps difficult to distribute.
It has "too many experts", meaning that everyone has too much decision making power to force their own tiny variations into existing tools. So you end up needing 5+ different Python versions spread all over the file system just to run basic programs.
NodeJS, Ruby, etc also have this problem, as does Go with CGO. So the problem is the binary dependencies with C/C++ code and make, configure, autotools, etc... The whole C/C++ compilation story is such a mess that almost 5 decades ago inventing containers was pretty much the only sane way of tackling it.
Java at least uses binary dependencies very rarely, and they usually have the decency of bundling the compiled dependencies... But it seems Java and Go just saw the writing on the wall and mostly just reimplement everything. I did have problems with the Snappy compression in the Kafka libraries, though, for instance
.
The issue is with cross platform package management without proper hooks for the platform themselves. That may be ok if the library is pure, but as soon as you have bindings to another ecosystem (C/C++ in most cases), then it should be user/configurable instead of the provider doing the configuration with post installs scripts and other hacky stuff.
If you look at most projects in the C world, they only provide the list of dependencies and some build config Makefile/Meson/Cmake/... But the latter is more of a sample and if your platform is not common or differs from the developer, you have the option to modify it (which is what most distros and port systems do).
But good luck doing that with the sprawling tree of modern packages managers. Where there's multiple copies of the same libraries inside the same project just because.
I don't agree with this. Java systems were one of the earliest beneficiaries of container-based systems, which essentially obsoleted those ridiculously over-complicated, and language-specific, application servers that you mentioned.
Java users largely didn't bother with containers IME, largely for the same reasons that most Java users didn't bother with application servers. Those who did want that functionality already had it available, making the move from an existing Java application server to Docker-style containers a minor upgrade at best.
Tomcat and Jetty are application servers which are in almost every Spring application. There are such application servers which you mentioned, like Wildfly, but they are not obsolete as a whole.
Pyinstaller predates Docker. It's not about any individual language not being able to do packaging, it's about having a uniform interface for running applications in any language/architecture. That's why platforms like K8s don't have to know a thing about Python or anything else and they automatically support any future languages too.
My take: containers forced devepopers to declare various aspects of the application in a standardized, opinioated way:
- Persistant state? Must declare a volume.
- IO with external services? Must declare the ports (and maybe addresses).
- Configurable parameters? Must declare some env variables.
- Trasitive dependecies? Must declare them, but using a mechanism of your choosing (e.g. via the package manager of your base image distro).
Separation of state (as in persistency) and application (as in binaries, assets) makes updates easy. Backups also.
Having most all IO visible and explicit simplifies operation and integration.
And a single, (too?!?) simple config mechanism increases reusability, by enabling e.g. lightweight tailoring of generic application service containers (such as mariadb).
Together this bunch of forced, yet leaky abstractions is just good enough to foster immense reuse & composability on to a plethora of applications, all while allowing to treat them almost entirely like blackboxes. IMHO that is why OCI containers became this big, compared to other virtualization and (application-) cuntainer technologies.
Containers happened because nobody can be bothered to build an entire application into a single distributable executable anymore - heck even the tooling barely exists anymore. But instead of solving problems like dependency management and linking, today's engineers simply build piles of abstraction into the problem space until the thing you want to do more than anything (i.e. execute an application) becomes a single call.
Of course you now need to build and maintain those abstract towers, so more jobs for everybody!
Put another way: stuff like Electron makes a pretty good case for the "cheap hardware leads to shitty software quality/distribution mechanisms" claim. But does Docker? Containers aren't generally any more expensive in hardware other than disk-space to run than any other app. And disk space was always (at least since the advent of the discrete HDD) one of the cheapest parts of a computer to scale up.
If you go back to the Sun days, you literally could not afford enough servers to run one app per server so instead you'd hire sysadmins to figure out how to run Sendmail and Oracle and whatever on one server without conflicting. Then x86/Linux 1Us came out and people started just running one app per server ("server sprawl") which was easy because there was nothing to conflict. This later became VM sprawl and containers were an optimization on that.
We had to have multiple apps per server before, and now we have containers which offer a convenient way to have multiple apps per server? That seems like the same thing. Could you explain more re: what you meant?
What blew my mind and convinced me to only use immutable distros is the immutability of it.
For instance I could create my own login screen for an web service without having to worry about the package manager overriding my code, because I inject it into the container, which is already updated.
I can also forcefully reroute much easier ports or network connections the way I want it.
The author suggests that Docker doesnt help development and that devs just spin up databases, but I have to disagree with that and Im pretty sure i am not the only one.
All my projects (primarily web apps) are using docker compose which configures multiple containers (php/python/node runtime, nginx server, database, scheduler, etc) and run as a dev environment on my machine. The source code is mounted as a volume. This same compose file is then also used for the deployment to the production server (with minor changes that remove debug settings for example).
This approach has worked well for me as a solo dev creating web apps for my clients.
It has also enabled extreme flexibility in the stacks that I use, I can switch dev environments easily and quickly.
I agree with you 100%, though arguably what you could be describing is how docker changed your deployment workflow, not your development workflow (although with devcontainers that line is blurry, as you say).
I guess it's worth keeping in mind that Justin only quit Docker a few months ago, and his long tenure as CTO there will have (obviously) informed the majority of the opinions in the article. I think the deployment over development spin and some of the other takes there more closely reflect the conversations he had with large corp paying customers at the exec level than the workflows of solo devs that switch dev environments much more frequently than most etc.
> Application composition from open source components became the dominant way of constructing applications over the last decade.
I'm just as interested in why this ^ happened. I imagine it's pretty unique to software? I don't hear of car companies publishing component designs free for competitors to use, or pharmaceuticals freely waiving the IP in their patents or processes. Certainly not as "the dominant way of" doing business.
I wonder if LLM coding assistants had come about earlier, whether this would have been as prevalent. Companies might have been more inclined to create more of their own tooling from scratch since LLMs make it cheap (in theory). Individuals might have been less inclined to work on open source as hobbies because LLMs make it less personal. Companies might be less inclined to adopt open-source LLM-managed libraries because it's too chaotic.
I think open source software took off because it’s more standalone than the other things you listed and this makes the rewards much higher.
If I write some code, it needs a computer and environment to run. If I’m writing for what’s popular, that’s pretty much a given. In short, for code the design is the product.
If I design a pharmaceutical, someone still has to make it. Same for car parts. This effort is actually greater than the effort of design. If you include regulation, it’s way higher.
So, this great feedback loop of creation-impact-collaboration never forms. The loop would be too big and involve too much other stuff.
The closest thing isn’t actually manufacturing, it’s more like writing and music. People have been reusing each other’s stuff forever in those spaces.
Linux CGroups specifically were started at Google because their cluster management system Borg (or maybe it was still Babysitter at the time) needed a way to do resource tracking and admission control. Here's a comment by one of original devs: https://news.ycombinator.com/item?id=25017753
Containers happened because running an ad network and search engine means serving a lot of traffic for as little cost as possible, and part of keeping the cost down is bin packing workloads onto homogeneous hardware as efficiently as possible.
What does the 'ad network and search engine' have to do with it? Wouldn't any organization who serves lots of traffic have the same cost cutting goals you mentioned?
Yes, to expand: Both search and ads mean serving immense amounts of traffic and users while earning tiny amounts of revenue per unit of each. The dominant mid-90s model of buying racks of Sun and NetApp gear, writing big checks to Oracle, etc, would have been too expensive for Google. Instead they made a big investment in Linux running on large quantities of commodity x86 PC hardware, and building software on top of that to get the most out of it. That means things like combining workloads with different profiles onto the same servers, and cgroups kind of falls out of that.
Other companies like Yahoo, Whatsapp, Netflix also followed interesting patterns of using strong understanding of how to be efficient on cheap hardware. Notably those three all were FreeBSD users at least in their early days.
> I was always surprised someone didn't invent a tool for ftping to your container and updating the PHP
We thought of it, and were thankful that it was not obvious to our bosses, because lord forbid they would make it standard process and we would be right back where we started, with long lived images and filesystem changes, and hacks, and managing containers like pets.
This and the comments may miss the forest for the trees.
Enterprise software vendors sold libraries and then "application servers", essentially promising infrastructure (typically tied to databases).
Enterprise software developers -- Google in particular -- got tired of depending on others' licensed infrastructure. This birthed Spring and Docker, splitting the market.
(Fun aside: when is a container a vm? When it runs via Apple containerization.)
When you launch a container (either through docker you manually through namespaces) you are effectively representing yourself to the kernel as a separate thing. This allows you to construct a completely separate environment when interacting with the kernel where none of your concerns are going to leak out and nothing you don't care for is going to leak in.
When people say that static executables would solve the problem they are wrong, a static executable just means that you can eschew constructing a separate file-system inside your container - and you will probably need to populate some locations anyway.
Properly configured containers are actually supposed to be secure sandboxes, such that any violation is a kernel exploit. However the Linux kernel attack surface is very large so no one serious who offers multi-tenant hosting can afford to rely on containers for isolation. They have to assume that a container escape 0day can be sourced. It may be more accurate to say that a general kernel 0day can be sourced since the entire kernel surface area is open for anyone to poke. seccomp can mitigate the surface area but also narrow down the usefulness.
not .. really. Linux kernel has no concept of a container, you have to be super careful to avoid "mixing" host stuff in. I'm yet to see an case where "leaking in" would be prevented by default. Docker "leaks in" as much as you want. Containers also do not nest gracefully (due to, e.g., uids), so cannot be used as a software component. It's mostly a linux system admin thing right now.
it happened because the story of dependencies (system & application) was terrible. the ability to run the app on different distribution/kernel/compiler/etc was hard. there were different solutions like vagrant, but they were heavy and the DX wasn't there
I love this sentence about DevOps "Somehow it seems easier for people to relate to technology than culture, and the technology started working against the culture."
Containers happened because nobody knew what the hell they were doing and still have no clue what the hell they are doing. Software by the deranged for the deranged.
Ironically, also because Go is one of few popular languages for web applications that can produce a single executable binary and does not require a container to deploy with ease.
I think there's a pretty big citation needed on that part of the article. I'm not clear that Docker contributed to that anywhere near as much as a general increase of momentum around Go as it became better known in the industry.
containers happened because the original execution isolation environment(the process) was considered a lost cause, Processes shared too much with each other so additional isolation primitives had to be added, but they had to be sort of tacked on to the side because more important than security or correctness is backwards compatibility. so now containers are considered a different thing than processes when really they are process with these additional isolation primitives enabled.
In the early 2000s (yes, long after the original jails), containers were pitched as an alternative to VMware's VMs. They lost out for a variety of reasons--but mostly because as purely a different (and somewhat lighter-weight) encapsulation technique they weren't that interesting.
For me the main reason to use containers is "one-line install any linux distro userspace". So much simpler than installing a dozen VirtualBox boxes to test $APP on various versions of ubuntu, debian, nixos, arch, fedora, suse, centos etc.
The article is just wrong. Before Docker, there was OpenVZ and Virtuozzo. They were used to provide cheaper "dedicatd machine" hosting back around 2005.
Then the technology from OpenVZ slowly made its way into the mainline Linux, in the form of cgroups and namespaces. LWN called it a "container puzzle", with tens of moving pieces. And it was largely finished by early 2010-s.
I built my own container system in 2012 that used cgroups to oversubscribe the RAM, with simple chroot-based file namespaces for isolation. We even used XFS projects (raise your hand if you know what this is!) for the disk quota management. I remember that I had to use systemtap to patch the kernel to be able to find out which process died as a result of the OOM killer, there were no standard ways to do that.
We sold it as a part of our biotech startup to Illumina. Then we sold it again to Amazon as a part of another startup :)
The genius of Docker was the layered overlayfs-based image building. This one simple innovation made it possible to build images in a constructive way, without having to waste half an hour for each minor change. I was floored with its simplicity and power when I first saw it.
in some situations, the cloud is a no-go and the lead time to install new hardware is quite lengthy. in those case vm's or containers allow for rapid change at the software/OS level while the meat space still moves like a glacier.
My guess Linux started getting requests rom various orgs for a while, so in true Linux fashion, we got a a few different container type methods years later.
I still think Jails are the best of the bunch, but they can be a bit hard to setup. Once setup, Jails works great.
Why it happened is not nearly as important as what it unveiled: that versioned immutable systems are the most powerful system design concept in history. Most people have not yet grasped what an insanely powerful concept it is. At some point in the future, maybe 50-100 years from now, someone will look back and say "holy shit; deploying web apps are where this concept came from?" I hope in my lifetime that people get it and start applying it to other system designs/components.
I loved the assertion that AI ate up all the budget and that K8s is now "boring" technology. That's fine because it was getting pretty annoying with all the clone competitors for practically everything that were popping up every month!
Do you use K8s? No! That's old! I use Thrumba! It's just a clone of K8s by some startup because people figured out that the easiest way to make gobs of money is/was to build platform products and then get people to use them.
Fascinating documentary on Kubernetes for those who have 50 minutes. Gives more background to the "Container Wars". The filmmakers also have documentaries on the history of Python, Argo, etc.
Some highlights:
- How far behind Kubernetes was at the time of launch. Docker Swarm was significantly more simple to use, and Apache Mesos scheduler could already handle 10,000 nodes (and was being used by Netflix).
- RedHat's early contributions were key, despite having the semi-competing project of OpenShift.
- The decision to Open Source K8S came down to one meeting brief meeting at Google. Many of the senior engineers attended remotely from Seattle, not bothering to fly out because they thought their request to go OS was going to get shutdown.
- Brief part at the end where Kelsey Hightower talks about what he thinks might come after Kubernetes. He mentions, and I thought this was very interesting ... Serverless making a return. It really seemed like Serverless would be "the thing" in 2016-2017 but containers were too powerful. Maybe now with KNative or some future fusing of Container Orchestration + K8S?
Containers were invented because Linux is incapable of reliably running software programs. The global pool of shared libraries is an abject failure and containers are a heavy handed workaround.
I think "fixing distro packaging" is more apropos.
In a past life, I remember having to juggle third-party repositories in order to get very specific versions of various services, which resulted in more than few instances of hair-pull-inducing untangling of dependency weirdness.
This might be controversial, but I personally think that distro repos being the assumed first resort of software distribution on Linux has done untold amounts of damage to the software ecosystem on Linux. Containers, alongside Flatpak and Steam, are thankfully undoing the damage.
> This might be controversial, but I personally think that distro repos being the assumed first resort of software distribution on Linux has done untold amounts of damage to the software ecosystem on Linux.
Hard agree. After getting used to "system updates are... system updates; user software that's not part of the base system is managed by a separate package manager from system updates, doesn't need root, and approximately never breaks the base system (to include the graphical environment); development/project dependencies are not and should not be managed by either of those but through project-specific means" on macOS, the standard Linux "one package manager does everything" approach feels simply wrong.
> development/project dependencies are not and should not be managed by either of those but through project-specific means" on macOS, the standard Linux "one package manager does everything" approach feels simply wrong.
This predates macOS. The mainframe folks did this separation eons ago (see IBM VM/CMS).
On Unix, it's mostly the result of getting rid of your sysadmins who actually had a clue. Even in Unix-land in the Bad Old Days(tm), we used to have "/usr/local" for a reason. You didn't want the system updating your Perl version and bringing everything to a screeching halt; you used the version of Perl in /usr/local that was under your control.
I wonder if it can be traced back to something RedHat did somewhere, because it may have all begun once you COULDN'T be absolutely certain that anything even remotely "enterprise" was running on a RedHat.
I think it's a natural outgrowth of what Linux is.
Linux is just a kernel - you need to ship your own userland with it. Therefore, early distros had to assemble an entire OS around this newfangled kernel from bits and pieces, and those bits and pieces needed a way to be installed and removed at will. Eventually this installation mechanism gets scope creep and and suddenly things like FreeCiv and XBill are distributed using the same underlying system that bash and cron use.
This system of distro packaging might be good as a selling point for a distro - so people can brag about their distro comes with 10,000 packages or whatever. That said, I can think of no other operating system out there where the happiest path of releasing software is to simply release a tarball of the source, hope a distro maintainer packages it for you, hope they do it properly, and hope that nobody runs into a bug due to a newer or older version of a dependency you didn't test against.
Yours is a philosophy I encounter more and more. Where there should be that unified platform, ideally fast moving, where software is only tested against $latest. Stability is a thing of the past. The important thing is more feature.
Instead of designing a solution and perfecting it overtime, it's endless tweaking where there's a new redesign every years. And you're supposed to use the exact computer as the Dev to get their code to work.
Red Hat was actually doing something more directly based on a variety of existing Linux projects than Docker but switched to OCI/Docker when that came about--rather than jumping on the CloudFoundry bandwagon. (Which many argues was obviously the future for container orchestration.)
Kubernetes was also not the obvious winner in its time with Mesos in particular seeming like a possible alternative when it wasn't clear if orchestration and resource management weren't possibly different product categories.
I was at Red Hat at the time and my impression was they did a pretty good job of jumping onto where the community momentum at the time was--while doubtless influencing that momentum at the time.
Never grew popular, perhaps. But I'm not sure how it failed, and not sure how many of the Venm Diagrams of concerns plan9 really has with containers.
Yes there was an idea of creating bespoke filesystems for apps, custom mount structures that plan9 had. That containers also did something semi-parallel to. But container images as read only overlays (with a final rw top overlay) feel like a very narrow craft. Plan9 had a lot more to it (everything as a file), and containers have a lot more to them (process, user, net namespaces, container images to pre-assembled layers).
I can see some shared territory but these concerns feel mostly orthogonal. I could easily imagine a plan9 like entity arising amid the containerized world: these aren't really in tension with each other. There's also a decade and a half+ gap between Plan9's hayday and the rise of containers.
I just wanted to have two instances of two versions of Postgres installed, and have their data directories documented in a YAML file, and know that they aren't gonna fuck up anything else on my system if someone gets RCE or something odd
The article seems to assume that containers appeared to solve the software distribution problem and then somehow got repurposed into virtualization, isolation and management of production services. I think this view is very far from truth.
The virtualization/isolation aspect came first, the SWSoft Virtuozzo was doing that quite well in early 2000s. They even had [some] IO isolation which I think took around a decade to support elsewhere. Then gradually pieces of Virtuozzo/OpenVZ reached the mainline in a form of cgroups/LXC and the whole thing slowly brewed for a while until the Docker added the two missing pieces: the fast image rebuilds and the out-of-the-box user experience.
Docker of course was the revolution, but by then sufficiently advanced companies have been already using containers for isolation for a full decade.
Containers (meaning Docker) happened because CGroups and namespaces were arcane and required lots of specialized knowledge to create what most of us can intuitively understand as a "sandbox".
Cgroups and namespaces were added to Linux in an attempt to add security to a design (UNIX) which has a fundamentally poor approach to security (shared global namespace, users, etc.).
It's really not going all that well, and I hope something like SEL4 can replace Linux for cloud server workloads eventually. Most applications use almost none of the Linux kernel's features. We could have very secure, high performance web servers, which get capabilities to the network stack as initial arguments, and don't have access to anything more.
Drivers for virtual devices are simple, we don't need Linux's vast driver support for cloud VMs. We essentially need a virtual ethernet device driver for SEL4, a network stack that runs on SEL4, and a simple init process that loads the network stack with capabilities for the network device, and loads the application with a capability to the network stack. Make building an image for that as easy as compiling a binary, and you could eliminate maybe 10s of millions of lines of complexity from the deployment of most server applications. No Linux, no docker.
Because SEL4 is actually well designed, you can run a sub kernel as a process on SEL4 relatively easily. Tada, now you can get rid of K8s too.
Containers and namespaces are not about security. They are about not having singleton objects at the OS level. Would have called it virtualization if the word wasn't so overloaded already. There is a big difference that somehow everyone misses. A bypassable security mechanism is worse than useless. A bypassable virtualization mechanism is useful. It is useful to be able to have a separate root filesystem just for this program - even if a malicious program is still able to detect it's not the true root.
As about SEL4 - it is so elegant because it leaves all the difficult problems to the upper layer (coincidentally making them much more difficult).
> Containers and namespaces are not about security
True. Yet containers, or more precisely the immutable images endemic to container systems, directly address the hardest part of application security: the supply chain. Between the low effort and risk entailed when revising images to address endlessly emerging vulnerabilities, and enabling systematized auditing of immutable images, container images provide invaluable tools for security processes.
I know about Nix and other such approaches. I also know these are more fragile than the deeply self-contained nature of containers and their images. That's why containers and their image paradigm have won, despite all the well-meaning and admirable alternatives.
> A bypassable security mechanism is worse than useless
Also true. Yet this is orthogonal to the issues of supply chain management. If tomorrow, all the problems of escapable containers were somehow solved, whether by virtual machines on flawless hypervisors, or formally verified microkernels, or any other conceivable isolation mechanism, one would still need some means to manage the "content" of disparate applications, and container systems and the image paradigm would still be applicable.
> I also know these are more fragile than the deeply self-contained nature of containers and their images
Not really. People only use Nix because it doesn't randomly break, bitrot or require arcane system setup.
Unlike containers. You really need k8s or something like it to mould Docker containers into something manageable.
> As about SEL4 - it is so elegant because it leaves all the difficult problems to the upper layer (coincidentally making them much more difficult).
I completely buy this as an explanation for why SEL4 for user environments hasn't (and probably will never) take off. But there's just not that much to do to connect a server application to the network, where it can access all of its resources. I think a better explanation for the lack of server side adoption is poor marketing, lack of good documentation, and no company selling support for it as a best practice.
> But there's just not that much to do to connect a server application to the network, where it can access all of its resources.
If you only care to run stateless stuff that never write anything (or at least never read what they wrote) - it's comparatively easy. Still gotta deal with the thousand drivers - even on the server there are a lot of quirky stuff. But then you gotta run the database somewhere. And once you run a database you get all the problems Linus warned about. So you gotta run the database on a separate Linux box (at that point - what do you win vs. using Linux for everything?) or develop a new database tailored for SeL4 (and that's quite a bit more complex than an OS kernel). An elegant solution that only solves a narrow set of cases stands no chance over a crude solution that solves every case.
Also, with the current sexy containerized stacks it's easy to forget, but having same kind of environment on the programmer's workbench and on the sever was once Unix's main selling point. It's kinda expensive to support a separate abstraction stack for a single purpose.
The lack of adoption is because it’s not a complete operating system.
Using sel4 on a server requires complex software development to produce an operating environment in which you can actually do anything.
I’m not speaking ill of sel4; I’m a huge fan, and things like it’s take-grant capability model are extremely interesting and valuable contributions.
It’s just not a usable standalone operating system. It’s a tool kit for purpose-built appliances, or something that you could, with an enormous amount of effort, build a complete operating system on top of.
Yes. I really hope someone builds a nice, usable OS with SeL4 as a base. If SeL4 is like the linux kernel, we need a userland (GNU). And a distribution that's simple to install and make use of.
I'd love to work on this. It'd be a fun problem!
seL4 needs a ‘the rest of the kernel’ to be like linux
It needs device drivers for modern x86 hardware. And filesystems, and a TCP stack. All of that code can be done in "SeL4 userland", but yeah - I see your point.
Are there any projects like that going on? It feels like an obvious thing.
Are you aware of https://genode.org ?
A lot of deployments essentially virtualize Linux or run portions of NetBSD (e.g. via their "rump" kernel mechanism) to achieve driver support, file systems, etc. That's not really a general-purpose solution, though.
There is work within major consumer product companies building such things (either with sel4, or things based on sel4's ideas), and there's Genode on seL4.
> Containers and namespaces are not about security
An escape from properly configured container/namespaces is a kernel 0day. Or a 0day in whatever protocol the isolated workload talks to the outside with.
Is that why containers started? I seem to recall them taking off because of dependency hell, back in the weird time when easy virtualization wasn't insanely available to everyone.
Trying to get the versions of software you needed to use all running on the same server was an exercise in fiddling.
I think there were multiple reasons why containers started to gain traction. If you ask 3 people why they started using containers, you're likely to get 4 answers.
For me, it was avoiding dependencies and making it easier to deploy programs (not services) to different servers w/o needing to install dependencies.
I seem to remember a meetup in SF around 2013 where Docker (was it still dotCloud back then?) was describing a primary use-case was easier deployment of services.
I'm sure for someone else, it was deployment/coordination of related services.
The big selling points for me were what you said about simplifying deployments, but also the fact that a container uses significantly less resource overhead than a full blown virtual machine. Containers really only work if your code works in user space and doesn't need anything super low level (eg TCP network stack), but as long as you stay in user space it's amazing.
The main initial drive for me was that it let me separately run many things without a) trying to manage separate dependency sets, and b) Sharing RAM - Without having to physically allocate large amounts of memory to virtual machines; on an 8GB machine at a couple per VM that doesn’t let you get far.
"making it easier to deploy" is a rather... clinical description for fixing the "but it works on my machine!" issue. We could go into detail on how it solved that, but imo it comes down to that.
There's a classic joke where it turns out the solution to "it works on my machine" was to ship my machine
This matches my recollection. Easily repeatable development and test environments that would save developers headaches with reproduction. That then lead logically to replacement of Ansible etc for the server side with the same methodology.
There were many use cases that rapidly emerged, but this eclipsed the rest.
Docker Hub then made it incredibly easy to find and distribute base images.
Google also made it “cool” by going big with it.
And what was the reason for the dependency hell?
Was it always so hard to build the software you needed on a single system?
iirc full virtualization was expensive ( vmware ) and paravirtualization was pretty heavyweight and slow ( Xen ). I think Docker was like a user friendlier cgroups and everyone loved it. I can't remember the name but there was a "web hosting company in a box" software that relied heavily on LXC and probably was some inspiration for containerization too.
edit: came back in to add reference to LXC, it's been probably 2 decades since i've thought about that.
LXD?
On a personal level, that's why I started using them for self hosting. At work, I think the simplicity of scaling from a pool of resources is a huge improvement over having to provision a new device. Currently at an on-prem team and even moving to kubernetes without going to cloud would solve some of the more painful operational problems that send us pages or we have to meet with our prod support team about.
Yes, totally agree that's a contributor too. I should expand that by namespaces I mean user, network, and mount table namespaces. The initial contents of those is something you would have to provide when creating the sandbox. Most of it is small enough to be shipped around in a JSON file, but the initial contents of a mount table require filesystem images to be useful.
This makes sense if you look at containers as simply a means to an end of setting up a sandbox, but not really much sense at all if you think of containers as a way to make it easy to get an arbitrary application up and running on an arbitrary server without altering host system dependencies.
I suspect that containers would have taken off even without isolation. I think the important innovation of Docker was the image. It let people deploy consistent version of their software or download outside software.
All of the hassle of installing things was in the Dockerfile, and it was run in containers so more reliable.
I honestly think that Dockerfile was the biggest driver. Containers as a technology are useful, for the many reasons outlined in this thread. But what Dockerfiles achieved was to make the technology accessible to much wider and much less technically deep audience. The syntax is easy to follow, the vocabulary available for the DSL is limited, and the results are immediately usable.
Oh, and the layer caching made iterative development with _very_ rapid cycles possible. That lowered the bar for entry and raised the floor for everyone to get going easier.
But back to Dockerfiles. The configuration language used made it possible for anyone[tm] to build a container image, to ship a container image and to run the container. Fire-and-forget style. (Operating the things in practice and at any scale was left as an exercise for the reader.)
And because Anyone[tm] could do it, pretty much anyone did. For good and ill alike.
> I think the important innovation of Docker was the image. It let people deploy consistent version of their software or download outside software.
What did it let people do that they couldn't already do with static linking?
- most languages don't really do static linking in the same way as C
- things like "a network port" can also be a dependency, but can't be "linked". And so on for all sorts of software that expects particular files to be in particular places, or requires deploying multiple communicating executables
- some dependencies really do not like being statically linked (this includes the GNU standard library!), for things like nsswitch
I can't tell if this is a genuine question or not but if it is.. deploying a Ruby on Rails app with a pile of gems that have c deps isn't fixed with static linking. This is true for python and node and probably other things I'm not thinking of.
What about the ton of languages that don't have static linking?
Because it doesn’t need you to delve deep into the build system of every dependency and application you ever want to package?
>> It let people deploy consistent version of their software
Surprisingly "my software" depends on a lot of other stuff. Python, Ruby, PHP, JS, etc all need tens to hundreds of native libraries that have to be deployed.
I agree: I think the container image is what matters. As it turns out, getting more (or less) isolation given that image format is not a very hard problem.
Agreed. There was a point where I thought AMIs would become the unit of open source deployment packaging, and I think docker filled that niche in a cloud-agnostic way
ps I still miss the alternate universe where Kenton won the open source deployment battle :-)
> Cgroups and namespaces were added to Linux in an attempt to add security to a design (UNIX) which has a fundamentally poor approach to security (shared global namespace, users, etc.)
Namespacing of all resources (no restriction to a shared global namespace) was actually taken directly from plan9. It does enable better security but it's about more than that; it also sets up a principled foundation for distributed compute. You can see this in how containerization enables the low-level layers of something like k8s - setting aside for the sake of argument the whole higher-level adaptive deployment and management that it's actually most well-known for.
> Drivers for virtual devices are simple, we don't need Linux's vast driver support for cloud VMs. We essentially need a virtual ethernet device driver for SEL4, a network stack that runs on SEL4, and a simple init process that loads the network stack with capabilities for the network device, and loads the application with a capability to the network stack. Make building an image for that as easy as compiling a binary, and you could eliminate maybe 10s of millions of lines of complexity from the deployment of most server applications. No Linux, no docker.
Wasn't this what unikernels were attempting a decade ago? I always thought they were neat but they never really took off.
I would totally be onboard with moving to seL4 for most cloud applications. I think Linux would be nearly impossible to get into a formally-verified state like seL4, and as you said most cloud stuff doesn't need most of the features of Linux.
Also seL4 is just cool.
It would be great if we got "kernel independent" Nvidia drivers. I have some experience with bare-metal development and it really seems like most of what an operating system provides could be provided in a much better way as a set of libraries that make specific pieces of hardware work, plus a very good "build" system.
> which has a fundamentally poor approach to security
Unix was not designed to be convenient for VPS providers. It was designed to allow a single computer to serve an entire floor of a single company. The security approach is appropriate for the deployment strategy.
As it did with all OSes, the Internet showed up, and promptly ruined everything.
> Because SEL4 is actually well designed, you can run a sub kernel as a process on SEL4 relatively easily. Tada, now you can get rid of K8s too.
k8s is about managing clusters of machines as if they were a single resource. Hence the name "borg" of its predecessor.
AFAIK, this isn't a use case handled by SEL4?
The K8s master is just a scheduling application. It can run anywhere, and doesn't depend on much (just etcd). The kublet (which runs on each node) is what manages the local resources. It has a plugin architecture, and when you include one of each necessary plugin, it gets very complicated. There are plugins for networking, containerization, storage.
If you are already running SEL4 and you want to spawn an application that is totally isolated, or even an entire sub-kernel it's not different than spawning a process on UNIX. There is no need for the containerization plugins on SEL4. Additionally the isolation for the storage and networking plugins would be much better on SEL4, and wouldn't even really require additional specialized code. A reasonable init system would be all you need to wire up isolated components that provide storage and networking.
Kubernetes is seen as this complicated and impressive piece of software, but it's only impressive given the complexity of the APIs it is built on. Providing K8s functionality on top of SEL4 would be trivial in comparison.
I understand what you're saying, and I'm a fan of SEL4. But isolation isn't one of the primary points of k8s.
Containerization is after all, as you mentioned, a plugin. As is network behavior. These are things that k8s doesn't have a strong opinion on beyond compliance with the required interface. You can switch container plugin and barely notice the difference. The job of k8s is to have control loops that manage fleets of resources.
That's why containers are called "containers". They're for shipping services around like containers on boats. Isolation, especially security isolation, isn't (or at least wasn't originally) the main idea.
You manage a fleet of machines and a fleet of apps. k8s is what orchestrates that. SEL4 is a microkernel -- it runs on a single machine. From the point of view of k8s, a single machine is disposable. From the point of view of SEL4, the machine is its whole world.
So while I see your point that SEL4 could be used on k8s nodes, it performs a very different function than k8s.
The scheduler is the least interesting thing about k8s. The extensible API common to all operating environments is the real value add.
As others mentioned containers aren’t about security either, I think you’re rather missing the whole purpose of the cloud native ecosystem here.
> Kubernetes is seen as this complicated and impressive piece of software, but it's only impressive given the complexity of the APIs it is built on.
There are other reasons it's impressive. Its API and core design is incredibly well-designed and general, something many other projects could and should learn from.
But the fact that it's impressive because of the complexity of the APIs it's built on is certainly a big part of its value. It means you can use a common declarative definition to define and deploy entire distributed systems, across large clusters, handling everything from ingress via load balancers to scaling and dynamic provisioning at the node level. It's essentially a high-level abstraction for entire data centers.
seL4 overlaps with that in a pretty minimal way. Would it be better as underlying infrastructure than the Linux kernel? Perhaps, but "providing K8s functionality on top of SEL4" would require reimplementing much of what Linux and various systems on top of it currently provide. Hardly "trivial in comparison".
cgroups first came from resource management frameworks that IIRC came out of IBM and got into some distro kernels for a time but not upstream.
Namespaces were not an attempt to add security, but just grew out of work to make interfaces more flexible, like bind mounts. And Unix security is fundamentally good, not having namespaces isn't much of a point against it in the first place, but now it does have them.
And it's going pretty well indeed. All applications use many kernel features, and we do have very secure high performance web and other servers.
L4 systems have been around for as long as Linux, and SEL4 in particular for 2 decades. They haven't moved the needle much so I'd say it's not really going all that well for them so far. SEL4 is a great project that has done some important things don't get me wrong, but it doesn't seem to be a unix replacement poised for a coup.
> Unix security is fundamentally good
L. Ron Hubbard is fundamentally good!
I kid, but seriously, good how? Because it ensures cybersecurity engineers will always have a job?
seL4 is not the final answer, but something close to it absolutely will be. Capability-based security is an irreducible concept at a mathematical level, meaning you can’t do better than it, at best you can match it, and its certainly not matched by anything else we’ve discovered in this space.
I don't think Docker came about due to cgroups and namespaces being arcane, LXC was already abstracting that away.
Docker's claim to fame was connecting that existing stuff with layered filesystem images and packaging based off that. Docker even started off using LXC to cover those container runtime parts.
You say applications and web servers kind of interchangeably. I don't know anything about SEL4. What if your application needs to spawn and manage executables as child processes? Is it Linux-like enough to run those and handle stuff like that so that those of us coding at the application layer don't need to worry about it too much?
seems like all this was part of a long evolution.
I think the whole thing has been levels of abstraction around a runtime environment.
in the beginning we had the filesystem. We had /usr/bin, /usr/local/bin, etc.
then chroot where we could run an environment
then your chgroups/namespaces
then docker build and docker run
then swarm/k8s/etc
I think there was a parallel evolution around administration, like configure/make, then apt/yum/pacman, then ansible/puppet/chef and then finally dockerfile/yaml
> Containers (meaning Docker) happened because CGroups and namespaces were arcane and required lots of specialized knowledge to create what most of us can intuitively understand as a "sandbox".
That might be why Docker was originally implemented, but why it "happened" is because everyone wanted to deploy Python and pre-uv Python package management sucks so bad that Docker was the least bad way to do that. Even pre-kubernetes, most people using Docker weren't using it for sandboxing, they were using it as fat jars for Python.
Not only python, although python is particularly bad.
Even java things wher fatjars exist you at some point end up with os level dependencies like "and this logging thing needs to be set up, and these dirs need these rights, and this user needs to be in place" etc. Nowadays you can shove that into a container
> which get capabilities to the network stack as initial arguments, and don't have access to anything more
Systemd does this and it is widely used.
> namespaces were added to Linux in an attempt to add security to a design (UNIX) which has a fundamentally poor approach to security (shared global namespace, users, etc.)
If the "fundamentally poor approach to security" is a shared global namespace, why are namespaces not just a fix that means the fundamental approach to security is no longer poor?
https://archive.kernel.org/oldwiki/tiny.wiki.kernel.org/
> What people do with Docker is spin up a database or another service to develop or test against.
Yep. Being able to run
or and then get rid of it with simple Ctrl-C is a godsend.My headcanon is that Docker exists because Python packaging and dependency management was so bad that dotCloud had no choice but to invent some porcelain on top of Linux containers, just to provide a pleasant experience for deploying Python apps.
That's basically correct. But the more general problem is that engineers simply lost the ability to succinctly package applications and their dependencies into simple to distribute and run packages. Somehow around the same time Java made .jar files mainstream (just zip all the crap with a manifest), the rest of the world completely forgot how to do the equivalent of statically linking in libraries and that we're all running highly scheduled multithreaded operating systems now.
The "solution" for a long time was to spin up single application Virtual Machines, which was a heavy way to solve it and reduced the overall system resources available to the application making them stupidly inefficient solutions. The modern cloud was invented during this phase, which is why one of the base primitives of all current cloud systems is the VM.
Containers both "solved" the dependency distribution problem as well as the resource allocation problem sort of at once.
> engineers simply lost the ability to succinctly package applications and their dependencies into simple to distribute and run packages.
but this is what docker is
If anything, java kinda showed it doesn't have to suck, but as not all things are java, you need something more general
With the difference that with docker you are shipping the runtime to your source code as well.
which is great when you realize that not all software is updated at the same time.
how managing multiple java runtime versions is supposed to work is still beyond me... it's a different tool at every company, and the instructions never seem to work
I would argue that the traditional way to install applications (particularly servers) on UNIX wasn’t very compatible with the needs that arose in the 2000s.
The traditional way tends to assume that there will be only one version of something installed on a system. It also assumes that when installing a package you distribute binaries, config files, data files, libraries and whatnot across lots and lots of system directories. I grew up on traditional UNIX. I’ve spent 35+ years using perhaps 15-20 different flavors of UNIX, including some really, really obscure variants. For what I did up until around 2000, this was good enough. I liked learning about new variants. And more importantly: it was familiar to me.
It was around that time I started writing software for huge collections of servers sitting in data centers on a different continent. Out of necessity I had to make my software more robust and easier to manage. It had to coexist with lots of other stuff I had no control over.
It would have to be statically linked, everything I needed had to be in one place so you could easily install and uninstall. (Eventually in all-in-one JAR files when I started writing software in Java). And I couldn’t make too many assumptions about the environment my software was running in.
UNIX could have done with a re-thinking of how you deal with software, but that never happened. I think an important reason for this is that when you ask people to re-imagine something, it becomes more complex. We just can’t help ourselves.
Look at how we reimagined managing services with systemd. Yes, now that it has matured a bit and people are getting used to it, it isn’t terrible. But it also isn’t good. No part of it is simple. No part of it is elegant. Even the command line tools are awkward. Even the naming of the command line tools fail the most basic litmus test (long prefixes that require too many keystrokes to tab-complete says a lot about how people think about usability - or don’t).
Again, systemd isn’t bad. But it certainly isn’t great.
As for blaming Python, well, blame the people who write software for _distribution_ in Python. Python isn’t a language that lends itself to writing software for distribution and the Python community isn’t the kind of community that will fix it.
Point out that it is problematic and you will be pointed to whatever mitigation that is popular at the time (to quote Queen “I've fallen in love for the first time. And this time I know it's for real”), and people will get upset with you, downvote you and call you names.
I’m too old to spend time on this so for me it is much easier to just ban Python from my projects. I’ve tried many times, I’ve been patient, and it always ends up biting me in the ass. Something more substantial has to happen before I’ll waste another minute on it.
Sure they definitely were using Docker for their own applications, but also dotCloud was itself a PaaS, so they were trying to compete with Heroku and similar offerings, which had buildpacks.
The problem is/was that buildpacks aren't as flexible and only work if the buildpack exists for your language/runtime/stack.
Exactly this, but not just Python. The traditional way most Linux apps work is that they are splayed over your filesystem with hard coded references to absolute paths and they expect you to provide all of their dependencies for them.
Basically the Linux world was actively designed to apps difficult to distribute.
It wasn't about making apps difficult to distribute at all, that's a later side effect. Originally distros were built around making a coherent unified system of package management that made it easier to manage a system due to everything being built on the same base. Back then Linux users were sysadmins and/or C programmers managing (very few) code dependencies via tarballs. With some CPAN around too.
For a sysadmin, distros like Debian were an innovative godsend for installing and patching stuff. Especially compared to the hell that was Windows server sysadmin back in the 90s.
The developer oriented language ecosystem dependency explosion was a more recent thing. When the core distros started, apps were distributed as tarballs of source code. The distros were the next step in distribution - hence the name.
Right but those things are not unrelated. Back in the day if you suggested to the average FOSS developer that maybe it should just be possible to download a zip of binaries, unzip it anywhere and run it with no extra effort (like on Windows), they would say that that is actively bad.
You should be installing it from a distro package!!
What about security updates of dependencies??
And so on. Docker basically overrules these impractical ideas.
It was more like, library writers forgot how to provide stable APIs for their software, and applications decided they just wanted to bundle all the dependencies they needed together and damn the consequences on the rest of the system. Hence we got static linked binaries and then containers.
even if you have a stable interface... the user might not want to install it and then forget to remove it down the line
> Basically the Linux world was actively designed to apps difficult to distribute.
It has "too many experts", meaning that everyone has too much decision making power to force their own tiny variations into existing tools. So you end up needing 5+ different Python versions spread all over the file system just to run basic programs.
Pretty much this; systems with coherent isolated dependency management, like Java, never required OS-level container solutions.
They did have what you could call userspace container management via application servers, though.
NodeJS, Ruby, etc also have this problem, as does Go with CGO. So the problem is the binary dependencies with C/C++ code and make, configure, autotools, etc... The whole C/C++ compilation story is such a mess that almost 5 decades ago inventing containers was pretty much the only sane way of tackling it.
Java at least uses binary dependencies very rarely, and they usually have the decency of bundling the compiled dependencies... But it seems Java and Go just saw the writing on the wall and mostly just reimplement everything. I did have problems with the Snappy compression in the Kafka libraries, though, for instance .
The issue is with cross platform package management without proper hooks for the platform themselves. That may be ok if the library is pure, but as soon as you have bindings to another ecosystem (C/C++ in most cases), then it should be user/configurable instead of the provider doing the configuration with post installs scripts and other hacky stuff.
If you look at most projects in the C world, they only provide the list of dependencies and some build config Makefile/Meson/Cmake/... But the latter is more of a sample and if your platform is not common or differs from the developer, you have the option to modify it (which is what most distros and port systems do).
But good luck doing that with the sprawling tree of modern packages managers. Where there's multiple copies of the same libraries inside the same project just because.
I don't agree with this. Java systems were one of the earliest beneficiaries of container-based systems, which essentially obsoleted those ridiculously over-complicated, and language-specific, application servers that you mentioned.
Java users largely didn't bother with containers IME, largely for the same reasons that most Java users didn't bother with application servers. Those who did want that functionality already had it available, making the move from an existing Java application server to Docker-style containers a minor upgrade at best.
Tomcat and Jetty are application servers which are in almost every Spring application. There are such application servers which you mentioned, like Wildfly, but they are not obsolete as a whole.
Pyinstaller predates Docker. It's not about any individual language not being able to do packaging, it's about having a uniform interface for running applications in any language/architecture. That's why platforms like K8s don't have to know a thing about Python or anything else and they automatically support any future languages too.
My take: containers forced devepopers to declare various aspects of the application in a standardized, opinioated way:
- Persistant state? Must declare a volume. - IO with external services? Must declare the ports (and maybe addresses). - Configurable parameters? Must declare some env variables. - Trasitive dependecies? Must declare them, but using a mechanism of your choosing (e.g. via the package manager of your base image distro).
Separation of state (as in persistency) and application (as in binaries, assets) makes updates easy. Backups also.
Having most all IO visible and explicit simplifies operation and integration.
And a single, (too?!?) simple config mechanism increases reusability, by enabling e.g. lightweight tailoring of generic application service containers (such as mariadb).
Together this bunch of forced, yet leaky abstractions is just good enough to foster immense reuse & composability on to a plethora of applications, all while allowing to treat them almost entirely like blackboxes. IMHO that is why OCI containers became this big, compared to other virtualization and (application-) cuntainer technologies.
Yep, this is why. Containers are a way to package more of the OS environment than you can do otherwise
Containers happened because nobody can be bothered to build an entire application into a single distributable executable anymore - heck even the tooling barely exists anymore. But instead of solving problems like dependency management and linking, today's engineers simply build piles of abstraction into the problem space until the thing you want to do more than anything (i.e. execute an application) becomes a single call.
Of course you now need to build and maintain those abstract towers, so more jobs for everybody!
That's why Docker and layers exist. Containers predate them by more than a decade.
this is what happens when hw is too cheap
You sure? Which hardware?
Put another way: stuff like Electron makes a pretty good case for the "cheap hardware leads to shitty software quality/distribution mechanisms" claim. But does Docker? Containers aren't generally any more expensive in hardware other than disk-space to run than any other app. And disk space was always (at least since the advent of the discrete HDD) one of the cheapest parts of a computer to scale up.
If you go back to the Sun days, you literally could not afford enough servers to run one app per server so instead you'd hire sysadmins to figure out how to run Sendmail and Oracle and whatever on one server without conflicting. Then x86/Linux 1Us came out and people started just running one app per server ("server sprawl") which was easy because there was nothing to conflict. This later became VM sprawl and containers were an optimization on that.
I'm not getting it, sorry.
We had to have multiple apps per server before, and now we have containers which offer a convenient way to have multiple apps per server? That seems like the same thing. Could you explain more re: what you meant?
What blew my mind and convinced me to only use immutable distros is the immutability of it.
For instance I could create my own login screen for an web service without having to worry about the package manager overriding my code, because I inject it into the container, which is already updated.
I can also forcefully reroute much easier ports or network connections the way I want it.
The author suggests that Docker doesnt help development and that devs just spin up databases, but I have to disagree with that and Im pretty sure i am not the only one.
All my projects (primarily web apps) are using docker compose which configures multiple containers (php/python/node runtime, nginx server, database, scheduler, etc) and run as a dev environment on my machine. The source code is mounted as a volume. This same compose file is then also used for the deployment to the production server (with minor changes that remove debug settings for example).
This approach has worked well for me as a solo dev creating web apps for my clients.
It has also enabled extreme flexibility in the stacks that I use, I can switch dev environments easily and quickly.
I agree with you 100%, though arguably what you could be describing is how docker changed your deployment workflow, not your development workflow (although with devcontainers that line is blurry, as you say).
I guess it's worth keeping in mind that Justin only quit Docker a few months ago, and his long tenure as CTO there will have (obviously) informed the majority of the opinions in the article. I think the deployment over development spin and some of the other takes there more closely reflect the conversations he had with large corp paying customers at the exec level than the workflows of solo devs that switch dev environments much more frequently than most etc.
> Application composition from open source components became the dominant way of constructing applications over the last decade.
I'm just as interested in why this ^ happened. I imagine it's pretty unique to software? I don't hear of car companies publishing component designs free for competitors to use, or pharmaceuticals freely waiving the IP in their patents or processes. Certainly not as "the dominant way of" doing business.
I wonder if LLM coding assistants had come about earlier, whether this would have been as prevalent. Companies might have been more inclined to create more of their own tooling from scratch since LLMs make it cheap (in theory). Individuals might have been less inclined to work on open source as hobbies because LLMs make it less personal. Companies might be less inclined to adopt open-source LLM-managed libraries because it's too chaotic.
I think open source software took off because it’s more standalone than the other things you listed and this makes the rewards much higher.
If I write some code, it needs a computer and environment to run. If I’m writing for what’s popular, that’s pretty much a given. In short, for code the design is the product.
If I design a pharmaceutical, someone still has to make it. Same for car parts. This effort is actually greater than the effort of design. If you include regulation, it’s way higher.
So, this great feedback loop of creation-impact-collaboration never forms. The loop would be too big and involve too much other stuff.
The closest thing isn’t actually manufacturing, it’s more like writing and music. People have been reusing each other’s stuff forever in those spaces.
Linux CGroups specifically were started at Google because their cluster management system Borg (or maybe it was still Babysitter at the time) needed a way to do resource tracking and admission control. Here's a comment by one of original devs: https://news.ycombinator.com/item?id=25017753
This is also somewhat highlighted in Google's paper "Borg, Omega, and Kubernetes" which they published in 2016.
https://static.googleusercontent.com/media/research.google.c...
Containers happened because running an ad network and search engine means serving a lot of traffic for as little cost as possible, and part of keeping the cost down is bin packing workloads onto homogeneous hardware as efficiently as possible.
https://en.wikipedia.org/wiki/Cgroups
(arguably FreeBSD jails and various mainframe operating systems preceded Linux containers but not by that name)
What does the 'ad network and search engine' have to do with it? Wouldn't any organization who serves lots of traffic have the same cost cutting goals you mentioned?
It's an oblique way to say that Linux cgroups and namespaces were developed by Google.
Yes, to expand: Both search and ads mean serving immense amounts of traffic and users while earning tiny amounts of revenue per unit of each. The dominant mid-90s model of buying racks of Sun and NetApp gear, writing big checks to Oracle, etc, would have been too expensive for Google. Instead they made a big investment in Linux running on large quantities of commodity x86 PC hardware, and building software on top of that to get the most out of it. That means things like combining workloads with different profiles onto the same servers, and cgroups kind of falls out of that.
Other companies like Yahoo, Whatsapp, Netflix also followed interesting patterns of using strong understanding of how to be efficient on cheap hardware. Notably those three all were FreeBSD users at least in their early days.
Yup and just to add timelines - Google Borg and containerization was what... 2003-2005? Docker was 2011-2013?
*cgroups v1. We have Facebook to thank for v2, right?
> I was always surprised someone didn't invent a tool for ftping to your container and updating the PHP
We thought of it, and were thankful that it was not obvious to our bosses, because lord forbid they would make it standard process and we would be right back where we started, with long lived images and filesystem changes, and hacks, and managing containers like pets.
This and the comments may miss the forest for the trees.
Enterprise software vendors sold libraries and then "application servers", essentially promising infrastructure (typically tied to databases).
Enterprise software developers -- Google in particular -- got tired of depending on others' licensed infrastructure. This birthed Spring and Docker, splitting the market.
(Fun aside: when is a container a vm? When it runs via Apple containerization.)
When you launch a container (either through docker you manually through namespaces) you are effectively representing yourself to the kernel as a separate thing. This allows you to construct a completely separate environment when interacting with the kernel where none of your concerns are going to leak out and nothing you don't care for is going to leak in.
When people say that static executables would solve the problem they are wrong, a static executable just means that you can eschew constructing a separate file-system inside your container - and you will probably need to populate some locations anyway.
Properly configured containers are actually supposed to be secure sandboxes, such that any violation is a kernel exploit. However the Linux kernel attack surface is very large so no one serious who offers multi-tenant hosting can afford to rely on containers for isolation. They have to assume that a container escape 0day can be sourced. It may be more accurate to say that a general kernel 0day can be sourced since the entire kernel surface area is open for anyone to poke. seccomp can mitigate the surface area but also narrow down the usefulness.
not .. really. Linux kernel has no concept of a container, you have to be super careful to avoid "mixing" host stuff in. I'm yet to see an case where "leaking in" would be prevented by default. Docker "leaks in" as much as you want. Containers also do not nest gracefully (due to, e.g., uids), so cannot be used as a software component. It's mostly a linux system admin thing right now.
it happened because the story of dependencies (system & application) was terrible. the ability to run the app on different distribution/kernel/compiler/etc was hard. there were different solutions like vagrant, but they were heavy and the DX wasn't there
Because dependencies on Unix are terrible for some languages that assume things are installed globally.
I love this sentence about DevOps "Somehow it seems easier for people to relate to technology than culture, and the technology started working against the culture."
[dead]
how did solaris zones get left out of the story?
> There was one key innovation, which was Docker Hub
Vagrant has had that for VMs long before.
Dockerfiles and Docker Hub are directly inspired from Vagrantfiles and the Vagrant Box library
"The compute we are wasting is at least 10x cheaper, but we have automation to waste it at scale now."
So much this. keep it simple, stupid (muah)
Fwiw the actual video that he links to is well worth a watch.
Featuring one of the most Justin intros ever.
> was always surprised someone didn't invent a tool for ftping to your container and updating the PHP.
No FTP needed, you can just mount the application directory.
Containers happened because nobody knew what the hell they were doing and still have no clue what the hell they are doing. Software by the deranged for the deranged.
...and they don't care about security.
> Docker also made Go credible as a programming language,
Can someone explain why Docker was beneficial for Go?
Because docker was written in golang.
And before docker, not many large applications were.
Ironically, also because Go is one of few popular languages for web applications that can produce a single executable binary and does not require a container to deploy with ease.
I think there's a pretty big citation needed on that part of the article. I'm not clear that Docker contributed to that anywhere near as much as a general increase of momentum around Go as it became better known in the industry.
containers happened because the original execution isolation environment(the process) was considered a lost cause, Processes shared too much with each other so additional isolation primitives had to be added, but they had to be sort of tacked on to the side because more important than security or correctness is backwards compatibility. so now containers are considered a different thing than processes when really they are process with these additional isolation primitives enabled.
In the early 2000s (yes, long after the original jails), containers were pitched as an alternative to VMware's VMs. They lost out for a variety of reasons--but mostly because as purely a different (and somewhat lighter-weight) encapsulation technique they weren't that interesting.
For me the main reason to use containers is "one-line install any linux distro userspace". So much simpler than installing a dozen VirtualBox boxes to test $APP on various versions of ubuntu, debian, nixos, arch, fedora, suse, centos etc.
Yeah nowadays we have the distrobox(1) command. Super useful. But certainly that's not why containers happened.
The article is just wrong. Before Docker, there was OpenVZ and Virtuozzo. They were used to provide cheaper "dedicatd machine" hosting back around 2005.
Then the technology from OpenVZ slowly made its way into the mainline Linux, in the form of cgroups and namespaces. LWN called it a "container puzzle", with tens of moving pieces. And it was largely finished by early 2010-s.
I built my own container system in 2012 that used cgroups to oversubscribe the RAM, with simple chroot-based file namespaces for isolation. We even used XFS projects (raise your hand if you know what this is!) for the disk quota management. I remember that I had to use systemtap to patch the kernel to be able to find out which process died as a result of the OOM killer, there were no standard ways to do that.
We sold it as a part of our biotech startup to Illumina. Then we sold it again to Amazon as a part of another startup :)
The genius of Docker was the layered overlayfs-based image building. This one simple innovation made it possible to build images in a constructive way, without having to waste half an hour for each minor change. I was floored with its simplicity and power when I first saw it.
in some situations, the cloud is a no-go and the lead time to install new hardware is quite lengthy. in those case vm's or containers allow for rapid change at the software/OS level while the meat space still moves like a glacier.
sounds minor, but it is a Big Deal for some
You can laugh or not but its because they never finished gnu/hurd :D
FreeBSD jails years ago based upon a user request.
>hosting provider's ... desire to establish a clean, clear-cut separation between their own services and those of their customers
https://en.wikipedia.org/wiki/FreeBSD_jail
My guess Linux started getting requests rom various orgs for a while, so in true Linux fashion, we got a a few different container type methods years later.
I still think Jails are the best of the bunch, but they can be a bit hard to setup. Once setup, Jails works great.
So here we are :)
Why it happened is not nearly as important as what it unveiled: that versioned immutable systems are the most powerful system design concept in history. Most people have not yet grasped what an insanely powerful concept it is. At some point in the future, maybe 50-100 years from now, someone will look back and say "holy shit; deploying web apps are where this concept came from?" I hope in my lifetime that people get it and start applying it to other system designs/components.
radmind had this philosophy. https://radmind.org
It let you create diffs of a filesystem, and layer them with configurations, similar to containers. Useful for managing computer labs at the time.
I love boring.
Because Linux devs generally suck at making portable packages that are easy to install.
I loved the assertion that AI ate up all the budget and that K8s is now "boring" technology. That's fine because it was getting pretty annoying with all the clone competitors for practically everything that were popping up every month!
Do you use K8s? No! That's old! I use Thrumba! It's just a clone of K8s by some startup because people figured out that the easiest way to make gobs of money is/was to build platform products and then get people to use them.
Fascinating documentary on Kubernetes for those who have 50 minutes. Gives more background to the "Container Wars". The filmmakers also have documentaries on the history of Python, Argo, etc.
Some highlights:
- How far behind Kubernetes was at the time of launch. Docker Swarm was significantly more simple to use, and Apache Mesos scheduler could already handle 10,000 nodes (and was being used by Netflix).
- RedHat's early contributions were key, despite having the semi-competing project of OpenShift.
- The decision to Open Source K8S came down to one meeting brief meeting at Google. Many of the senior engineers attended remotely from Seattle, not bothering to fly out because they thought their request to go OS was going to get shutdown.
- Brief part at the end where Kelsey Hightower talks about what he thinks might come after Kubernetes. He mentions, and I thought this was very interesting ... Serverless making a return. It really seemed like Serverless would be "the thing" in 2016-2017 but containers were too powerful. Maybe now with KNative or some future fusing of Container Orchestration + K8S?
[1] - https://youtu.be/BE77h7dmoQU
I feel that's going to be more interesting than this video. The speaker is very unpracticed.
Containers were invented because Linux is incapable of reliably running software programs. The global pool of shared libraries is an abject failure and containers are a heavy handed workaround.
Likely because Plan9's 'everything-is-a-filesystem' failed.
The standard answer is, "because inventing and implementing them was easier than fixing Python packaging."
I think "fixing distro packaging" is more apropos.
In a past life, I remember having to juggle third-party repositories in order to get very specific versions of various services, which resulted in more than few instances of hair-pull-inducing untangling of dependency weirdness.
This might be controversial, but I personally think that distro repos being the assumed first resort of software distribution on Linux has done untold amounts of damage to the software ecosystem on Linux. Containers, alongside Flatpak and Steam, are thankfully undoing the damage.
> This might be controversial, but I personally think that distro repos being the assumed first resort of software distribution on Linux has done untold amounts of damage to the software ecosystem on Linux.
Hard agree. After getting used to "system updates are... system updates; user software that's not part of the base system is managed by a separate package manager from system updates, doesn't need root, and approximately never breaks the base system (to include the graphical environment); development/project dependencies are not and should not be managed by either of those but through project-specific means" on macOS, the standard Linux "one package manager does everything" approach feels simply wrong.
> development/project dependencies are not and should not be managed by either of those but through project-specific means" on macOS, the standard Linux "one package manager does everything" approach feels simply wrong.
This predates macOS. The mainframe folks did this separation eons ago (see IBM VM/CMS).
On Unix, it's mostly the result of getting rid of your sysadmins who actually had a clue. Even in Unix-land in the Bad Old Days(tm), we used to have "/usr/local" for a reason. You didn't want the system updating your Perl version and bringing everything to a screeching halt; you used the version of Perl in /usr/local that was under your control.
I wonder if it can be traced back to something RedHat did somewhere, because it may have all begun once you COULDN'T be absolutely certain that anything even remotely "enterprise" was running on a RedHat.
I think it's a natural outgrowth of what Linux is.
Linux is just a kernel - you need to ship your own userland with it. Therefore, early distros had to assemble an entire OS around this newfangled kernel from bits and pieces, and those bits and pieces needed a way to be installed and removed at will. Eventually this installation mechanism gets scope creep and and suddenly things like FreeCiv and XBill are distributed using the same underlying system that bash and cron use.
This system of distro packaging might be good as a selling point for a distro - so people can brag about their distro comes with 10,000 packages or whatever. That said, I can think of no other operating system out there where the happiest path of releasing software is to simply release a tarball of the source, hope a distro maintainer packages it for you, hope they do it properly, and hope that nobody runs into a bug due to a newer or older version of a dependency you didn't test against.
Yours is a philosophy I encounter more and more. Where there should be that unified platform, ideally fast moving, where software is only tested against $latest. Stability is a thing of the past. The important thing is more feature.
Instead of designing a solution and perfecting it overtime, it's endless tweaking where there's a new redesign every years. And you're supposed to use the exact computer as the Dev to get their code to work.
Red Hat was actually doing something more directly based on a variety of existing Linux projects than Docker but switched to OCI/Docker when that came about--rather than jumping on the CloudFoundry bandwagon. (Which many argues was obviously the future for container orchestration.)
Kubernetes was also not the obvious winner in its time with Mesos in particular seeming like a possible alternative when it wasn't clear if orchestration and resource management weren't possibly different product categories.
I was at Red Hat at the time and my impression was they did a pretty good job of jumping onto where the community momentum at the time was--while doubtless influencing that momentum at the time.
Ngl this is why I started using them
Never grew popular, perhaps. But I'm not sure how it failed, and not sure how many of the Venm Diagrams of concerns plan9 really has with containers.
Yes there was an idea of creating bespoke filesystems for apps, custom mount structures that plan9 had. That containers also did something semi-parallel to. But container images as read only overlays (with a final rw top overlay) feel like a very narrow craft. Plan9 had a lot more to it (everything as a file), and containers have a lot more to them (process, user, net namespaces, container images to pre-assembled layers).
I can see some shared territory but these concerns feel mostly orthogonal. I could easily imagine a plan9 like entity arising amid the containerized world: these aren't really in tension with each other. There's also a decade and a half+ gap between Plan9's hayday and the rise of containers.
I just wanted to have two instances of two versions of Postgres installed, and have their data directories documented in a YAML file, and know that they aren't gonna fuck up anything else on my system if someone gets RCE or something odd
I mean, containers do lend themselves to cargo culting by their very nature.
https://youtube.com/watch?v=TwxetMVdYTc&t=30s
Original sin.