As Linux runs on everything from cellphones, radio-controlled helicopters, desktops, and servers to 73 percent of the world's largest supercomputers, scaling the driver model was very important and always in the backs of our minds. As development progressed, it was nice to see that the basic structures used to hold devices, struct kobject and struct devices, were relatively small. The number of devices connected to most systems is directly proportional to the size of the system. So small, embedded systems had only a few—one to ten— different devices connected and in their device tree. Larger "enterprise" systems had many more devices connected, but these systems also had a lot of memory to spare, so the increased number of devices was still only a very small proportion of the kernel's overall memory usage.
This comfortable scaling model, unfortunately, was found to be completely false when it came to one class of "enterprise" system, the s390 mainframe computer. This computer could run Linux in a virtual partition (up to 1,024 instances at the same time on a single machine) and had a huge number of different storage devices connected to it. Overall, the system had a lot of memory, but each virtual partition would have only a small slice of that memory. Each virtual partition wanted to see all different storage devices (20,000 could be typical), while only being allocated a few hundred megabytes of RAM.
On these systems, the device tree was quickly found to suck up a huge percentage of memory that was never released back to the user processes. It was time to put the driver model on a diet, and some very smart IBM kernel developers went to work on the problem.
What the developers found was initially surprising. It turned out that the main struct device structure was only around 160 bytes (for a 32-bit processor). With 20,000 devices in the system, that amounted to only 3 to 4 MB of RAM being used, a very manageable usage of memory. The big memory hog was the RAM-based filesystem mentioned earlier, sysfs, which showed all of these devices to userspace. For every device, sysfs created both a struct inode and a struct dentry. These are both fairly heavy structures, with the struct inode weighing in around 256 bytes and struct dentry about 140 bytes.[§]
[§] Both of these structures have since been shrunk, and therefore are smaller in current kernel versions.
For every struct device, at least one struct dentry and one struct inode were being created. Generally, many different copies of these filesystem structures were created, one for every virtual file per device in the system. As an example, a single block device would create about 10 different virtual files, so that meant that a single structure of 160 bytes would end up using 4 KB. In a system of 20,000 devices, about 80 MB were wasted on the virtual filesystem. This memory was consumed by the kernel, unable to be used by any user programs, even if they never wanted to look at the information stored in sysfs.
The solution for this was to rewrite the sysfs code to put these struct inode and struct dentry structures in the kernel's caches, creating them on the fly when the filesystem was accessed. The solution was just a matter of dynamically creating the directories and files on the fly as a user walked through the tree, instead of preallocating everything when the device was originally created. Because these structures are in the main caches of the kernel, if memory pressure is placed on the system by userspace programs, or other parts of the kernel, the caches are freed and the memory is returned to those who need it at the time. This was all done by touching the backend sysfs code, and not the main struct device structures.