Skip to content

Booting up Linux Kernel - Linux for Engineers #1

Ever wondered what happens when your Linux machine starts? There is a fascinating song and dance between different components which brings a machine to life. Lets dive into the world of booting Linux Kernel.

Linux kernel is a monumental achievement of collaboration in the world of open-source software.
It has resulted in a powerful, versatile operating system base that powers an estimated 80-90% of the modern tech world
From smartphones in our pockets to the vast cloud infrastructure, Linux is the backbone of many technologies we rely on daily.
This series chronicles different aspects of modern day Linux system.
Each article aims to be a tldr; While linking to amazing talks/presentations/articles by some great minds.
Treat this as a crash course for the lazy and a jump-off point for the curious.
Happy reading!
Have feedback or questions, or want to be notified about more such articles? Follow me on Twitter @wiresurfer

⚡ And then there was light! ⚡

Ever wondered what happens when you press the power button on your machines.
The fan's come to life, your RGB lights light up, there is a bell sound, and the screen shows a sea of gibberish [or if you started using laptops, you probably see a fancy logo and a spinner. How boring is that!]
By the end of this article, I want to unwrap the song and dance that goes on between hardware and different types of software to finally bring to you a usable PC.

I won't hold back on some technical details and gloss over some in the interest of salvaging some brevity.

This post in particular will have a buildup and posturing in the beginning about hardware, bios and grub.
While it isn't absolutely necessary for a Linux Kernel bootup saga, I do feel its important for a well rounded understanding.

For the impatient, feel free to jump to Linux Kernel Bootup Sequence


🖥️ Modern PC Architecture

A modern day PC hasn't changed its core design philosophy for over 30 years. Yes, there has been miniaturization and our electronics design has improved by leaps and bounds, but the basic building blocks have remained the same.
Here is a picture of a motherboard annotated to show different subsystems.

|600

Even this can be simplified down to the following block diagram.
If you pay attention, most of the peripherals used with modern machines connect over the PCI express bus.

|475

  • Most demanding high throughput devices are the graphics cards and memory modules. They attach directly to the CPU over dedicated lanes often referred to as the Northbridge .

  • Southbridge on the other hand is a separate bus controller which manages other peripherals and connects to the CPU with a dedicated link. [ D irect M edia I nterface or DMI3.0 is an Intel specific link ]

One thing omitted in this block diagram is a plethora of i2c, uart, serial , PWM and GPIO 1 which help the motherboard maintain its function.

Notable peripherals in a motherboard include:

  • CMOS Clock [helps with timekeeping specially in embedded devices. Modern OS's often use NTP to sync time]
  • Temperature Sensors 2
  • PWM controllers for fan control

The first step in bringing the computer to working order is performed by a special program called the BIOS which is programmed onto a ROM chip on the motherboard, right from the factory. Lets see what happens in the BIOS.

High level Sequence of Boot up Steps |475

🧩 BIOS and Power-On Self Test (POST)

BIOS is an embedded program responsible for starting the computer and performing a POST (Power-On Self Test). UEFI is the new modern reincarnation on the BIOS and the new kid on the block. From the Linux kernel bootup perspective, BIOS vs UEFI has limited implications in the startup process 3

Minimum POST checks at startup. Pic Credit: CGDirector.com |475

  1. Power-On Self Test (POST): During POST, the BIOS initializes the CPU and memory subsystem. It checks if these components are functioning correctly. If they start properly, the computer is ready to boot. However, this depends on the rest of the hardware functioning correctly as well.
  2. Hardware Initialization: The BIOS initializes and lists all other peripheral hardware attached to the system. These peripherals are generally connected via buses like I2C, PCI Express, and SATA, which connect them to the CPU
  3. CMOS Battery Check: Keeping correct time is important. Not so critical these days, but back in the day, a depleted CMOS battery could lead to a lot of mess. Often a motherboard would refuse to boot automatically if it detected a fault in the CMOS Clock.
  4. RAM check : Verify RAM speed and make sure the bus speed of various subsystems are compatible. Overclockers tinker with the voltage levels of the RAM and the CPU to force it to run at higher than prescribed clock speeds. BIOS/UEFI these days protects such folks from frying up their threadripper/i9 cpu
  5. Boot a Storage Device: Try to find bootable storage devices. Either a disk with Master Boot Record entries, or a GPT UUID partitioned disk.
  6. BIOS and UEFI both perform POST on motherboards. UEFI just happens to be with the times.|625
Handing over to a Bootloader

As we learned BIOS acts as the first sanity check, initializing the system and performing POST. It hands over control to a Bootloader , a somewhat more advanced program that is tasked with loading operating systems


💽 Bootloader and GRUB

After POST, BIOS looks for special I/O devices to provide the next program to run, typically the bootloader.
Locating the bootloader is done by following a few conventions which have been in the PC world for decades. In short, BIOS has a configured set of boot devices. This preference is stored in the CMOS.

After POST, BIOS will start going through each Boot Device and verify if it has a Master Boot Record . This is where GRUB enters the picture. GRUB is a program residing in the Master Boot Record of a bootable device.

Here is a quick anatomy of the Master Boot Record. Its the first sector of a disk. A disk sector is traditionally 512bytes. Modern disks have larger sectors up to 4096 bytes [Advanced Format Disks] but even these disks access the physical media in 512byte emulated mode [512e]

512bytes of the MBR are broken up as follows

  • 1-446bytes - Bootloader code. Grub boot.img
  • 64 bytes - Master Partition Entries, Up to 4 entries, 16 bytes each.
  • Last 2bytes - A special magic number 0xAA55 . This indicates to the BIOS, there is a bootloader on this disk device.

With that out of the way, lets look at GRUB as a bootloader.

Now. lets be honest. 512bytes is a tight space, and adding things like splash image, nice graphics and menus to dual boot an OS will take way more than 512bytes. For eg. EasyBMP a tiny library to display images on screen when statically linked is 20kb!

To work around this limitation, Grub does a multi-stage boot.
Stage 1 : Boot.img , Fits in 440byte MBR. Its a simple sled program which loads Stage 1.5 using a LBA address jump
Stage 1.5 : Core.img , Fits in 32kb. It has just enough file system modules to load \/boot\/grub .
Stage 2: \/boot\/grub is a special folder in /boot partition or a path on the root / partition. This is where all grub modules are available. It contains module drives for a large set of IO devices including networking booting. It also brings some nice things like a GUI, selection menus and splash pages!

|500

Now all that theory out of the way, how do we boot the kernel itself? We usually just select a menu and boom! Linux starts loading up?
Well, there is a bunch of configuration files and menu entries which makes that happen. Just peek into /boot/grub/grub.cfg . But we aren't looking at GRUB and its magic. Lets try to boot a system by first principles.

Here is the minimal set of commands you can use on a GRUB prompt to start a Linux system.

insmod linux
set root=(hd0,1)
linux /boot/vmlinuz-3.13.0-29-generic root=/dev/sda1
initrd /boot/initrd.img-3.13.0-29-generic
boot

A quick description of what's happening.

  • Load the linux grub module. This module tells grub how to start a Linux like operating system.
  • hd0,1 - This is the tricky bit! This is grub's way of selecting a hard drive partition. varies on each machine, depending on how it was partitioned. This stackexchange discussion should come in handy
  • /boot/vmlinuz-3.13.0-29-generic is the compressed compiled linux kernel. Tab completion is your friend. Version number 3.13.0 would likely change in your case.
  • /dev/sda1 is the root file system device.
  • /boot/initrd.img-3.13.0-29-generic is the initial ram disk
  • boot kicks off the boot process and runs vmlinux.

Once we write boot, GRUB is officially trying to hand over control to the kernel. Phew!

Future Topics
If you're interested, I can cover BIOS and bootloader in more detail in future posts. However, for practicality, most of us won't be dealing directly with building BIOS or bootloaders. This high-level overview should provide a clear understanding of the initial steps in the boot sequence.


🐧 Linux Kernel Bootup Sequence

Many seasoned developers have observed a Linux system boot up, witnessing a stream of text scrolling by on the screen. This output primarily consists of driver initializations and service startups. Despite its apparent complexity, the Linux kernel boot process is relatively straightforward. Let's delve into the most prevalent Linux kernel boot sequence

Linux Boot Process in a Nutshell

Bare Minimum Linux Kernel?

As we saw in the grub commands, achieving a minimal boot for a practical PC experience requires,

  • vmlinuz : Compressed compiled linux kernel binary
  • initramfs or initrd : compressed archive which can be expanded by vmlinuz or grub and placed into memory
  • (additionally) root filesystem

Linux follows a two stage booting process. The bootloader first loads the stage 1 kernel.

Stage 1 kernel's aim is simple and can be listed as follows

  • Load just enough kernel modules to mount a proper filesystem.
  • Mount said root file system
  • Hand over control to an init executable in the root filesystem. (/init, /sbin/init or configured paths in linux/init/main.c at torvalds/linux · GitHub )

Because storage hardware comes in various formats (curse of linux's ubiquity), kernel developers have chosen to break the booting process into an initial in-memory file system load ( Initramfs or initrd ) which then mounts and loads the root filesystem

Stage 2 kernel boot is where the init process starts configuring hardware and running services, usually running programs initiated in userspace but invoking kernel space syscalls. This ensures the boot process can be configured without having to recompile the kernel for every small customizations. This design choice is the reason why we all aren't kernel developers (yet)

Note: About minimal linux
For a truly minimal boot experience we could have a stripped down vmlinuz which runs a statically compiled /sbin/init program to print hello world! all in < 10mbs of RAM, if you configure your kernel boot correctly!

Initramfs and initrd (initial ramdisk) serve a similar purpose. initramfs is a modern take on initrd;
Write to me if you think I should write about the internals of initramfs/initrd.
PS: A quick search would point to some great resources.
I do find What’s the Difference Between initrd and initramfs? | Baeldung on Linux a good reference.

🌴 Root File System : Linux Kernel Directory Structure

Upon starting, the Linux kernel prepares and virtually presents some special file system in a specific way. The root file system and the initial RAM file system (initramfs) are crucial parts of this structure.
Initramfs is the first file system that the kernel mounts and operates in memory, rather than on disk.
Root filesystem is then loaded and presents more kernel modules, libraries, binary utilities and daemons.
The root filesystem also starts various daemons which could further mount other storage devices and filesystems.

We need to be aware of /proc and /sys directories, which contain information about the running kernel and allow certain kernel parameters to be modified.
We also need to learn how all these file systems are combined together to provide a singular working view of the running system using overlayfs

Init Process and the Role of PID 1: systemd and upstart in modern linux distros

In the world of Linux, PID 1 holds a special place. PID 1, or Process ID 1, is the first process started by the Linux kernel during the boot sequence and is the ancestor of all other processes. Understanding PID 1 and its role is crucial for grasping how a Linux system initializes and manages services.

When the Linux kernel finishes its initial setup, it launches the first user-space process, which is assigned PID 1. Traditionally, this process was the init system, responsible for starting system processes, handling system initialization, and managing services. The init system follows a predefined sequence to bring the system to a usable state.

PID 1 is critical because it remains running as long as the system is up and serves as the parent for all other processes. If PID 1 terminates, the kernel will panic, causing the system to halt or reboot, as there would be no process to adopt orphaned child processes.

systemd as init

In modern Linux distributions, systemd has largely replaced the traditional init system as the default system and service manager. systemd is designed to provide a more efficient and feature-rich way of managing system processes and services. It still occupies the PID 1 slot and takes on the responsibilities of its predecessor but with enhanced capabilities.

How systemd Works?

When the kernel passes control to systemd as PID 1, systemd begins its initialization process by mounting the initial file systems and starting essential services. It reads its configuration from unit files located in /etc/systemd/system/ and /usr/lib/systemd/system/ . These unit files describe how to manage services, sockets, devices, and other system components.

systemd also sets up the cgroups (control groups) to manage resource allocation and limits for processes

|850

Kernel Privilege rings and security. PID 1's security implications

In the architecture of modern computer systems, privilege rings play a crucial role in maintaining security and stability. The Linux kernel operates in these rings to control access to resources and enforce security policies. Understanding the concept of privilege rings and the security implications of PID 1 helps in comprehending how Linux ensures a secure environment.

Privilege Rings Explained

Privilege rings are hierarchical levels of privileges that a system's processes can have. They range from Ring 0, the highest level of privilege, to Ring 3, the lowest.

  • Ring 0 (Kernel Mode): The most privileged level, where the operating system kernel operates. It has unrestricted access to all system resources and hardware.
  • Ring 3 (User Mode): The least privileged level, where user applications run. It has restricted access, requiring sudo to elevate to kernelmode

The kernel mode (Ring 0) allows the operating system to execute critical tasks that require direct access to hardware and memory. User mode (Ring 3) provides a restricted environment for applications, preventing them from directly accessing hardware and system memory, thus protecting the system from malicious software and user errors.

Ironically Ring 1 and Ring 2 have felt out of favor.

As a hobbyist Linux nerd, I discovered that the benefits of rings 1 and 2 in the modern protection model are greatly diminished due to paging only distinguishing between privileged (ring 0, 1, 2) and unprivileged levels.

Anecdotally Intel designed rings 1 and 2 to house device drivers, providing them with some privileges while keeping them somewhat separate from kernel code. Although rings 1 and 2 can access supervisor pages, they still trigger a General Protection Fault (GPF) if they use a privileged instruction, similar to ring 3. Despite this, rings 1 and 2 are useful in certain designs.

For instance, VirtualBox places guest kernel code in ring 1, and some operating systems do utilize them, though it is not a widely popular design choice at present.

🏁 Conclusion

Dear reader, I hope this gives you a glimpse into the fascinating engineering that powers your PC.
We followed a popular path through the woods of bootloaders and linux kernels.
In reality, there are many ways to boot a machine. We've got three components to mix and match. The BIOS, Bootloader and the OS/Kernel.

In modern devices with a single powerful System on Chip design, we sometimes encounter BIOS and Bootloader becoming very slim and having features to offer OEM locking and security. In most systems there is a proprietary "BIOS". 4

Keen readers would noticed how Grub was a two stage bootloader, and Linux kernel was a two stage OS boot system. There have been efforts like Direct Kernel Boot 5 to merge these two components + four steps into a streamlined two step process. This speeds up the boot process and is usually seen as an option in Virtualization solutions and hypervisors.

In no particular order or importance, here are some interesting projects powering the modern device bootup space.

  • U-Boot - Usually used in embedded devices. Gives fine grained control of where the kernel gets loaded into device memory. As seen in:
    • SpaceX Dragon/falcon/
    • Ubiquiti/TP-Link network devices
  • Coreboot
    • Chrome OS devices, Lenovo ThinkPad's and
  • Android Devices :
    • While android uses a variant of the Linux kernel, there is no standard boot loader prescribed for running android os.
    • OEMs implement their own boot loader depending on the storage options available and the SoC used.
    • Qualcomm chipsets use Little Kernel and try to be UEFI compatible.
    • MediaTek chipsets use a variant of U-Boot.
    • Major players like Samsung have their own variants of the bootloader.
  • Cloud VMs : AWS/Azure/GCP support custom virtualized BIOS/UEFI for their VMs.
    • Cloud BIOS supports emulating keyboard/serial console during the boot process, even before the bootloader or OS has loaded. This makes supporting any standard bootloader possible on cloud infrastructure. Critical for disaster recovery scenarios where your server, 7000kms away stops booting!
    • Virtio and OS Images : Cloud Boot disks are virtualized and attached on demand. You could explore virtio locally on qemu kvm. Virtualized storage enables using pre-built images for different operating systems. A superpower for running repeatable consistent infrastructure at scale!
    • Cloud-Init : They also have additional initialization steps after the kernel and operating system start called cloud-init . This helps setup networking, user accounts, passwords and ssh keys, networking and services. It also helps platform operators to further extend the boot process.

Have feedback or questions, or want to be notified about more such articles? Follow me on Twitter @wiresurfer


Footnotes


  1. GPIOs are common in embedded boards like raspberry pi or industrial PCs. Here is an insightful stackoverflow discussion about hacks to use GPIOs on desktop motherboards

  2. PWM Controllers and Temperature Sensors together play a critical role in thermal management. Most modern UEFI motherboards offer fine grained control over the sensor readings and the Fan control curve.

  3. This is a loose statement and I accept its far from true. For all you advanced practitioners out there, remember we are writing this guide down to be approachable. I am shying away from a lot of complexity and keeping things simple. Adam Williamson from Redhat, who has written about Linux, has a great post about this

  4. "BIOS" is a symbolic name here. Each SoC needs to start the hardware and do some rudimentary POST. Most Mobile devices also need to initialize a special telephony subsystem which runs its own Baseband OS. A traditional BIOS/UEFI introduced here would mean pushing a square peg through a round hole.

  5. 20.2.3. Direct kernel boot Red Hat Enterprise Linux 6 | Red Hat Customer Portal