CentOS Stream 8 software RAID1 and UEFI GPT Boot configuration

This post has nothing to do with the usual content of this website (not K6 related) but I’m sharing my experience of configuring a CentOS Stream 8 Linux software RAID 1 installation on an UEFI system that actually boots when you replace a failed disk.

Most of the tutorials and articles online are about such installations where the the disks are in MBR (legacy) mode and not GPT (UEFI) and this has a huge impact on how your system boots and how you act when a disk fails. We’ll have a look at a scenario where we:

  • Start by installing CentOS Stream 8 in software RAID 1 (on 2 disks) mode
  • Simulate the failing of one drive
  • Replace the drive so the RAID arrays are synchronized again
  • Configure the UEFI to be able to boot on this new replacement drive

For demonstration purposes, this will be done using a VM in VirtualBox, but it also applies to any PC or Server booting in UEFI mode, as most machines do nowadays.

Step 1 – Configure the VirtualBox VM for RAID 1 and UEFI boot

The VirtualBox configuration is quite easy, start by creating your VM (I named the disk disk1.vdi for more clarity) as usual and then make 2 changes to the hardware:

  • In the “System” tab, select “Enable EFI (special OSes only)
  • In the “Storage” tab, add a second disk, here it is “disk2.vdi”

Step 2 – Install CentOS 8 in software RAID 1 mode

We can now proceed with the CentOS Stream 8 installation, I will only show you the steps concerning the RAID 1 configuration. In our setup, all partitions including /boot and /boot/efi will be RAIDed

The first step is to select the 2 disks as installation destinations, they need to be “checked” to do so. Then select a “custom” storage configuration and click Done.

Then for demonstration purposes I let the OS create the partition scheme automatically.

Now comes the part where we can convert the standard partition scheme to a RAID 1 configuration. For each of the 4 mount points created (/, /boot/efi, /boot and swap), change the field under “Device Type” from LVM to RAID. Note that RAID1 mode is automatically selected.

After that you can proceed with your CentOS installation as usual you now have a system running in fully RAID1.

Step 3 – Checking the configuration in the OS – know the tools

The first thing we can do after the OS has started is to verify that we are in GPT (UEFI) mode, this can be seen with parted. The partition table type should be “gpt”. If you have “msdos” instead you are not in UEFI mode, you are in MBR mode.

In the current example, both our drives (sda and sdb) are in gpt mode, which is what we expect.

[root@testbox ~]# parted -l
Model: ATA VBOX HARDDISK (scsi)
Disk /dev/sda: 21.5GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number Start End Size File system Name Flags
1 1049kB 18.7GB 18.7GB raid
2 18.7GB 19.7GB 1074MB raid
3 19.7GB 20.4GB 629MB fat32 raid
4 20.4GB 21.5GB 1103MB raid

Model: ATA VBOX HARDDISK (scsi)
Disk /dev/sdb: 21.5GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number Start End Size File system Name Flags
1 1049kB 18.7GB 18.7GB raid
2 18.7GB 19.7GB 1074MB raid
3 19.7GB 20.4GB 629MB fat32 raid
4 20.4GB 21.5GB 1103MB raid

The second thing we can look at is the configuration of our RAID1. We have a status file and an utility for that, one (/proc/mdstat) which shows us the status of the RAID arrays, and one (lsblk) which gives us a better view of which drives are part of which array on which mount point.

Let’s start with mdstat. Here we can see our 4 RAID 1 arrays and we can learn that:

  • Array md124 is composed of partitions sda3 and sdb3
  • Array md125 is composed of partitions sda2 and sdb2
  • Array md126 is composed of partitions sda1 and sdb1
  • Array md127 is composed of partitions sda4 and sdb4
  • All arrays are operational, this is shown by the “[UU]” status representing the 2 drives in the array

[root@testbox ~]# cat /proc/mdstat
Personalities : [raid1]
md124 : active raid1 sdb3[1] sda3[0]
614336 blocks super 1.0 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk

md125 : active raid1 sdb2[1] sda2[0]
1046528 blocks super 1.2 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk

md126 : active raid1 sda1[0] sdb1[1]
18211840 blocks super 1.2 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk

md127 : active raid1 sda4[0] sdb4[1]
1075200 blocks super 1.2 [2/2] [UU]

lsblk shows us some of the same information (which partitions are in which arrays) but it also tells us what mount point it is associated with:

[root@testbox ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 20G 0 disk
├─sda1 8:1 0 17.4G 0 part
│ └─md126 9:126 0 17.4G 0 raid1 /
├─sda2 8:2 0 1G 0 part
│ └─md125 9:125 0 1022M 0 raid1 /boot
├─sda3 8:3 0 600M 0 part
│ └─md124 9:124 0 600M 0 raid1 /boot/efi
└─sda4 8:4 0 1G 0 part
└─md127 9:127 0 1G 0 raid1 [SWAP]
sdb 8:16 0 20G 0 disk
├─sdb1 8:17 0 17.4G 0 part
│ └─md126 9:126 0 17.4G 0 raid1 /
├─sdb2 8:18 0 1G 0 part
│ └─md125 9:125 0 1022M 0 raid1 /boot
├─sdb3 8:19 0 600M 0 part
│ └─md124 9:124 0 600M 0 raid1 /boot/efi
└─sdb4 8:20 0 1G 0 part
└─md127 9:127 0 1G 0 raid1 [SWAP]

Step 4 – Fail a disk

We will now purposely fail a disk so our RAID 1 arrays are broken, note that in real life you may have to do the same if you have a disk that still works but displays SMART errors. Or if a disk completely fails, this will be done automatically for you :)

In this example, we will fail the disk “sdb”. This action has 2 steps: mark the partitions belonging to sdb in every array as failed, and removing the partitions from the arrays. mdadm is the tool for the job and we can use the output from the previous commands to help us:

[root@testbox ~]# mdadm –manage /dev/md124 –fail /dev/sdb3
mdadm: set /dev/sdb3 faulty in /dev/md124
[root@testbox ~]# mdadm –manage /dev/md125 –fail /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md125
[root@testbox ~]# mdadm –manage /dev/md126 –fail /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md126
[root@testbox ~]# mdadm –manage /dev/md127 –fail /dev/sdb4
mdadm: set /dev/sdb4 faulty in /dev/md127

If you run cat /proc/mdstat at this point you will notice that the array configurations have chaged to [U_]. Now we can remove the sdb partitions from the arrays:

[root@testbox ~]# mdadm –manage /dev/md124 –remove /dev/sdb3
mdadm: hot removed /dev/sdb3 from /dev/md124
[root@testbox ~]# mdadm –manage /dev/md125 –remove /dev/sdb2
mdadm: hot removed /dev/sdb2 from /dev/md125
[root@testbox ~]# mdadm –manage /dev/md126 –remove /dev/sdb1
mdadm: hot removed /dev/sdb1 from /dev/md126
[root@testbox ~]# mdadm –manage /dev/md127 –remove /dev/sdb4
mdadm: hot removed /dev/sdb4 from /dev/md127

And again, if you look at /proc/mdstat, you will see that only partitions belonging to sda are listed, the ones from sdb are gone:

[root@testbox ~]# cat /proc/mdstat
Personalities : [raid1]
md124 : active raid1 sda3[0]
614336 blocks super 1.0 [2/1] [U_]
bitmap: 0/1 pages [0KB], 65536KB chunk

md125 : active raid1 sda2[0]
1046528 blocks super 1.2 [2/1] [U_]
bitmap: 0/1 pages [0KB], 65536KB chunk

md126 : active raid1 sda4[0]
1075200 blocks super 1.2 [2/1] [U_]

md127 : active raid1 sda1[0]
18211840 blocks super 1.2 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk

Step 5 – Replace the failed disk with a new one and re-synchronize the RAID1

To replace the disk:

  • Power off you system
  • Change your failed disk (in VirtualBox, remove “disk2.vdi” and add a “nexdisk.vdi). Your disk needs to be at least the same size as the one you replace, but it can be bigger
  • Boot you system

Now we are going to re-configure the new drive (still seen as /dev/sdb) to re-enable it in the RAID arrays. Note that this points differs in GPT mode compared to MBR. For that, we need to install a disk utility on our machine, called sgdisk. On CentOS Stream 8 it is done by running dnf install gdisk

The first step involves copying the partition table from sda to sdb, and generate a new disk ID (UUID) for sdb:

[root@testbox ~]# sgdisk –backup=sda_parttable_gpt.bak /dev/sda
The operation has completed successfully.
[root@testbox ~]# sgdisk –load-backup=sda_parttable_gpt.bak /dev/sdb
Creating new GPT entries.
The operation has completed successfully.
[root@testbox ~]# sgdisk -G /dev/sdb
The operation has completed successfully.

We can now run lsblk again and see that sdb has 4 partitions the same size as the ones on sda.

[root@testbox ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 20G 0 disk
├─sda1 8:1 0 17.4G 0 part
│ └─md125 9:125 0 17.4G 0 raid1 /
├─sda2 8:2 0 1G 0 part
│ └─md126 9:126 0 1022M 0 raid1 /boot
├─sda3 8:3 0 600M 0 part
│ └─md124 9:124 0 600M 0 raid1 /boot/efi
└─sda4 8:4 0 1G 0 part
└─md127 9:127 0 1G 0 raid1 [SWAP]
sdb 8:16 0 20G 0 disk
├─sdb1 8:17 0 17.4G 0 part
├─sdb2 8:18 0 1G 0 part
├─sdb3 8:19 0 600M 0 part
└─sdb4 8:20 0 1G 0 part

We can now re-create the RAID 1 arrays by adding the correct sdb partitions to the mdX arrays. Looking at the partition sizes, this means:

[root@testbox ~]# mdadm –manage /dev/md124 –add /dev/sdb3
mdadm: added /dev/sdb3
[root@testbox ~]# mdadm –manage /dev/md125 –add /dev/sdb1
mdadm: added /dev/sdb1
[root@testbox ~]# mdadm –manage /dev/md126 –add /dev/sdb2
mdadm: added /dev/sdb2
[root@testbox ~]# mdadm –manage /dev/md127 –add /dev/sdb4
mdadm: added /dev/sdb4

And if you look at /proc/mdstat you actually see the data rebuild happening:


[root@testbox ~]# cat /proc/mdstat
Personalities : [raid1]
md124 : active raid1 sdb3[2] sda3[0]
614336 blocks super 1.0 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk

md125 : active raid1 sdb1[2] sda1[0]
18211840 blocks super 1.2 [2/1] [U_]
[==>………………] recovery = 14.6% (2667456/18211840) finish=1.2min speed=205188K/sec
bitmap: 1/1 pages [4KB], 65536KB chunk

md126 : active raid1 sdb2[2] sda2[0]
1046528 blocks super 1.2 [2/1] [U_]
resync=DELAYED
bitmap: 1/1 pages [4KB], 65536KB chunk

md127 : active raid1 sdb4[2] sda4[0]
1075200 blocks super 1.2 [2/1] [U_]
resync=DELAYED

At this point you have replaced your disk and all data are successfully re-synchronized. But there is still one problem…your UEFI system only knows how to boot on /dev/sda but not on your new /dev/sdb. This is not a problem unless /dev/sda fails at some point. Then you will be stuck with an un-bootable system.

Step 6 – Configure the UEFI so we can boot on the replaced disk

Configuring the UEFI so it can boot on /dev/sda and /dev/sdb is probably the most obscure part of this process and it isn’t really well documented anywhere.

For this process, the firs step is to know on which partitions the EFI boot image is, this is where the /boot/efi partition is and we know it’s on /dev/sda3 and /dev/sdb3 from the previous lsblk command. Another way to see this is to use blkid and look for the partitions with the “boot_efi” label:

[root@testbox ~]# blkid
/dev/sda1: UUID=”92264c45-1a4c-bb6c-a067-4c9f725de306″ UUID_SUB=”2b300c6f-a3d7-4a53-71e0-028db9956603″ LABEL=”testbox:root” TYPE=”linux_raid_member” PARTUUID=”1b1f3eec-1565-4717-a345-81c6d8dfaf63″
/dev/sda2: UUID=”6cdc6268-22f1-1073-be0a-a422b024611c” UUID_SUB=”fab4355b-cd48-2cdf-b390-563cd4922d29″ LABEL=”testbox:boot” TYPE=”linux_raid_member” PARTUUID=”89edf2e6-b556-46d0-8a99-610c6e042bee”
/dev/sda3: UUID=”88aadcb5-b05d-75f3-41f3-0b9a4f2a7292″ UUID_SUB=”1ae6db87-d6cb-a667-33ee-2fad253f64c8″ LABEL=”testbox:boot_efi” TYPE=”linux_raid_member” PARTUUID=”e9bc076b-14de-4480-8122-45e5cf47357e”
/dev/sda4: UUID=”521841ff-95fc-5d9b-5d41-96c9a0001a03″ UUID_SUB=”4a77e747-7ae1-33f2-08dd-531d56d23d46″ LABEL=”testbox:swap” TYPE=”linux_raid_member” PARTUUID=”62e8b9ee-2ba7-4923-adad-4c04fff8af10″
/dev/sdb1: UUID=”92264c45-1a4c-bb6c-a067-4c9f725de306″ UUID_SUB=”899fa261-df1c-c0e9-9d0d-98c25b98c451″ LABEL=”testbox:root” TYPE=”linux_raid_member” PARTUUID=”0a4b8678-e6e3-4d74-821b-1ad2d1263659″
/dev/sdb2: UUID=”6cdc6268-22f1-1073-be0a-a422b024611c” UUID_SUB=”20078c1e-9aa6-131a-a42b-33655048d1d7″ LABEL=”testbox:boot” TYPE=”linux_raid_member” PARTUUID=”e582c6f1-0318-4d94-b917-f089f5b94492″
/dev/sdb3: UUID=”88aadcb5-b05d-75f3-41f3-0b9a4f2a7292″ UUID_SUB=”70f5a3a3-5413-c7dd-1c45-08af93516dff” LABEL=”testbox:boot_efi” TYPE=”linux_raid_member” PARTUUID=”ba95fcb4-57d4-4463-8f9c-e7efaf825df8″
/dev/sdb4: UUID=”521841ff-95fc-5d9b-5d41-96c9a0001a03″ UUID_SUB=”e23ebb0e-b22a-914a-c025-5ef17b26bfbe” LABEL=”testbox:swap” TYPE=”linux_raid_member” PARTUUID=”db22ded1-6d51-4bda-be6a-0ab02f1d0f1d”
/dev/md127: UUID=”0a290b0e-9842-4ebd-80a8-8f62d0b78e41″ TYPE=”swap”
/dev/md126: UUID=”f5bcc5a4-b6ea-4da5-b333-26a70046e8db” BLOCK_SIZE=”512″ TYPE=”xfs”
/dev/md125: UUID=”06f67d10-5473-4144-8912-783fe3bc22f1″ BLOCK_SIZE=”512″ TYPE=”xfs”
/dev/md124: UUID=”EEE5-4DFF” BLOCK_SIZE=”512″ TYPE=”vfat”

Armed with that knowledge, we can look at the current EFI Boot Manager configuration:

[root@testbox ~]# efibootmgr list -v
BootCurrent: 0005
Timeout: 0 seconds
BootOrder: 0006,0005,0000,0001,0002,0004,0003
Boot0000* UiApp FvVol(7cb8bdc9-f8eb-4f34-aaea-3ee4af6516a1)/FvFile(462caa21-7614-4503-836e-8ab6f4662331)
Boot0001* UEFI VBOX CD-ROM VB2-01700376 PciRoot(0x0)/Pci(0x1,0x1)/Ata(1,0,0)N…..YM….R,Y.
Boot0002* UEFI VBOX HARDDISK VBad6c2a23-54cbc127 PciRoot(0x0)/Pci(0xd,0x0)/Sata(0,65535,0)N…..YM….R,Y.
Boot0003* UEFI VBOX HARDDISK VB6bb70033-086ede82 PciRoot(0x0)/Pci(0xd,0x0)/Sata(1,65535,0)N…..YM….R,Y.
Boot0004* EFI Internal Shell FvVol(7cb8bdc9-f8eb-4f34-aaea-3ee4af6516a1)/FvFile(7c04a583-9e3e-4f1c-ad65-e05268d0b4d1)
Boot0005* CentOS Linux HD(3,GPT,e9bc076b-14de-4480-8122-45e5cf47357e,0x24c5800,0x12c000)/File(\EFI\centos\shimx64.efi)
Boot0006* CentOS Linux HD(3,GPT,ef341aae-78f3-48ed-b850-e5725c54e85d,0x24c5800,0x12c000)/File(\EFI\centos\shimx64.efi)

These 2 lines correspond to /dev/sda3 and the old /dev/sdb3 we replaced, you can see this by comparing the string after “GPT,” with the PARTUUID of the blkid output: the one for /dev/sda corresponds but not the one for /dev/sdb3 (ef341aae… for the old drive, ba95fcb4… fot the new drive).

So, we can start by removing the entry for the old /dev/sdb3 (Boot00006):

[root@testbox ~]# efibootmgr -Bb 0006

And add en entry for the new disk, still /dev/sdb3:

[root@testbox ~]# efibootmgr –create –disk /dev/sdb –part 3 –label “CentOS Linux on sdb” –load “\EFI\centos\shimx64.efi”

By default this will set the new entry as the default boot device, to set the default boot back to /dev/sda you can change the ordrer:

[root@testbox ~]# efibootmgr -o 0005,0006,0000,0001,0002,0004,0003
BootCurrent: 0005
Timeout: 0 seconds
BootOrder: 0005,0006,0000,0001,0002,0004,0003
Boot0000* UiApp
Boot0001* UEFI VBOX CD-ROM VB2-01700376
Boot0002* UEFI VBOX HARDDISK VBad6c2a23-54cbc127
Boot0003* UEFI VBOX HARDDISK VB6bb70033-086ede82
Boot0004* EFI Internal Shell
Boot0005* CentOS Linux
Boot0006* CentOS Linux on sdb

Step 7 – Test that it actually works

The last thing to do is power off your VM and remove “disk1.vdi” so only our “newdisk.vdi” remains, and test if it boots! If it does, you did everything coirrectly!

I hope this can help someone!

Add your feedback

Your email address will not be published. Required fields are marked *