License: CC BY 4.0

Recently, one of my favourite YouTube channels decided to cease their presence on YouTube and remove their channel. I stumbled across some old pictures which I overly compressed to fit to 600 megabytes - CD-ROM sized - faces barely recognizable. When visiting some bookmark from 2010, I found out that it was not served through that link anymore. The time for me is now, to become a datahoarder. 🕸️

Disclaimer: I’m not an expert in this field and merely want to record and share my experience, that’s why I chose to license this text with the formidable CC BY 4.0.

What does that practically entail for me? Setting up some long-term reliable storage on my Linux desktop in the form of a RAID-1 array. As I found existing tutorials a bit confusing and I wanted to retain the steps that I undertook for future reference, I decided to make them publicly available here.

Steps to take:

Set-up Multiple-Device (MD) RAID array

I am talking software RAID here. Motherboards may provide some RAID mode but on consumer and entry-level boards this is typically software RAID in disguise, e.g. HostRAID. Software RAID in Linux is implemented through the md Multiple Devices device driver, and mdadm is the tool to administer these RAID devices.

mdadm - manage MD devices aka Linux Software RAID

Install mdadm. (Ubuntu, Debian, Arch: package mdadm)

Find the devices that you want to use in your array with lsblk. I will assume /dev/sda and /dev/sdb. If there is any data on these devices, the following will remove that data.

Now, format the disks and create a GPT partitiontable with a Linux Partition on it that will be the RAID member. For each disk:

  • g create a new GPT partition table
  • n create a new partition, choose the defaults to fill the disk
  • w write to disk ⚠️ this destroys all data on the disk
$ fdisk /dev/sda
Welcome to fdisk (util-linux 2.34).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Command (m for help): g
Created a new GPT disklabel (GUID: FD7A7EC5-CC10-A140-B39D-2C8D3E43A007).

Command (m for help): n
Partition number (1-128, default 1): 
First sector (2048-1953091, default 2048): 
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-1953091, default 1953091): 

Created a new partition 1 of type 'Linux filesystem' and of size 952.7 MiB.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Synching disks.

Repeat for all disks. Note that lsblk will show the devices /dev/sda, /dev/sdb, etc.

Now, for creating the array, invoke mdadm --create with these part devices. Again, make sure to create the array with the part devices, not with the disk devices! Choose the target location, e.g. /dev/md0, the RAID level, I choose 1 for redundancy, the number of devices and the list of part devices.

$ mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda /dev/sdb
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

Great. lsblk will now show the device md1 in the tree beneath all the part devices.

We created a software RAID device and can continue to set-up LVM on it!

However, the device is not yet persistent.

Configure mdadm to reassemble the array on reboot, then update the initrd to allow mdadm to settle there.

$ mdadm --detail --scan | tee -a /etc/mdadm/mdadm.conf
$ update-initramfs -u

Set-up Logical Volume Management (LVM)

Create a physical volume from the md-device:

$ pvcreate /dev/md0

Create a volume-group (e.g. vgmd0) on the physical volume:

$ vgcreate vgmd0 /dev/md0

Create a logical volume (e.g. media, 1TB) on the volume-group:

$ lvcreate -n media -L 1T vgmd0

Create a file-system (here, ext4) on the new partition:

$ sudo mkfs.ext4 -F /dev/vgmd0/media

Mount the LV somewhere (e.g. /mnt/vgmd0/media), optionally chowning the mount.

$ mkdir -p /mnt/vgmd0/media
$ mount /dev/vgmd0/media /mnt/vgmd0/media
$ # chown someuser:somegroup /mnt/vgmd0/media

Great, the volume is mounted! Repeat for all LVs.

However, the volume is not yet persistent.

Append the mounting instruction to fstab:

echo '/dev/mapper/vgmd0-media /mnt/vgmd0/media ext4 defaults,nofail,discard 0 0' | sudo tee -a /etc/fstab

Repeat for all LVs.

In the meanwhile, keep an eye out on mdstat to see how your array is behaving:

$ watch cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 loop20p1[1] loop19p1[0]
      974464 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sdb1[1] sda1[0]
      3906885440 blocks super 1.2 [2/2] [UU]
      [=>...................]  resync =  6.6% (261045632/3906885440) finish=2129.1min speed=28537K/sec
      bitmap: 30/30 pages [120KB], 65536KB chunk

Quiet disks

Let’s see what your drives are doing right now:

$ hdparm -C /dev/sda /dev/sdb

/dev/sda:
 drive state is:  active/idle

/dev/sdb:
 drive state is:  active/idle

My data-hoarding disks are often doing… nothing. Wouldn’t it be nice to lose a few dB during extended times of idleness? 🤫

If your disk supports Advanced Power Management, it’s easy to configure: see hdparm -B. hdparm -I /dev/sda should contain a Advanced power management level entry if your disk does support it.

Even if your disk does not support APM (like mine) you’re not lost. Install hd-idle, hd-idle on Debian/Ubuntu. It will check /proc/diskstats to detect idleness. Edit /etc/default/hd-idle and uncomment HD_IDLE_OPTS.

E.g.: -i 0 -a sda -i 300 -a sdb -i 300 -l /var/log/hd-idle.log, to set the default spin-down timeout to no spin-down (0), then for disk sda and sdb, set the timeout to 300 seconds. Write logs to /var/log/hd-idle.log.

Dry-run with disk images

You might want to test and try-out different settings and set-ups. That’s where loop devices come-in.

# Create two empty images of 1024 1M blocks
dd if=/dev/zero of=disk-a.img iflag=fullblock bs=1M count=1024
dd if=/dev/zero of=disk-b.img iflag=fullblock bs=1M count=1024
sync

# Check the currently assigned loop-devices
losetup

# Create loop devices for the first available numbers N `/dev/loopN` and `/dev/loop(N+1)`
losetup /dev/loop6 disk-a.img
losetup /dev/loop7 disk-b.img

The loop devices are now available.