Salvaging a system with a broken disk

Once there was a server which contained two Xen virtual machines. All Ubuntu.
Then, suddenly (what else), disk failures. The backup was still on the todo list, so…

Many hours later trying to recover everything, I learned the following

Rescuing the domU images

(GNU) ddrescue to the rescue. This is a command which is similar to dd (which does low-level byte-by-byte disk copying), but with some added features for accessing failing disks. I used the command
ddrescue -v --no-split /dev/sdc sdc.img sdc.log
You could add additional steps to retry the failed parts hoping to recover more. This can only work if your new disk is bigger than the failing disk. Making the copy takes a lot of time. Copying files several GB in size takes a lot of time between healthy disks, reading a failing disk only makes it slower.

The disk can then be accessed using the loopback device. For starts, you can have a look at the partition table (giving valuable information for later, included in the output

$ fdisk -l -u -C 9039 sdc.img 

Disk sdc.img: 0 MB, 0 bytes
255 heads, 63 sectors/track, 9039 cylinders, total 0 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000c6c37

  Device Boot      Start         End      Blocks   Id  System
sdc.img1   *   119684250   139219289     9767520   83  Linux
sdc.img2       139219290   145211534     2996122+   5  Extended
sdc.img3              63   119684249    59842093+  83  Linux
sdc.img5       139219353   145211534     2996091   82  Linux swap / Solaris

You need to know the number of cylinders of your original disk to view this info (the “-C 9039” parameter), which you can read from a fdisk on the original disk. So this command serves more as a sanity check that the copying was at least partially ok.

You can now access the partitions using the loopback device (as root, the mout point “/tst” in this example should already exist)

mount -o loop,offset=32256 -t ext3 sdc.img /test/
mount -o loop,offset=61278336000 -t ext3 sdc.img /test/

The offset values need to be given in bytes and can be calculated from the fdisk output. For example for sdc.img3 the offset is 63 (from the fdisk output) times the sector size (512 – also displayed in fdisk output), giving 32256.

Once mounted, you can go through the fisk and retrieve the xen disk images. These images are also images just like the ones created using dd (or ddrescue) and can be mounted in the same way (as these are partition images, you don’t need the offset).

The rescueing may have broken the xen disk image. To fix (please do take another copy of the disk image first), unmount the disk and check/fix:

losetup /dev/loop0 disk.img
e2fsck -f /dev/loop0
losetup -d /dev/loop0

Fresh dom0 installation

While I love Ubuntu, recent versions don’t have Xen support anymore (dom0 was 8.04, the latest version where it was still included). The guides to get it installed did not work for me (including recompile of xen, kernel etc). Finally I ended up installing Debian Squeeze. See the guide at http://wiki.debian.org/Xen.

One point where the guide is not clear, you have to add the line
(network-script 'network-bridge antispoof=yes')
in /etc/xen/xend-config.sxp to make the network bridge work (the file seems to indicate using a network bridge is the default, but it made a difference for me).

Getting the domU system up and running again

This was (in hindsight, there was a lot of trial-and-error and googling to get to this point) quite easy.

In the domU configuration file, change the kernel and ramdisk to point to the booted version, in my case this was

kernel = '/boot/vmlinuz-2.6.32-5-xen-amd64'
ramdisk = '/boot/initrd.img-2.6.32-5-xen-amd64'

I could notw create the virtual machines, but they could not yet access the disk. Apparently Debian does not include the blocktap device for licensing reasons, so accessing the disk using the (supposedly faster) “tap:iao:” protocol fails. This should be replaced by “file:”. Secondly, my original configuration created the disk as “sda1”. For some reason this failed and needs to be replaced by “xvda1” (also in the “root” configuration).

Backups?

This time around, I did fix the backups. As I want these to be remote, I used dropbox http://wiki.dropbox.com/TipsAndTricks/TextBasedLinuxInstall. This works nicely but (as I don’t see how you can restrict what data is coming in without a GUI, it is best to create a specific dropbox account for the machine – the data can always be shared in other dropboxes).

Leave a Reply

Your email address will not be published. Required fields are marked *

question razz sad evil exclaim smile redface biggrin surprised eek confused cool lol mad twisted rolleyes wink idea arrow neutral cry mrgreen

*