HFS+ on Linux

My MacBook Pro seems to have died. Its hard disk holds lots of data that have not been backed up anywhere else. The important data have been mostly backed up but latest changes haven’t been backed up. Other than that, I had to check the hard disk for bad sectors etc and since I don’t have another Mac to test it on I had to do it on my Fedora box. My system (Fedora 15) would automatically detect and mount the HFS+ hard disk on read only mode. It was also missing the fsck tool for HFS+ partitions. Getting the fsck tool for HFS+ required downloading and installing hfsplus-tools via yum.However, fsck.hfs will not allow you, except if you use the –force option, to scan the partition if it has journalling on, which is the case with HFS+ partitions by default.

But how would I turn off journalling when I don’t have a Mac system to attach my disk on? After a couple of minutes I came across a post on the Ubuntu forum which linked to this blog. The author has a C code that turns journalling off. A fixed version of this code can be found here. The compiled code results in an executable which gets as its only argument the partition that you want to get journalling off.

# gcc journalling_off.c -o journalling_off
# ./journalling_off /dev/sdg2

Next step was to perform the fsck check on the target disk

# fsck.hfs /dev/sdg2
** /dev/sdg2
** Checking HFS Plus volume.
** Checking Extents Overflow file.
** Checking Catalog file.
** Checking multi-linked files.
** Checking Catalog hierarchy.
** Checking Extended Attributes file.
** Checking volume bitmap.
** Checking volume information.
** The volume OS X appears to be OK.

Disk looks OK. Next step is to mount it with read-write permissions:

# mount -t hfplus -o rw,user /dev/sdg2 /mnt/osx

Next issue encountered was the different UIDs between my account on the OS X system and that on the Linux system. Therefore, next step was to change the UIDs under the whole user directory on the OS X disk so I could access without problem with write permissions from my Linux box:

# find panoskrt/ -uid 501 -exec chown panoskrt {} \;

4th IC-SCCE

Last week I was in Athens for the 4th International Conference from Scientific Computing to Computational Engineering. Many interesting talks from a wide range of areas. My talk was about “HPC Applications Performance on Virtual Clusters“. The main outcome from my perspective, we need to investigate GPU virtualisation. There are more and more scientists/researchers that want to exploit such systems and in the near future we’ll need to deploy virtualised GPU systems in the same way we do with CPUs.

My paper
My presentation

Linux buffer cache state

Following Faidon’s comment on an earlier post, I came across  this informative site concerning Linux’s buffer cache state. In a nutshell, the following code will release all the cached data of a specified file.

#define _XOPEN_SOURCE 600
#include <unistd.h>
#include <fcntl.h>
int main(int argc, char *argv[]) {
    int fd;
    fd = open(argv[1], O_RDONLY);
    fdatasync(fd);
    posix_fadvise(fd, 0,0,POSIX_FADV_DONTNEED);
    close(fd);
    return 0;
}

There are some useful samples and examples on the mentioned web page. posix_fadvise description here.

Linux software RAID

Recently I got two Maxtor 80GB disks. Sometimes the existing external hard disk runs out of space but can be easily sorted by erasing unnecessary data. As this external hard disk has been on constant use for around 4 years I thought it might be a good idea to use the additional Maxtors to build a software RAID and backup the existing external disk that stores the data.

My setup is pretty simple as all of the disks are external and connected via USB with the main system. The two 80GB disks are on RAID-1 (mirroring) syncing (via rsync in a cron-job) all the required data from the existing external hard disk. To keep syncing simple, I’ve created two single partitions (/dev/sdb1 and /dev/sdc1) on each of the RAID disks and then created a RAID-1 /dev/md0

I tried two different ways configuring the RAID, one with raidtools2 and one with mdadm.

mdadm is straight forward and can be used directly from the command file as below:

mdadm --create --verbose /dev/md0 --level=raid1 --raid-devices=2 /dev/sdb1 /dev/sdc1
mdadm: chunk size defaults to 64K
mdadm: /dev/sdb1 appears to contain an ext2fs file system
    size=78156192K  mtime=Thu Jan  1 01:00:00 1970
mdadm: /dev/sdc1 appears to contain an ext2fs file system
    size=78156192K  mtime=Thu Jan  1 01:00:00 1970
Continue creating array? (y/n) y
mdadm: array /dev/md0 started.

There is no need to explain the mdadm parameters as it is pretty much obvious what is happening. A look at the man page reveals all the possible options and what they stand for.

You can also check in /proc/mdstat to see if the RAID is running:

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdc1[1] sdb1[0]
      167772160 blocks 64k rounding

The other way is to use raidtools and declare the raid setup in /etc/raidtab:

$ cat /etc/raidtab
raiddev /dev/md0
        raid-level      1
        nr-raid-disks   2
	nr-spare-disks	0
	chunk-size	4
        persistent-superblock 1
        device          /dev/sdc1
        raid-disk       0
        device          /dev/sdd1
        raid-disk       1

And then the raid can be created:

# mkraid /dev/md0
handling MD device /dev/md0
analyzing super-block
disk 0: /dev/sdb1, 78156193kB, raid superblock at 78156096kB
disk 1: /dev/sdc1, 78156193kB, raid superblock at 78156096kB

Either way, raidtools or mdadm, you can then create and format partitions on /dev/md0 the normal way, using fdisk and mkfs.ext*. Once done so, the partitions can be mounted as would any other partition and syncing between the external storage disk and the raid can start.

I think that I’ll stick with mdadm as it is easier and more flexible than raidtools.

Some useful links:

How to replace a failed disk on Linux software RAID-1
mdadm: A new tool for Linux software RAID management
The Software-RAID HOW-TO

Report disk space usage

A quick shell script that reports which disks use more than a specified percentage of disk space. Can be used as a cronjob to mail the results.

############################################################################
# Copyright (C) 2009  Panagiotis Kritikakos <panoskrt@gmail.com>           #
#                                                                          #
#    This program is free software: you can redistribute it and/or modify  #
#    it under the terms of the GNU General Public License as published by  #
#    the Free Software Foundation, either version 3 of the License, or     #
#    (at your option) any later version.                                   #
#                                                                          #
#    This program is distributed in the hope that it will be useful,       #
#    but WITHOUT ANY WARRANTY; without even the implied warranty of        #
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the         #
#    GNU General Public License for more details.                          #
#                                                                          #
#    You should have received a copy of the GNU General Public License     #
#    along with this program.  If not, see <http://www.gnu.org/licenses/>. #
############################################################################

#!/bin/bash
if [ $# -lt 1 ] || [ $# -gt 2 ]; then
  echo "Usage ./disk_usage <percentage>";
  echo " Example: ./disk_usage 50";
  exit;
fi
space=(`df -h | awk '{sub(/%/,"");print $5}' | grep -v / | grep -v Use | grep -v ^$`)
spaceLen=${#space[*]}

i=0
echo "The disks bellow use more than ${1}% of their space" > /tmp/diskSpace
echo "-----------------------------------------------------" >> /tmp/diskSpace
while [ $i -lt $spaceLen ]; do
   checkval=${space[$i]}
   if [ $checkval -gt $1 ] || [ $checkval -eq $1 ]; then
        df -h | grep $checkval | awk '{print $1 "\t" $5 "\t" $6}' >> /tmp/diskSpace
   fi
        let i++
done
/usr/bin/Mail -s "Disk space usage" foo@bar.com < /tmp/diskSpace
rm -f /tmp/diskSpace

Xen and loop devices continued

I created one more guest and that made them nine in total, but when I tried to power it on, I got the following error message:

# xm create node9
Using config file "/etc/xen/node9".
Error: Device 51712 (vbd) could not be connected. Failed to find an unused loop device

Once again problem with the disk image in combination with loop device but different than yesterday’s problem.
I checked which loop devices are used:

# losetup -a
/dev/loop0: [0811]:161693699 (/data/guests/node1.img)
/dev/loop1: [0811]:161693702 (/data/guests/node2.img)
/dev/loop2: [0811]:161693703 (/data/guests/node3.img)
/dev/loop3: [0811]:161693704 (/data/guests/node4.img)
/dev/loop4: [0811]:161693705 (/data/guests/node5.img)
/dev/loop5: [0811]:161693706 (/data/guests/node6.img)
/dev/loop6: [0811]:161693707 (/data/guests/node7.img)
/dev/loop7: [0811]:161693708 (/data/guests/node8.img)

And then how many are available in total:

# ls -l /dev/loop*
brw-r----- 1 root disk 7, 0 Mar  9 22:08 /dev/loop0
brw-r----- 1 root disk 7, 1 Mar  9 22:08 /dev/loop1
brw-r----- 1 root disk 7, 2 Mar  9 22:08 /dev/loop2
brw-r----- 1 root disk 7, 3 Mar  9 22:08 /dev/loop3
brw-r----- 1 root disk 7, 4 Mar  9 22:08 /dev/loop4
brw-r----- 1 root disk 7, 5 Mar  9 22:08 /dev/loop5
brw-r----- 1 root disk 7, 6 Mar  9 22:08 /dev/loop6
brw-r----- 1 root disk 7, 7 Mar  9 22:08 /dev/loop7

All eight loop devices are used. If each guest had a second disk image as well, I’d face the problem after node4. The problem resides to the fact that the default number of loop devices provided by the kernel are eight, and every Xen guest that doesn’t use the Xen blktp driver will use a loop device for every disk image it’s assigned with. The solution to this is to increase the number of loop devices. That can be done by editing /etc/modprobe.conf and adding a new definition for the maximum loop devices:

options loop max_loop=24

In order for that to take effect, you’ll need either to reboot the system or reload the loop kernel module. In order to reload the module, the current loop devices must be free, so every guest should be powered off. Then simply:

# rmmod loop
# modprobe loop
# ls -l /dev/loop*
brw-r----- 1 root disk 7,  0 Mar 24 12:25 /dev/loop0
brw-r----- 1 root disk 7,  1 Mar 24 12:25 /dev/loop1
brw-r----- 1 root disk 7, 10 Mar 24 12:27 /dev/loop10
brw-r----- 1 root disk 7, 11 Mar 24 12:27 /dev/loop11
brw-r----- 1 root disk 7, 12 Mar 24 12:27 /dev/loop12
brw-r----- 1 root disk 7, 13 Mar 24 12:27 /dev/loop13
brw-r----- 1 root disk 7, 14 Mar 24 12:27 /dev/loop14
brw-r----- 1 root disk 7, 15 Mar 24 12:27 /dev/loop15
brw-r----- 1 root disk 7, 16 Mar 24 12:27 /dev/loop16
brw-r----- 1 root disk 7, 17 Mar 24 12:27 /dev/loop17
brw-r----- 1 root disk 7, 18 Mar 24 12:27 /dev/loop18
brw-r----- 1 root disk 7, 19 Mar 24 12:27 /dev/loop19
brw-r----- 1 root disk 7,  2 Mar 24 12:25 /dev/loop2
brw-r----- 1 root disk 7, 20 Mar 24 12:27 /dev/loop20
brw-r----- 1 root disk 7, 21 Mar 24 12:27 /dev/loop21
brw-r----- 1 root disk 7, 22 Mar 24 12:27 /dev/loop22
brw-r----- 1 root disk 7, 23 Mar 24 12:27 /dev/loop23

Powering the guests and running losetup -a:

# losetup -a
/dev/loop0: [0811]:161693699 (/data/guests/node1.img)
/dev/loop1: [0811]:161693702 (/data/guests/node2.img)
/dev/loop10: [0811]:161693703 (/data/guests/node3.img)
/dev/loop11: [0811]:161693704 (/data/guests/node4.img)
/dev/loop12: [0811]:161693705 (/data/guests/node5.img)
/dev/loop13: [0811]:161693706 (/data/guests/node6.img)
/dev/loop14: [0811]:161693707 (/data/guests/node7.img)
/dev/loop15: [0811]:161693708 (/data/guests/node8.img)
/dev/loop16: [0811]:161693709 (/data/guests/node9.img)

Paravirtualised guests can make use of the blktp driver to access the virtual block device directly, without using a loop device. To do so, the guest’s configuration file must specify ‘tap:aio:’ instead of of ‘file’ at the disk entry. This is not true for full virtualised guests. However, both paravirtualsied and full virtualised guests can make use of physical partitions (defined as ‘phy’) and that eliminates the use of loop devices among other advantages.