Odd booting problem

Odd booting problem

am 08.10.2011 01:33:55 von Maurice

Good day all, and Happy Thanksgiving for those in Canada!

An odd problem, and hopefully someone reading this can tell me what to
do differently:

I am converting a machine from a single disk to one with 2 disks and 4
md RAIDs.

I do not know how much detail people might like, so I will keep it as
terse as possible to start:
Original install CentOS 6.0, with all updates.
XC86_64 kernel

mdadm v3.2.2

I have added the second physical disk, and duplicated the partition
table from the existing system disk
created 4 md RAID1s, to be used for /boot, /, /var, and /home
All were created with the second part missing:
mdadm --create /dev/md0 --metadata=0.90 --level=1 --raid-disks=2
/dev/sda1 missing
mdadm --create /dev/md1 --level=1 --raid-disks=2 /dev/sda2 missing
mdadm --create /dev/md2 --level=1 --raid-disks=2 /dev/sda5 missing
mdadm --create /dev/md3 --level=1 --raid-disks=2 /dev/sda6 missing

System is booting on GRUB 0.97.
Therefore I made the md0 ( /boot) with 0.90
And then made md1, md2 and md3 with 1.2

Everything looks good in /proc/mdstat
I made file systems ( ext for boot and ext for others) , then copied
over all data from original disk.

Created the mdadm.conf file:
mdadm --detail --scan > /etc/mdadm.conf

I used dracut to rebuild the initramfs with the new mdadm.conf:

dracut --mdadmconf --force /boot/initramfs-$(uname -r).img $(uname -r)
This completed uneventfully.

Edited /etc/fstab and rebooted.
Now I have all my filesystems on the md devices with no complaints.

Ran grub and installed on the new disk :
root (hd0,0)
setup (hd0)
Completed successfully

Next, rebooted again, and this time manually edited the grub command
line to change
from root=/dev/sdb2 to root=/dev/md1

And here it gets funny:

Boots a bit, then I see:

dracut Warning: No root device "block: /dev/md1" found.

And a few seconds later we get a kernel panic.





--
Cheers,
Maurice Hilarius
eMail: /mhilarius@gmail.com/
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Odd booting problem

am 10.10.2011 18:02:00 von Maurice

Good morning,

I am top and bottom posting this reply to myself, so as to satisfy the
tastes of each camp:

I just tried using only one of the disks, and NOT as md RAID devices.
This works perfectly.
Which, at least to my eyes, implies there is some interaction with mdadm
that is causing the
failure of dracut and of booting.

Is there anyone here interested in helping me with this?




>
> An odd problem, and hopefully someone reading this can tell me what to
> do differently:
>
> I am converting a machine from a single disk to one with 2 disks and 4
> md RAIDs.
>
> I do not know how much detail people might like, so I will keep it as
> terse as possible to start:
> Original install CentOS 6.0, with all updates.
> XC86_64 kernel
>
> mdadm v3.2.2
>
> I have added the second physical disk, and duplicated the partition
> table from the existing system disk
> created 4 md RAID1s, to be used for /boot, /, /var, and /home
> All were created with the second part missing:
> mdadm --create /dev/md0 --metadata=0.90 --level=1 --raid-disks=2
> /dev/sda1 missing
> mdadm --create /dev/md1 --level=1 --raid-disks=2 /dev/sda2 missing
> mdadm --create /dev/md2 --level=1 --raid-disks=2 /dev/sda5 missing
> mdadm --create /dev/md3 --level=1 --raid-disks=2 /dev/sda6 missing
>
> System is booting on GRUB 0.97.
> Therefore I made the md0 ( /boot) with 0.90
> And then made md1, md2 and md3 with 1.2
>
> Everything looks good in /proc/mdstat
> I made file systems ( ext for boot and ext for others) , then copied
> over all data from original disk.
>
> Created the mdadm.conf file:
> mdadm --detail --scan > /etc/mdadm.conf
>
> I used dracut to rebuild the initramfs with the new mdadm.conf:
>
> dracut --mdadmconf --force /boot/initramfs-$(uname -r).img $(uname -r)
> This completed uneventfully.
>
> Edited /etc/fstab and rebooted.
> Now I have all my filesystems on the md devices with no complaints.
>
> Ran grub and installed on the new disk :
> root (hd0,0)
> setup (hd0)
> Completed successfully
>
> Next, rebooted again, and this time manually edited the grub command
> line to change
> from root=/dev/sdb2 to root=/dev/md1
>
> And here it gets funny:
>
> Boots a bit, then I see:
>
> dracut Warning: No root device "block: /dev/md1" found.
>
> And a few seconds later we get a kernel panic.
>
Good morning,

I am top and bottom posting this reply to myself, so as to satisfy the
tastes of each camp:

I just tried using only one of the disks, and NOT as md RAID devices.
This works perfectly.
Which, at least to my eyes, implies there is some interaction with mdadm
that is causing the
failure of dracut and of booting.

Is there anyone here interested in helping me with this?


--
Cheers,
Maurice Hilarius
eMail: /mhilarius@gmail.com/
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Odd booting problem

am 10.10.2011 19:19:25 von Jim Schatzman

Maurice-

I am not an expert, but I have observed that initrd/initramfs/dracut is fairly stupid, and can't handle what would seem like benign changes to the operating evironment. For example, if you install a linux kernel on a system where the / filesystem is on an LVM volume, then you move the / filesystem to a raw partition, change the grub boot commands and /etc/fstab appropriately, initrd/initramfs still fails with an "/bin/lvm exited abnormally with error 5" message. It does not seem to me that there is any reason that initrd/initramfs should care what return code lvm returns as long as it can mount the / filesystem. It also messes about with raids, generally unnecessarily.

Again, initrd /initramfsdoes not allow for many changes in the system configurated from the state when the initrd/initramfs was created. My guess is that you are going to have to create a new initrd/initramfs. Here is the procedure at a high level, as best I can recall from the last time I had to do it

1) Boot the computer with some recent linux kernel. For example, use Knoppix or Fedora live.

2) Mount your target root filesystem and mount the target boot filesystem under it, if you use a separate filesystem for boot.

3) chroot to the target root filesystem.

4) Run mkinitrd to create a new initrd/initramfs. mkinitrd has a zillion options that I do not pretend to understand. I generally find that it works well enough with default options.

5) If needed, update your grub.conf to point to the new initrd/initramfs.

Good luck!

Jim



At 10:02 AM 10/10/2011, maurice wrote:
>Good morning,
>
>I am top and bottom posting this reply to myself, so as to satisfy the tastes of each camp:
>
>I just tried using only one of the disks, and NOT as md RAID devices.
>This works perfectly.
>Which, at least to my eyes, implies there is some interaction with mdadm that is causing the
>failure of dracut and of booting.
>
>Is there anyone here interested in helping me with this?
>
>
>
>
>>
>>An odd problem, and hopefully someone reading this can tell me what to do differently:
>>
>>I am converting a machine from a single disk to one with 2 disks and 4 md RAIDs.
>>
>>I do not know how much detail people might like, so I will keep it as terse as possible to start:
>>Original install CentOS 6.0, with all updates.
>>XC86_64 kernel
>>
>>mdadm v3.2.2
>>
>>I have added the second physical disk, and duplicated the partition table from the existing system disk
>>created 4 md RAID1s, to be used for /boot, /, /var, and /home
>>All were created with the second part missing:
>>mdadm --create /dev/md0 --metadata=0.90 --level=1 --raid-disks=2 /dev/sda1 missing
>>mdadm --create /dev/md1 --level=1 --raid-disks=2 /dev/sda2 missing
>>mdadm --create /dev/md2 --level=1 --raid-disks=2 /dev/sda5 missing
>>mdadm --create /dev/md3 --level=1 --raid-disks=2 /dev/sda6 missing
>>
>>System is booting on GRUB 0.97.
>>Therefore I made the md0 ( /boot) with 0.90
>>And then made md1, md2 and md3 with 1.2
>>
>>Everything looks good in /proc/mdstat
>>I made file systems ( ext for boot and ext for others) , then copied over all data from original disk.
>>
>>Created the mdadm.conf file:
>>mdadm --detail --scan > /etc/mdadm.conf
>>
>>I used dracut to rebuild the initramfs with the new mdadm.conf:
>>
>>dracut --mdadmconf --force /boot/initramfs-$(uname -r).img $(uname -r)
>>This completed uneventfully.
>>
>>Edited /etc/fstab and rebooted.
>>Now I have all my filesystems on the md devices with no complaints.
>>
>>Ran grub and installed on the new disk :
>>root (hd0,0)
>>setup (hd0)
>>Completed successfully
>>
>>Next, rebooted again, and this time manually edited the grub command line to change
>>from root=/dev/sdb2 to root=/dev/md1
>>
>>And here it gets funny:
>>
>>Boots a bit, then I see:
>>
>>dracut Warning: No root device "block: /dev/md1" found.
>>
>>And a few seconds later we get a kernel panic.
>Good morning,
>
>I am top and bottom posting this reply to myself, so as to satisfy the tastes of each camp:
>
>I just tried using only one of the disks, and NOT as md RAID devices.
>This works perfectly.
>Which, at least to my eyes, implies there is some interaction with mdadm that is causing the
>failure of dracut and of booting.
>
>Is there anyone here interested in helping me with this?
>
>
>--
>Cheers,
>Maurice Hilarius
>eMail: /mhilarius@gmail.com/
>--
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Odd booting problem

am 10.10.2011 20:11:35 von Michal Soltys

On 11-10-10 18:02, maurice wrote:
>
>> Created the mdadm.conf file:
>> mdadm --detail --scan > /etc/mdadm.conf
>>
>> from root=/dev/sdb2 to root=/dev/md1
>>
>> dracut Warning: No root device "block: /dev/md1" found.
>>
>> And a few seconds later we get a kernel panic.
>>
>
> Is there anyone here interested in helping me with this?
>
>

Well, dracut (currently, I have some big changes staged for submission,
but for a bit later) will do forced assembly (if no root can be found
after a few udev loops), so technically degraded arrays you created are
not likely the reason.

If I'm to guess the cause, it's probably naming - modern mdadm /will
not/ export device names to the file, and leaves that for udev - and by
default udev will use decreasing minor numbers, starting with 127. Don't
expect the same (minor number ~ device) relations during next bootings
either.

Try this - get the UUID of actual filesystem, and change the boot
commandline to something with root=UUID=

You can get the uuid with e.g.

blkid -o udev

and use the ID_FS_UUID_ENC value. Or peek into /dev/disk/by-uuid and get
the value from there. You can also use LABEL, and quite a few other
alternatives. dracut is pretty felxible.

Another possibility, is to create (or update) the array's name, and use
/dev/md/ for root= which should work just fine as well (with sane
udev rules and modern mdadm, /dev/md/ should contain array specific name
symlinked to proper /dev entry ).

If the above fails, then there's some other reason lurking.

Btw, what version of dracut are you using ? Anything relatively current
should drop you into emergency shell, not end with kernel panic.

Btw2: stock md rules (3.2.x) will try to assemble anything as udev goes.
If you want to limit the assembly to something specific, use mdadm.conf
with:

AUTO -all
ARRAY ....

see mdadm.conf(5) for details.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Odd booting problem

am 11.10.2011 04:12:59 von Maurice

On 10/10/2011 11:19 AM, Jim Schatzman wrote:
> Maurice-
>
> I am not an expert, but I have observed that initrd/initramfs/dracut is fairly stupid, and can't handle what would seem like benign changes to the operating evironment. For example, if you install a linux kernel on a system where the / filesystem is on an LVM volume, then you move the / filesystem to a raw partition, change the grub boot commands and /etc/fstab appropriately, initrd/initramfs still fails with an "/bin/lvm exited abnormally with error 5" message. It does not seem to me that there is any reason that initrd/initramfs should care what return code lvm returns as long as it can mount the / filesystem. It also messes about with raids, generally unnecessarily.
> ..
>
> Good luck!
>
> Jim
>

Thanks Jim,

I will try some of your suggestions.

--
Cheers,
Maurice Hilarius
eMail: /mhilarius@gmail.com/
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Odd booting problem

am 12.10.2011 22:55:10 von Maurice

Problem Solved!

Thanks VERY much to Michal Soltys who offered me lots of useful suggestions.

Also thanks to Jim Schatzman who chipped in with some good ideas as well.

In the end, while I tried those suggestions, it did not solve the problem.
I did a lot of reading and some experiments and here is what turned out
to be the problem:

Dracut does the following:

+ [ -d /sysroot/proc ]
+ . /mount/99mount-root.sh
....
+ [ -n block:/dev/md1 -a -z ]
+ mount -t auto -o ro /dev/md1 /sysroot
+ ROOTFS_MOUNTED=yes

So far so good.

BUT: as I found, and for unclear reasons, there was no 'proc' on the
file system present
on /dev/md1 (which was intended as our new /).

So dracut gets into a loop despite of ROOTFS_MOUNTED being already set:

+ [ -d /sysroot/proc ]
+ . /mount/99mount-root.sh
....
+ [ -n block:/dev/md1 -a -z ]
+ mount -t auto -o ro /dev/md1 /sysroot
mount: /dev/md1 already mounted or /sysroot busy
mount: according to mtab, /dev/md1 is already mounted on /sysroot

That is absolutely true but not particularly illuminating.
Eventually this ends, quite late,

+ i=21
+ [ 21 -gt 20 ]
+ flock -s 9
+ emergency_shell Can't mount root filesystem

which is not particularly informative.

Bailing out quite a bit earlier, before everything scrolled off of the
screen with something
like:
"Cannot find /proc on a canditate for a root filesystem; that cannot be
right."
would be quite a bit more helpful.

Alternatively if ROOTFS_MOUNTED is "yes" and that filesystem looks
like a reasonable candidate for /
(say 'etc/issue' and 'sbin' present)
Then why not offer to run 'mkdir -p /sysroot/{dev,proc,sys}'?
Do this after making sysroot writeable and then continue?

That is possibly going too far but a notification with some useful
information
pointing to an error would be much better than "Can't mount".


--
Cheers,
Maurice Hilarius
eMail: /mhilarius@gmail.com/
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Odd booting problem

am 13.10.2011 00:41:01 von Michal Soltys

On 11-10-12 22:55, maurice wrote:
> Problem Solved!
>
> So far so good.
>
> BUT: as I found, and for unclear reasons, there was no 'proc' on the
> file system present
> on /dev/md1 (which was intended as our new /).
>
> So dracut gets into a loop despite of ROOTFS_MOUNTED being already set:
>
> + [ -d /sysroot/proc ]
> + . /mount/99mount-root.sh
> ...
> + [ -n block:/dev/md1 -a -z ]
> + mount -t auto -o ro /dev/md1 /sysroot
> mount: /dev/md1 already mounted or /sysroot busy
> mount: according to mtab, /dev/md1 is already mounted on /sysroot
>
> That is absolutely true but not particularly illuminating.
> Eventually this ends, quite late,
>

For the record, the loop you're talking about sources mount hooks of
each module that installs one. And those hooks are internally
responsible for mounting. The "success" check is, if /proc directory
showed up at /sysroot point. So yea, for certain unusual situations it
could be more precise/verbose. I'll prep a patch for that.

Btw, the version you reported offlist, is relatively old (current is 013
with quite a few commits ahead). FYI

> Then why not offer to run 'mkdir -p /sysroot/{dev,proc,sys}'?
> Do this after making sysroot writeable and then continue?
>

IMHO the init script itself should keep its hands off the filesystem.

It's prefectly fine though to create a module that would do that through
e.g. a pre-pivot hook - but that's distro's/user's policy, or module's
specific necessity.

PS. If you run into other dracut-related issues, initramfs@vger will
likely get you answer faster.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html