mdadm does not create partition devices whatsoever, "partitionable"functionality broken

am 13.05.2011 17:13:07 von Christopher White

Greetings.

I have spent TEN hours trying everything other than regressing to a
REALLY old version. I started out on 3.1.4 and have also tried manually
upgrading to 3.2.1, but the bug still exists.

Somewhere along the way, the "auto" partitionable flag has broken.

sudo mdadm --create --level=raid5 --auto=part2 /dev/md1 --metadata=1.2
--raid-devices=4 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2

This only creates /dev/md1. It is of course possible to create one big
partition as /dev/md1p1 with any partitioning program, but FORGET about
trying to create /dev/md1p2.

The problem is that the RAID array is NOT created in partitionable mode,
and only supports one large partition, despite ALL attempts at EVERY
format of the --auto option, you name it, -a part2, --auto=mdp2,
--auto=part2, --auto=p2, --auto=mdp, --auto=part, --auto=p, --auto=p4,
you name it and I've tried it!

My guess is the functionality of creating partitionable arrays literally
DID break somewhere prior to/at version 3.1.4 which is the earliest
version I tried.

I'm giving up and creating physical n-1 sized partitions on the source
disks and creating two RAID 5 arrays from those partitions instead, but
decided I really MUST report this bug so that other people don't bang
their head against the wall for ten hours of their life as well. ;-)

Christopher

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm does not create partition devices whatsoever, "partitionable"functionality broke

am 13.05.2011 18:49:29 von Phil Turmel

Hi Christopher,

On 05/13/2011 11:13 AM, Christopher White wrote:
> Greetings.
>
> I have spent TEN hours trying everything other than regressing to a REALLY old version. I started out on 3.1.4 and have also tried manually upgrading to 3.2.1, but the bug still exists.
>
> Somewhere along the way, the "auto" partitionable flag has broken.
>
> sudo mdadm --create --level=raid5 --auto=part2 /dev/md1 --metadata=1.2 --raid-devices=4 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2
>
> This only creates /dev/md1. It is of course possible to create one big partition as /dev/md1p1 with any partitioning program, but FORGET about trying to create /dev/md1p2.

What exactly did fdisk or parted report when you tried to partition /dev/md1 ?

> The problem is that the RAID array is NOT created in partitionable mode, and only supports one large partition, despite ALL attempts at EVERY format of the --auto option, you name it, -a part2, --auto=mdp2, --auto=part2, --auto=p2, --auto=mdp, --auto=part, --auto=p, --auto=p4, you name it and I've tried it!
>
> My guess is the functionality of creating partitionable arrays literally DID break somewhere prior to/at version 3.1.4 which is the earliest version I tried.

The mdadm <==> kernel interface for this might be broken, but as a side-effect of the change to make all md devices support conventional partition tables. I don't recall exactly when this changed, but it was several kernels ago.

What kernel are you running?

> I'm giving up and creating physical n-1 sized partitions on the source disks and creating two RAID 5 arrays from those partitions instead, but decided I really MUST report this bug so that other people don't bang their head against the wall for ten hours of their life as well. ;-)

Consider trying "mdadm --create" without the "--auto" option at all, then fdisk on the resulting array.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm does not create partition devices whatsoever, "partitionable"functionality broke

am 13.05.2011 19:18:39 von Christopher White

Hi Phil, thanks for the response!

On 5/13/11 6:49 PM, Phil Turmel wrote:
> Hi Christopher,
>
> On 05/13/2011 11:13 AM, Christopher White wrote:
>> Greetings.
>>
>> I have spent TEN hours trying everything other than regressing to a REALLY old version. I started out on 3.1.4 and have also tried manually upgrading to 3.2.1, but the bug still exists.
>>
>> Somewhere along the way, the "auto" partitionable flag has broken.
>>
>> sudo mdadm --create --level=raid5 --auto=part2 /dev/md1 --metadata=1.2 --raid-devices=4 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2
>>
>> This only creates /dev/md1. It is of course possible to create one big partition as /dev/md1p1 with any partitioning program, but FORGET about trying to create /dev/md1p2.
> What exactly did fdisk or parted report when you tried to partition /dev/md1 ?
I run "sudo gparted /dev/md1" to access the whole RAID array, since I
like the GUI for precisely creating partitions. When making two ext4
partitions and applying the changes, it successfully creates /dev/md1p1
(which does not exist before this operation is performed). It then goes
on to trying to create md1p2 and it sends the commands to the md1
device, but md1p2 is never created. After the step of creating the
partition (which failed, but gparted does not know that), it tries to
set up the file system, which fails since there is no md1p2:
mkfs.ext4 -j -O extent -L "" /dev/md1p2
"mke2fs 1.41.14 (22-Dec-2010)
Could not stat /dev/md1p2 --- No such file or directory"
>> The problem is that the RAID array is NOT created in partitionable mode, and only supports one large partition, despite ALL attempts at EVERY format of the --auto option, you name it, -a part2, --auto=mdp2, --auto=part2, --auto=p2, --auto=mdp, --auto=part, --auto=p, --auto=p4, you name it and I've tried it!
>>
>> My guess is the functionality of creating partitionable arrays literally DID break somewhere prior to/at version 3.1.4 which is the earliest version I tried.
> The mdadm<==> kernel interface for this might be broken, but as a side-effect of the change to make all md devices support conventional partition tables. I don't recall exactly when this changed, but it was several kernels ago.
>
> What kernel are you running?
Linux Mint 11 RC, which uses 2.6.38-8-generic.
>> I'm giving up and creating physical n-1 sized partitions on the source disks and creating two RAID 5 arrays from those partitions instead, but decided I really MUST report this bug so that other people don't bang their head against the wall for ten hours of their life as well. ;-)
> Consider trying "mdadm --create" without the "--auto" option at all, then fdisk on the resulting array.
>
> Phil
I've tried that as well during my testing since some postings suggested
that leaving out the option will create a partitionable array, but it
didn't.

Christopher
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm does not create partition devices whatsoever, "partitionable"functionality broke

am 13.05.2011 19:32:23 von Christopher White

I forgot to mention that I've also tried "sudo fdisk /dev/md1" and
creating two partitions that way. It fails too.

This leads me to conclude that /dev/md1 was never created in
partitionable mode and that the kernel refuses to create anything beyond
a single partition on it.

On 5/13/11 7:18 PM, Christopher White wrote:
> Hi Phil, thanks for the response!
>
> On 5/13/11 6:49 PM, Phil Turmel wrote:
>> Hi Christopher,
>>
>> On 05/13/2011 11:13 AM, Christopher White wrote:
>>> Greetings.
>>>
>>> I have spent TEN hours trying everything other than regressing to a
>>> REALLY old version. I started out on 3.1.4 and have also tried
>>> manually upgrading to 3.2.1, but the bug still exists.
>>>
>>> Somewhere along the way, the "auto" partitionable flag has broken.
>>>
>>> sudo mdadm --create --level=raid5 --auto=part2 /dev/md1
>>> --metadata=1.2 --raid-devices=4 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2
>>>
>>> This only creates /dev/md1. It is of course possible to create one
>>> big partition as /dev/md1p1 with any partitioning program, but
>>> FORGET about trying to create /dev/md1p2.
>> What exactly did fdisk or parted report when you tried to partition
>> /dev/md1 ?
> I run "sudo gparted /dev/md1" to access the whole RAID array, since I
> like the GUI for precisely creating partitions. When making two ext4
> partitions and applying the changes, it successfully creates
> /dev/md1p1 (which does not exist before this operation is performed).
> It then goes on to trying to create md1p2 and it sends the commands to
> the md1 device, but md1p2 is never created. After the step of creating
> the partition (which failed, but gparted does not know that), it tries
> to set up the file system, which fails since there is no md1p2:
> mkfs.ext4 -j -O extent -L "" /dev/md1p2
> "mke2fs 1.41.14 (22-Dec-2010)
> Could not stat /dev/md1p2 --- No such file or directory"
>>> The problem is that the RAID array is NOT created in partitionable
>>> mode, and only supports one large partition, despite ALL attempts at
>>> EVERY format of the --auto option, you name it, -a part2,
>>> --auto=mdp2, --auto=part2, --auto=p2, --auto=mdp, --auto=part,
>>> --auto=p, --auto=p4, you name it and I've tried it!
>>>
>>> My guess is the functionality of creating partitionable arrays
>>> literally DID break somewhere prior to/at version 3.1.4 which is the
>>> earliest version I tried.
>> The mdadm<==> kernel interface for this might be broken, but as a
>> side-effect of the change to make all md devices support conventional
>> partition tables. I don't recall exactly when this changed, but it
>> was several kernels ago.
>>
>> What kernel are you running?
> Linux Mint 11 RC, which uses 2.6.38-8-generic.
>>> I'm giving up and creating physical n-1 sized partitions on the
>>> source disks and creating two RAID 5 arrays from those partitions
>>> instead, but decided I really MUST report this bug so that other
>>> people don't bang their head against the wall for ten hours of their
>>> life as well. ;-)
>> Consider trying "mdadm --create" without the "--auto" option at all,
>> then fdisk on the resulting array.
>>
>> Phil
> I've tried that as well during my testing since some postings
> suggested that leaving out the option will create a partitionable
> array, but it didn't.
>
>
> Christopher
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm does not create partition devices whatsoever,"partitionable" functionality broke

am 13.05.2011 19:40:55 von Roman Mamedov

--Sig_/s9pE_stIpQp39OPD/tjml_A
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Fri, 13 May 2011 19:32:23 +0200
Christopher White wrote:

> I forgot to mention that I've also tried "sudo fdisk /dev/md1" and=20
> creating two partitions that way. It fails too.
>=20
> This leads me to conclude that /dev/md1 was never created in=20
> partitionable mode and that the kernel refuses to create anything beyond=
=20
> a single partition on it.

Did you try running "blockdev --rereadpt /dev/md1"?

--=20
With respect,
Roman

--Sig_/s9pE_stIpQp39OPD/tjml_A
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk3NbScACgkQTLKSvz+PZwgRPQCgkLBQN80LW8e6q+W5jWBR da6Z
RygAnjC/a+ynLQSnylCjvyWQ08h9uqBl
=H5Fp
-----END PGP SIGNATURE-----

--Sig_/s9pE_stIpQp39OPD/tjml_A--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm does not create partition devices whatsoever, "partitionable"functionality broke

am 13.05.2011 19:43:57 von Phil Turmel

On 05/13/2011 01:32 PM, Christopher White wrote:
> I forgot to mention that I've also tried "sudo fdisk /dev/md1" and creating two partitions that way. It fails too.

Please show "partx --show /dev/md1" after the fdisk operation above.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm does not create partition devices whatsoever, "partitionable"functionality broke

am 13.05.2011 20:04:31 von Christopher White

On 5/13/11 7:40 PM, Roman Mamedov wrote:
> On Fri, 13 May 2011 19:32:23 +0200
> Christopher White wrote:
>
>> I forgot to mention that I've also tried "sudo fdisk /dev/md1" and
>> creating two partitions that way. It fails too.
>>
>> This leads me to conclude that /dev/md1 was never created in
>> partitionable mode and that the kernel refuses to create anything beyond
>> a single partition on it.
> Did you try running "blockdev --rereadpt /dev/md1"?
>
Hmm. Hmmmm. One more for good measure: Hmmmmmmm.

That's weird! Here's the thing: Fdisk is *just* for creating the
partitions, not formatting them, so for that one it makes sense that you
must re-read the partition table before you have a partition device to
execute "mkfs.XXX" on.

However, Gparted on the other hand is BOTH for creating partition tables
AND for executing the "make filesystem" commands (formatting).
Therefore, Gparted is supposed to tell the kernel about partition table
changes BEFORE trying to access the partitions it just created.
Basically, Gparted goes: Blank disk, create partition table, create
partitions, notify OS to re-scan the table, THEN access the new
partition devices and format them. But instead, it skips the "notify OS"
part when working with md-arrays!

When you use Gparted on PHYSICAL hard disks, it properly creates the
partition table and the OS is updated to immediately see the new
partition devices, to allow them to be formatted.

Therefore, what this has shown is that the necessary procedure in
Gparted is:
* sudo gparted /dev/md1
* Create the partition table (gpt for instance)
* Create as many partitions as you need BUT SET THEIR TYPE TO
"unformatted" (extremely important).
* Go back to a terminal and execute "sudo blockdev --rereadpt /dev/md1"
to let the kernel see the new partition devices
* Now go back to the Gparted and format the partitions, or just do it
the CLI way with mkfs.ext4 manually. Either way, it will now work.

So how should we sum up this problem? Well, that depends. What is
responsible for auto-discovering the new partitions when you use Gparted
on a PHYSICAL disk (which works perfectly without manual re-scan
commands)? 1) Is it Gparted telling the kernel to re-scan, or 2) is it
the kernel that auto-watches physical disks for changes?

If 1), it means Gparted needs a bug fix to tell the kernel to re-scan
the partition table for md-arrays when you re-partition them.
If 2), it means the kernel doesn't watch md-arrays for partition table
changes, which debatably it should be doing.

Thoughts?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm does not create partition devices whatsoever, "partitionable"functionality broke

am 13.05.2011 20:18:38 von Phil Turmel

Hi Christopher,

On 05/13/2011 02:04 PM, Christopher White wrote:
> On 5/13/11 7:40 PM, Roman Mamedov wrote:
>> On Fri, 13 May 2011 19:32:23 +0200
>> Christopher White wrote:
>>
>>> I forgot to mention that I've also tried "sudo fdisk /dev/md1" and
>>> creating two partitions that way. It fails too.
>>>
>>> This leads me to conclude that /dev/md1 was never created in
>>> partitionable mode and that the kernel refuses to create anything beyond
>>> a single partition on it.
>> Did you try running "blockdev --rereadpt /dev/md1"?
>>
> Hmm. Hmmmm. One more for good measure: Hmmmmmmm.
>
> That's weird! Here's the thing: Fdisk is *just* for creating the partitions, not formatting them, so for that one it makes sense that you must re-read the partition table before you have a partition device to execute "mkfs.XXX" on.
>
> However, Gparted on the other hand is BOTH for creating partition tables AND for executing the "make filesystem" commands (formatting). Therefore, Gparted is supposed to tell the kernel about partition table changes BEFORE trying to access the partitions it just created. Basically, Gparted goes: Blank disk, create partition table, create partitions, notify OS to re-scan the table, THEN access the new partition devices and format them. But instead, it skips the "notify OS" part when working with md-arrays!
>
> When you use Gparted on PHYSICAL hard disks, it properly creates the partition table and the OS is updated to immediately see the new partition devices, to allow them to be formatted.

Indeed. I suspect (g)parted is in fact requesting a rescan, but is being ignored.

I just tried this on one of my servers, and parted (v2.3) choked on an assertion. Hmm.

> Therefore, what this has shown is that the necessary procedure in Gparted is:
> * sudo gparted /dev/md1
> * Create the partition table (gpt for instance)
> * Create as many partitions as you need BUT SET THEIR TYPE TO "unformatted" (extremely important).
> * Go back to a terminal and execute "sudo blockdev --rereadpt /dev/md1" to let the kernel see the new partition devices
> * Now go back to the Gparted and format the partitions, or just do it the CLI way with mkfs.ext4 manually. Either way, it will now work.
>
> So how should we sum up this problem? Well, that depends. What is responsible for auto-discovering the new partitions when you use Gparted on a PHYSICAL disk (which works perfectly without manual re-scan commands)? 1) Is it Gparted telling the kernel to re-scan, or 2) is it the kernel that auto-watches physical disks for changes?

Generally, udev does it. But based on my little test, I suspect parted is at fault. fdisk did just fine.

> If 1), it means Gparted needs a bug fix to tell the kernel to re-scan the partition table for md-arrays when you re-partition them.
> If 2), it means the kernel doesn't watch md-arrays for partition table changes, which debatably it should be doing.

What is ignored or acted upon is decided by udev rules, as far as I know. You might want to monitor udev events while running some of your tests (physical disk vs. MD).

> Thoughts?

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm does not create partition devices whatsoever, "partitionable"functionality broke

am 13.05.2011 20:54:48 von Christopher White

Hello again Phil (and Roman). Thanks to your back-and-forth, the bug has
now finally been completely narrowed down: It is a bug in (g)parted!

The issue is that (g)parted doesn't properly call the kernel API for
re-scanning the device when you operate on md disks as compared to
physical disks.

Your information (Phil) that (g)parted chokes on an assertion is good
information for when I report this bug. It's not impossible that you
must handle md-disks differently from physical disks and that (g)parted
is not aware of that distinction, therefore choking on the partition
table rescan API.

Either way, this is fantastic news, because it means it's not an md
kernel bug, where waiting for a fix would have severely pushed back my
current project. I'm glad it was simply (g)parted failing to tell the
kernel to re-read the partition tables.

---

With this bug out of the way (I'll be reporting it to parted's mailing
list now),one thing that's been bugging me during my hours of research
is that the vast majority of users use either a single, large RAID array
and virtually partition that with LVM, or alternatively breaking each
disk into many small partitions and making multiple smaller arrays out
of those partitions. Very few people seem to use md's built-in support
for partitionable raid arrays.

This makes me a tiny bit wary to trust the stability of md's
partitionable implementation, even though I suspect it is rock solid. I
suspect the reason that most people don't use the feature is for
legacy/habit reasons, since md used to support only a single partition,
so there's avast amount of guides telling people to use LVM. Do any of
you know anything about this and can advise on whether I should go for a
single-partition MD array with LVM, or a partitionable MD array?

As far as performance goes, the CPU overhead of LVM is in the 1-5% range
from what I've heard, and I have zero need for the other features LVM
provides (snapshots, backups, online resizing, clusters of disks acting
as one disk, etc), so it just feels completely overkill and worthless
when all I need is a single, partitionable RAID array.

All I need is the ability to (in the future) add more disks to the
array, grow the array, and then resize+move the partitions around using
regular partitioning tools treating the RAID array as a single disk, and
md's partitionable arrays support doing this since they act as a disk,
where if you add more hard disks to your array; the available,
unallocated space on that array simply grows and partitions on it can be
expanded and relocated to take advantage of this. I don't need LVM for
any of that, as long as md's implementation is stable.

Christopher

On 5/13/11 8:18 PM, Phil Turmel wrote:
> Hi Christopher,
>
> On 05/13/2011 02:04 PM, Christopher White wrote:
>> On 5/13/11 7:40 PM, Roman Mamedov wrote:
>>> On Fri, 13 May 2011 19:32:23 +0200
>>> Christopher White wrote:
>>>
>>>> I forgot to mention that I've also tried "sudo fdisk /dev/md1" and
>>>> creating two partitions that way. It fails too.
>>>>
>>>> This leads me to conclude that /dev/md1 was never created in
>>>> partitionable mode and that the kernel refuses to create anything beyond
>>>> a single partition on it.
>>> Did you try running "blockdev --rereadpt /dev/md1"?
>>>
>> Hmm. Hmmmm. One more for good measure: Hmmmmmmm.
>>
>> That's weird! Here's the thing: Fdisk is *just* for creating the partitions, not formatting them, so for that one it makes sense that you must re-read the partition table before you have a partition device to execute "mkfs.XXX" on.
>>
>> However, Gparted on the other hand is BOTH for creating partition tables AND for executing the "make filesystem" commands (formatting). Therefore, Gparted is supposed to tell the kernel about partition table changes BEFORE trying to access the partitions it just created. Basically, Gparted goes: Blank disk, create partition table, create partitions, notify OS to re-scan the table, THEN access the new partition devices and format them. But instead, it skips the "notify OS" part when working with md-arrays!
>>
>> When you use Gparted on PHYSICAL hard disks, it properly creates the partition table and the OS is updated to immediately see the new partition devices, to allow them to be formatted.
> Indeed. I suspect (g)parted is in fact requesting a rescan, but is being ignored.
>
> I just tried this on one of my servers, and parted (v2.3) choked on an assertion. Hmm.
>
>> Therefore, what this has shown is that the necessary procedure in Gparted is:
>> * sudo gparted /dev/md1
>> * Create the partition table (gpt for instance)
>> * Create as many partitions as you need BUT SET THEIR TYPE TO "unformatted" (extremely important).
>> * Go back to a terminal and execute "sudo blockdev --rereadpt /dev/md1" to let the kernel see the new partition devices
>> * Now go back to the Gparted and format the partitions, or just do it the CLI way with mkfs.ext4 manually. Either way, it will now work.
>>
>> So how should we sum up this problem? Well, that depends. What is responsible for auto-discovering the new partitions when you use Gparted on a PHYSICAL disk (which works perfectly without manual re-scan commands)? 1) Is it Gparted telling the kernel to re-scan, or 2) is it the kernel that auto-watches physical disks for changes?
> Generally, udev does it. But based on my little test, I suspect parted is at fault. fdisk did just fine.
>
>> If 1), it means Gparted needs a bug fix to tell the kernel to re-scan the partition table for md-arrays when you re-partition them.
>> If 2), it means the kernel doesn't watch md-arrays for partition table changes, which debatably it should be doing.
> What is ignored or acted upon is decided by udev rules, as far as I know. You might want to monitor udev events while running some of your tests (physical disk vs. MD).
>
>> Thoughts?
> Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm does not create partition devices whatsoever, "partitionable"functionality broke

am 13.05.2011 21:01:35 von Rudy Zijlstra

Hi Chris,

I've run paritioned MD disks for several years now. I do that on systems
where i use md for the system partitions. One mirror with partitions for
the different system aspects. I prefer that, as it reflects best the
actual physical configuration, and all partitions will be degraded at
the same time when 1 disk develops a problem (which is unfortunately not
the case when you partition the disk and then mirror the partitions).

As i am a bit lazy and have only limited wish to fight with
BIOS/bootloader conflicts / vagaries, these systems typically boot from
the network (kernel gets loaded from the network, from there onwards all
is on the local disk).

Cheers,

Rudy

On 05/13/2011 08:54 PM, Christopher White wrote:
> Hello again Phil (and Roman). Thanks to your back-and-forth, the bug
> has now finally been completely narrowed down: It is a bug in (g)parted!
>
> The issue is that (g)parted doesn't properly call the kernel API for
> re-scanning the device when you operate on md disks as compared to
> physical disks.
>
> Your information (Phil) that (g)parted chokes on an assertion is good
> information for when I report this bug. It's not impossible that you
> must handle md-disks differently from physical disks and that
> (g)parted is not aware of that distinction, therefore choking on the
> partition table rescan API.
>
> Either way, this is fantastic news, because it means it's not an md
> kernel bug, where waiting for a fix would have severely pushed back my
> current project. I'm glad it was simply (g)parted failing to tell the
> kernel to re-read the partition tables.
>
> ---
>
> With this bug out of the way (I'll be reporting it to parted's mailing
> list now),one thing that's been bugging me during my hours of research
> is that the vast majority of users use either a single, large RAID
> array and virtually partition that with LVM, or alternatively breaking
> each disk into many small partitions and making multiple smaller
> arrays out of those partitions. Very few people seem to use md's
> built-in support for partitionable raid arrays.
>
> This makes me a tiny bit wary to trust the stability of md's
> partitionable implementation, even though I suspect it is rock solid.
> I suspect the reason that most people don't use the feature is for
> legacy/habit reasons, since md used to support only a single
> partition, so there's avast amount of guides telling people to use
> LVM. Do any of you know anything about this and can advise on whether
> I should go for a single-partition MD array with LVM, or a
> partitionable MD array?
>
> As far as performance goes, the CPU overhead of LVM is in the 1-5%
> range from what I've heard, and I have zero need for the other
> features LVM provides (snapshots, backups, online resizing, clusters
> of disks acting as one disk, etc), so it just feels completely
> overkill and worthless when all I need is a single, partitionable RAID
> array.
>
> All I need is the ability to (in the future) add more disks to the
> array, grow the array, and then resize+move the partitions around
> using regular partitioning tools treating the RAID array as a single
> disk, and md's partitionable arrays support doing this since they act
> as a disk, where if you add more hard disks to your array; the
> available, unallocated space on that array simply grows and partitions
> on it can be expanded and relocated to take advantage of this. I don't
> need LVM for any of that, as long as md's implementation is stable.
>
>
> Christopher
>
> On 5/13/11 8:18 PM, Phil Turmel wrote:
>> Hi Christopher,
>>
>> On 05/13/2011 02:04 PM, Christopher White wrote:
>>> On 5/13/11 7:40 PM, Roman Mamedov wrote:
>>>> On Fri, 13 May 2011 19:32:23 +0200
>>>> Christopher White wrote:
>>>>
>>>>> I forgot to mention that I've also tried "sudo fdisk /dev/md1" and
>>>>> creating two partitions that way. It fails too.
>>>>>
>>>>> This leads me to conclude that /dev/md1 was never created in
>>>>> partitionable mode and that the kernel refuses to create anything
>>>>> beyond
>>>>> a single partition on it.
>>>> Did you try running "blockdev --rereadpt /dev/md1"?
>>>>
>>> Hmm. Hmmmm. One more for good measure: Hmmmmmmm.
>>>
>>> That's weird! Here's the thing: Fdisk is *just* for creating the
>>> partitions, not formatting them, so for that one it makes sense that
>>> you must re-read the partition table before you have a partition
>>> device to execute "mkfs.XXX" on.
>>>
>>> However, Gparted on the other hand is BOTH for creating partition
>>> tables AND for executing the "make filesystem" commands
>>> (formatting). Therefore, Gparted is supposed to tell the kernel
>>> about partition table changes BEFORE trying to access the partitions
>>> it just created. Basically, Gparted goes: Blank disk, create
>>> partition table, create partitions, notify OS to re-scan the table,
>>> THEN access the new partition devices and format them. But instead,
>>> it skips the "notify OS" part when working with md-arrays!
>>>
>>> When you use Gparted on PHYSICAL hard disks, it properly creates the
>>> partition table and the OS is updated to immediately see the new
>>> partition devices, to allow them to be formatted.
>> Indeed. I suspect (g)parted is in fact requesting a rescan, but is
>> being ignored.
>>
>> I just tried this on one of my servers, and parted (v2.3) choked on
>> an assertion. Hmm.
>>
>>> Therefore, what this has shown is that the necessary procedure in
>>> Gparted is:
>>> * sudo gparted /dev/md1
>>> * Create the partition table (gpt for instance)
>>> * Create as many partitions as you need BUT SET THEIR TYPE TO
>>> "unformatted" (extremely important).
>>> * Go back to a terminal and execute "sudo blockdev --rereadpt
>>> /dev/md1" to let the kernel see the new partition devices
>>> * Now go back to the Gparted and format the partitions, or just do
>>> it the CLI way with mkfs.ext4 manually. Either way, it will now work.
>>>
>>> So how should we sum up this problem? Well, that depends. What is
>>> responsible for auto-discovering the new partitions when you use
>>> Gparted on a PHYSICAL disk (which works perfectly without manual
>>> re-scan commands)? 1) Is it Gparted telling the kernel to re-scan,
>>> or 2) is it the kernel that auto-watches physical disks for changes?
>> Generally, udev does it. But based on my little test, I suspect
>> parted is at fault. fdisk did just fine.
>>
>>> If 1), it means Gparted needs a bug fix to tell the kernel to
>>> re-scan the partition table for md-arrays when you re-partition them.
>>> If 2), it means the kernel doesn't watch md-arrays for partition
>>> table changes, which debatably it should be doing.
>> What is ignored or acted upon is decided by udev rules, as far as I
>> know. You might want to monitor udev events while running some of
>> your tests (physical disk vs. MD).
>>
>>> Thoughts?
>> Phil
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm does not create partition devices whatsoever, "partitionable"functionality broke

am 13.05.2011 21:22:09 von Phil Turmel

Hi Christopher,

On 05/13/2011 02:54 PM, Christopher White wrote:
> Hello again Phil (and Roman). Thanks to your back-and-forth, the bug has now finally been completely narrowed down: It is a bug in (g)parted!

Good to know. A pointer to the formal bug report would be a good followup, when you have it.

> The issue is that (g)parted doesn't properly call the kernel API for re-scanning the device when you operate on md disks as compared to physical disks.
>
> Your information (Phil) that (g)parted chokes on an assertion is good information for when I report this bug. It's not impossible that you must handle md-disks differently from physical disks and that (g)parted is not aware of that distinction, therefore choking on the partition table rescan API.
>
> Either way, this is fantastic news, because it means it's not an md kernel bug, where waiting for a fix would have severely pushed back my current project. I'm glad it was simply (g)parted failing to tell the kernel to re-read the partition tables.
>
> ---
>
> With this bug out of the way (I'll be reporting it to parted's mailing list now),one thing that's been bugging me during my hours of research is that the vast majority of users use either a single, large RAID array and virtually partition that with LVM, or alternatively breaking each disk into many small partitions and making multiple smaller arrays out of those partitions. Very few people seem to use md's built-in support for partitionable raid arrays.
>
> This makes me a tiny bit wary to trust the stability of md's partitionable implementation, even though I suspect it is rock solid. I suspect the reason that most people don't use the feature is for legacy/habit reasons, since md used to support only a single partition, so there's avast amount of guides telling people to use LVM. Do any of you know anything about this and can advise on whether I should go for a single-partition MD array with LVM, or a partitionable MD array?
>
> As far as performance goes, the CPU overhead of LVM is in the 1-5% range from what I've heard, and I have zero need for the other features LVM provides (snapshots, backups, online resizing, clusters of disks acting as one disk, etc), so it just feels completely overkill and worthless when all I need is a single, partitionable RAID array.

I always use LVM. While the lack of attention to MD partitions might justify that, the real reason is the sheer convenience of creating, manipulating, and deleting logical volumes on the fly. While you may not need it *now*, when you discover that you *do* need it, you won't be able to use it. Online resizing of any of your LVs is the killer feature.

Also, I'd be shocked if the LVM overhead for plain volumes was close to 1%. In fact, I'd be surprised if it was even 0.1%. Do you have any benchmarks that show otherwise?

> All I need is the ability to (in the future) add more disks to the array, grow the array, and then resize+move the partitions around using regular partitioning tools treating the RAID array as a single disk, and md's partitionable arrays support doing this since they act as a disk, where if you add more hard disks to your array; the available, unallocated space on that array simply grows and partitions on it can be expanded and relocated to take advantage of this. I don't need LVM for any of that, as long as md's implementation is stable.

If you can take the downtime, this is true.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm does not create partition devices whatsoever,"partitionable" functionality broke

am 13.05.2011 21:32:21 von Roman Mamedov

--Sig_/GnPeh53ewwrAIeme=xwjy9V
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Fri, 13 May 2011 15:22:09 -0400
Phil Turmel wrote:

> I always use LVM. While the lack of attention to MD partitions might
> justify that, the real reason is the sheer convenience of creating,
> manipulating, and deleting logical volumes on the fly. While you may not
> need it *now*, when you discover that you *do* need it, you won't be able=
to
> use it. Online resizing of any of your LVs is the killer feature.

Can it defragment non-contiguous LVs yet?

--=20
With respect,
Roman

--Sig_/GnPeh53ewwrAIeme=xwjy9V
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk3Nh0UACgkQTLKSvz+PZwjgCQCeJM5WNSud6L20pYLOtDvk Drnz
Bt4An1PrKA44LRm9kjpReGOh1ukA443r
=Kdj5
-----END PGP SIGNATURE-----

--Sig_/GnPeh53ewwrAIeme=xwjy9V--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm does not create partition devices whatsoever, "partitionable"functionality broke

am 13.05.2011 21:39:54 von Phil Turmel

On 05/13/2011 03:32 PM, Roman Mamedov wrote:
> On Fri, 13 May 2011 15:22:09 -0400
> Phil Turmel wrote:
>
>> I always use LVM. While the lack of attention to MD partitions might
>> justify that, the real reason is the sheer convenience of creating,
>> manipulating, and deleting logical volumes on the fly. While you may not
>> need it *now*, when you discover that you *do* need it, you won't be able to
>> use it. Online resizing of any of your LVs is the killer feature.
>
> Can it defragment non-contiguous LVs yet?

Automatically, no (so far as I've seen). If you have a suitable free space chunk in the group, though, you can manually create a contiguous mirror, let it sync, then remove the original segments. Do it twice if placement is critical.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm does not create partition devices whatsoever, "partitionable"functionality broke

am 13.05.2011 21:49:06 von Christopher White

On 5/13/11 9:01 PM, Rudy Zijlstra wrote:
> Hi Chris,
>
>
> I've run paritioned MD disks for several years now. I do that on
> systems where i use md for the system partitions. One mirror with
> partitions for the different system aspects. I prefer that, as it
> reflects best the actual physical configuration, and all partitions
> will be degraded at the same time when 1 disk develops a problem
> (which is unfortunately not the case when you partition the disk and
> then mirror the partitions).
>
> As i am a bit lazy and have only limited wish to fight with
> BIOS/bootloader conflicts / vagaries, these systems typically boot
> from the network (kernel gets loaded from the network, from there
> onwards all is on the local disk).
>
> Cheers,
>
>
>
> Rudy
Thank you for the information, Rudy,

Your experience of running partitioned MD arrays for years shows that it
is indeed stable. The reason for wanting to skip LVM was that it's one
less performance-penalty layer, one less layer to configure, one less
possible point of failure, etc.

However, Phil again brings up the main fear that's been nagging me, and
that is that MD's partitioning support receives less love (use) and
therefore risks having bugs that go undiscovered for ages and (gasp) may
even risk corrupting the data. People are just so used to LVM since MD
used to be single-partition only, that LVM+single-partition MD array is
far more mature and far more in use.

My main reason against LVM was the performance penalty, where I had read
that it was in the 1-5% range, but I just did a new search and saw
threads showing that any performance hit claim is outdated and that LVM2
is extremely efficient. In fact the CPU load didn't seem to be impacted
more than 0.1% or so in the graphs I saw.

By the way, Rudy, as for your boot conflicts and the fact that you
resort to running a network boot, that was only a problem in the past
when bootloaders did not support software RAID. Grub2 supports GPT, MD
arrays with metadata 1.2, and can fully boot from a system (with /boot)
installation located on your MD array. All you'll have to do is make
sure your /boot partition (and the whole system if you want to) is on a
RAID 1 (mirrored) array, and that you install the Grub2 bootloader on
every physical disk. This means that it goes:

Computer starts up -> BIOS/EFI picks any of the hard drives to boot from
-> GRUB2 loads -> GRUB2 sees the MD RAID1 array and picks ANY of the
disks to boot from (since they are all mirrored) and treats it as a
regular, raw disk as if you didn't use an array at all.

I think you may have to do some slight extra work to get the system disk
to mount as RAID1 for the OS and RAID 5 for your other array(s) after
the kernel has booted, I think you have to first boot into a ram
filesystem to allow the disk to be unmounted and re-mounted as a RAID 1
array, but it's not hard, there are guides for it. Just get a 2.6-series
kernel, grub2, a RAID1 array for the OS, and a guide and you will be
set. It will remove the need for you to keep a network PXE boot server.

On 5/13/11 9:22 PM, Phil Turmel wrote:
> Hi Christopher,
>
> On 05/13/2011 02:54 PM, Christopher White wrote:
>> Hello again Phil (and Roman). Thanks to your back-and-forth, the bug has now finally been completely narrowed down: It is a bug in (g)parted!
> Good to know. A pointer to the formal bug report would be a good followup, when you have it.
I've submitted a detailed report to the bug-parted mailing list and want
to sincerely thank ALL of you for your discussion to help narrow it
down. Thank you very much! The bug-parted archive seems slow to refresh,
but the posting is called "[Confirmed Bug] Parted does not notify kernel
when modifying partition tables in partitionable md arrays" and was
posted about 40 minutes ago. It contains a list of the steps to
reproduce the bug and the theories of why it happens. It should show up
here eventually:
http://lists.gnu.org/archive/html/bug-parted/2011-05/threads .html
> I always use LVM. While the lack of attention to MD partitions might justify that, the real reason is the sheer convenience of creating, manipulating, and deleting logical volumes on the fly. While you may not need it *now*, when you discover that you *do* need it, you won't be able to use it. Online resizing of any of your LVs is the killer feature.
>
> Also, I'd be shocked if the LVM overhead for plain volumes was close to 1%. In fact, I'd be surprised if it was even 0.1%. Do you have any benchmarks that show otherwise?
As I wrote above, you re-inforce the fear I have that the lack of
attention to MD partitions is an added risk, compared to the
ultra-well-maintained LVM2 layer. Now that the performance question is
out of the way, I will actually go for it. Online resizing and so on
isn't very interesting since I use the partitions for storage that isn't
in need of 100% availability, but the fact that LVM can be trusted with
my life whereas MD partitions are rarely used, and the fact that LVM(2)
turned out to be extremely effective CPU-wise, just settles it.

Christopher
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm does not create partition devices whatsoever, "partitionable"functionality broke

am 13.05.2011 21:49:27 von Christopher White

Re: mdadm does not create partition devices whatsoever, "partitionable"functionality broke

am 13.05.2011 22:00:07 von Rudy Zijlstra

On 05/13/2011 09:49 PM, Christopher White wrote:
> On 5/13/11 9:01 PM, Rudy Zijlstra wrote:
>> Hi Chris,
>>
>>
>> I've run paritioned MD disks for several years now. I do that on
>> systems where i use md for the system partitions. One mirror with
>> partitions for the different system aspects. I prefer that, as it
>> reflects best the actual physical configuration, and all partitions
>> will be degraded at the same time when 1 disk develops a problem
>> (which is unfortunately not the case when you partition the disk and
>> then mirror the partitions).
>>
>> As i am a bit lazy and have only limited wish to fight with
>> BIOS/bootloader conflicts / vagaries, these systems typically boot
>> from the network (kernel gets loaded from the network, from there
>> onwards all is on the local disk).
>>
>> Cheers,
>>
>>
>>
>> Rudy
> Thank you for the information, Rudy,
>
> Your experience of running partitioned MD arrays for years shows that
> it is indeed stable. The reason for wanting to skip LVM was that it's
> one less performance-penalty layer, one less layer to configure, one
> less possible point of failure, etc.

I skip LVM cause for my usage pattern it only gives me an additional
management layer... an additional layer to configure

>
> However, Phil again brings up the main fear that's been nagging me,
> and that is that MD's partitioning support receives less love (use)
> and therefore risks having bugs that go undiscovered for ages and
> (gasp) may even risk corrupting the data. People are just so used to
> LVM since MD used to be single-partition only, that
> LVM+single-partition MD array is far more mature and far more in use.
MD layer and LVM layer are independently maintained. There regular use
together would trigger eventual bugs quicker though.

>
> My main reason against LVM was the performance penalty, where I had
> read that it was in the 1-5% range, but I just did a new search and
> saw threads showing that any performance hit claim is outdated and
> that LVM2 is extremely efficient. In fact the CPU load didn't seem to
> be impacted more than 0.1% or so in the graphs I saw.
>
> By the way, Rudy, as for your boot conflicts and the fact that you
> resort to running a network boot, that was only a problem in the past
> when bootloaders did not support software RAID. Grub2 supports GPT, MD
> arrays with metadata 1.2, and can fully boot from a system (with
> /boot) installation located on your MD array. All you'll have to do is
> make sure your /boot partition (and the whole system if you want to)
> is on a RAID 1 (mirrored) array, and that you install the Grub2
> bootloader on every physical disk. This means that it goes:
>

I know... but i happen to dislike grub2, and my network boot environment
is stable and well maintained.
Grub2 is for me a step backwards. more difficult to configure, and i've
gone back to lilo as main bootloader.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm does not create partition devices whatsoever, "partitionable"functionality broke

am 14.05.2011 12:10:56 von David Brown

On 13/05/11 21:32, Roman Mamedov wrote:
> On Fri, 13 May 2011 15:22:09 -0400
> Phil Turmel wrote:
>
>> I always use LVM. While the lack of attention to MD partitions might
>> justify that, the real reason is the sheer convenience of creating,
>> manipulating, and deleting logical volumes on the fly. While you may not
>> need it *now*, when you discover that you *do* need it, you won't be able to
>> use it. Online resizing of any of your LVs is the killer feature.
>
> Can it defragment non-contiguous LVs yet?
>

What is perhaps more relevant, is can filesystems see the fragmentation
of the LV's? I don't know the answer.

Fragmentation of files is not a problem unless files are split into
/lots/ of small pieces. The bad reputation of fragmentation has come
from the DOS/Windows world, where poor filesystems combined with
shotgun-style allocators give you much slower performance than necessary.

Modern Linux filesystems have various techniques to keep fragmentation
to a minimum. But (AFAIK) they make the assumption that the underlying
device is contiguous. If the filesystem /knows/ that the device is in
bits, then it could take that into account in its allocation policy (in
the same way that it takes raid stripes into account).

Still, you don't usually have many segments in an LV - if you want the
LV to be fast, you can request it to be contiguous when creating it.
Then you only get a fragment for each time it is grown. It's a price
often worth paying for the flexibility.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm does not create partition devices whatsoever,"partitionable" functionality broke

am 14.05.2011 12:24:27 von Roman Mamedov

--Sig_/VGffokO2aS/A6mEhw5mrkP6
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Sat, 14 May 2011 12:10:56 +0200
David Brown wrote:

> What is perhaps more relevant, is can filesystems see the fragmentation=20
> of the LV's? I don't know the answer.

No, of course they can't.

> Still, you don't usually have many segments in an LV - if you want the=20
> LV to be fast, you can request it to be contiguous when creating it.=20
> Then you only get a fragment for each time it is grown. It's a price=20
> often worth paying for the flexibility.

=46rom what I see, the key selling point for LVM is the ability to 'easily'
add/remove/resize LVs. And then if you buy that and start to actively use
these features, you end up in a situation (badly fragmented LVs) from which
there isn't a proper way out. No - backup and restore, or 'have enough
contiguous free space to mirror your entire LV and then nuke the original' =
are
not the answer. What's sad is that there isn't any fundamental technical
reason LVs can't be defragmented. They can, just no one has bothered to wri=
te
the corresponding code yet.

--=20
With respect,
Roman

--Sig_/VGffokO2aS/A6mEhw5mrkP6
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk3OWFsACgkQTLKSvz+PZwj8bwCfdMlEdBrqM8uuPJCnTBQw Vm41
Op8An2o2EwlxbKTRWUzJY6/cfhOKett7
=xJOL
-----END PGP SIGNATURE-----

--Sig_/VGffokO2aS/A6mEhw5mrkP6--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm does not create partition devices whatsoever, "partitionable"functionality broke

am 14.05.2011 14:56:55 von David Brown

On 14/05/11 12:24, Roman Mamedov wrote:
> On Sat, 14 May 2011 12:10:56 +0200
> David Brown wrote:
>
>> What is perhaps more relevant, is can filesystems see the fragmentation
>> of the LV's? I don't know the answer.
>
> No, of course they can't.
>
>> Still, you don't usually have many segments in an LV - if you want the
>> LV to be fast, you can request it to be contiguous when creating it.
>> Then you only get a fragment for each time it is grown. It's a price
>> often worth paying for the flexibility.
>
> From what I see, the key selling point for LVM is the ability to 'easily'
> add/remove/resize LVs. And then if you buy that and start to actively use
> these features, you end up in a situation (badly fragmented LVs) from which
> there isn't a proper way out. No - backup and restore, or 'have enough
> contiguous free space to mirror your entire LV and then nuke the original' are
> not the answer. What's sad is that there isn't any fundamental technical
> reason LVs can't be defragmented. They can, just no one has bothered to write
> the corresponding code yet.
>

I'm sure that LV's could be defragmented - there is already code to move
them around on the disks (such as to move them out of a PV before
deleting the PV). I don't know why it hasn't been implemented - maybe
there are too few people working on LVM, or that it is a low priority,
or that LV fragmentation makes very little measurable difference in
practice.

Personally, I find LVM to be a hugely useful tool. I like being able to
make new logical volumes when I need them, and resize them as
convenient. For servers, I make heavy use of openvz lightweight virtual
serving, and I make a new LV for each "machine". So setting up a new
"server" with its own "disk" is done in a couple of minutes. And if the
needs of the "server" outgrow its bounds, it's easy to extend it.

I've had plenty of other cases where LVM has saved me a lot of time and
effort. It's not that long ago since I temporarily needed a bit more
space on a server, and didn't have the time or spare disk to build it
out. So I added a USB disk I had lying around, made a PV on it, and
extended the server's LVs onto the disk. Obviously this sort of thing
gives a performance hit - but it was better to be slow than not working.
For me, LVM's flexibility is worth the minor performance cost.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm does not create partition devices whatsoever,"partitionable" functionality broke

am 14.05.2011 15:27:14 von Drew

> I'm sure that LV's could be defragmented - there is already code to m=
ove
> them around on the disks (such as to move them out of a PV before del=
eting
> the PV). Â I don't know why it hasn't been implemented - maybe th=
ere are too
> few people working on LVM, or that it is a low priority, or that LV
> fragmentation makes very little measurable difference in practice.

I've always figured it was because fragmentation in the LV's caused
little performance degradation. If we were talking about LV's composed
of hundreds of fragments I would expect to see degradation but I've
never come across a scenario where LV's have been that bad.

Someone refered to DOS in an earlier post and I think that's a good
example of relevance. I maintain a bunch of Windows based machines at
work and I did some performance benchmarking between a traditional
defrag utility and some of the "professional" versions. Bells and
whistles aside, what set most of the Pro versions apart from the
standard defrag utilities was the concept of "good enough" defrag,
which basically puts files into several larger fragments as opposed to
a complete defrag. I ran tests on filesystem performance before and
after defraging drives with both options and the change in performance
between a full defrag and a "good enough" defrag was minimal.

--=20
Drew

"Nothing in life is to be feared. It is only to be understood."
--Marie Curie

"This started out as a hobby and spun horribly out of control."
-Unknown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm does not create partition devices whatsoever, "partitionable"functionality broke

am 14.05.2011 20:21:48 von David Brown

On 14/05/11 15:27, Drew wrote:
>> I'm sure that LV's could be defragmented - there is already code to move
>> them around on the disks (such as to move them out of a PV before deleting
>> the PV). I don't know why it hasn't been implemented - maybe there are too
>> few people working on LVM, or that it is a low priority, or that LV
>> fragmentation makes very little measurable difference in practice.
>
> I've always figured it was because fragmentation in the LV's caused
> little performance degradation. If we were talking about LV's composed
> of hundreds of fragments I would expect to see degradation but I've
> never come across a scenario where LV's have been that bad.
>

I too think the fragmentation is probably a small effect - but it is
maybe a measurable effect nonetheless. I've never seen any benchmarks
on it. However, I know that some filesystems (such as xfs in
particular, and ext4 to a lesser extent) go out of their way to reduce
the risk of fragmentation in files - if it is worth their effort, then
perhaps it is also worth it for LVM.

> Someone refered to DOS in an earlier post and I think that's a good
> example of relevance. I maintain a bunch of Windows based machines at
> work and I did some performance benchmarking between a traditional
> defrag utility and some of the "professional" versions. Bells and
> whistles aside, what set most of the Pro versions apart from the
> standard defrag utilities was the concept of "good enough" defrag,
> which basically puts files into several larger fragments as opposed to
> a complete defrag. I ran tests on filesystem performance before and
> after defraging drives with both options and the change in performance
> between a full defrag and a "good enough" defrag was minimal.
>

You will probably also find that the real-world difference between no
defrag and "good enough" defrag is also minimal.

There are some heavily used files and directories that get so badly
fragmented in windows systems that they can benefit from a defrag - for
example the registry files, the windows directory, and some of the NTFS
structures. Of course, these are the parts that normal defrag utilities
can't help - they can't be defragged while the system is running. But
for most other parts of the system, defrag makes very little real
difference, especially as it is so temporary.

In the old days, before DOS and Windows had any sort of file or disk
cache, defraging had a bigger effect. But now you are far better off
spending money on some extra ram for more cache space than on
"professional" defrag programs.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html