stripe cache question

stripe cache question

am 24.02.2011 22:06:43 von Piergiorgio Sartor

Hi all,

few posts ago was mentioned that the unit of the stripe
cache are "pages per device", usually 4K pages.

Questions:

1) Does "device" means raid (md) device or component
device (HDD)?

2) The max possible value seems to be 32768, which
means, in case of 4K page per md device, a max of
128MiB of RAM.
Is this by design? Would it be possible to increase
up to whatever is available?

Thanks,

bye,

--

piergiorgio
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: stripe cache question

am 25.02.2011 04:51:25 von NeilBrown

On Thu, 24 Feb 2011 22:06:43 +0100 Piergiorgio Sartor
wrote:

> Hi all,
>
> few posts ago was mentioned that the unit of the stripe
> cache are "pages per device", usually 4K pages.
>
> Questions:
>
> 1) Does "device" means raid (md) device or component
> device (HDD)?

component device.
In drivers/md/raid5.[ch] there is a 'struct stripe_head'.
It holds one page per component device (ignoring spares).
Several of these comprise the 'cache'. The 'size' of the cache is the number
oof 'struct stripe_head' and associated pages that are allocated.


>
> 2) The max possible value seems to be 32768, which
> means, in case of 4K page per md device, a max of
> 128MiB of RAM.
> Is this by design? Would it be possible to increase
> up to whatever is available?

32768 is just an arbitrary number. It is there in raid5.c and is easy to
change (for people comfortable with recompiling their kernels).

I wanted an upper limit because setting it too high could easily cause your
machine to run out of memory and become very sluggish - or worse.

Ideally the cache should be automatically sized based on demand and memory
size - with maybe just a tunable to select between "use a much memory as you
need - within reason" verse "use a little memory as you can manage with".

But that requires thought and design and code and .... it just never seemed
like a priority.

NeilBrown


>
> Thanks,
>
> bye,
>

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: stripe cache question

am 26.02.2011 11:21:28 von Piergiorgio Sartor

On Fri, Feb 25, 2011 at 02:51:25PM +1100, NeilBrown wrote:
> On Thu, 24 Feb 2011 22:06:43 +0100 Piergiorgio Sartor
> wrote:
>
> > Hi all,
> >
> > few posts ago was mentioned that the unit of the stripe
> > cache are "pages per device", usually 4K pages.
> >
> > Questions:
> >
> > 1) Does "device" means raid (md) device or component
> > device (HDD)?
>
> component device.
> In drivers/md/raid5.[ch] there is a 'struct stripe_head'.
> It holds one page per component device (ignoring spares).
> Several of these comprise the 'cache'. The 'size' of the cache is the number
> oof 'struct stripe_head' and associated pages that are allocated.
>
>
> >
> > 2) The max possible value seems to be 32768, which
> > means, in case of 4K page per md device, a max of
> > 128MiB of RAM.
> > Is this by design? Would it be possible to increase
> > up to whatever is available?
>
> 32768 is just an arbitrary number. It is there in raid5.c and is easy to
> change (for people comfortable with recompiling their kernels).

Ah! I found it. Maybe, considering currently
available memory you should think about incresing
it to, for example, 128K or 512K.

> I wanted an upper limit because setting it too high could easily cause your
> machine to run out of memory and become very sluggish - or worse.
>
> Ideally the cache should be automatically sized based on demand and memory
> size - with maybe just a tunable to select between "use a much memory as you
> need - within reason" verse "use a little memory as you can manage with".
>
> But that requires thought and design and code and .... it just never seemed
> like a priority.

You're a bit contraddicting your philosopy of
"let's do the smart things in user space"... :-)

IMHO, if really necessary, it could be enough to
have this "upper limit" avaialable in sysfs.

Then user space can decide what to do with it.

For example, at boot the amount of memory is checked
and the upper limit set.
I see a duplication here, maybe better just remove
the upper limit and let user space to deal with that.

bye,

--

piergiorgio
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: stripe cache question

am 27.02.2011 05:43:50 von NeilBrown

On Sat, 26 Feb 2011 11:21:28 +0100 Piergiorgio Sartor
wrote:

> On Fri, Feb 25, 2011 at 02:51:25PM +1100, NeilBrown wrote:
> > On Thu, 24 Feb 2011 22:06:43 +0100 Piergiorgio Sartor
> > wrote:
> >
> > > Hi all,
> > >
> > > few posts ago was mentioned that the unit of the stripe
> > > cache are "pages per device", usually 4K pages.
> > >
> > > Questions:
> > >
> > > 1) Does "device" means raid (md) device or component
> > > device (HDD)?
> >
> > component device.
> > In drivers/md/raid5.[ch] there is a 'struct stripe_head'.
> > It holds one page per component device (ignoring spares).
> > Several of these comprise the 'cache'. The 'size' of the cache is the number
> > oof 'struct stripe_head' and associated pages that are allocated.
> >
> >
> > >
> > > 2) The max possible value seems to be 32768, which
> > > means, in case of 4K page per md device, a max of
> > > 128MiB of RAM.
> > > Is this by design? Would it be possible to increase
> > > up to whatever is available?
> >
> > 32768 is just an arbitrary number. It is there in raid5.c and is easy to
> > change (for people comfortable with recompiling their kernels).
>
> Ah! I found it. Maybe, considering currently
> available memory you should think about incresing
> it to, for example, 128K or 512K.
>
> > I wanted an upper limit because setting it too high could easily cause your
> > machine to run out of memory and become very sluggish - or worse.
> >
> > Ideally the cache should be automatically sized based on demand and memory
> > size - with maybe just a tunable to select between "use a much memory as you
> > need - within reason" verse "use a little memory as you can manage with".
> >
> > But that requires thought and design and code and .... it just never seemed
> > like a priority.
>
> You're a bit contraddicting your philosopy of
> "let's do the smart things in user space"... :-)
>
> IMHO, if really necessary, it could be enough to
> have this "upper limit" avaialable in sysfs.
>
> Then user space can decide what to do with it.
>
> For example, at boot the amount of memory is checked
> and the upper limit set.
> I see a duplication here, maybe better just remove
> the upper limit and let user space to deal with that.


Maybe.... I still feel I want some sort of built-in protection...

Maybe if I did all the allocations with "__GFP_WAIT" clear so that it would
only allocate memory that is easily available. It wouldn't be a hard
guarantee against running out, but it might help..

Maybe you could try removing the limit and see what actually happens when
you set a ridiculously large size.??

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: stripe cache question

am 27.02.2011 12:37:11 von Piergiorgio Sartor

> > > Ideally the cache should be automatically sized based on demand and memory
> > > size - with maybe just a tunable to select between "use a much memory as you
> > > need - within reason" verse "use a little memory as you can manage with".
> > >
> > > But that requires thought and design and code and .... it just never seemed
> > > like a priority.
> >
> > You're a bit contraddicting your philosopy of
> > "let's do the smart things in user space"... :-)
> >
> > IMHO, if really necessary, it could be enough to
> > have this "upper limit" avaialable in sysfs.
> >
> > Then user space can decide what to do with it.
> >
> > For example, at boot the amount of memory is checked
> > and the upper limit set.
> > I see a duplication here, maybe better just remove
> > the upper limit and let user space to deal with that.
>
>
> Maybe.... I still feel I want some sort of built-in protection...

As I wrote, I think a second sysfs entry, with the upper
limit, could be enough.
It allows flexibility and somehow protection.
It would be required two _coordinated_ access to sysfs in order
to break the limit, which is unlikely to happen by random.
That is, at boot /sys/block/mdX/md/stripe_cache_limit will
be 32768 and the "cache_size" will be 256.
If someone wants to play with the cache size, will be able
to top it to 32768. Otherwise, the first entry has to be
changed to higher values (min value should be the cache_size).

This is, of course, a duplication, but it enforce a certain
process (two accesses) giving then some degree of protection.

I guess, but you're the expert, this should be easier than
other solutions.

> Maybe if I did all the allocations with "__GFP_WAIT" clear so that it would
> only allocate memory that is easily available. It wouldn't be a hard
> guarantee against running out, but it might help..

Again, I think you're over-designing it.

BTW, I hope that is unswappable memory, or?

> Maybe you could try removing the limit and see what actually happens when
> you set a ridiculously large size.??

Yes and no. The home PC has a RAID-10f2, the work PC has
a RAID-5, but I do not want to play with the kernel on it.
I guess using loop devices will not be meaningful.

As soon as I manage to build the RAID-6 NAS I could give it
a try, but this has no "schedule" right now.

bye,

--

piergiorgio
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: stripe cache question

am 06.03.2011 21:08:56 von Piergiorgio Sartor

> Maybe you could try removing the limit and see what actually happens when
> you set a ridiculously large size.??

Actually, I tried (unintentionally) something similar.

The storage I've has 7 RAID-6 arrays, so I tried to
increase the stripe_cache_size to 32768 on each array.
Of course, while writing to the array...

The first 3 have 10, 10 and 9 HDDs, so with the third
array about 3.6GiB were allocated out of 4GiB the PC has.

At this point the PC was completely unresponsive, but
still working, i.e. it was still writing to the array.

Also ssh did not answer in time.

Nevertheless, it was not dead or locked, just extremely
slow, in fact, once the writing finished, the PC was
again working as before.

I guess there was a lot of swapping going on...

In any case, an upper limit seems to be necessary, but
it should be consistent with all available RAM.
It does not help, it seems, to limit arrays independently,
there should a be a "global" limit, so that the _sum_ of
the caches does no exceed this limit.

Hope this helps,

bye,

--

piergiorgio
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html