Questions about 4k sector drives

Questions about 4k sector drives

am 25.04.2010 13:45:27 von Florian Kusche

Hello,

I have a few questions about Software RAID and 4k-sector drives. I already searched the web, but some questions are left.

1) (this one is not directly raid-related)
I understand, that newer kernels can determine whether a drive uses 4k physical sectors even if it presents 512-byte logical sectors to the outside. (via the sector_size_supported() patch by Matthew Wilcox)

Is this only an information given to userland tools, or will the linux kernel change its behavior (e.g. use 4k-blocks for such block devices)? (I guess it's only an information for userland tools.)

2)
Will Software RAID work with physical 4k-sector drives...
- that simulate logical 512 byte sectors?
- that also have logical 4k-sectors?
(I'm pretty sure, the answer to both questions is yes.)

3)
Will Software RAID work in mixed setups? i.e.: what combinations of the following drive types are possible?
- 512b physical / 512b logical
- 4k physical / 512b logical
- 4k physical / 4k logical
And: Will the resulting md block device have 512 byte blocks or 4k blocks?
(I would guess that you need to have the same logical sector size for all disks.)

I am aware of the performance problems due to read-modify-write cycles and potential misalignment. This has been discussed in plenty of articles on the web.

It would be great if someone could clear things up a little (and tell me if me guesses are correct).

Thanks,
Florian
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Questions about 4k sector drives

am 25.04.2010 16:43:21 von Phillip Susi

On Sun, 2010-04-25 at 13:45 +0200, Florian Kusche wrote:
> Hello,
>
> I have a few questions about Software RAID and 4k-sector drives. I
> already searched the web, but some questions are left.
>
> 1) (this one is not directly raid-related) I understand, that newer
> kernels can determine whether a drive uses 4k physical sectors even if
> it presents 512-byte logical sectors to the outside. (via the
> sector_size_supported() patch by Matthew Wilcox)
>
> Is this only an information given to userland tools, or will the linux
> kernel change its behavior (e.g. use 4k-blocks for such block
> devices)? (I guess it's only an information for userland tools.)

If the device reports it, then the kernel knows it and exports it to
user space. The new 4k drives from WD unfortunately, lie and report
their physical sector size is 512 bytes.

> 2)
> Will Software RAID work with physical 4k-sector drives...
> - that simulate logical 512 byte sectors?
> - that also have logical 4k-sectors?
> (I'm pretty sure, the answer to both questions is yes.)

Yes.


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Questions about 4k sector drives

am 03.05.2010 01:04:50 von Bill Davidsen

Phillip Susi wrote:
> On Sun, 2010-04-25 at 13:45 +0200, Florian Kusche wrote:
>
>> Hello,
>>
>> I have a few questions about Software RAID and 4k-sector drives. I
>> already searched the web, but some questions are left.
>>
>> 1) (this one is not directly raid-related) I understand, that newer
>> kernels can determine whether a drive uses 4k physical sectors even if
>> it presents 512-byte logical sectors to the outside. (via the
>> sector_size_supported() patch by Matthew Wilcox)
>>
>> Is this only an information given to userland tools, or will the linux
>> kernel change its behavior (e.g. use 4k-blocks for such block
>> devices)? (I guess it's only an information for userland tools.)
>>
>
> If the device reports it, then the kernel knows it and exports it to
> user space. The new 4k drives from WD unfortunately, lie and report
> their physical sector size is 512 bytes.
>
>
>> 2)
>> Will Software RAID work with physical 4k-sector drives...
>> - that simulate logical 512 byte sectors?
>> - that also have logical 4k-sectors?
>> (I'm pretty sure, the answer to both questions is yes.)
>>
>
> Yes.
>

Is there any reason not to just align partitions for all drives on 32kB
sectors and expect that to work on 512b 4kB and SSD? Cautious testing
here says it works fine, no anomalies, no exciting performance data,
just works.

--
Bill Davidsen
"We can't solve today's problems by using the same thinking we
used in creating them." - Einstein

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Questions about 4k sector drives

am 03.05.2010 07:54:47 von Luca Berra

On Sun, May 02, 2010 at 07:04:50PM -0400, Bill Davidsen wrote:
> Is there any reason not to just align partitions for all drives on 32kB
> sectors and expect that to work on 512b 4kB and SSD? Cautious testing here
> says it works fine, no anomalies, no exciting performance data, just works.

No reason at all, consider that even microsoft decided to align
partition at 1M boundary with W2008.

L.

--
Luca Berra -- bluca@comedia.it
Communication Media & Services S.r.l.
/"\
\ / ASCII RIBBON CAMPAIGN
X AGAINST HTML MAIL
/ \
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Questions about 4k sector drives

am 03.05.2010 15:17:54 von Phillip Susi

On 5/2/2010 7:04 PM, Bill Davidsen wrote:
> Is there any reason not to just align partitions for all drives on 32kB
> sectors and expect that to work on 512b 4kB and SSD? Cautious testing
> here says it works fine, no anomalies, no exciting performance data,
> just works.

SSD usually have an erase block size of 512k, which is why Windows 7
aligns partitions to a 1 MB boundary and parted has followed suit.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Questions about 4k sector drives

am 03.05.2010 15:30:01 von Greg Freemyer

Adding Martin Petersen in cc since he knows as much about 4K sector
drives as anyone.

On Sun, May 2, 2010 at 7:04 PM, Bill Davidsen wrote:

> Is there any reason not to just align partitions for all drives on 32kB
> sectors and expect that to work on 512b 4kB and SSD? Cautious testing here
> says it works fine, no anomalies, no exciting performance data, just works.

In theory 4K physical sector drives with XP alignment will eventually
ship and possibly have already.

The alignment maybe controlled via a jumper, or could be set in the
factory. Its up to the manufacturer so there is no way to predict.

These drives will need partitions/stripes etc. aligned to 31.5K, not 1MB.

I don't know it any those drives exist yet, or if they ever will. But
the kernel topology info specifically supports providing the above
info and aiui parted uses it to choose the best partition layout.

mdadm should as well, not just blindly say 1MB is the magic alignment
point. (ie. linux can do better than Win2008/Win2003 which simply
disagree with each other on how to align.)

Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Questions about 4k sector drives

am 03.05.2010 15:38:58 von Phillip Susi

On 5/3/2010 9:30 AM, Greg Freemyer wrote:
> In theory 4K physical sector drives with XP alignment will eventually
> ship and possibly have already.
>
> The alignment maybe controlled via a jumper, or could be set in the
> factory. Its up to the manufacturer so there is no way to predict.
>
> These drives will need partitions/stripes etc. aligned to 31.5K, not 1MB.

The WD drives have such a jumper, but it is not set by default and WD
highly recommends NOT using it since it will only produce optimal
results with XP.

> I don't know it any those drives exist yet, or if they ever will. But
> the kernel topology info specifically supports providing the above
> info and aiui parted uses it to choose the best partition layout.

AFAICS the kernel has a means of providing that information to user
space, and parted will use it if it is provided, but the kernel has no
means of obtaining that informati9on from the drive, so it is always
left as unknown, so parted defaults to 1 MB alignment like Windows 7.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Questions about 4k sector drives

am 03.05.2010 22:19:47 von martin.petersen

>>>>> "Greg" == Greg Freemyer writes:

Greg> In theory 4K physical sector drives with XP alignment will
Greg> eventually ship and possibly have already.

Well, that's a definite maybe :)

The 4K transition took much longer than anticipated and Vista and beyond
know how to query the drives for alignment. So I'm guessing that we'll
only see 1-alignment via a jumper at this point.


Greg> I don't know it any those drives exist yet, or if they ever will.

I have a bunch, but obviously they are mostly prototypes.


Greg> mdadm should as well, not just blindly say 1MB is the magic
Greg> alignment point. (ie. linux can do better than Win2008/Win2003
Greg> which simply disagree with each other on how to align.)

We're going with 1MB as default because that's the new storage industry
consensus. It's a less formalized number than - say - IDEMA sector
counts, but it appears to have reached critical mass among the vendors.

And obviously we'll compensate if the storage device reports a different
alignment via the relevant ATA or SCSI knobs.

--
Martin K. Petersen Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Questions about 4k sector drives

am 03.05.2010 22:27:16 von martin.petersen

>>>>> "Phillip" == Phillip Susi writes:

>> I don't know it any those drives exist yet, or if they ever will.
>> But the kernel topology info specifically supports providing the
>> above info and aiui parted uses it to choose the best partition
>> layout.

Phillip> AFAICS the kernel has a means of providing that information to
Phillip> user space, and parted will use it if it is provided, but the
Phillip> kernel has no means of obtaining that information from the
Phillip> drive, so it is always left as unknown, so parted defaults to 1
Phillip> MB alignment like Windows 7.

We have means of obtaining alignment and physical sector size
information from both SCSI and ATA drives. But only if the drive
firmware provides the information, of course.

One currently shipping drive model on the market isn't reporting the
bigger physical block size. But there are several other 4KB sector
products out there that are working just fine.

--
Martin K. Petersen Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Questions about 4k sector drives

am 04.05.2010 15:24:42 von Phillip Susi

On 5/3/2010 4:27 PM, Martin K. Petersen wrote:
> We have means of obtaining alignment and physical sector size
> information from both SCSI and ATA drives. But only if the drive
> firmware provides the information, of course.

How? I don't see any such information in the output of hdparm -I for
instance.

> One currently shipping drive model on the market isn't reporting the
> bigger physical block size. But there are several other 4KB sector
> products out there that are working just fine.

The WD drive indeed reports a 512 byte sector size, but I also have an
SSD with a 512kb erase block size and it seems like these knobs were
intended to cover that as well, but again, the values exported by the
kernel in /sys are 0 and I don't see a way for the drive to report this
information to the kernel.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Questions about 4k sector drives

am 04.05.2010 17:29:47 von martin.petersen

>>>>> "Phillip" == Phillip Susi writes:

>> We have means of obtaining alignment and physical sector size
>> information from both SCSI and ATA drives. But only if the drive
>> firmware provides the information, of course.

Phillip> How?

http://oss.oracle.com/~mkp/docs/linux-advanced-storage.pdf


Phillip> I don't see any such information in the output of hdparm -I for
Phillip> instance.

You need hdparm-9.27 or later.


>> One currently shipping drive model on the market isn't reporting the
>> bigger physical block size. But there are several other 4KB sector
>> products out there that are working just fine.

Phillip> The WD drive indeed reports a 512 byte sector size, but I also
Phillip> have an SSD with a 512kb erase block size and it seems like
Phillip> these knobs were intended to cover that as well, but again, the
Phillip> values exported by the kernel in /sys are 0 and I don't see a
Phillip> way for the drive to report this information to the kernel.

There are no means to report things like the erase block size. A few
years ago there was a push in the industry to define a set of parameters
that would make sense for flash drives. For a variety of reasons,
however, this effort never really took off. There are some things in
the pipeline but it's mostly statistics and life expectancy stuff.

On well-designed drives the erase block size and other physical
characteristics do not matter because the firmware uses an approach akin
to a log-structured filesystem.

For low-end devices (where we could potentially benefit from knowing the
physical characteristics) the problem is that this information is often
considered part of the vendor's secret sauce. Another common concern is
that exporting a set of metrics squarely puts the drive in the "poorly
designed" bucket and that's a marketing disaster.

It is a lengthy process to get stuff pushed through the standards
organizations. Even if the industry had been successful in defining a
set of SSD characteristics it would have taken quite a while for things
to get ratified and show up in devices. The expectation was that early
SSD designs exhibiting side effects from being flash-based would be
obsolete by then.

And as it turns out you can get an SSD with a sane firmware for $100 and
change these days...

--
Martin K. Petersen Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Questions about 4k sector drives

am 05.05.2010 09:44:48 von John Robinson

Perhaps slightly o/t, but...

On 04/05/2010 16:29, Martin K. Petersen wrote:
[...]
> And as it turns out you can get an SSD with a sane firmware for $100 and
> change these days...

You can? Which one(s)? Would they be good for putting md bitmaps and
filesystem journals on?

Cheers,

John.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Questions about 4k sector drives

am 05.05.2010 09:47:14 von Mikael Abrahamsson

On Wed, 5 May 2010, John Robinson wrote:

> You can? Which one(s)? Would they be good for putting md bitmaps and
> filesystem journals on?

Yes. The Intel X25-V 40G drive is the one I would recommend, I use it as a
system drive in one box, it's not as fast (linear write speed) as the
X25-M drives, but it's definitely a step up from the 5400rpm 2.5" drive I
used in the system before :P

--
Mikael Abrahamsson email: swmike@swm.pp.se
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Questions about 4k sector drives

am 10.05.2010 19:13:28 von Bill Davidsen

Mikael Abrahamsson wrote:
> On Wed, 5 May 2010, John Robinson wrote:
>
>> You can? Which one(s)? Would they be good for putting md bitmaps and
>> filesystem journals on?
>
> Yes. The Intel X25-V 40G drive is the one I would recommend, I use it
> as a system drive in one box, it's not as fast (linear write speed) as
> the X25-M drives, but it's definitely a step up from the 5400rpm 2.5"
> drive I used in the system before :P
>
2nd that, nice drive, cheap, many uses for it. In addition to bitmap and
journal (pick the right journal options and see a huge boost), putting
swap out there make hibernate one of those "wanna see it again" operations.

I really want to use it for write cache, but I guess putting a big
journal there has a similar effect.

--
Bill Davidsen
"We can't solve today's problems by using the same thinking we
used in creating them." - Einstein

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html