[md PATCH 00/34] md patches for 3.1

[md PATCH 00/34] md patches for 3.1 - part 1

am 21.07.2011 04:32:24 von NeilBrown

As 3.0 is fast approaching it is long past time to publish my patch
queue for 3.1. Much of it has been in -next for a while and while
that enabled testing it doesn't really encourage review.

So this is the first of 2 patch-bombs I will be sending which comprise
all of the changes is (currently) plan to submit for 3.1.

This set is mostly ad-hoc bits and pieces. Code clean up and minor
bugs fixes and so forth. Probably the strongest theme is some
refactoring in raid5 to reduce code duplication between RAID5 and
RAID6. (almost) all similar code has been factored into common
routines.

All "Reviewed-by" lines gratefully accepted ... until early-ish next
week when I will submit that code to Linus. If anyone is keen to do
some review, the RAID5 changes is where I would be most happy to have
it focused. Remember: if something doesn't look right, it probably
isn't.

Some of these patches have been seen on the list already, but I
thought it best to simply include them all for completeness.

NeilBrown

---

Akinobu Mita (1):
md: use proper little-endian bitops

Christian Dietrich (1):
md/raid: use printk_ratelimited instead of printk_ratelimit

Jonathan Brassow (2):
MD bitmap: Revert DM dirty log hooks
MD: raid1 s/sysfs_notify_dirent/sysfs_notify_dirent_safe

Namhyung Kim (11):
md/raid10: move rdev->corrected_errors counting
md/raid5: move rdev->corrected_errors counting
md/raid1: move rdev->corrected_errors counting
md: get rid of unnecessary casts on page_address()
md: remove ro check in md_check_recovery()
md: introduce link/unlink_rdev() helpers
md/raid5: get rid of duplicated call to bio_data_dir()
md/raid5: use kmem_cache_zalloc()
md/raid10: share pages between read and write bio's during recovery
md/raid10: factor out common bio handling code
md/raid10: get rid of duplicated conditional expression

NeilBrown (19):
md/raid5: Avoid BUG caused by multiple failures.
md/raid10: Improve decision on whether to fail a device with a read error.
md/raid10: Make use of new recovery_disabled handling
md: change managed of recovery_disabled.
md/raid5: finalise new merged handle_stripe.
md/raid5: move some more common code into handle_stripe
md/raid5: move more common code into handle_stripe
md/raid5: unite handle_stripe_dirtying5 and handle_stripe_dirtying6
md/raid5: unite fetch_block5 and fetch_block6
md/raid5: rearrange a test in fetch_block6.
md/raid5: move more code into common handle_stripe
md/raid5: Move code for finishing a reconstruction into handle_stripe.
md/raid5: move stripe_head_state and more code into handle_stripe.
md/raid5: add some more fields to stripe_head_state
md/raid5: unify stripe_head_state and r6_state
md/raid5: move common code into handle_stripe
md/raid5: replace sh->lock with an 'active' flag.
md/raid5: Protect some more code with ->device_lock.
md/raid5: Remove use of sh->lock in sync_request

drivers/md/bitmap.c | 137 +++-----
drivers/md/bitmap.h | 5
drivers/md/md.c | 77 ++---
drivers/md/md.h | 28 +-
drivers/md/raid1.c | 66 ++--
drivers/md/raid1.h | 6
drivers/md/raid10.c | 214 +++++++------
drivers/md/raid10.h | 5
drivers/md/raid5.c | 832 ++++++++++++++-------------------------------------
drivers/md/raid5.h | 45 +--
10 files changed, 505 insertions(+), 910 deletions(-)

--
Signature

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[md PATCH 02/34] md/raid10: factor out common bio handling code

am 21.07.2011 04:32:24 von NeilBrown

From: Namhyung Kim

When normal-write and sync-read/write bio completes, we should
find out the disk number the bio belongs to. Factor those common
code out to a separate function.

Signed-off-by: Namhyung Kim
Signed-off-by: NeilBrown
---

drivers/md/raid10.c | 44 +++++++++++++++++++++++---------------------
1 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index d55ae12..e434f1e 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -244,6 +244,23 @@ static inline void update_head_pos(int slot, r10bio_t *r10_bio)
r10_bio->devs[slot].addr + (r10_bio->sectors);
}

+/*
+ * Find the disk number which triggered given bio
+ */
+static int find_bio_disk(conf_t *conf, r10bio_t *r10_bio, struct bio *bio)
+{
+ int slot;
+
+ for (slot = 0; slot < conf->copies; slot++)
+ if (r10_bio->devs[slot].bio == bio)
+ break;
+
+ BUG_ON(slot == conf->copies);
+ update_head_pos(slot, r10_bio);
+
+ return r10_bio->devs[slot].devnum;
+}
+
static void raid10_end_read_request(struct bio *bio, int error)
{
int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
@@ -289,13 +306,10 @@ static void raid10_end_write_request(struct bio *bio, int error)
{
int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
r10bio_t *r10_bio = bio->bi_private;
- int slot, dev;
+ int dev;
conf_t *conf = r10_bio->mddev->private;

- for (slot = 0; slot < conf->copies; slot++)
- if (r10_bio->devs[slot].bio == bio)
- break;
- dev = r10_bio->devs[slot].devnum;
+ dev = find_bio_disk(conf, r10_bio, bio);

/*
* this branch is our 'one mirror IO has finished' event handler:
@@ -316,8 +330,6 @@ static void raid10_end_write_request(struct bio *bio, int error)
*/
set_bit(R10BIO_Uptodate, &r10_bio->state);

- update_head_pos(slot, r10_bio);
-
/*
*
* Let's see if all mirrored write operations have finished
@@ -1173,14 +1185,9 @@ static void end_sync_read(struct bio *bio, int error)
{
r10bio_t *r10_bio = bio->bi_private;
conf_t *conf = r10_bio->mddev->private;
- int i,d;
+ int d;

- for (i=0; icopies; i++)
- if (r10_bio->devs[i].bio == bio)
- break;
- BUG_ON(i == conf->copies);
- update_head_pos(i, r10_bio);
- d = r10_bio->devs[i].devnum;
+ d = find_bio_disk(conf, r10_bio, bio);

if (test_bit(BIO_UPTODATE, &bio->bi_flags))
set_bit(R10BIO_Uptodate, &r10_bio->state);
@@ -1211,18 +1218,13 @@ static void end_sync_write(struct bio *bio, int error)
r10bio_t *r10_bio = bio->bi_private;
mddev_t *mddev = r10_bio->mddev;
conf_t *conf = mddev->private;
- int i,d;
+ int d;

- for (i = 0; i < conf->copies; i++)
- if (r10_bio->devs[i].bio == bio)
- break;
- d = r10_bio->devs[i].devnum;
+ d = find_bio_disk(conf, r10_bio, bio);

if (!uptodate)
md_error(mddev, conf->mirrors[d].rdev);

- update_head_pos(i, r10_bio);
-
rdev_dec_pending(conf->mirrors[d].rdev, mddev);
while (atomic_dec_and_test(&r10_bio->remaining)) {
if (r10_bio->master_bio == NULL) {

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[md PATCH 03/34] md/raid10: share pages between read and write bio"sduring recovery

am 21.07.2011 04:32:24 von NeilBrown

From: Namhyung Kim

When performing a recovery, only first 2 slots in r10_bio are in use,
for read and write respectively. However all of pages in the write bio
are never used and just replaced to read bio's when the read completes.

Get rid of those unused pages and share read pages properly.

Signed-off-by: Namhyung Kim
Signed-off-by: NeilBrown
---

drivers/md/raid10.c | 23 ++++++++++++-----------
1 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index e434f1e..3715e22 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -123,7 +123,14 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
for (j = 0 ; j < nalloc; j++) {
bio = r10_bio->devs[j].bio;
for (i = 0; i < RESYNC_PAGES; i++) {
- page = alloc_page(gfp_flags);
+ if (j == 1 && !test_bit(MD_RECOVERY_SYNC,
+ &conf->mddev->recovery)) {
+ /* we can share bv_page's during recovery */
+ struct bio *rbio = r10_bio->devs[0].bio;
+ page = rbio->bi_io_vec[i].bv_page;
+ get_page(page);
+ } else
+ page = alloc_page(gfp_flags);
if (unlikely(!page))
goto out_free_pages;

@@ -1360,20 +1367,14 @@ done:
static void recovery_request_write(mddev_t *mddev, r10bio_t *r10_bio)
{
conf_t *conf = mddev->private;
- int i, d;
- struct bio *bio, *wbio;
-
+ int d;
+ struct bio *wbio;

- /* move the pages across to the second bio
+ /*
+ * share the pages with the first bio
* and submit the write request
*/
- bio = r10_bio->devs[0].bio;
wbio = r10_bio->devs[1].bio;
- for (i=0; i < wbio->bi_vcnt; i++) {
- struct page *p = bio->bi_io_vec[i].bv_page;
- bio->bi_io_vec[i].bv_page = wbio->bi_io_vec[i].bv_page;
- wbio->bi_io_vec[i].bv_page = p;
- }
d = r10_bio->devs[1].devnum;

atomic_inc(&conf->mirrors[d].rdev->nr_pending);

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[md PATCH 01/34] md/raid10: get rid of duplicated conditionalexpression

am 21.07.2011 04:32:24 von NeilBrown

From: Namhyung Kim

Variable 'first' is initialized to zero and updated to @rdev->raid_disk
only if it is greater than 0. Thus condition '>= first' always implies
'>= 0' so the latter is not needed.

Signed-off-by: Namhyung Kim
Signed-off-by: NeilBrown
---

drivers/md/raid10.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 6e84668..d55ae12 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1093,8 +1093,7 @@ static int raid10_add_disk(mddev_t *mddev, mdk_rdev_t *rdev)
if (rdev->raid_disk >= 0)
first = last = rdev->raid_disk;

- if (rdev->saved_raid_disk >= 0 &&
- rdev->saved_raid_disk >= first &&
+ if (rdev->saved_raid_disk >= first &&
conf->mirrors[rdev->saved_raid_disk].rdev == NULL)
mirror = rdev->saved_raid_disk;
else

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[md PATCH 09/34] md/raid5: move common code into handle_stripe

am 21.07.2011 04:32:25 von NeilBrown

There is common code at the start of handle_stripe5 and
handle_stripe6. Move it into handle_stripe.

Signed-off-by: NeilBrown
---

drivers/md/raid5.c | 18 ++++++------------
1 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index f8275b5..dfb3d9f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3016,12 +3016,6 @@ static void handle_stripe5(struct stripe_head *sh)
atomic_read(&sh->count), sh->pd_idx, sh->check_state,
sh->reconstruct_state);

- if (test_and_clear_bit(STRIPE_SYNC_REQUESTED, &sh->state)) {
- set_bit(STRIPE_SYNCING, &sh->state);
- clear_bit(STRIPE_INSYNC, &sh->state);
- }
- clear_bit(STRIPE_DELAYED, &sh->state);
-
s.syncing = test_bit(STRIPE_SYNCING, &sh->state);
s.expanding = test_bit(STRIPE_EXPAND_SOURCE, &sh->state);
s.expanded = test_bit(STRIPE_EXPAND_READY, &sh->state);
@@ -3310,12 +3304,6 @@ static void handle_stripe6(struct stripe_head *sh)
sh->check_state, sh->reconstruct_state);
memset(&s, 0, sizeof(s));

- if (test_and_clear_bit(STRIPE_SYNC_REQUESTED, &sh->state)) {
- set_bit(STRIPE_SYNCING, &sh->state);
- clear_bit(STRIPE_INSYNC, &sh->state);
- }
- clear_bit(STRIPE_DELAYED, &sh->state);
-
s.syncing = test_bit(STRIPE_SYNCING, &sh->state);
s.expanding = test_bit(STRIPE_EXPAND_SOURCE, &sh->state);
s.expanded = test_bit(STRIPE_EXPAND_READY, &sh->state);
@@ -3607,6 +3595,12 @@ static void handle_stripe(struct stripe_head *sh)
return;
}

+ if (test_and_clear_bit(STRIPE_SYNC_REQUESTED, &sh->state)) {
+ set_bit(STRIPE_SYNCING, &sh->state);
+ clear_bit(STRIPE_INSYNC, &sh->state);
+ }
+ clear_bit(STRIPE_DELAYED, &sh->state);
+
if (sh->raid_conf->level == 6)
handle_stripe6(sh);
else

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[md PATCH 07/34] md/raid5: Protect some more code with ->device_lock.

am 21.07.2011 04:32:25 von NeilBrown

Other places that change or follow dev->towrite and dev->written take
the device_lock as well as the sh->lock.
So it should really be held in these places too.
Also, doing so will allow sh->lock to be discarded.

with merged fixes by: Namhyung Kim

Signed-off-by: NeilBrown
---

drivers/md/raid5.c | 30 ++++++++++++++++--------------
1 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index f2f2ab3..9985138 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1021,10 +1021,12 @@ ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx)
struct bio *wbi;

spin_lock(&sh->lock);
+ spin_lock_irq(&sh->raid_conf->device_lock);
chosen = dev->towrite;
dev->towrite = NULL;
BUG_ON(dev->written);
wbi = dev->written = chosen;
+ spin_unlock_irq(&sh->raid_conf->device_lock);
spin_unlock(&sh->lock);

while (wbi && wbi->bi_sector <
@@ -2141,7 +2143,7 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, in
raid5_conf_t *conf = sh->raid_conf;
int firstwrite=0;

- pr_debug("adding bh b#%llu to stripe s#%llu\n",
+ pr_debug("adding bi b#%llu to stripe s#%llu\n",
(unsigned long long)bi->bi_sector,
(unsigned long long)sh->sector);

@@ -2167,19 +2169,6 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, in
bi->bi_next = *bip;
*bip = bi;
bi->bi_phys_segments++;
- spin_unlock_irq(&conf->device_lock);
- spin_unlock(&sh->lock);
-
- pr_debug("added bi b#%llu to stripe s#%llu, disk %d.\n",
- (unsigned long long)bi->bi_sector,
- (unsigned long long)sh->sector, dd_idx);
-
- if (conf->mddev->bitmap && firstwrite) {
- bitmap_startwrite(conf->mddev->bitmap, sh->sector,
- STRIPE_SECTORS, 0);
- sh->bm_seq = conf->seq_flush+1;
- set_bit(STRIPE_BIT_DELAY, &sh->state);
- }

if (forwrite) {
/* check if page is covered */
@@ -2194,6 +2183,19 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, in
if (sector >= sh->dev[dd_idx].sector + STRIPE_SECTORS)
set_bit(R5_OVERWRITE, &sh->dev[dd_idx].flags);
}
+ spin_unlock_irq(&conf->device_lock);
+ spin_unlock(&sh->lock);
+
+ pr_debug("added bi b#%llu to stripe s#%llu, disk %d.\n",
+ (unsigned long long)(*bip)->bi_sector,
+ (unsigned long long)sh->sector, dd_idx);
+
+ if (conf->mddev->bitmap && firstwrite) {
+ bitmap_startwrite(conf->mddev->bitmap, sh->sector,
+ STRIPE_SECTORS, 0);
+ sh->bm_seq = conf->seq_flush+1;
+ set_bit(STRIPE_BIT_DELAY, &sh->state);
+ }
return 1;

overlap:

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[md PATCH 11/34] md/raid5: add some more fields to stripe_head_state

am 21.07.2011 04:32:25 von NeilBrown

Adding these three fields will allow more common code to be moved
to handle_stripe()

Signed-off-by: NeilBrown
---

drivers/md/raid5.c | 54 +++++++++++++++++++++++-----------------------------
drivers/md/raid5.h | 4 ++++
2 files changed, 28 insertions(+), 30 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c32ffb5..3327e82 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3003,12 +3003,9 @@ static void handle_stripe5(struct stripe_head *sh)
{
raid5_conf_t *conf = sh->raid_conf;
int disks = sh->disks, i;
- struct bio *return_bi = NULL;
struct stripe_head_state s;
struct r5dev *dev;
- mdk_rdev_t *blocked_rdev = NULL;
int prexor;
- int dec_preread_active = 0;

memset(&s, 0, sizeof(s));
pr_debug("handling stripe %llu, state=%#lx cnt=%d, pd_idx=%d check:%d "
@@ -3058,9 +3055,9 @@ static void handle_stripe5(struct stripe_head *sh)
if (dev->written)
s.written++;
rdev = rcu_dereference(conf->disks[i].rdev);
- if (blocked_rdev == NULL &&
+ if (s.blocked_rdev == NULL &&
rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
- blocked_rdev = rdev;
+ s.blocked_rdev = rdev;
atomic_inc(&rdev->nr_pending);
}
clear_bit(R5_Insync, &dev->flags);
@@ -3088,15 +3085,15 @@ static void handle_stripe5(struct stripe_head *sh)
spin_unlock_irq(&conf->device_lock);
rcu_read_unlock();

- if (unlikely(blocked_rdev)) {
+ if (unlikely(s.blocked_rdev)) {
if (s.syncing || s.expanding || s.expanded ||
s.to_write || s.written) {
set_bit(STRIPE_HANDLE, &sh->state);
goto unlock;
}
/* There is nothing for the blocked_rdev to block */
- rdev_dec_pending(blocked_rdev, conf->mddev);
- blocked_rdev = NULL;
+ rdev_dec_pending(s.blocked_rdev, conf->mddev);
+ s.blocked_rdev = NULL;
}

if (s.to_fill && !test_bit(STRIPE_BIOFILL_RUN, &sh->state)) {
@@ -3112,7 +3109,7 @@ static void handle_stripe5(struct stripe_head *sh)
* need to be failed
*/
if (s.failed > 1 && s.to_read+s.to_write+s.written)
- handle_failed_stripe(conf, sh, &s, disks, &return_bi);
+ handle_failed_stripe(conf, sh, &s, disks, &s.return_bi);
if (s.failed > 1 && s.syncing) {
md_done_sync(conf->mddev, STRIPE_SECTORS,0);
clear_bit(STRIPE_SYNCING, &sh->state);
@@ -3128,7 +3125,7 @@ static void handle_stripe5(struct stripe_head *sh)
!test_bit(R5_LOCKED, &dev->flags) &&
test_bit(R5_UPTODATE, &dev->flags)) ||
(s.failed == 1 && s.failed_num[0] == sh->pd_idx)))
- handle_stripe_clean_event(conf, sh, disks, &return_bi);
+ handle_stripe_clean_event(conf, sh, disks, &s.return_bi);

/* Now we might consider reading some blocks, either to check/generate
* parity, or to satisfy requests
@@ -3166,7 +3163,7 @@ static void handle_stripe5(struct stripe_head *sh)
}
}
if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
- dec_preread_active = 1;
+ s.dec_preread_active = 1;
}

/* Now to consider new write requests and what else, if anything
@@ -3264,15 +3261,15 @@ static void handle_stripe5(struct stripe_head *sh)
unlock:

/* wait for this device to become unblocked */
- if (unlikely(blocked_rdev))
- md_wait_for_blocked_rdev(blocked_rdev, conf->mddev);
+ if (unlikely(s.blocked_rdev))
+ md_wait_for_blocked_rdev(s.blocked_rdev, conf->mddev);

if (s.ops_request)
raid_run_ops(sh, s.ops_request);

ops_run_io(sh, &s);

- if (dec_preread_active) {
+ if (s.dec_preread_active) {
/* We delay this until after ops_run_io so that if make_request
* is waiting on a flush, it won't continue until the writes
* have actually been submitted.
@@ -3282,19 +3279,16 @@ static void handle_stripe5(struct stripe_head *sh)
IO_THRESHOLD)
md_wakeup_thread(conf->mddev->thread);
}
- return_io(return_bi);
+ return_io(s.return_bi);
}

static void handle_stripe6(struct stripe_head *sh)
{
raid5_conf_t *conf = sh->raid_conf;
int disks = sh->disks;
- struct bio *return_bi = NULL;
int i, pd_idx = sh->pd_idx, qd_idx = sh->qd_idx;
struct stripe_head_state s;
struct r5dev *dev, *pdev, *qdev;
- mdk_rdev_t *blocked_rdev = NULL;
- int dec_preread_active = 0;

pr_debug("handling stripe %llu, state=%#lx cnt=%d, "
"pd_idx=%d, qd_idx=%d\n, check:%d, reconstruct:%d\n",
@@ -3345,9 +3339,9 @@ static void handle_stripe6(struct stripe_head *sh)
if (dev->written)
s.written++;
rdev = rcu_dereference(conf->disks[i].rdev);
- if (blocked_rdev == NULL &&
+ if (s.blocked_rdev == NULL &&
rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
- blocked_rdev = rdev;
+ s.blocked_rdev = rdev;
atomic_inc(&rdev->nr_pending);
}
clear_bit(R5_Insync, &dev->flags);
@@ -3376,15 +3370,15 @@ static void handle_stripe6(struct stripe_head *sh)
spin_unlock_irq(&conf->device_lock);
rcu_read_unlock();

- if (unlikely(blocked_rdev)) {
+ if (unlikely(s.blocked_rdev)) {
if (s.syncing || s.expanding || s.expanded ||
s.to_write || s.written) {
set_bit(STRIPE_HANDLE, &sh->state);
goto unlock;
}
/* There is nothing for the blocked_rdev to block */
- rdev_dec_pending(blocked_rdev, conf->mddev);
- blocked_rdev = NULL;
+ rdev_dec_pending(s.blocked_rdev, conf->mddev);
+ s.blocked_rdev = NULL;
}

if (s.to_fill && !test_bit(STRIPE_BIOFILL_RUN, &sh->state)) {
@@ -3400,7 +3394,7 @@ static void handle_stripe6(struct stripe_head *sh)
* might need to be failed
*/
if (s.failed > 2 && s.to_read+s.to_write+s.written)
- handle_failed_stripe(conf, sh, &s, disks, &return_bi);
+ handle_failed_stripe(conf, sh, &s, disks, &s.return_bi);
if (s.failed > 2 && s.syncing) {
md_done_sync(conf->mddev, STRIPE_SECTORS,0);
clear_bit(STRIPE_SYNCING, &sh->state);
@@ -3425,7 +3419,7 @@ static void handle_stripe6(struct stripe_head *sh)
(s.q_failed || ((test_bit(R5_Insync, &qdev->flags)
&& !test_bit(R5_LOCKED, &qdev->flags)
&& test_bit(R5_UPTODATE, &qdev->flags)))))
- handle_stripe_clean_event(conf, sh, disks, &return_bi);
+ handle_stripe_clean_event(conf, sh, disks, &s.return_bi);

/* Now we might consider reading some blocks, either to check/generate
* parity, or to satisfy requests
@@ -3461,7 +3455,7 @@ static void handle_stripe6(struct stripe_head *sh)
}
}
if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
- dec_preread_active = 1;
+ s.dec_preread_active = 1;
}

/* Now to consider new write requests and what else, if anything
@@ -3561,8 +3555,8 @@ static void handle_stripe6(struct stripe_head *sh)
unlock:

/* wait for this device to become unblocked */
- if (unlikely(blocked_rdev))
- md_wait_for_blocked_rdev(blocked_rdev, conf->mddev);
+ if (unlikely(s.blocked_rdev))
+ md_wait_for_blocked_rdev(s.blocked_rdev, conf->mddev);

if (s.ops_request)
raid_run_ops(sh, s.ops_request);
@@ -3570,7 +3564,7 @@ static void handle_stripe6(struct stripe_head *sh)
ops_run_io(sh, &s);

- if (dec_preread_active) {
+ if (s.dec_preread_active) {
/* We delay this until after ops_run_io so that if make_request
* is waiting on a flush, it won't continue until the writes
* have actually been submitted.
@@ -3581,7 +3575,7 @@ static void handle_stripe6(struct stripe_head *sh)
md_wakeup_thread(conf->mddev->thread);
}

- return_io(return_bi);
+ return_io(s.return_bi);
}

static void handle_stripe(struct stripe_head *sh)
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index d3c61d3..9ceb574 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -248,6 +248,10 @@ struct stripe_head_state {
int failed_num[2];
unsigned long ops_request;
int p_failed, q_failed;
+
+ struct bio *return_bi;
+ mdk_rdev_t *blocked_rdev;
+ int dec_preread_active;
};

/* Flags */

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[md PATCH 05/34] md/raid5: get rid of duplicated call tobio_data_dir()

am 21.07.2011 04:32:25 von NeilBrown

From: Namhyung Kim

In raid5::make_request(), once bio_data_dir(@bi) is detected
it never (and couldn't) be changed. Use the result always.

Signed-off-by: Namhyung Kim
Signed-off-by: NeilBrown
---

drivers/md/raid5.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 0f71aa9..7148064 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4014,7 +4014,7 @@ static int make_request(mddev_t *mddev, struct bio * bi)
}
}

- if (bio_data_dir(bi) == WRITE &&
+ if (rw == WRITE &&
logical_sector >= mddev->suspend_lo &&
logical_sector < mddev->suspend_hi) {
release_stripe(sh);
@@ -4032,7 +4032,7 @@ static int make_request(mddev_t *mddev, struct bio * bi)
}

if (test_bit(STRIPE_EXPANDING, &sh->state) ||
- !add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
+ !add_stripe_bio(sh, bi, dd_idx, rw)) {
/* Stripe is busy expanding or
* add failed due to overlap. Flush everything
* and wait a while

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[md PATCH 08/34] md/raid5: replace sh->lock with an "active" flag.

am 21.07.2011 04:32:25 von NeilBrown

sh->lock is now mainly used to ensure that two threads aren't running
in the locked part of handle_stripe[56] at the same time.

That can more neatly be achieved with an 'active' flag which we set
while running handle_stripe. If we find the flag is set, we simply
requeue the stripe for later by setting STRIPE_HANDLE.

For safety we take ->device_lock while examining the state of the
stripe and creating a summary in 'stripe_head_state / r6_state'.
This possibly isn't needed but as shared fields like ->toread,
->towrite are checked it is safer for now at least.

We leave the label after the old 'unlock' called "unlock" because it
will disappear in a few patches, so renaming seems pointless.

This leaves the stripe 'locked' for longer as we clear STRIPE_ACTIVE
later, but that is not a problem.

Signed-off-by: NeilBrown
---

drivers/md/raid5.c | 26 +++++++++++++-------------
drivers/md/raid5.h | 2 +-
2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 9985138..f8275b5 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1020,14 +1020,12 @@ ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx)
if (test_and_clear_bit(R5_Wantdrain, &dev->flags)) {
struct bio *wbi;

- spin_lock(&sh->lock);
spin_lock_irq(&sh->raid_conf->device_lock);
chosen = dev->towrite;
dev->towrite = NULL;
BUG_ON(dev->written);
wbi = dev->written = chosen;
spin_unlock_irq(&sh->raid_conf->device_lock);
- spin_unlock(&sh->lock);

while (wbi && wbi->bi_sector <
dev->sector + STRIPE_SECTORS) {
@@ -1322,7 +1320,6 @@ static int grow_one_stripe(raid5_conf_t *conf)
return 0;

sh->raid_conf = conf;
- spin_lock_init(&sh->lock);
#ifdef CONFIG_MULTICORE_RAID456
init_waitqueue_head(&sh->ops.wait_for_ops);
#endif
@@ -1442,7 +1439,6 @@ static int resize_stripes(raid5_conf_t *conf, int newsize)
break;

nsh->raid_conf = conf;
- spin_lock_init(&nsh->lock);
#ifdef CONFIG_MULTICORE_RAID456
init_waitqueue_head(&nsh->ops.wait_for_ops);
#endif
@@ -2148,7 +2144,6 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, in
(unsigned long long)sh->sector);

- spin_lock(&sh->lock);
spin_lock_irq(&conf->device_lock);
if (forwrite) {
bip = &sh->dev[dd_idx].towrite;
@@ -2184,7 +2179,6 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, in
set_bit(R5_OVERWRITE, &sh->dev[dd_idx].flags);
}
spin_unlock_irq(&conf->device_lock);
- spin_unlock(&sh->lock);

pr_debug("added bi b#%llu to stripe s#%llu, disk %d.\n",
(unsigned long long)(*bip)->bi_sector,
@@ -2201,7 +2195,6 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, in
overlap:
set_bit(R5_Overlap, &sh->dev[dd_idx].flags);
spin_unlock_irq(&conf->device_lock);
- spin_unlock(&sh->lock);
return 0;
}

@@ -3023,12 +3016,10 @@ static void handle_stripe5(struct stripe_head *sh)
atomic_read(&sh->count), sh->pd_idx, sh->check_state,
sh->reconstruct_state);

- spin_lock(&sh->lock);
if (test_and_clear_bit(STRIPE_SYNC_REQUESTED, &sh->state)) {
set_bit(STRIPE_SYNCING, &sh->state);
clear_bit(STRIPE_INSYNC, &sh->state);
}
- clear_bit(STRIPE_HANDLE, &sh->state);
clear_bit(STRIPE_DELAYED, &sh->state);

s.syncing = test_bit(STRIPE_SYNCING, &sh->state);
@@ -3037,6 +3028,7 @@ static void handle_stripe5(struct stripe_head *sh)

/* Now to look around and see what can be done */
rcu_read_lock();
+ spin_lock_irq(&conf->device_lock);
for (i=disks; i--; ) {
mdk_rdev_t *rdev;

@@ -3099,6 +3091,7 @@ static void handle_stripe5(struct stripe_head *sh)
s.failed_num = i;
}
}
+ spin_unlock_irq(&conf->device_lock);
rcu_read_unlock();

if (unlikely(blocked_rdev)) {
@@ -3275,7 +3268,6 @@ static void handle_stripe5(struct stripe_head *sh)
handle_stripe_expansion(conf, sh, NULL);

unlock:
- spin_unlock(&sh->lock);

/* wait for this device to become unblocked */
if (unlikely(blocked_rdev))
@@ -3318,12 +3310,10 @@ static void handle_stripe6(struct stripe_head *sh)
sh->check_state, sh->reconstruct_state);
memset(&s, 0, sizeof(s));

- spin_lock(&sh->lock);
if (test_and_clear_bit(STRIPE_SYNC_REQUESTED, &sh->state)) {
set_bit(STRIPE_SYNCING, &sh->state);
clear_bit(STRIPE_INSYNC, &sh->state);
}
- clear_bit(STRIPE_HANDLE, &sh->state);
clear_bit(STRIPE_DELAYED, &sh->state);

s.syncing = test_bit(STRIPE_SYNCING, &sh->state);
@@ -3332,6 +3322,7 @@ static void handle_stripe6(struct stripe_head *sh)
/* Now to look around and see what can be done */

rcu_read_lock();
+ spin_lock_irq(&conf->device_lock);
for (i=disks; i--; ) {
mdk_rdev_t *rdev;
dev = &sh->dev[i];
@@ -3395,6 +3386,7 @@ static void handle_stripe6(struct stripe_head *sh)
s.failed++;
}
}
+ spin_unlock_irq(&conf->device_lock);
rcu_read_unlock();

if (unlikely(blocked_rdev)) {
@@ -3580,7 +3572,6 @@ static void handle_stripe6(struct stripe_head *sh)
handle_stripe_expansion(conf, sh, &r6s);

unlock:
- spin_unlock(&sh->lock);

/* wait for this device to become unblocked */
if (unlikely(blocked_rdev))
@@ -3608,10 +3599,19 @@ static void handle_stripe6(struct stripe_head *sh)

static void handle_stripe(struct stripe_head *sh)
{
+ clear_bit(STRIPE_HANDLE, &sh->state);
+ if (test_and_set_bit(STRIPE_ACTIVE, &sh->state)) {
+ /* already being handled, ensure it gets handled
+ * again when current action finishes */
+ set_bit(STRIPE_HANDLE, &sh->state);
+ return;
+ }
+
if (sh->raid_conf->level == 6)
handle_stripe6(sh);
else
handle_stripe5(sh);
+ clear_bit(STRIPE_ACTIVE, &sh->state);
}

static void raid5_activate_delayed(raid5_conf_t *conf)
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index a330011..217a9d4 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -209,7 +209,6 @@ struct stripe_head {
short ddf_layout;/* use DDF ordering to calculate Q */
unsigned long state; /* state flags */
atomic_t count; /* nr of active thread/requests */
- spinlock_t lock;
int bm_seq; /* sequence number for bitmap flushes */
int disks; /* disks in stripe */
enum check_states check_state;
@@ -290,6 +289,7 @@ struct r6_state {
* Stripe state
*/
enum {
+ STRIPE_ACTIVE,
STRIPE_HANDLE,
STRIPE_SYNC_REQUESTED,
STRIPE_SYNCING,

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html