Re: myisamchk error (duplicate key records)
am 19.09.2011 13:19:52 von Johan De Meersman
----- Original Message -----
> From: "Hank"
>
> While running a -rq on a large table, I got the following error:
>
> myisamchk: warning: Duplicate key for record at 54381140 against
> record at 54380810
>
> How do I find which records are duplicated (without doing the typical
> self-join or "having cnt(*)>1" query)? This table has 144 million
> rows, so that's not really feasible.
Given that the error is a duplicate *key*, "select from group by having count() > 1" is an index-covered query, and should thus be perfectly feasible :-)
What I'm not so sure about, is wether the duplicate key will show up correctly in the index - as that index may be marked corrupt - and so, if it falls back to a full tablescan, it's indeed going to take a long time. If it does, however, there's no other option anyway: the only way to do it fast is an index, and that index is untrustworthy.
--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org
Re: myisamchk error (duplicate key records)
am 19.09.2011 14:31:11 von Hank
--0016364d23efad048004ad4a84a6
Content-Type: text/plain; charset=ISO-8859-1
On Mon, Sep 19, 2011 at 7:19 AM, Johan De Meersman wrote:
> ----- Original Message -----
> > From: "Hank"
> >
> > While running a -rq on a large table, I got the following error:
> >
> > myisamchk: warning: Duplicate key for record at 54381140 against
> > record at 54380810
> >
> > How do I find which records are duplicated (without doing the typical
> > self-join or "having cnt(*)>1" query)? This table has 144 million
> > rows, so that's not really feasible.
>
> Given that the error is a duplicate *key*, "select from group
> by having count() > 1" is an index-covered query, and should thus
> be perfectly feasible :-)
>
> What I'm not so sure about, is wether the duplicate key will show up
> correctly in the index - as that index may be marked corrupt - and so, if it
> falls back to a full tablescan, it's indeed going to take a long time. If it
> does, however, there's no other option anyway: the only way to do it fast is
> an index, and that index is untrustworthy.
Exactly - I can't create an index on the table until I remove the duplicate
records. I suppose I could create a non-unique index on the key fields, and
try that, but then I'd have to (1) create that index, and then (2) do the
full table scan query. Either way, it's going to take a tremendous amount
of time to do that.
Alternatively, it would be most helpful if the tools provided that find and
output the offending record block #s also provided a quick way to actually
print out those offending rows so I could track down how they got in there
in the first place.
-Hank
--0016364d23efad048004ad4a84a6--
Re: myisamchk error (duplicate key records)
am 19.09.2011 14:56:01 von Johan De Meersman
----- Original Message -----
> From: "Hank"
>
> Exactly - I can't create an index on the table until I remove the
> duplicate records.
I was under the impression you were seeing this during a myisamchk run - which indicates you should *already* have a key on that field. Or am I interpreting that wrong?
> I suppose I could create a non-unique index on the key
> fields, and try that, but then I'd have to (1) create that index, and then (2) do
> the full table scan query.
No, if you create the index, you will not do a full tablescan afterwards. That's the whole point of an index :-)
> Either way, it's going to take a tremendous amount of time to do that.
Yes, building the index is, that's true.
> Alternatively, it would be most helpful if the tools provided that
> find and output the offending record block #s also provided a quick way to
> actually print out those offending rows so I could track down how they got in
> there in the first place.
I don't know about that, I'm afraid.
--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org
Re: myisamchk error (duplicate key records)
am 19.09.2011 16:47:05 von Hank
--001485f6cd3eb005d704ad4c6a0f
Content-Type: text/plain; charset=ISO-8859-1
>
> > Exactly - I can't create an index on the table until I remove the
> > duplicate records.
>
> I was under the impression you were seeing this during a myisamchk run -
> which indicates you should *already* have a key on that field. Or am I
> interpreting that wrong?
>
>
I'm trying to rebuild an index after disabling all keys using myisamchk and
adding all 144 million records, so there is no current index on the table.
> I suppose I could create a non-unique index on the key
> > fields, and try that, but then I'd have to (1) create that index, and
> then (2) do
> > the full table scan query.
>
> No, if you create the index, you will not do a full tablescan afterwards.
> That's the whole point of an index :-)
>
>
But in order to create the index, mysql has to do a full table scan AND a
sort, which for 144 million records, is going to take a very long time. So
an un-indexed full table scan without an index (i.e ..."HAVING count(*)>1")
will actually take less time.
So like I said, it would be intuitive and helpful if the tool finding the
dup records actually provided enough info to view, find, and fix them so I
don't have to index, sort, or table scan the table a second time.
--001485f6cd3eb005d704ad4c6a0f--
Re: myisamchk error (duplicate key records)
am 19.09.2011 18:34:44 von Johan De Meersman
----- Original Message -----
> From: "Hank"
>
> I'm trying to rebuild an index after disabling all keys using
> myisamchk and adding all 144 million records, so there is no current index on the
> table.
Ahhh... I didn't realise that.
> But in order to create the index, mysql has to do a full table scan
> AND a sort, which for 144 million records, is going to take a very long
> time. So an un-indexed full table scan without an index (i.e ..."HAVING
> count(*)>1") will actually take less time.
>
> So like I said, it would be intuitive and helpful if the tool finding
> the dup records actually provided enough info to view, find, and fix them
> so I don't have to index, sort, or table scan the table a second time.
Hah, yes. I have to say I'm not aware of anything that does that, maybe one of the Percona guys knows something.
--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org