How to Shuffle data

How to Shuffle data

am 13.07.2011 12:40:55 von Adarsh Sharma

Dear all,

I have million of sites stored in url column of a mysql table.

Some few examples are :-

www.facebook.com/home
adelaide.yourguide.com/news/local/news/entertainment/cd-revi ew-day-and-age-the-killers/1401702.aspx
abclive.in/abclive_business/2393.html
abclive.in/abclive_business/assocham_manufacturing_companies .html
abclive.in/abclive_business/b-ramalinga-raju-satyam-financia l-irregularities.html
aktualne.centrum.cz/report/krimi/clanek.phtml?id=635342
aktualne.centrum.cz/report/krimi/clanek.phtml?id=635306

I want to take the output in a tsv file the sites url in the below forms :

com.faebook.com/home
com.yourguide.adelaide/news/local/news/entertainment/cd-revi ew-day-and-age-the-killers/1401702.aspx
in.abclive/abclive_business/2393.html
in.abclive/abclive_business/assocham_manufacturing_companies .html
in.abclive/abclive_business/b-ramalinga-raju-satyam-financia l-irregularities.html
cz.centrum.aktualne/report/krimi/clanek.phtml?id=635306
cz.centrum.aktualne/report/krimi/clanek.phtml?id=635342

I need to shuffle the . words . Is there any in built function in mysql
to achieve this.

Thanks


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: How to Shuffle data

am 13.07.2011 13:04:25 von Reindl Harald

--------------enig498CBC0032390A3FEFFEE811
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable



Am 13.07.2011 12:40, schrieb Adarsh Sharma:
> I need to shuffle the . words . Is there any in built function in mysql=
to achieve this.
>=20
> Thanks

do not use any random-functions of mysql even if they exists
http://bugs.mysql.com/bug.php?id=3D59253



--------------enig498CBC0032390A3FEFFEE811
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk4de7kACgkQhmBjz394Anm+/wCeIBUntcww6LaHBdRtwEeA jzV0
8v0AniNcCJt98zkl/+K+HD56TmnxBcuG
=X+zm
-----END PGP SIGNATURE-----

--------------enig498CBC0032390A3FEFFEE811--

Re: How to Shuffle data

am 13.07.2011 15:55:42 von walter

What you want to do is not shuffle, but reverse the domain order, starting
from the TLD (top level domain). You may pick the string functions you
need from the string functions page of the manual, but the problem is that
there may be any number of subdomains, so that in principle you would need
a loop construct. If I were in your position, I wouldn't use MySQL to do
this.
ciao
Walter

> Dear all,
>
> I have million of sites stored in url column of a mysql table.
>
> Some few examples are :-
>
> www.facebook.com/home
> adelaide.yourguide.com/news/local/news/entertainment/cd-revi ew-day-and-age-the-killers/1401702.aspx
> abclive.in/abclive_business/2393.html
> abclive.in/abclive_business/assocham_manufacturing_companies .html
> abclive.in/abclive_business/b-ramalinga-raju-satyam-financia l-irregularities.html
> aktualne.centrum.cz/report/krimi/clanek.phtml?id=635342
> aktualne.centrum.cz/report/krimi/clanek.phtml?id=635306
>
> I want to take the output in a tsv file the sites url in the below forms :
>
> com.faebook.com/home
> com.yourguide.adelaide/news/local/news/entertainment/cd-revi ew-day-and-age-the-killers/1401702.aspx
> in.abclive/abclive_business/2393.html
> in.abclive/abclive_business/assocham_manufacturing_companies .html
> in.abclive/abclive_business/b-ramalinga-raju-satyam-financia l-irregularities.html
> cz.centrum.aktualne/report/krimi/clanek.phtml?id=635306
> cz.centrum.aktualne/report/krimi/clanek.phtml?id=635342
>
> I need to shuffle the . words . Is there any in built function in mysql
> to achieve this.
>
> Thanks
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:
> http://lists.mysql.com/mysql?unsub=walter@waltertross.com
>



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: How to Shuffle data

am 13.07.2011 15:58:59 von Adarsh Sharma

--------------000805080203010303060807
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Do i need to move output to a CSV file & then write a Java or C code for it.

I think a procedure can do it easily , If I know the proper function for
that.

Thanks

walter@waltertross.com wrote:
> What you want to do is not shuffle, but reverse the domain order, starting
> from the TLD (top level domain). You may pick the string functions you
> need from the string functions page of the manual, but the problem is that
> there may be any number of subdomains, so that in principle you would need
> a loop construct. If I were in your position, I wouldn't use MySQL to do
> this.
> ciao
> Walter
>
>
>> Dear all,
>>
>> I have million of sites stored in url column of a mysql table.
>>
>> Some few examples are :-
>>
>> www.facebook.com/home
>> adelaide.yourguide.com/news/local/news/entertainment/cd-revi ew-day-and-age-the-killers/1401702.aspx
>> abclive.in/abclive_business/2393.html
>> abclive.in/abclive_business/assocham_manufacturing_companies .html
>> abclive.in/abclive_business/b-ramalinga-raju-satyam-financia l-irregularities.html
>> aktualne.centrum.cz/report/krimi/clanek.phtml?id=635342
>> aktualne.centrum.cz/report/krimi/clanek.phtml?id=635306
>>
>> I want to take the output in a tsv file the sites url in the below forms :
>>
>> com.faebook.com/home
>> com.yourguide.adelaide/news/local/news/entertainment/cd-revi ew-day-and-age-the-killers/1401702.aspx
>> in.abclive/abclive_business/2393.html
>> in.abclive/abclive_business/assocham_manufacturing_companies .html
>> in.abclive/abclive_business/b-ramalinga-raju-satyam-financia l-irregularities.html
>> cz.centrum.aktualne/report/krimi/clanek.phtml?id=635306
>> cz.centrum.aktualne/report/krimi/clanek.phtml?id=635342
>>
>> I need to shuffle the . words . Is there any in built function in mysql
>> to achieve this.
>>
>> Thanks
>>
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe:
>> http://lists.mysql.com/mysql?unsub=walter@waltertross.com
>>
>>
>
>
>
>


--------------000805080203010303060807--

Re: How to Shuffle data

am 13.07.2011 23:03:10 von (Halász Sándor) hsv

>>>> 2011/07/13 16:10 +0530, Adarsh Sharma >>>>
www.facebook.com/home
adelaide.yourguide.com/news/local/news/entertainment/cd-revi ew-day-and-age-the-killers/1401702.aspx
abclive.in/abclive_business/2393.html
abclive.in/abclive_business/assocham_manufacturing_companies .html
abclive.in/abclive_business/b-ramalinga-raju-satyam-financia l-irregularities.html
aktualne.centrum.cz/report/krimi/clanek.phtml?id=635342
aktualne.centrum.cz/report/krimi/clanek.phtml?id=635306

I want to take the output in a tsv file the sites url in the below forms :

com.faebook.com/home
com.yourguide.adelaide/news/local/news/entertainment/cd-revi ew-day-and-age-the-killers/1401702.aspx
in.abclive/abclive_business/2393.html
in.abclive/abclive_business/assocham_manufacturing_companies .html
in.abclive/abclive_business/b-ramalinga-raju-satyam-financia l-irregularities.html
cz.centrum.aktualne/report/krimi/clanek.phtml?id=635306
cz.centrum.aktualne/report/krimi/clanek.phtml?id=635342

I need to shuffle the . words . Is there any in built function in mysql to achieve this.
<<<<<<<<


Well, this will give you the domain name: SUBSTRING_INDEX(url, '/', 1). After that, you reallie want a version of "FIND_IN_SET" that takes a number and yields a string, but I have not seen such in MySQL. That leaves you with "LOCATE" to find each dot, one by one, and "SUBSTRING" to pick each word out--or nested cases of "SUBSTRING_INDEX":
SUBSTRING_INDEX(SUBSTRING_INDEX(dom, '.', i), '.', -1)
--and "SUBSTRING_INDEX" is very obliging, the only way, using only it, to determine that one has reached the limit of separators is that
SUBSTRING_INDEX(dom, ',', i) = SUBSTRING_INDEX(dom, '.', i+1)
.. And yes, this is a loop within an SQL procedure or function.

Are you, aside from 'com.faebook.com', only reversing the words? That is much easier than randomly picking them for the outcome--and guaranteed to be different from the original, relevant because most domain names are so short that a random permutation of their words is quite likly to be the same as the original: with only three, the probability is one sixth, with only twain, one half.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: How to Shuffle data

am 14.07.2011 00:18:13 von Shawn Wilson

--90e6ba53a4a8d32acb04a7faca0a
Content-Type: text/plain; charset=ISO-8859-1

On Jul 13, 2011 6:41 AM, "Adarsh Sharma" wrote:
>
> Dear all,
>
> I have million of sites stored in url column of a mysql table.
>

> I need to shuffle the . words . Is there any in built function in mysql to
achieve this.
>

Why would you want to do this in mysql? What's your programming language of
choice? Do you have a favorite db api and csv library for that language?

Then its just something like:
(+\.^\.){3}(/*$)
And do something with them. (warning, I just attempted at regex on a phone)

--90e6ba53a4a8d32acb04a7faca0a--

Re: How to Shuffle data

am 14.07.2011 02:34:54 von rajlist

In infinite wisdom Adarsh Sharma wrote:

> Dear all,
>
> I have million of sites stored in url column of a mysql table.
>
> Some few examples are :-
>
> www.facebook.com/home
> I want to take the output in a tsv file the sites url in the below forms :
>
> com.faebook.com/home
> com.yourguide.adelaide/news/local/news/entertainment/cd-revi ew-day-and-age-the-killers/1401702.aspx
> I need to shuffle the . words . Is there any in built function in mysql
> to achieve this.

Dealing with urls is a non trivial problem. Have a look at google-url
library for a robust solution.


--
Raj Shekhar
-
Tonight at 11 - DOOM!
-
Read the latest at my blog: "a bag of tips"



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

RE: How to Shuffle data

am 14.07.2011 16:50:12 von Jerry Schwartz

There are a couple of problems with using any database for doing this.

- Rows in a table are inherently in no particular order. That means they are
neither sorted nor random.
- Depending upon the keys you are using (an auto-increment field for example),
you might be able to select a random row; but that wouldn't guarantee that you
wouldn't get the same row twice.

There is a way around this that sounds tempting, but it would be hideously
impractical:

1. Create a copy of your original table, `t2`, with an auto-increment index
`aui` (either from the original table or created at this point.
2. Create yet another table, `t3`, with the same structure but no data in it.
3. Select a random record from `t2` using
SELECT ROUND(RAND()*MAX(`t2`.`aui`)) FROM `t2` INTO @RANDREC;
4. Copy that randomly-selected record into `t3` using
INSERT INTO `t3` SELECT * FROM `t2` WHERE `t2`.`aui` = @RANDREC;
5. Remove the selected record from `t2` using
DELETE FROM `t2` WHERE `t2`.`aui` = @RANDREC;
6. Loop back to step 3 until `t2` is empty.

Here's why, tempting as it is, it wouldn't work: as the number of records in
`t2` dwindles, it will become less and less likely that @RANDREC will actually
match an existing value of `t2`.`aui`. By the time you got down to a handful
of records, it might take years to find one.

Now if the LIMIT clause accepted variables instead of constants, you could
replace steps 3 and 4 with something like this:

SELECT ROUND(RAND()*COUNT(`t2`.`aui`)) INTO @RANDSTART;
SELECT FROM `t2` LIMIT @RANDSTART, 1;

And that would actually work (give or take some corner cases I haven't thought
about very hard).

In point of fact, that's basically what you need to do whether you do it in
MySQL or in an external program. The difference is that in an external program
you can effectively renumber your rows as you remove them, so that it won't
get harder and harder to find an actual row. Even then, a random shuffle is
likely to be time-consuming.

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: jerry@gii.co.jp
Web site: www.the-infoshop.com

>-----Original Message-----
>From: Adarsh Sharma [mailto:adarsh.sharma@orkash.com]
>Sent: Wednesday, July 13, 2011 9:59 AM
>To: walter@waltertross.com
>Cc: mysql@lists.mysql.com
>Subject: Re: How to Shuffle data
>
>Do i need to move output to a CSV file & then write a Java or C code for it.
>
>I think a procedure can do it easily , If I know the proper function for
>that.
>
>Thanks
>
>walter@waltertross.com wrote:
>> What you want to do is not shuffle, but reverse the domain order, starting
>> from the TLD (top level domain). You may pick the string functions you
>> need from the string functions page of the manual, but the problem is that
>> there may be any number of subdomains, so that in principle you would need
>> a loop construct. If I were in your position, I wouldn't use MySQL to do
>> this.
>> ciao
>> Walter
>>
>>
>>> Dear all,
>>>
>>> I have million of sites stored in url column of a mysql table.
>>>
>>> Some few examples are :-
>>>
>>> www.facebook.com/home
>>> adelaide.yourguide.com/news/local/news/entertainment/cd-revi ew-day-and-age-
>the-killers/1401702.aspx
>>> abclive.in/abclive_business/2393.html
>>> abclive.in/abclive_business/assocham_manufacturing_companies .html
>>> abclive.in/abclive_business/b-ramalinga-raju-satyam-financia l-
>irregularities.html
>>> aktualne.centrum.cz/report/krimi/clanek.phtml?id=635342
>>> aktualne.centrum.cz/report/krimi/clanek.phtml?id=635306
>>>
>>> I want to take the output in a tsv file the sites url in the below forms :
>>>
>>> com.faebook.com/home
>>> com.yourguide.adelaide/news/local/news/entertainment/cd-revi ew-day-and-age-
>the-killers/1401702.aspx
>>> in.abclive/abclive_business/2393.html
>>> in.abclive/abclive_business/assocham_manufacturing_companies .html
>>> in.abclive/abclive_business/b-ramalinga-raju-satyam-financia l-
>irregularities.html
>>> cz.centrum.aktualne/report/krimi/clanek.phtml?id=635306
>>> cz.centrum.aktualne/report/krimi/clanek.phtml?id=635342
>>>
>>> I need to shuffle the . words . Is there any in built function in mysql
>>> to achieve this.
>>>
>>> Thanks
>>>
>>>
>>> --
>>> MySQL General Mailing List
>>> For list archives: http://lists.mysql.com/mysql
>>> To unsubscribe:
>>> http://lists.mysql.com/mysql?unsub=walter@waltertross.com
>>>
>>>
>>
>>
>>
>>





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: How to Shuffle data

am 14.07.2011 17:42:29 von Shawn Wilson

On Thu, Jul 14, 2011 at 10:50, Jerry Schwartz wrote:
> There are a couple of problems with using any database for doing this.
>

you're probably right. that said....

> - Rows in a table are inherently in no particular order. That means they =
are
> neither sorted nor random.
> - Depending upon the keys you are using (an auto-increment field for exam=
ple),
> you might be able to select a random row; but that wouldn't guarantee tha=
t you
> wouldn't get the same row twice.
>
> There is a way around this that sounds tempting, but it would be hideousl=
y
> impractical:
>

this actually doesn't sound very temping to me :)

> 1. Create a copy of your original table, `t2`, with an auto-increment ind=
ex
> `aui` (either from the original table or created at this point.
> 2. Create yet another table, `t3`, with the same structure but no data in=
it.
> 3. Select a random record from `t2` using
> =A0 SELECT ROUND(RAND()*MAX(`t2`.`aui`)) FROM `t2` INTO @RANDREC;
> 4. Copy that randomly-selected record into `t3` using
> =A0 INSERT INTO `t3` SELECT * FROM `t2` WHERE `t2`.`aui` =3D @RANDREC;
> 5. Remove the selected record from `t2` using
> =A0 DELETE FROM `t2` WHERE `t2`.`aui` =3D @RANDREC;
> 6. Loop back to step 3 until `t2` is empty.
>
> Here's why, tempting as it is, it wouldn't work: as the number of records=
in
> `t2` dwindles, it will become less and less likely that @RANDREC will act=
ually
> match an existing value of `t2`.`aui`. By the time you got down to a hand=
ful
> of records, it might take years to find one.
>
> Now if the LIMIT clause accepted variables instead of constants, you coul=
d
> replace steps 3 and 4 with something like this:
>
> =A0 SELECT ROUND(RAND()*COUNT(`t2`.`aui`)) INTO @RANDSTART;
> =A0 SELECT FROM `t2` LIMIT @RANDSTART, 1;
>
> And that would actually work (give or take some corner cases I haven't th=
ought
> about very hard).
>

i've actually done find_in_string (or whatever it was under sql
server) and then a substr -1 to get all of the titles where pages were
stored in a table. that said, it took me a few hours to figure it out,
debug, and generate /something/ that looked right. it would've been
hella easier to just write a program that did a: select data from
table
while( data ) {
new_data =3D regex of data
print new_data
}

(i wrote that in pseudo code but i'd do 10 lines in perl and probably
have it done in 10 minutes)

> In point of fact, that's basically what you need to do whether you do it =
in
> MySQL or in an external program. The difference is that in an external pr=
ogram
> you can effectively renumber your rows as you remove them, so that it won=
't
> get harder and harder to find an actual row. Even then, a random shuffle =
is
> likely to be time-consuming.
>
> Regards,
>
> Jerry Schwartz
> Global Information Incorporated
> 195 Farmington Ave.
> Farmington, CT 06032
>
> 860.674.8796=A0/ FAX: 860.674.8341
> E-mail: jerry@gii.co.jp
> Web site: www.the-infoshop.com
>
>>-----Original Message-----
>>From: Adarsh Sharma [mailto:adarsh.sharma@orkash.com]
>>Sent: Wednesday, July 13, 2011 9:59 AM
>>To: walter@waltertross.com
>>Cc: mysql@lists.mysql.com
>>Subject: Re: How to Shuffle data
>>
>>Do i need to move output to a CSV file & then write a Java or C code for =
it.

if you want, you can also select your pk from your data and write the
pk and data out in a new table. i've never done java and never looked
at any c db api's, so ymmv. but yes, do it externally.

>>
>>I think a procedure can do it easily , If I know the proper function for
>>that.

if you need this done on the fly, an sp will be what you'll want (or
memcache and an event handler but i hate plugging more db's into a
project until i'm killing my current setup). however, an sp to do this
will make your db slower. so, if you don't need data on the fly, write
a program and a cron (or at) job to run when you sleep and be done
with it.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=3Dgcdmg-mysql-2@m.gmane.o rg

Re: How to Shuffle data

am 15.07.2011 02:07:43 von (Halász Sándor) hsv

>>>> 2011/07/13 19:28 +0530, Adarsh Sharma >>>>
I think a procedure can do it easily , If I know the proper function for that.
<<<<<<<<
Well, here is a procedure that copies from one table, strips off the leading slash-separated part and reverses it by dots, and inserts the original, the reversed, and the trailing into a temporary table:

CREATE PROCEDURE URLR()
MODIFIES SQL DATA
BEGIN
DECLARE F, G INT;
DECLARE R, S, T VARCHAR(199);
DECLARE EOF BOOL DEFAULT 0;
DECLARE X CURSOR FOR SELECT URL FROM URL;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET EOF = 1;
CREATE TEMPORARY TABLE UU
( A VARCHAR(199) COMMENT 'original leading part'
, B VARCHAR(199) COMMENT 'dot-reversed leading part'
, C VARCHAR(199) COMMENT 'trailing part'
);

OPEN X;
FETCH X INTO S;
WHILE EOF = 0 DO
SET T = SUBSTRING_INDEX(S, '/', 1);
SET S = SUBSTRING(S, CHAR_LENGTH(T)+2);
SET G = 1, F = LOCATE('.', T), R = NULL;
WHILE F > 0 DO
SET R = CONCAT_WS('.', SUBSTRING(T, G, F-G), R);
SET G = F+1;
SET F = LOCATE('.', T, G);
END WHILE;
SET R = CONCAT_WS('.', SUBSTRING(T, G), R);
INSERT INTO UU VALUE (T, R, S);
FETCH X INTO S;
END WHILE;
CLOSE X;
END

It seemed to me that in your examples you only reversed the domain name around its dot-separated words, and that this does.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org