Sorting an extremely LARGE file
Sorting an extremely LARGE file
am 07.08.2011 17:28:14 von Ramprasad Prasad
--000e0cd29ba4aec25904a9ebfa45
Content-Type: text/plain; charset=ISO-8859-1
I have a file that contains records of customer interaction
The first column of the file is the batch number(INT) , and other columns
are date time , close time etc etc
I have to sort the entire file in order of the first column .. but the
problem is that the file is extremely huge.
For the largest customer it contains 1100 million records and the file is
44GB !
how can I sort this big a file
--
Thanks
Ram
n
--000e0cd29ba4aec25904a9ebfa45--
Re: Sorting an extremely LARGE file
am 07.08.2011 17:42:54 von Shawn H Corey
On 11-08-07 11:28 AM, Ramprasad Prasad wrote:
> I have a file that contains records of customer interaction
> The first column of the file is the batch number(INT) , and other columns
> are date time , close time etc etc
>
> I have to sort the entire file in order of the first column .. but the
> problem is that the file is extremely huge.
>
> For the largest customer it contains 1100 million records and the file is
> 44GB !
> how can I sort this big a file
>
First, consider putting it in a database.
Split the file into little ones, sort them, merge-sort them back together.
--
Just my 0.00000002 million dollars worth,
Shawn
Confusion is the first step of understanding.
Programming is as much about organization and communication
as it is about coding.
The secret to great software: Fail early & often.
Eliminate software piracy: use only FLOSS.
"Make something worthwhile." -- Dear Hunter
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
Re: Sorting an extremely LARGE file
am 07.08.2011 17:54:05 von Shawn H Corey
On 11-08-07 11:46 AM, Ramprasad Prasad wrote:
> I used a mysql database , but the order by clause used to hang the
> process indefinitely
> If I sort files in smaller chunks how can I merge them back ??
>
Please use "Reply All" when responding to a message on this list.
You need two temporary files and lots of disk space.
1. Open the first and second sorted files.
2. Read one record from each.
3. Write the lesser record to the first temporary file.
4. Read another record from the file where you got the record you wrote.
5. If not eof, goto 3.
6. Write the remaining of the other file to the end of the temporary file.
Repeat the above with the first temporary file and the third sorted
file, writing the result to the second temporary file.
Repeat the above with the second temporary file and the fourth sorted
file, writing the result to the first temporary file.
And so on...
Rename the final temporary file to your sorted file name.
--
Just my 0.00000002 million dollars worth,
Shawn
Confusion is the first step of understanding.
Programming is as much about organization and communication
as it is about coding.
The secret to great software: Fail early & often.
Eliminate software piracy: use only FLOSS.
"Make something worthwhile." -- Dear Hunter
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
Re: Sorting an extremely LARGE file
am 07.08.2011 18:01:17 von Ramprasad Prasad
--000e0cd23eeedd0f7a04a9ec70d9
Content-Type: text/plain; charset=ISO-8859-1
On 7 August 2011 21:24, Shawn H Corey wrote:
> On 11-08-07 11:46 AM, Ramprasad Prasad wrote:
>
>> I used a mysql database , but the order by clause used to hang the
>> process indefinitely
>> If I sort files in smaller chunks how can I merge them back ??
>>
>>
> Please use "Reply All" when responding to a message on this list.
>
> You need two temporary files and lots of disk space.
>
> 1. Open the first and second sorted files.
> 2. Read one record from each.
> 3. Write the lesser record to the first temporary file.
> 4. Read another record from the file where you got the record you wrote.
> 5. If not eof, goto 3.
> 6. Write the remaining of the other file to the end of the temporary file.
>
> Repeat the above with the first temporary file and the third sorted file,
> writing the result to the second temporary file.
>
> Repeat the above with the second temporary file and the fourth sorted file,
> writing the result to the first temporary file.
>
> And so on...
>
> Rename the final temporary file to your sorted file name.
>
>
>
There would be a CPAN module already doing this ??
--000e0cd23eeedd0f7a04a9ec70d9--
Re: Sorting an extremely LARGE file
am 07.08.2011 18:26:55 von rvtol+usenet
On 2011-08-07 17:28, Ramprasad Prasad wrote:
> I have a file that contains records of customer interaction
> The first column of the file is the batch number(INT) , and other columns
> are date time , close time etc etc
>
> I have to sort the entire file in order of the first column .. but the
> problem is that the file is extremely huge.
>
> For the largest customer it contains 1100 million records and the file is
> 44GB !
> how can I sort this big a file
I would use MySQL.
An alternative is the Linux sort executable.
To split up the file, as Shawn suggested, you could use Perl.
Split for example based on a few initial characters.
Then sort each file independently, and concat them.
(BTW, are the rows representing fixed-width records?)
Using a database is fine for this. I think you must have been using it
wrongly.
--
Ruud
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
Re: Sorting an extremely LARGE file
am 07.08.2011 20:12:05 von Rajeev Prasad
--0-1136577856-1312740725=:51537
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
hi, you can try this: first get only that field (sed/awk/perl) whihc you wa=
nt to sort on in a file. sort that file which i assume would be lot less in=
size then your current file/table. then run a loop on the main file using =
sorted file as variable.
=0Ahere is the logic in shell:
=0Aawk '{=
print $}' > tmp-file
=0Asort
file>
=0Afor id in `cat `;do grep $id >=
> sorted-large-file;done
From: Ramprasad Prasad
m>=0ATo: Shawn H Corey =0ACc: Perl Beginners
ers@perl.org>=0ASent: Sunday, August 7, 2011 11:01 AM=0ASubject: Re: Sortin=
g an extremely LARGE file
On 7 August 2011 21:24, Shawn H Corey
hcorey@gmail.com> wrote:
> On 11-08-07 11:46 AM, Ramprasad Prasad wrot=
e:=0A>=0A>> I used a mysql database , but the order by clause used to hang =
the=0A>> process indefinitely=0A>> If I sort files in smaller chunks how ca=
n I merge them back ??=0A>>=0A>>=0A> Please use "Reply All" when responding=
to a message on this list.=0A>=0A> You need two temporary files and lots o=
f disk space.=0A>=0A> 1. Open the first and second sorted files.=0A> 2. Rea=
d one record from each.=0A> 3. Write the lesser record to the first tempora=
ry file.=0A> 4. Read another record from the file where you got the record =
you wrote.=0A> 5. If not eof, goto 3.=0A> 6. Write the remaining of the oth=
er file to the end of the temporary file.=0A>=0A> Repeat the above with the=
first temporary file and the third sorted file,=0A> writing the result to =
the second temporary file.=0A>=0A> Repeat the above with the second tempora=
ry file and the fourth sorted file,=0A> writing the result to the first tem=
porary file.=0A>=0A> And so on...=0A>=0A> Rename the final temporary file t=
o your sorted file name.=0A>=0A>=0A>=0AThere would be a CPAN module already=
doing this ??
--0-1136577856-1312740725=:51537--
Re: Sorting an extremely LARGE file
am 07.08.2011 20:14:40 von Paul Johnson
On Sun, Aug 07, 2011 at 08:58:14PM +0530, Ramprasad Prasad wrote:
> I have a file that contains records of customer interaction
> The first column of the file is the batch number(INT) , and other columns
> are date time , close time etc etc
>
> I have to sort the entire file in order of the first column .. but the
> problem is that the file is extremely huge.
>
> For the largest customer it contains 1100 million records and the file is
> 44GB !
> how can I sort this big a file
Is there any reason not to use the system sort? GNU sort uses an
external R-way merge. It's designed for this sort of thing.
--
Paul Johnson - paul@pjcj.net
http://www.pjcj.net
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
Re: Sorting an extremely LARGE file
am 07.08.2011 21:20:02 von Shawn Wilson
--0016364eedc8a1713304a9ef3700
Content-Type: text/plain; charset=ISO-8859-1
On Aug 7, 2011 1:15 PM, "Paul Johnson" wrote:
>
> On Sun, Aug 07, 2011 at 08:58:14PM +0530, Ramprasad Prasad wrote:
>
> > I have a file that contains records of customer interaction
> > The first column of the file is the batch number(INT) , and other
columns
> > are date time , close time etc etc
> >
> > I have to sort the entire file in order of the first column .. but the
> > problem is that the file is extremely huge.
> >
> > For the largest customer it contains 1100 million records and the file
is
> > 44GB !
> > how can I sort this big a file
>
> Is there any reason not to use the system sort? GNU sort uses an
> external R-way merge. It's designed for this sort of thing.
>
The Unix sort is pretty fast and it will work. The problem with it is that
it seems to buffer overflow somewhere between 2 and 4 gigs, IIRC. A database
is perfect for this. However, I think the problem was that mysql's order by
is slow as hell. It can be sped up (slightly) with an index. You might
consider postgresql as their order by /should/ be quite a bit faster. You
might also try mongo or couch - though you'll put the sort logic in the
script and I haven't used either in perl.
If you've already got it in a db, I'd create the index, start the query,
watch your resources get pegged, and wait. You'll get it eventually. :)
--0016364eedc8a1713304a9ef3700--
Re: Sorting an extremely LARGE file
am 07.08.2011 21:30:40 von Shawn H Corey
On 11-08-07 03:20 PM, shawn wilson wrote:
> It can be sped up (slightly) with an index.
Indexes in SQL don't normally speed up sorting. What they're best at is
selecting a limited number of records, usually less than 10% of the
total. Otherwise, they just get in the way.
The best you can do with a database is to keep the table sorted by the
key most commonly used. This is different than an index. An index is
an additional file that records the keys and the offset to the record in
the table file. The index file is sorted by its key.
--
Just my 0.00000002 million dollars worth,
Shawn
Confusion is the first step of understanding.
Programming is as much about organization and communication
as it is about coding.
The secret to great software: Fail early & often.
Eliminate software piracy: use only FLOSS.
"Make something worthwhile." -- Dear Hunter
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
Re: Sorting an extremely LARGE file
am 07.08.2011 21:58:35 von Rob Dixon
On 07/08/2011 20:30, Shawn H Corey wrote:
> On 11-08-07 03:20 PM, shawn wilson wrote:
>>
>> It can be sped up (slightly) with an index.
>
> Indexes in SQL don't normally speed up sorting. What they're best at is
> selecting a limited number of records, usually less than 10% of the
> total. Otherwise, they just get in the way.
>
> The best you can do with a database is to keep the table sorted by the
> key most commonly used. This is different than an index. An index is an
> additional file that records the keys and the offset to the record in
> the table file. The index file is sorted by its key.
Exactly. So to sort a database in the order of its key field all that is
necessary is to read sequentially through the index and pull out the
corresponding record.
I would suggest that the OP could do this 'manually'. i.e. build a
separate index file with just the key fields and pointers into the
primary file. Once that is done the operation is trivial: even more so
if the primary file has fixed-length records (and if not I would like a
word with the person who decided on a 44G file that must be read
sequentially!).
Cheers,
Rob
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
Re: Sorting an extremely LARGE file
am 07.08.2011 22:53:22 von Shawn Wilson
On Sun, Aug 7, 2011 at 15:58, Rob Dixon wrote:
> On 07/08/2011 20:30, Shawn H Corey wrote:
>>
>> On 11-08-07 03:20 PM, shawn wilson wrote:
>>>
>>> It can be sped up (slightly) with an index.
>>
>> Indexes in SQL don't normally speed up sorting. What they're best at is
>> selecting a limited number of records, usually less than 10% of the
>> total. Otherwise, they just get in the way.
>>
>> The best you can do with a database is to keep the table sorted by the
>> key most commonly used. This is different than an index. An index is an
>> additional file that records the keys and the offset to the record in
>> the table file. The index file is sorted by its key.
>
> Exactly. So to sort a database in the order of its key field all that is
> necessary is to read sequentially through the index and pull out the
> corresponding record.
>
> I would suggest that the OP could do this 'manually'. i.e. build a
> separate index file with just the key fields and pointers into the
> primary file. Once that is done the operation is trivial: even more so
> if the primary file has fixed-length records (and if not I would like a
> word with the person who decided on a 44G file that must be read
> sequentially!).
>
i really do think it could be done in perl pretty easy:
my $idx;
while( <> ) {
$idx->{ $csv->field }->{ scalar{ $idx->{ $csv->field } ) } = $.;
}
then you have a nice data structure of your values and duplicates
along with line numbers. you can then go and loop again and pull out
your lines.
i still think this is the wrong approach as it is in a db, should be
in a db and should never have been put in a 44G flat file in the first
place. but....
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
Re: Sorting an extremely LARGE file
am 07.08.2011 23:07:20 von Uri Guttman
>>>>> "RP" == Rajeev Prasad writes:
RP> hi, you can try this: first get only that field (sed/awk/perl)
RP> whihc you want to sort on in a file. sort that file which i assume
RP> would be lot less in size then your current file/table. then run a
RP> loop on the main file using sorted file as variable.
RP> =A0
RP> here is the logic in shell:
RP> =A0
RP> awk '{print $}' > tmp-file
RP> =A0
RP> sort
RP> =A0
RP> for id in `cat `;do grep $id >> sorted=
-large-file;done
have you thought about the time this will take? you are doing an O( N**2
) grep there. you are looping over all N keys and then scanning the file
N lines for each key. that will take a very long time for such a large
file. as others have said, either use the sort utility or do a
merge/sort on the records. your way is effectively a slow bubble sort!
uri
--=20
Uri Guttman -- uri AT perlhunter DOT com --- http://www.perlhunter.com =
--
------------ Perl Developer Recruiting and Placement Services -----------=
--
----- Perl Code Review, Architecture, Development, Training, Support -----=
--
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
Re: Sorting an extremely LARGE file
am 08.08.2011 07:10:12 von Ramprasad Prasad
Using the system linux sort ... Does not help.
On my dual quad core machine , (8 gb ram) sort -n file takes 10
minutes and in the end produces no output.
when I put this data in mysql , there is an index on the order by
field ... But I guess keys don't help when you are selecting the
entire table.
I guess there is a serious need for re-architecting , rather than
create such monstrous files, but when people work with legacy systems
which worked fine when there was lower usage and now you tell then you
need a overhaul because the current system doesn't scale ... That
takes a lot of convincing
On 8/8/11, Uri Guttman wrote:
>>>>>> "RP" == Rajeev Prasad writes:
>
> RP> hi, you can try this: first get only that field (sed/awk/perl)
> RP> whihc you want to sort on in a file. sort that file which i assume
> RP> would be lot less in size then your current file/table. then run a
> RP> loop on the main file using sorted file as variable.
>
> RP>
> RP> here is the logic in shell:
> RP>
> RP> awk '{print $}' > tmp-file
> RP>
> RP> sort
> RP>
>
> RP> for id in `cat `;do grep $id >>
> sorted-large-file;done
>
> have you thought about the time this will take? you are doing an O( N**2
> ) grep there. you are looping over all N keys and then scanning the file
> N lines for each key. that will take a very long time for such a large
> file. as others have said, either use the sort utility or do a
> merge/sort on the records. your way is effectively a slow bubble sort!
>
> uri
>
> --
> Uri Guttman -- uri AT perlhunter DOT com --- http://www.perlhunter.com
> --
> ------------ Perl Developer Recruiting and Placement Services
> -------------
> ----- Perl Code Review, Architecture, Development, Training, Support
> -------
>
--
Sent from my mobile device
Thanks
Ram
n
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
Re: Sorting an extremely LARGE file
am 08.08.2011 07:27:18 von Kenneth Wolcott
On Sun, Aug 7, 2011 at 22:10, Ramprasad Prasad wrote:
>
[snip]
> I guess there is a serious need for re-architecting , rather than
> create such monstrous files, but when people work with legacy systems
> which worked fine when there was lower usage and now you tell then you
> need a overhaul because the current system doesn't scale ... That
> takes a lot of convincing
That's the nature of many jobs, you get what the people before you did.
They might have not been very good at what they did, or maybe they had
very short-sighted management, but that's the job, is to do your best
to work with what you have.
I have to undo/fix/replace ten years (plus) of short-sighted damage in
my work. Hey, I have a job and I'm thrilled.
Do what you can so that the people who replace you won't curse you and
throw poisoned darts at a picture of you.
Ken Wolcott
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
Re: Sorting an extremely LARGE file
am 08.08.2011 13:11:37 von Paul Johnson
On Mon, Aug 08, 2011 at 10:40:12AM +0530, Ramprasad Prasad wrote:
> Using the system linux sort ... Does not help.
> On my dual quad core machine , (8 gb ram) sort -n file takes 10
> minutes and in the end produces no output.
Did you set any other options?
At a minimum you should set -T to tell sort where to put its temporary
files. Otherwise they will go into /tmp which you probably don't want.
I expect this was your problem here.
You probably want to set --compress-program=gzip too. This will
compress the temporary files, reducing IO (which would likely be the
limiting factor otherwise) and making use of some of those cores (which
would likely be sitting idle otherwise). This will probably both speed
up the sort and reduce the disk space required.
This really is your solution if you just want to sort that file.
--
Paul Johnson - paul@pjcj.net
http://www.pjcj.net
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
Re: Sorting an extremely LARGE file
am 08.08.2011 15:25:48 von Shawn Wilson
--0022152d65599ca4d304a9fe62da
Content-Type: text/plain; charset=ISO-8859-1
On Aug 8, 2011 12:11 AM, "Ramprasad Prasad" wrote:
>
> Using the system linux sort ... Does not help.
> On my dual quad core machine , (8 gb ram) sort -n file takes 10
> minutes and in the end produces no output.
>
I had a smaller file and 32g to play with on a dual quad core (dl320). Sort
just can't handle more than 2~4 gigs.
> when I put this data in mysql , there is an index on the order by
> field ... But I guess keys don't help when you are selecting the
> entire table.
>
> I guess there is a serious need for re-architecting , rather than
> create such monstrous files, but when people work with legacy systems
> which worked fine when there was lower usage and now you tell then you
> need a overhaul because the current system doesn't scale ... That
> takes a lot of convincing
>
You're dealing with a similar issue that I had in this respect too. The only
difference is that I created my own issue out of ignorance (having never
dealt with as much data and having set my dl320 to splice, sort, and merge I
got through that). Well, with this data I just threw 30+ fields of a hundred
thousand lines (yes, you've still got more data to deal with) into one
table. This worked ok until my queries got a bit more complex at which
point, it took me 8+ hours to generate a report. I rethink the tables (or
more like read a bit and think about what the hell I'm doing) and create a
half dozen relationships and I get the report down to little under 2 hours.
My advise is to think about rethinking your db. This is probably going to
mean rethinking software too (or, at least the queries it makes).
You might want to check out the #mysql freenode irc channel - most of them
are pompous but you'll get your answers. I think perl is less related to
your issue but the people in the #dbi and dbic perl irc channels are much
more easy going with their business.
> On 8/8/11, Uri Guttman wrote:
> >>>>>> "RP" == Rajeev Prasad writes:
> >
> > RP> hi, you can try this: first get only that field (sed/awk/perl)
> > RP> whihc you want to sort on in a file. sort that file which i assume
> > RP> would be lot less in size then your current file/table. then run a
> > RP> loop on the main file using sorted file as variable.
> >
> > RP>
> > RP> here is the logic in shell:
> > RP>
> > RP> awk '{print $}' > tmp-file
> > RP>
> > RP> sort
> > RP>
> >
> > RP> for id in `cat `;do grep $id >>
> > sorted-large-file;done
> >
> > have you thought about the time this will take? you are doing an O( N**2
> > ) grep there. you are looping over all N keys and then scanning the file
> > N lines for each key. that will take a very long time for such a large
> > file. as others have said, either use the sort utility or do a
> > merge/sort on the records. your way is effectively a slow bubble sort!
> >
> > uri
> >
> > --
> > Uri Guttman -- uri AT perlhunter DOT com ---
http://www.perlhunter.com
> > --
> > ------------ Perl Developer Recruiting and Placement Services
> > -------------
> > ----- Perl Code Review, Architecture, Development, Training, Support
> > -------
> >
>
> --
> Sent from my mobile device
>
> Thanks
> Ram
>
>
>
>
>
> n
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> http://learn.perl.org/
>
>
--0022152d65599ca4d304a9fe62da--
Re: Sorting an extremely LARGE file
am 08.08.2011 16:10:03 von Paul Johnson
On Mon, Aug 08, 2011 at 09:25:48AM -0400, shawn wilson wrote:
> On Aug 8, 2011 12:11 AM, "Ramprasad Prasad" wrote:
> >
> > Using the system linux sort ... Does not help.
> > On my dual quad core machine , (8 gb ram) sort -n file takes 10
> > minutes and in the end produces no output.
>
> I had a smaller file and 32g to play with on a dual quad core (dl320). Sort
> just can't handle more than 2~4 gigs.
You keep saying this ...
Gnu sort really can handle very large files. I have even tested it to
make sure. You may need to configure things slightly. You may need to
locate some temporary disk space. You may prefer to do things another
way. But if you just want to sort a file, sort will do it for you.
--
Paul Johnson - paul@pjcj.net
http://www.pjcj.net
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
Re: Sorting an extremely LARGE file
am 08.08.2011 16:23:40 von Shlomi Fish
Hi Ramprasad,
On Sun, 7 Aug 2011 20:58:14 +0530
Ramprasad Prasad wrote:
> I have a file that contains records of customer interaction
> The first column of the file is the batch number(INT) , and other columns
> are date time , close time etc etc
>
> I have to sort the entire file in order of the first column .. but the
> problem is that the file is extremely huge.
>
> For the largest customer it contains 1100 million records and the file is
> 44GB !
> how can I sort this big a file
>
I suggest splitting the files into bins. Each bin will contain the records with
the batch numbers in a certain range (say 0-999,999 ; 1,000,000-1,999,999,
etc.). You should select the bins so the numbers are spread more or less
evenly. Then you sort each bin separately, and then append the bins in order.
Let me know if there's anything else you don't understand, and if you're
interested, I can be commissioned to write it for you (but it shouldn't be too
hard.).
Regards,
Shlomi Fish
>
>
>
>
--
------------------------------------------------------------ -----
Shlomi Fish http://www.shlomifish.org/
Why I Love Perl - http://shlom.in/joy-of-perl
Chuck Norris refactors 10 million lines of Perl code before lunch.
Please reply to list if it's a mailing list post - http://shlom.in/reply .
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
Re: Sorting an extremely LARGE file
am 08.08.2011 16:49:09 von Shawn H Corey
On 11-08-08 10:23 AM, Shlomi Fish wrote:
> I suggest splitting the files into bins. Each bin will contain the records with
> the batch numbers in a certain range (say 0-999,999 ; 1,000,000-1,999,999,
> etc.). You should select the bins so the numbers are spread more or less
> evenly. Then you sort each bin separately, and then append the bins in order.
Well, if you want a Linux version rather than Perl, see:
man split
man sort
man comm
When you use comm(1), set its --output-delimiter to the empty string.
--output-delimiter=''
--
Just my 0.00000002 million dollars worth,
Shawn
Confusion is the first step of understanding.
Programming is as much about organization and communication
as it is about coding.
The secret to great software: Fail early & often.
Eliminate software piracy: use only FLOSS.
"Make something worthwhile." -- Dear Hunter
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
Re: Sorting an extremely LARGE file
am 08.08.2011 16:58:37 von Shawn Wilson
On Mon, Aug 8, 2011 at 10:10, Paul Johnson wrote:
> On Mon, Aug 08, 2011 at 09:25:48AM -0400, shawn wilson wrote:
>> On Aug 8, 2011 12:11 AM, "Ramprasad Prasad" wro=
te:
>> >
>> > Using the system linux sort ... Does not help.
>> > On my dual quad core machine , (8 gb ram) sort -n file takes 10
>> > minutes and in the end produces no output.
>>
>> I had a smaller file and 32g to play with on a dual quad core (dl320). S=
ort
>> just can't handle more than 2~4 gigs.
>
> You keep saying this ...
>
> Gnu sort really can handle very large files. =A0I have even tested it to
> make sure. =A0You may need to configure things slightly. =A0You may need =
to
> locate some temporary disk space. =A0You may prefer to do things another
> way. =A0But if you just want to sort a file, sort will do it for you.
>
very well then (lets assume you're right). what are you saying is the
max file size that sort can handle with what amount of ram and disk?
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/