count the number of occurences of each different word of a text

am 26.12.2007 15:29:29 von gniagnia

Hi all

i have a text that contains nearly 30000 words.
I'd like to know the number of occurences of each word of this text
and sort the result in descending order.
Is this easily feasible using perl? If so, how ? (i am a huge noob..)

thanks in advance

Re: count the number of occurences of each different word of a text

am 26.12.2007 16:53:52 von jurgenex

Mr_Noob wrote:
>i have a text that contains nearly 30000 words.
>I'd like to know the number of occurences of each word of this text
>and sort the result in descending order.
>Is this easily feasible using perl?

Very simple actually.

> If so, how ? (i am a huge noob..)

Just split() the string into words and count each word pattern in a hash,
using the word pattern as the key and the counter as the value.
Then sort the keys of that hash into an array by order of their values.
And then print them.

my $unhelpfulFAQ =
' How can I count the number of occurrences of a substring within a
string?
There are a number of ways, with varying efficiency. If you want
a count of a certain single character (X) within a string, you
can use the "tr///" function like so:

$string = "ThisXlineXhasXsomeXxsXinXit";
$count = ($string =~ tr/X//);
print "There are $count X characters in the string";

This is fine if you are just looking for a single character.
However, if you are trying to count multiple character
substrings within a larger string, "tr///" wont work. What you
can do is wrap a while() loop around a global pattern match. For
example, lets count negative integers:

$string = "-9 55 48 -2 23 -76 4 14 -44";
while ($string =~ /-\d+/g) { $count++ }
print "There are $count negative numbers in the string";';

my %count;
for (split /\W+/, $unhelpfulFAQ){
$count{$_}++;
}
my @sorted = sort {
$count{$b} <=> $count{$a}
} keys %count;
for (@sorted) {
print "$_: \t $count{$_}\n";
}

Re: count the number of occurences of each different word of a text

am 26.12.2007 16:58:34 von it_says_BALLS_on_your forehead

On Dec 26, 9:29=A0am, Mr_Noob wrote:
> Hi all
>
> i have a text that contains nearly 30000 words.
> I'd like to know the number of occurences of each word of this text
> and sort the result in descending order.
> Is this easily feasible using perl? If so, how ? (i am a huge noob..)
>

use strict; use warnings;

my %word_count;
my $text =3D 'the quick brown fox jumped over the lazy dog. the dog
liked it.';

my @words =3D map { s/\.$//; $_; } split ' ', $text;
for ( @words ) {
$word_count{$_}++;
}

for my $word ( sort { $word_count{$b} <=3D> $word_count{$a} } keys
%word_count ) {
print "$word =3D> $word_count{$word}\n";
}

__OUTPUT__
the =3D> 3
dog =3D> 2
jumped =3D> 1
over =3D> 1
it =3D> 1
liked =3D> 1
lazy =3D> 1
brown =3D> 1
fox =3D> 1
quick =3D> 1

..you may want to sort on the word if the counts match up as well, so
the output will be consistent.

Re: count the number of occurences of each different word of a text

am 26.12.2007 17:28:45 von gniagnia

On 26 d=E9c, 16:58, nolo contendere wrote:
> On Dec 26, 9:29=A0am, Mr_Noob wrote:
>
> > Hi all
>
> > i have a text that contains nearly 30000 words.
> > I'd like to know the number of occurences of each word of this text
> > and sort the result in descending order.
> > Is this easily feasible using perl? If so, how ? (i am a huge noob..)
>
> use strict; use warnings;
>
> my %word_count;
> my $text =3D 'the quick brown fox jumped over the lazy dog. the dog
> liked it.';
>
> my @words =3D map { s/\.$//; $_; } split ' ', $text;
> for ( @words ) {
> =A0 =A0 $word_count{$_}++;
>
> }
>
> for my $word ( sort { $word_count{$b} <=3D> $word_count{$a} } keys
> %word_count ) {
> =A0 =A0 print "$word =3D> $word_count{$word}\n";
>
> }
>
> __OUTPUT__
> the =3D> 3
> dog =3D> 2
> jumped =3D> 1
> over =3D> 1
> it =3D> 1
> liked =3D> 1
> lazy =3D> 1
> brown =3D> 1
> fox =3D> 1
> quick =3D> 1
>
> ...you may want to sort on the word if the counts match up as well, so
> the output will be consistent.

Thanks a lot for your answers.
I gave your script a try and it works perfectly !
However, i'd like to tell $text to look for a file, but i can't find
a way to do so :

#!/usr/bin/perl -w
use strict; use warnings;
my %word_count;
my $text =3D system ("cat /Users/test/Desktop/mytext.txt");
my @words =3D map { s/\.$//; $_; } split ' ', $text;
for ( @words ) {
$word_count{$_}++;
}
for my $word ( sort { $word_count{$b} <=3D> $word_count{$a} } keys
%word_count ) {
print "$word =3D> $word_count{$word}\n";
}

any idea?

thank you again

Re: count the number of occurences of each different word of a text

am 26.12.2007 17:33:30 von it_says_BALLS_on_your forehead

On Dec 26, 11:28=A0am, Mr_Noob wrote:
> On 26 d=E9c, 16:58, nolo contendere wrote:
>
>
>
> > On Dec 26, 9:29=A0am, Mr_Noob wrote:
>
> > > Hi all
>
> > > i have a text that contains nearly 30000 words.
> > > I'd like to know the number of occurences of each word of this text
> > > and sort the result in descending order.
> > > Is this easily feasible using perl? If so, how ? (i am a huge noob..)
>
> > use strict; use warnings;
>
> > my %word_count;
> > my $text =3D 'the quick brown fox jumped over the lazy dog. the dog
> > liked it.';
>
> > my @words =3D map { s/\.$//; $_; } split ' ', $text;
> > for ( @words ) {
> > =A0 =A0 $word_count{$_}++;
>
> > }
>
> > for my $word ( sort { $word_count{$b} <=3D> $word_count{$a} } keys
> > %word_count ) {
> > =A0 =A0 print "$word =3D> $word_count{$word}\n";
>
> > }
>
> > __OUTPUT__
> > the =3D> 3
> > dog =3D> 2
> > jumped =3D> 1
> > over =3D> 1
> > it =3D> 1
> > liked =3D> 1
> > lazy =3D> 1
> > brown =3D> 1
> > fox =3D> 1
> > quick =3D> 1
>
> > ...you may want to sort on the word if the counts match up as well, so
> > the output will be consistent.
>
> Thanks a lot for your answers.
> I gave your script a try and it works perfectly !
> However, i'd like to tell $text to look for =A0a file, but i can't find
> a way to do so :
>
> #!/usr/bin/perl -w
> use strict; use warnings;
> my %word_count;
> my $text =3D system ("cat /Users/test/Desktop/mytext.txt");

instead of what you have above, create a function to "slurp" the file
into a scalar variable.

sub slurp_file {

##---------------------------------------------------------- --------
## Reads contents of a text file into a string
##
## INPUTS: 1) Filename
##
## OUTPUTS: 1) String which contains contents of input file
##
my ( $filename ) =3D @_;
open my $fh, '<', $filename or die "can't open $filename: $!";
my $text =3D do { local $/; <$fh> };
close $fh;
return $text;
}

then, just call it:

my $text =3D slurp_file( '/Users/test/Desktop/mytext.txt' );

Re: count the number of occurences of each different word of a text

am 26.12.2007 17:48:52 von gniagnia

On 26 d=E9c, 17:33, nolo contendere wrote:
> On Dec 26, 11:28=A0am, Mr_Noob wrote:
>
>
>
> > On 26 d=E9c, 16:58, nolo contendere wrote:
>
> > > On Dec 26, 9:29=A0am, Mr_Noob wrote:
>
> > > > Hi all
>
> > > > i have a text that contains nearly 30000 words.
> > > > I'd like to know the number of occurences of each word of this text
> > > > and sort the result in descending order.
> > > > Is this easily feasible using perl? If so, how ? (i am a huge noob..=
)
>
> > > use strict; use warnings;
>
> > > my %word_count;
> > > my $text =3D 'the quick brown fox jumped over the lazy dog. the dog
> > > liked it.';
>
> > > my @words =3D map { s/\.$//; $_; } split ' ', $text;
> > > for ( @words ) {
> > > =A0 =A0 $word_count{$_}++;
>
> > > }
>
> > > for my $word ( sort { $word_count{$b} <=3D> $word_count{$a} } keys
> > > %word_count ) {
> > > =A0 =A0 print "$word =3D> $word_count{$word}\n";
>
> > > }
>
> > > __OUTPUT__
> > > the =3D> 3
> > > dog =3D> 2
> > > jumped =3D> 1
> > > over =3D> 1
> > > it =3D> 1
> > > liked =3D> 1
> > > lazy =3D> 1
> > > brown =3D> 1
> > > fox =3D> 1
> > > quick =3D> 1
>
> > > ...you may want to sort on the word if the counts match up as well, so=

> > > the output will be consistent.
>
> > Thanks a lot for your answers.
> > I gave your script a try and it works perfectly !
> > However, i'd like to tell $text to look for =A0a file, but i can't find
> > a way to do so :
>
> > #!/usr/bin/perl -w
> > use strict; use warnings;
> > my %word_count;
> > my $text =3D system ("cat /Users/test/Desktop/mytext.txt");
>
> instead of what you have above, create a function to "slurp" the file
> into a scalar variable.
>
> sub slurp_file {
>
> ##---------------------------------------------------------- --------
> =A0 =A0 ## Reads contents of a text file into a string
> =A0 =A0 ##
> =A0 =A0 ## INPUTS: =A01) Filename
> =A0 =A0 ##
> =A0 =A0 ## OUTPUTS: 1) String which contains contents of input file
> =A0 =A0 ##
> =A0 =A0 my ( $filename ) =3D @_;
> =A0 =A0 open my $fh, '<', $filename or die "can't open $filename: $!";
> =A0 =A0 my $text =3D do { local $/; <$fh> };
> =A0 =A0 close $fh;
> =A0 =A0 return $text;
>
> }
>
> then, just call it:
>
> my $text =3D slurp_file( '/Users/test/Desktop/mytext.txt' );

Yes! Perfect ! thanks a lot !

Re: count the number of occurences of each different word of a text

am 26.12.2007 17:56:31 von jurgenex

nolo contendere wrote:
>instead of what you have above, create a function to "slurp" the file
>into a scalar variable.

Mind to explain your reasoning, please?
You are processing the file/string in a totally linear manner (no going
back). Therefore I don't see any reason to slurp in the whole file in one
piece instead of just processing it line by line.

jue

Re: count the number of occurences of each different word of a text

am 26.12.2007 18:03:54 von it_says_BALLS_on_your forehead

On Dec 26, 11:56=A0am, Jürgen Exner wrote:
> nolo contendere wrote:
> >instead of what you have above, create a function to "slurp" the file
> >into a scalar variable.
>
> Mind to explain your reasoning, please?
> You are processing the file/string in a totally linear manner (no going
> back). Therefore I don't see any reason to slurp in the whole file in one
> piece instead of just processing it line by line.
>
> jue

Well, it was laziness really. I was giving the OP a fish instead of
teaching how to fish. The code I delivered was the easiest way (not
the best, I agree) I could think of to plug into the existing code,
and that fulfilled the OP's need.

It doesn't really matter for small amounts of data. I wouldn't do it
the same way if the nature of the problem were different.

Re: count the number of occurences of each different word of a text

am 26.12.2007 18:09:32 von gniagnia

On 26 d=E9c, 17:48, Mr_Noob wrote:
> On 26 d=E9c, 17:33, nolo contendere wrote:
>
>
>
> > On Dec 26, 11:28=A0am, Mr_Noob wrote:
>
> > > On 26 d=E9c, 16:58, nolo contendere wrote:
>
> > > > On Dec 26, 9:29=A0am, Mr_Noob wrote:
>
> > > > > Hi all
>
> > > > > i have a text that contains nearly 30000 words.
> > > > > I'd like to know the number of occurences of each word of this tex=
t
> > > > > and sort the result in descending order.
> > > > > Is this easily feasible using perl? If so, how ? (i am a huge noob=
...)
>
> > > > use strict; use warnings;
>
> > > > my %word_count;
> > > > my $text =3D 'the quick brown fox jumped over the lazy dog. the dog
> > > > liked it.';
>
> > > > my @words =3D map { s/\.$//; $_; } split ' ', $text;
> > > > for ( @words ) {
> > > > =A0 =A0 $word_count{$_}++;
>
> > > > }
>
> > > > for my $word ( sort { $word_count{$b} <=3D> $word_count{$a} } keys
> > > > %word_count ) {
> > > > =A0 =A0 print "$word =3D> $word_count{$word}\n";
>
> > > > }
>
> > > > __OUTPUT__
> > > > the =3D> 3
> > > > dog =3D> 2
> > > > jumped =3D> 1
> > > > over =3D> 1
> > > > it =3D> 1
> > > > liked =3D> 1
> > > > lazy =3D> 1
> > > > brown =3D> 1
> > > > fox =3D> 1
> > > > quick =3D> 1
>
> > > > ...you may want to sort on the word if the counts match up as well, =
so
> > > > the output will be consistent.
>
> > > Thanks a lot for your answers.
> > > I gave your script a try and it works perfectly !
> > > However, i'd like to tell $text to look for =A0a file, but i can't fin=
d
> > > a way to do so :
>
> > > #!/usr/bin/perl -w
> > > use strict; use warnings;
> > > my %word_count;
> > > my $text =3D system ("cat /Users/test/Desktop/mytext.txt");
>
> > instead of what you have above, create a function to "slurp" the file
> > into a scalar variable.
>
> > sub slurp_file {
>
> > ##---------------------------------------------------------- --------
> > =A0 =A0 ## Reads contents of a text file into a string
> > =A0 =A0 ##
> > =A0 =A0 ## INPUTS: =A01) Filename
> > =A0 =A0 ##
> > =A0 =A0 ## OUTPUTS: 1) String which contains contents of input file
> > =A0 =A0 ##
> > =A0 =A0 my ( $filename ) =3D @_;
> > =A0 =A0 open my $fh, '<', $filename or die "can't open $filename: $!";
> > =A0 =A0 my $text =3D do { local $/; <$fh> };
> > =A0 =A0 close $fh;
> > =A0 =A0 return $text;
>
> > }
>
> > then, just call it:
>
> > my $text =3D slurp_file( '/Users/test/Desktop/mytext.txt' );
>
> Yes! Perfect ! thanks a lot !

well, i still have a little problem. Here is a sample of my output:

dog =3D> 5
cat =3D> 3
dog, =3D> 2
..

how can i avoid the distinction between a word and a word followed by
a coma?

Re: count the number of occurences of each different word of a text

am 26.12.2007 19:10:12 von it_says_BALLS_on_your forehead

On Dec 26, 12:09=A0pm, Mr_Noob wrote:
> On 26 d=E9c, 17:48, Mr_Noob wrote:
>
>
>
> > On 26 d=E9c, 17:33, nolo contendere wrote:
>
> > > On Dec 26, 11:28=A0am, Mr_Noob wrote:
>
> > > > On 26 d=E9c, 16:58, nolo contendere wrote:
>
> > > > > On Dec 26, 9:29=A0am, Mr_Noob wrote:
>
> > > > > > Hi all
>
> > > > > > i have a text that contains nearly 30000 words.
> > > > > > I'd like to know the number of occurences of each word of this t=
ext
> > > > > > and sort the result in descending order.
> > > > > > Is this easily feasible using perl? If so, how ? (i am a huge no=
ob..)
>
> > > > > use strict; use warnings;
>
> > > > > my %word_count;
> > > > > my $text =3D 'the quick brown fox jumped over the lazy dog. the do=
g
> > > > > liked it.';
>
> > > > > my @words =3D map { s/\.$//; $_; } split ' ', $text;
> > > > > for ( @words ) {
> > > > > =A0 =A0 $word_count{$_}++;
>
> > > > > }
>
> > > > > for my $word ( sort { $word_count{$b} <=3D> $word_count{$a} } keys=

> > > > > %word_count ) {
> > > > > =A0 =A0 print "$word =3D> $word_count{$word}\n";
>
> > > > > }
>
> > > > > __OUTPUT__
> > > > > the =3D> 3
> > > > > dog =3D> 2
> > > > > jumped =3D> 1
> > > > > over =3D> 1
> > > > > it =3D> 1
> > > > > liked =3D> 1
> > > > > lazy =3D> 1
> > > > > brown =3D> 1
> > > > > fox =3D> 1
> > > > > quick =3D> 1
>
> > > > > ...you may want to sort on the word if the counts match up as well=
, so
> > > > > the output will be consistent.
>
> > > > Thanks a lot for your answers.
> > > > I gave your script a try and it works perfectly !
> > > > However, i'd like to tell $text to look for =A0a file, but i can't f=
ind
> > > > a way to do so :
>
> > > > #!/usr/bin/perl -w
> > > > use strict; use warnings;
> > > > my %word_count;
> > > > my $text =3D system ("cat /Users/test/Desktop/mytext.txt");
>
> > > instead of what you have above, create a function to "slurp" the file
> > > into a scalar variable.
>
> > > sub slurp_file {
>
> > > ##---------------------------------------------------------- --------
> > > =A0 =A0 ## Reads contents of a text file into a string
> > > =A0 =A0 ##
> > > =A0 =A0 ## INPUTS: =A01) Filename
> > > =A0 =A0 ##
> > > =A0 =A0 ## OUTPUTS: 1) String which contains contents of input file
> > > =A0 =A0 ##
> > > =A0 =A0 my ( $filename ) =3D @_;
> > > =A0 =A0 open my $fh, '<', $filename or die "can't open $filename: $!";=

> > > =A0 =A0 my $text =3D do { local $/; <$fh> };
> > > =A0 =A0 close $fh;
> > > =A0 =A0 return $text;
>
> > > }
>
> > > then, just call it:
>
> > > my $text =3D slurp_file( '/Users/test/Desktop/mytext.txt' );
>
> > Yes! Perfect ! thanks a lot !
>
> well, i still have a little problem. Here is a sample of my output:
>
> dog =3D> 5
> cat =3D> 3
> dog, =3D> 2
> ...
>
> how can i avoid the distinction between a word and a word followed by
> a coma?

in this line:

my @words =3D map { s/\.$//; $_; } split ' ', $text;

..I strip any words of a period. you can change this to strip them of
commas as well, either with a logical 'or', or a character class, or
with a separate s/// statement.

Re: count the number of occurences of each different word of a text

am 26.12.2007 19:28:58 von Uri Guttman

>>>>> "N" == Noob writes:

N> On 26 déc, 17:33, nolo contendere wrote:
>>
>> instead of what you have above, create a function to "slurp" the file
>> into a scalar variable.
>>
>> sub slurp_file {
>>
>> ##---------------------------------------------------------- --------
>> ## Reads contents of a text file into a string
>> ##
>> ## INPUTS: 1) Filename
>> ##
>> ## OUTPUTS: 1) String which contains contents of input file
>> ##
>> my ( $filename ) = @_;
>> open my $fh, '<', $filename or die "can't open $filename: $!";
>> my $text = do { local $/; <$fh> };
>> close $fh;
>> return $text;
>>
>> }
>>
>> then, just call it:
>>
>> my $text = slurp_file( '/Users/test/Desktop/mytext.txt' );

N> Yes! Perfect ! thanks a lot !

not so perfect. it is slow and doesn't support various useful options.
check out File::Slurp on cpan and you won't have to cut/paste that
code. and nolo, you should use it too.

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Re: count the number of occurences of each different word of a text

am 26.12.2007 19:29:51 von Uri Guttman

>>>>> "nc" == nolo contendere writes:

nc> On Dec 26, 11:56 am, Jürgen Exner wrote:
>> nolo contendere wrote:
>> >instead of what you have above, create a function to "slurp" the file
>> >into a scalar variable.
>>
>> Mind to explain your reasoning, please?
>> You are processing the file/string in a totally linear manner (no going
>> back). Therefore I don't see any reason to slurp in the whole file in one
>> piece instead of just processing it line by line.
>>
>> jue

nc> Well, it was laziness really. I was giving the OP a fish instead of
nc> teaching how to fish. The code I delivered was the easiest way (not
nc> the best, I agree) I could think of to plug into the existing code,
nc> and that fulfilled the OP's need.

that was easier than use File::Slurp?

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Re: count the number of occurences of each different word of a text

am 26.12.2007 19:36:12 von it_says_BALLS_on_your forehead

On Dec 26, 1:29=A0pm, Uri Guttman wrote:
> >>>>> "nc" == nolo contendere writes:
>
> =A0 nc> On Dec 26, 11:56=A0am, Jürgen Exner wrote=
:
> =A0 >> nolo contendere wrote:
> =A0 >> >instead of what you have above, create a function to "slurp" the f=
ile
> =A0 >> >into a scalar variable.
> =A0 >>
> =A0 >> Mind to explain your reasoning, please?
> =A0 >> You are processing the file/string in a totally linear manner (no g=
oing
> =A0 >> back). Therefore I don't see any reason to slurp in the whole file =
in one
> =A0 >> piece instead of just processing it line by line.
> =A0 >>
> =A0 >> jue
>
> =A0 nc> Well, it was laziness really. I was giving the OP a fish instead o=
f
> =A0 nc> teaching how to fish. The code I delivered was the easiest way (no=
t
> =A0 nc> the best, I agree) I could think of to plug into the existing code=
,
> =A0 nc> and that fulfilled the OP's need.
>
> that was easier than use File::Slurp?
>

Uri, no, that was not easier than use File::Slurp, but I like to avoid
using modules for trivial tasks if I can, mainly due to bureaucratic
restrictions imposed by sysadmins, etc. If the OP had total control
over his environment, installing tested and optimized modules would be
the preferred solution.

Re: count the number of occurences of each different word of a text

am 26.12.2007 20:15:46 von jurgenex

nolo contendere wrote:

>my @words = map { s/\.$//; $_; } split ' ', $text;
>
>...I strip any words of a period. you can change this to strip them of
>commas as well, either with a logical 'or', or a character class, or
>with a separate s/// statement.

Is there a specific reason, why you are using this awful map and s///
instead of just splitt()ing at non-word characters?

my @words = split /\W+/, $test;

jue

Re: count the number of occurences of each different word of a text

am 26.12.2007 20:21:35 von it_says_BALLS_on_your forehead

On Dec 26, 2:15=A0pm, Jürgen Exner wrote:
> nolo contendere wrote:
> >my @words =3D map { s/\.$//; $_; } split ' ', $text;
>
> >...I strip any words of a period. you can change this to strip them of
> >commas as well, either with a logical 'or', or a character class, or
> >with a separate s/// statement.
>
> Is there a specific reason, why you are using this awful map and s///
> instead of just splitt()ing at non-word characters?
>
> =A0 =A0 =A0 =A0 my @words =3D split /\W+/, $test;
>

That works very well, except when dealing with my ex-wife and her
cohorts.

Re: count the number of occurences of each different word of a text

am 26.12.2007 20:24:45 von jurgenex

nolo contendere wrote:

>On Dec 26, 11:56 am, Jürgen Exner wrote:
>> nolo contendere wrote:
>> >instead of what you have above, create a function to "slurp" the file
>> >into a scalar variable.
>>
>> Mind to explain your reasoning, please?
>
>Well, it was laziness really. I was giving the OP a fish instead of
>teaching how to fish. The code I delivered was the easiest way (not
>the best, I agree) I could think of to plug into the existing code,
>and that fulfilled the OP's need.

A valid reason. Although I dont' quite agree. Wrapping this piece of code
> > my @words = map { s/\.$//; $_; } split ' ', $text;
> > for ( @words ) {
> > $word_count{$_}++;
> > }
into a
while (my $text = ) {
...
}
loop would have been even easier than defining a new sub. At least IMO.

jue

Re: count the number of occurences of each different word of a text

am 27.12.2007 22:37:27 von Ben Morrow

Quoth nolo contendere :
> On Dec 26, 2:15 pm, Jürgen Exner wrote:
> > nolo contendere wrote:
> > >my @words = map { s/\.$//; $_; } split ' ', $text;
> >
> > >...I strip any words of a period. you can change this to strip them of
> > >commas as well, either with a logical 'or', or a character class, or
> > >with a separate s/// statement.
> >
> > Is there a specific reason, why you are using this awful map and s///
> > instead of just splitt()ing at non-word characters?
> >
> > my @words = split /\W+/, $test;
>
> That works very well, except when dealing with my ex-wife and her
> cohorts.

So use your own definition of 'word' (\w is not a good idea in this case
anyway, as it includes '_')

my @words = split /[^[:alnum:]-]/, $test;

Ben

Re: count the number of occurences of each different word of a text

am 28.12.2007 19:48:27 von Ted Zlatanov

On Wed, 26 Dec 2007 10:36:12 -0800 (PST) nolo contendere wrote:

nc> Uri, no, that was not easier than use File::Slurp, but I like to avoid
nc> using modules for trivial tasks if I can, mainly due to bureaucratic
nc> restrictions imposed by sysadmins, etc. If the OP had total control
nc> over his environment, installing tested and optimized modules would be
nc> the preferred solution.

Slurping files is not trivial, it only looks that way :) Look at
File::Slurp to see how complicated it is when done right.

As far as CPAN goes, I often hear the complain about bureaucracy getting
in the way. Is there something other than CPAN::AutoINC (which just
calls CPAN to install the missing modules) that will do run-time
retrieval of the modules, put them in a temporary place, and load them?
For pure Perl modules that would work well, especially if a local mirror
was used. I looked on CPAN but couldn't find something like this.

Ted

Re: count the number of occurences of each different word of a text

am 28.12.2007 20:06:03 von it_says_BALLS_on_your forehead

On Dec 28, 1:48=A0pm, Ted Zlatanov wrote:
> On Wed, 26 Dec 2007 10:36:12 -0800 (PST) nolo contendere om> wrote:
>
> nc> Uri, no, that was not easier than use File::Slurp, but I like to avoid=

> nc> using modules for trivial tasks if I can, mainly due to bureaucratic
> nc> restrictions imposed by sysadmins, etc. If the OP had total control
> nc> over his environment, installing tested and optimized modules would be=

> nc> the preferred solution.
>
> Slurping files is not trivial, it only looks that way :) =A0Look at
> File::Slurp to see how complicated it is when done right.

Hmm, I suppose I should amend my earlier statement to the effect that
Slurping files for the most common cases is trivial. Uri himself
stated this 4 years ago (http://www.perl.com/pub/a/2003/11/21/
slurp.html):

Traditional Slurping

Perl has always supported slurping files with minimal code. Slurping
of a file to a list of lines is trivial, just call the <> operator
in a list context:

my @lines =3D ;

and slurping to a scalar isn't much more work. Just set the built in
variable $/ (the input record separator) to the undefined value and
read in the file with <>:

open( my $fh, $file ) or die "sudden flaming death\n"
my $text =3D do { local( $/ ) ; <$fh> } ;

>
> As far as CPAN goes, I often hear the complain about bureaucracy getting
> in the way. =A0Is there something other than CPAN::AutoINC (which just
> calls CPAN to install the missing modules) that will do run-time
> retrieval of the modules, put them in a temporary place, and load them?
> For pure Perl modules that would work well, especially if a local mirror
> was used. =A0I looked on CPAN but couldn't find something like this.
>
> Ted

I'm unaware of anything like that, Ted. This seems messy though,
particularly when one
considers that there are multiple environments to think of, and a user
won't always
have access to certain directory structures, or those structures won't
even exist
in the prod environment. Then there are the permission issues, etc.
Seems much simpler
just to use the trivial self-rolled code.

Re: count the number of occurences of each different word of a text

am 28.12.2007 21:18:58 von Ted Zlatanov

On Fri, 28 Dec 2007 11:06:03 -0800 (PST) nolo contendere wrote:

nc> On Dec 28, 1:48 pm, Ted Zlatanov wrote:
>> Slurping files is not trivial, it only looks that way :) Look at
>> File::Slurp to see how complicated it is when done right.

nc> Hmm, I suppose I should amend my earlier statement to the effect that
nc> Slurping files for the most common cases is trivial.

Agreed. Know your inputs and you'll know what's right :)

>> As far as CPAN goes, I often hear the complain about bureaucracy
>> getting in the way. Is there something other than CPAN::AutoINC
>> (which just calls CPAN to install the missing modules) that will do
>> run-time retrieval of the modules, put them in a temporary place, and
>> load them? For pure Perl modules that would work well, especially if
>> a local mirror was used. I looked on CPAN but couldn't find
>> something like this.

nc> I'm unaware of anything like that, Ted. This seems messy though,
nc> particularly when one considers that there are multiple environments
nc> to think of, and a user won't always have access to certain
nc> directory structures, or those structures won't even exist in the
nc> prod environment. Then there are the permission issues, etc. Seems
nc> much simpler just to use the trivial self-rolled code.

These are the things that Perl makes easy, though. File::Temp for
instance will work in most cases to give you temporary storage (or
IO::Scalar in a pinch, to do I/O to a scalar). I think it's an
interesting idea and I could swear it's been implemented already (but
Google and CPAN searches didn't turn anything up).

Ted