file processing
am 23.04.2006 06:03:18 von amit_h123
Hello all ,
I have large text filw with lots of spaces and newline chararters in
it, which i want to remove.
And after that i need to construct the hash tables for the unique word
which are present in the file. Its like i need the hash for only
unigrams (one word at a time), a hash for bigrams (2 words at a time)
and same as for 3 words.
I am all lost in removing and accessing the spaces in the text file but
am not bale to access the each word at a time.
Just a simple example of what i need to do is:
if my text in file is :
hello how are you all hello how are.
so my unigrams will be like:
hello 2
how 2
are 2
you 1...
bigrams will be
hello how 2
how are 2
are you 1
you all 1
trigrams
hello how are 2
how are you 1
are you all 1
.....so on
Can anyone help me with this code.
-thanks
Re: file processing
am 23.04.2006 09:36:54 von Gunnar Hjalmarsson
amit_h123@yahoo.co.in wrote (in alt.perl):
> Hello all ,
> I have large text filw ...
You posted the same job spec. in clpmisc a couple of hours ago, and I
suggested that you learn a programming language. Was that a bad idea?
Didn't you like the hints you were given by another poster either?
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
Re: file processing
am 23.04.2006 09:36:54 von Gunnar Hjalmarsson
amit_h123@yahoo.co.in wrote (in alt.perl):
> Hello all ,
> I have large text filw ...
You posted the same job spec. in clpmisc a couple of hours ago, and I
suggested that you learn a programming language. Was that a bad idea?
Didn't you like the hints you were given by another poster either?
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
Re: file processing
am 23.04.2006 09:37:39 von Joe Smith
amit_h123@yahoo.co.in wrote:
> I am all lost in removing and accessing the spaces in the text file but
That's super trivial - its one of the homework assignments given on
the first day on any respectable Perl class.
@words = split; # Where $_ contains the line of text
> And after that i need to construct the hash tables for the unique word
$unigram{$_}++ foreach @words;
> unigrams (one word at a time), a hash for bigrams (2 words at a time)
> and same as for 3 words.
my($prev1,$prev2) = ('','');
while (<>) {
@words = split;
foreach my $word (@words) {
$unigram{$word}++;
$bigram{"$prev1 $word}++;
$trigram{"$prev2 $prev1 $word}++;
$prev2 = $prev1;
$prev1 = $word;
}
}
So what's the problem? It almost sounds as if you never heard of
the split() function, or how it works when given no arguments.
-Joe
Re: file processing
am 23.04.2006 14:32:05 von amit_h123
Hi,
thanks for the help. I knew the split but nt without the arguments.
This really helped.
Re: file processing
am 23.04.2006 15:19:02 von amit_h123
hello ,
But i still have one problem. Its like is there a way to access the
bigram hash values on the basis of trigram since i have to calculate
the value as :
for each key in trigarm: I have to do the following thing.
trigram{ hello how are} / bigram{hello how}
how do i access these values simultaneuosly..
any suggestions
Re: file processing
am 23.04.2006 15:56:48 von Matt Garrish
wrote in message
news:1145798342.838627.116070@i39g2000cwa.googlegroups.com.. .
> hello ,
> But i still have one problem. Its like is there a way to access the
> bigram hash values on the basis of trigram since i have to calculate
> the value as :
>
> for each key in trigarm: I have to do the following thing.
> trigram{ hello how are} / bigram{hello how}
>
> how do i access these values simultaneuosly..
> any suggestions
>
I suggest you learn to quote some context when posting so people have some
idea what you're talking about. Usenet is not a bulletin board, even if
*you* happen to be using google groups and see it that way.
Matt