file processing

am 23.04.2006 06:03:18 von amit_h123

Hello all ,
I have large text filw with lots of spaces and newline chararters in
it, which i want to remove.
And after that i need to construct the hash tables for the unique word
which are present in the file. Its like i need the hash for only
unigrams (one word at a time), a hash for bigrams (2 words at a time)
and same as for 3 words.
I am all lost in removing and accessing the spaces in the text file but

am not bale to access the each word at a time.
Just a simple example of what i need to do is:

if my text in file is :

hello how are you all hello how are.

so my unigrams will be like:
hello 2
how 2
are 2
you 1...

bigrams will be
hello how 2
how are 2
are you 1
you all 1

trigrams
hello how are 2
how are you 1
are you all 1
.....so on

Can anyone help me with this code.
-thanks

Re: file processing

am 23.04.2006 09:36:54 von Gunnar Hjalmarsson

amit_h123@yahoo.co.in wrote (in alt.perl):
> Hello all ,
> I have large text filw ...

You posted the same job spec. in clpmisc a couple of hours ago, and I
suggested that you learn a programming language. Was that a bad idea?

Didn't you like the hints you were given by another poster either?

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Re: file processing

am 23.04.2006 09:36:54 von Gunnar Hjalmarsson

Re: file processing

am 23.04.2006 09:37:39 von Joe Smith

amit_h123@yahoo.co.in wrote:
> I am all lost in removing and accessing the spaces in the text file but

That's super trivial - its one of the homework assignments given on
the first day on any respectable Perl class.

@words = split; # Where $_ contains the line of text

> And after that i need to construct the hash tables for the unique word

$unigram{$_}++ foreach @words;

> unigrams (one word at a time), a hash for bigrams (2 words at a time)
> and same as for 3 words.

my($prev1,$prev2) = ('','');
while (<>) {
@words = split;
foreach my $word (@words) {
$unigram{$word}++;
$bigram{"$prev1 $word}++;
$trigram{"$prev2 $prev1 $word}++;
$prev2 = $prev1;
$prev1 = $word;
}
}

So what's the problem? It almost sounds as if you never heard of
the split() function, or how it works when given no arguments.

-Joe

Re: file processing

am 23.04.2006 14:32:05 von amit_h123

Hi,
thanks for the help. I knew the split but nt without the arguments.
This really helped.

Re: file processing

am 23.04.2006 15:19:02 von amit_h123

hello ,
But i still have one problem. Its like is there a way to access the
bigram hash values on the basis of trigram since i have to calculate
the value as :

for each key in trigarm: I have to do the following thing.
trigram{ hello how are} / bigram{hello how}

how do i access these values simultaneuosly..
any suggestions

Re: file processing

am 23.04.2006 15:56:48 von Matt Garrish

wrote in message
news:1145798342.838627.116070@i39g2000cwa.googlegroups.com.. .
> hello ,
> But i still have one problem. Its like is there a way to access the
> bigram hash values on the basis of trigram since i have to calculate
> the value as :
>
> for each key in trigarm: I have to do the following thing.
> trigram{ hello how are} / bigram{hello how}
>
> how do i access these values simultaneuosly..
> any suggestions
>

I suggest you learn to quote some context when posting so people have some
idea what you're talking about. Usenet is not a bulletin board, even if
*you* happen to be using google groups and see it that way.

Matt