SpellCheck in perl

SpellCheck in perl

am 18.04.2006 11:20:37 von SRIKANTH

Hi,



We need to check the spelling of a word which is actually a Domain
name. For example we have to check the word " onlinetradeing ".
When checked with the spell checkers we are getting the words which are
unrelated such as on, obliterating, incinerating, intruding etc. But
exactly what we want was " online trading ". So we would like to
have the word to be split into phrases and check the spelling too. The
normal spell checkers are just checking the words in the dictionary but
not splitting the word into phrases.



Regards
L.Srikanth.

Re: SpellCheck in perl

am 18.04.2006 20:48:13 von TTK Ciar

Once upon a time, "Srikanth" said:
>
>We need to check the spelling of a word which is actually a Domain
>name. For example we have to check the word " onlinetradeing ".
>When checked with the spell checkers we are getting the words which are
>unrelated such as on, obliterating, incinerating, intruding etc. But
>exactly what we want was " online trading ". So we would like to
>have the word to be split into phrases and check the spelling too. The
>normal spell checkers are just checking the words in the dictionary but
>not splitting the word into phrases.

I think this is the wrong newsgroup for this kind of request. You
should probably take this kind of request to comp.lang.perl.misc.

That aside, it sounds like you need to write a little code that
tries to match the first N characters of your domain names against
words in your dictionary, and for each match try the same against
the remainder of the domain name (after the matched word), and so
on until some combination of matches matches the entire phrase.

Perhaps something like this (warning, untested code):

my %DICT = (); # initialize this with dictionary words

sub matcher
{
my ( $phrase, @components ) = @_;
my $plen = length ( $phrase );
for ( my $i = $plen; $i > 0; $i-- ) # try to match more, first
{
my $frag = substr ( $phrase, 0, $i );
next unless ( defined($DICT{$frag}) );
return ( "MATCH FOUND", @components, $frag ) if ( $i == $plen );
push ( @components, $frag );
return ( matcher ( substr ( $phrase, $i+1 ), @components ) );
}
return ( "NO MATCH POSSIBLE", @components, $phrase );
}

my ( $result, @word_list ) = matcher ( "onlinetrading" );
# $result should now be "MATCH FOUND"
# @word_list should now be ( "online", "trading" )

A problem with this "greedy" approach is that a subphrase might
match too much, rendering the reamining fragment unmatchable, for
instance matcher("maileditorial") would fail to parse the entire
phrase if "mailed" were in the dictionary. The alternative would
be to build up a list of intermediate results, for each substring
that matched some word in the dictionary, and call matcher() on
each component of that list iteratively. This would explore all
possible matches.

Good luck!
-- TTK

Re: SpellCheck in perl

am 19.04.2006 05:09:33 von SRIKANTH

Thanks for the idea....i already tried this and as you said i got lot
of suggesting words which is a big problem to handle all those word
lists and find best suggesting words. But Google is one example of what
we wanted but unfortunately the code is unreachable for us to do this
kind.

Re: SpellCheck in perl

am 20.04.2006 06:58:01 von TTK Ciar

Once upon a time, "Srikanth" said:
>
>Thanks for the idea....i already tried this and as you said i got lot
>of suggesting words which is a big problem to handle all those word
>lists and find best suggesting words.

I'm not sure what you mean. Do you need a word list? Word lists
are often called "lemmas". I have a pretty good one left over from
an AI project which has 247266 words in it that you can use. It is
available for download at:
http://aux.ciar.org/ttk/lemma.ttkciar.01.txt

This file is text, and has a number and a word on each line,
separated by a tab. The number is the relative frequency of the
word in the domain of the original project. If you can't use the
frequency, then it's pretty easy for you to strip it out.

>But Google is one example of what
>we wanted but unfortunately the code is unreachable for us to do this
>kind.

I have no idea what this means. Can you try saying it in a
different way?

Good luck,
-- TTK

Re: SpellCheck in perl

am 20.04.2006 13:45:06 von SRIKANTH

Thanks...
I will explain my problem....
I am working on a spell checker which will input wrongly spelt keywords
(only keywords not multiple keywords or Text) and suggest some correct
words. For example if i entered "tradeing" my spell checker suggesting
that keyword should be "trading". But if I try to Spellcheck a compound
word with out delimiter like "onlinetradeing" which is wrongly
spelt...it's suggesting "unlaundered" which is irrelavant. Its not
recognizing onlinetradeing as "online trading". If you want another
example for this kind..."virtaulflowers" which should be "Virtual
Flowers".
If you have any idea plz let me know....

Thanks for replying...

Regards,
Srikanth.

Re: SpellCheck in perl

am 20.04.2006 23:02:11 von rvtol+news

Srikanth schreef:
> Thanks...
> I will explain my problem....
> I am working on a spell checker which will input wrongly spelt
> keywords (only keywords not multiple keywords or Text) and suggest
> some correct words. For example if i entered "tradeing" my spell
> checker suggesting that keyword should be "trading". But if I try to
> Spellcheck a compound word with out delimiter like "onlinetradeing"
> which is wrongly spelt...it's suggesting "unlaundered" which is
> irrelavant. Its not recognizing onlinetradeing as "online trading".
> If you want another example for this kind..."virtaulflowers" which
> should be "Virtual Flowers".
> If you have any idea plz let me know....
>
> Thanks for replying...


> NNTP-Posting-Host: 202.63.122.130
http://cbl.abuseat.org/lookup.cgi?ip=202.63.122.130

This same question was asked by you in news:comp.lang.perl.misc and has
already grown a thread there. You were already told that you shouldn't
multi-post. Now you do it again. Bye.

--
Affijn, Ruud

"Gewoon is een tijger."

Re: SpellCheck in perl

am 25.04.2006 06:43:10 von SRIKANTH

Thanks Ruud....All are giving some help regarding this spell check But
U have given far better help for me...This is the way of helping
people....Right?