related pages ...

am 08.11.2007 22:08:34 von Animesh Kumar

Hello All:

While the "related-pages" idea is common, I wasn't sure how to implement
it. Here is the problem:

1) Let's say my website has 100 or 1000 or 10000 articles.
2) Each article has bunch of keywords (e.g., "php, mysql, database,
navigation")
3) Two articles j and k have l(j, k) common keywords.
4) Each article j should display a few related articles, drawn randomly
from articles having common keywords.
5) The method should be robust to new entries (in keywords, as well as
articles).

I wonder if there is an efficient solution around this problem.

Best regards,
Animesh

Re: related pages ...

am 08.11.2007 22:57:38 von Jerry Stuckle

Animesh K wrote:
> Hello All:
>
> While the "related-pages" idea is common, I wasn't sure how to implement
> it. Here is the problem:
>
> 1) Let's say my website has 100 or 1000 or 10000 articles.
> 2) Each article has bunch of keywords (e.g., "php, mysql, database,
> navigation")
> 3) Two articles j and k have l(j, k) common keywords.
> 4) Each article j should display a few related articles, drawn randomly
> from articles having common keywords.
> 5) The method should be robust to new entries (in keywords, as well as
> articles).
>
> I wonder if there is an efficient solution around this problem.
>
> Best regards,
> Animesh
>

That's not easy. Are you keeping the articles in a database or text
files? If the former, you can search the database.

But the real problem here is defining what your keywords are. And
keywords might not be enough. For instance, if you have an article on
flowers, 'rose' might be a keyword. But you wouldn't call an article
about how the 'space shuttle rose into the sky' as related. So an
automated solution isn't great.

Maybe just scan the articles as they come in and pick out certain
keywords, similar to metatags in html. Then build a database of keywords.

Or, do it automatically and come back and modify the list manually.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: related pages ...

am 09.11.2007 00:18:41 von Animesh Kumar

Jerry Stuckle wrote:

>>
>
> That's not easy. Are you keeping the articles in a database or text
> files? If the former, you can search the database.
>

To keep the problem simpler, let's assume that each article has
tags/authors/topic and it is stored in database.

One can view this as a graph-theoretic problem (with the graph being
computed by php and stored in a database). But doing it in an efficient
way would be interesting.

Scanning the article for probable keywords is the next (and much harder)
step :)

Thanks for answering,
Animesh

Re: related pages ...

am 09.11.2007 01:12:53 von Jerry Stuckle

Animesh K wrote:
> Jerry Stuckle wrote:
>
>>>
>>
>> That's not easy. Are you keeping the articles in a database or text
>> files? If the former, you can search the database.
>>
>
> To keep the problem simpler, let's assume that each article has
> tags/authors/topic and it is stored in database.
>
> One can view this as a graph-theoretic problem (with the graph being
> computed by php and stored in a database). But doing it in an efficient
> way would be interesting.
>
> Scanning the article for probable keywords is the next (and much harder)
> step :)
>
>
>
> Thanks for answering,
> Animesh
>

OK, then you need to be in a database newsgroup. How to efficiently
search a database is not a PHP problem.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: related pages ...

am 09.11.2007 02:13:52 von Animesh Kumar

Jerry Stuckle wrote:
> Animesh K wrote:
>> Jerry Stuckle wrote:
>>
>>>>
>>>
>>> That's not easy. Are you keeping the articles in a database or text
>>> files? If the former, you can search the database.
>>>
>>
>> To keep the problem simpler, let's assume that each article has
>> tags/authors/topic and it is stored in database.
>>
>> One can view this as a graph-theoretic problem (with the graph being
>> computed by php and stored in a database). But doing it in an
>> efficient way would be interesting.
>>
>> Scanning the article for probable keywords is the next (and much
>> harder) step :)
>>
>>
>>
>> Thanks for answering,
>> Animesh
>>
>
> OK, then you need to be in a database newsgroup. How to efficiently
> search a database is not a PHP problem.
>

It is neither a database problem, nor a php problem. It is an
algorithmic problem, but it *is* implemented by a few websites running
scripts similar to Php.

I asked here because someone may be knowing by experience.

Thanks again,
Animesh

Re: related pages ...

am 09.11.2007 02:38:46 von Jerry Stuckle

Animesh K wrote:
> Jerry Stuckle wrote:
>> Animesh K wrote:
>>> Jerry Stuckle wrote:
>>>
>>>>>
>>>>
>>>> That's not easy. Are you keeping the articles in a database or text
>>>> files? If the former, you can search the database.
>>>>
>>>
>>> To keep the problem simpler, let's assume that each article has
>>> tags/authors/topic and it is stored in database.
>>>
>>> One can view this as a graph-theoretic problem (with the graph being
>>> computed by php and stored in a database). But doing it in an
>>> efficient way would be interesting.
>>>
>>> Scanning the article for probable keywords is the next (and much
>>> harder) step :)
>>>
>>>
>>>
>>> Thanks for answering,
>>> Animesh
>>>
>>
>> OK, then you need to be in a database newsgroup. How to efficiently
>> search a database is not a PHP problem.
>>
>
> It is neither a database problem, nor a php problem. It is an
> algorithmic problem, but it *is* implemented by a few websites running
> scripts similar to Php.
>
> I asked here because someone may be knowing by experience.
>
> Thanks again,
> Animesh
>

And this group is for PHP coding problems, not esoteric algorithms.

So this isn't the right group, and now you're saying a database group
isn't the right one.

Guess it's off to the library for you.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: related pages ...

am 09.11.2007 03:22:56 von Steve

"Jerry Stuckle" wrote in message
news:Y6mdnQo6xrC1J67anZ2dnUVZ_tXinZ2d@comcast.com...
> Animesh K wrote:
>> Jerry Stuckle wrote:
>>> Animesh K wrote:
>>>> Jerry Stuckle wrote:
>>>>
>>>>>>
>>>>>
>>>>> That's not easy. Are you keeping the articles in a database or text
>>>>> files? If the former, you can search the database.
>>>>>
>>>>
>>>> To keep the problem simpler, let's assume that each article has
>>>> tags/authors/topic and it is stored in database.
>>>>
>>>> One can view this as a graph-theoretic problem (with the graph being
>>>> computed by php and stored in a database). But doing it in an efficient
>>>> way would be interesting.
>>>>
>>>> Scanning the article for probable keywords is the next (and much
>>>> harder) step :)
>>>>
>>>>
>>>>
>>>> Thanks for answering,
>>>> Animesh
>>>>
>>>
>>> OK, then you need to be in a database newsgroup. How to efficiently
>>> search a database is not a PHP problem.
>>>
>>
>> It is neither a database problem, nor a php problem. It is an algorithmic
>> problem, but it *is* implemented by a few websites running scripts
>> similar to Php.
>>
>> I asked here because someone may be knowing by experience.
>>
>> Thanks again,
>> Animesh
>>
>
> And this group is for PHP coding problems, not esoteric algorithms.
>
> So this isn't the right group, and now you're saying a database group
> isn't the right one.
>
> Guess it's off to the library for you.

what is your major malfunction today, jerry?

Re: related pages ...

am 09.11.2007 03:36:12 von Steve

"Animesh K" wrote in message
news:fh0c8g$1u1s$1@agate.berkeley.edu...
> Jerry Stuckle wrote:
>> Animesh K wrote:
>>> Jerry Stuckle wrote:
>>>
>>>>>
>>>>
>>>> That's not easy. Are you keeping the articles in a database or text
>>>> files? If the former, you can search the database.
>>>>
>>>
>>> To keep the problem simpler, let's assume that each article has
>>> tags/authors/topic and it is stored in database.
>>>
>>> One can view this as a graph-theoretic problem (with the graph being
>>> computed by php and stored in a database). But doing it in an efficient
>>> way would be interesting.
>>>
>>> Scanning the article for probable keywords is the next (and much harder)
>>> step :)

why not approach it on a curve rather than a linear graph, without defining
(manually) specific words that should identify the page. that may be what
you're talking about anyway. in that case, it should be less of a big deal
than you think.

if you parse the text of the pages, exclude common words (like adjectives,
articles, and verbs), and reduce the page content to nouns essentially, you
can then give rank to each one based on occurance. you could also assist
yourself in this process by creating a mapping table. in that table, you
could define certain jargon that will be found in, or unique to, your site.
that would better correlate the ranking that i just described. you could
also define the rank in other ways like the 'common-ness' of the words left
in the reduced content. 'theory' is not a very common term in most settings,
so, it may need to be seen as a more predominate descriptor of what the page
is about. make sense?

that's a content based way to rank similarities between pages. as for tags,
authors, and topics? well, that's pretty specific and less guessing has to
be done.

anyway, that's just an initial theoric approach to retaining abstractness
without having to know what any one page is about - requiring you to read
the page and manually creating the relationships, i mean.

what would also be helpful for you to do is to look at case studies done by
web crawlers and search engines. there have been a terrible amount written
about what google is doing that makes them so successful compared to others.
i mean specific tactics and algorythms they use...not just conceptual stuff.
ironically, you can find these by googling google. :)

hth,

me

Re: related pages ...

am 09.11.2007 04:17:32 von Jerry Stuckle

Steve wrote:
> "Jerry Stuckle" wrote in message
> news:Y6mdnQo6xrC1J67anZ2dnUVZ_tXinZ2d@comcast.com...
>> Animesh K wrote:
>>> Jerry Stuckle wrote:
>>>> Animesh K wrote:
>>>>> Jerry Stuckle wrote:
>>>>>
>>>>>> That's not easy. Are you keeping the articles in a database or text
>>>>>> files? If the former, you can search the database.
>>>>>>
>>>>> To keep the problem simpler, let's assume that each article has
>>>>> tags/authors/topic and it is stored in database.
>>>>>
>>>>> One can view this as a graph-theoretic problem (with the graph being
>>>>> computed by php and stored in a database). But doing it in an efficient
>>>>> way would be interesting.
>>>>>
>>>>> Scanning the article for probable keywords is the next (and much
>>>>> harder) step :)
>>>>>
>>>>>
>>>>>
>>>>> Thanks for answering,
>>>>> Animesh
>>>>>
>>>> OK, then you need to be in a database newsgroup. How to efficiently
>>>> search a database is not a PHP problem.
>>>>
>>> It is neither a database problem, nor a php problem. It is an algorithmic
>>> problem, but it *is* implemented by a few websites running scripts
>>> similar to Php.
>>>
>>> I asked here because someone may be knowing by experience.
>>>
>>> Thanks again,
>>> Animesh
>>>
>> And this group is for PHP coding problems, not esoteric algorithms.
>>
>> So this isn't the right group, and now you're saying a database group
>> isn't the right one.
>>
>> Guess it's off to the library for you.
>
> what is your major malfunction today, jerry?
>
>
>

Trolls like you.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: related pages ...

am 09.11.2007 06:34:14 von Steve

"Jerry Stuckle" wrote in message
news:hZCdnaHTjuvQTK7anZ2dnUVZ_tjinZ2d@comcast.com...
> Steve wrote:
>> "Jerry Stuckle" wrote in message
>> news:Y6mdnQo6xrC1J67anZ2dnUVZ_tXinZ2d@comcast.com...
>>> Animesh K wrote:
>>>> Jerry Stuckle wrote:
>>>>> Animesh K wrote:
>>>>>> Jerry Stuckle wrote:
>>>>>>
>>>>>>> That's not easy. Are you keeping the articles in a database or text
>>>>>>> files? If the former, you can search the database.
>>>>>>>
>>>>>> To keep the problem simpler, let's assume that each article has
>>>>>> tags/authors/topic and it is stored in database.
>>>>>>
>>>>>> One can view this as a graph-theoretic problem (with the graph being
>>>>>> computed by php and stored in a database). But doing it in an
>>>>>> efficient way would be interesting.
>>>>>>
>>>>>> Scanning the article for probable keywords is the next (and much
>>>>>> harder) step :)
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks for answering,
>>>>>> Animesh
>>>>>>
>>>>> OK, then you need to be in a database newsgroup. How to efficiently
>>>>> search a database is not a PHP problem.
>>>>>
>>>> It is neither a database problem, nor a php problem. It is an
>>>> algorithmic problem, but it *is* implemented by a few websites running
>>>> scripts similar to Php.
>>>>
>>>> I asked here because someone may be knowing by experience.
>>>>
>>>> Thanks again,
>>>> Animesh
>>>>
>>> And this group is for PHP coding problems, not esoteric algorithms.
>>>
>>> So this isn't the right group, and now you're saying a database group
>>> isn't the right one.
>>>
>>> Guess it's off to the library for you.
>>
>> what is your major malfunction today, jerry?
>
> Trolls like you.

wow. so i'm trolling now? rofl.

Re: related pages ...

am 09.11.2007 06:37:44 von Steve

Re: related pages ...

am 10.11.2007 03:16:35 von skatermatt99

Wordpress blogging software has a good related posts feature.
I can't remember whether it's a plugin or not.

And I see no reason for not discussing algorithms.
PHP is a programming language and can make use of many algorithms.

Maybe you could email Matt Mullenweg or one of the core Wordpress
developers about it.

I think more sites should have this feature, it's very useful.