Perl CGI efficiency

am 14.06.2005 12:13:10 von Justin C

We have a website hosted by our ISP, within the site is a gallery which
changes daily (additions and deletions) and is growing. There are a
little over 2000 images so far, the pages that show the images are
generated by a perl script, the information taken from a CSV file which
also contains details about each image, details which are shown on the
page with the image. The CSV file is currently almost 200k, as I don't
host the site myself and the site is not extremely busy this file has to
be loaded each time an image is called (having no way to persuade it to
stay in memory).

What is the most efficient way of accessing the CSV data? It's not
necessary that it stays in CSV format, I can hack it with just about
anything as long as it's scriptable (bash, Perl, etc).

I did think of creating a separate index, just the barest minimum of
data to identify the image and, maybe, a line number referring to it's
location in the main CSV file.

Any suggestions on this will be gratefully received - I'm not trying to
reduce the loading for my ISP (though nor am I looking to increase it),
I just want to speed things up as much as possible for visitors to the
site.

Thank you for your replies.

Oh, if this has been discussed to death before I apologise. If you can
just let me know what sort of phrases I should be searching on I'll take
myself off to Google.

Justin.

--
Justin C by the sea.

Re: Perl CGI efficiency

am 14.06.2005 18:33:03 von Jim Gibson

In article , Justin
C wrote:

[website CGI program using 200K CSV file]

> What is the most efficient way of accessing the CSV data? It's not
> necessary that it stays in CSV format, I can hack it with just about
> anything as long as it's scriptable (bash, Perl, etc).
>
> I did think of creating a separate index, just the barest minimum of
> data to identify the image and, maybe, a line number referring to it's
> location in the main CSV file.

I would say an index is a good idea. Another idea would be to break up
the large file into a number of smaller files and use some sort of
hashing algorithm that finds the correct file from the image name (or
whatever identifier you use to find the right image). Pick a
randomizing hash alorithm so all of the smaller files have about equal
probability of containging the image and are therefore all about the
same size. Of course, these two methods may be combined for even
greater speed.

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---

Re: Perl CGI efficiency

am 14.06.2005 21:23:26 von Justin C

On 2005-06-14, Jim Gibson wrote:
> In article , Justin
> C wrote:
>
>
> [website CGI program using 200K CSV file]
>
>> What is the most efficient way of accessing the CSV data? It's not
>> necessary that it stays in CSV format, I can hack it with just about
>> anything as long as it's scriptable (bash, Perl, etc).
>>
>> I did think of creating a separate index, just the barest minimum of
>> data to identify the image and, maybe, a line number referring to it's
>> location in the main CSV file.
>
> I would say an index is a good idea. Another idea would be to break up
> the large file into a number of smaller files and use some sort of
> hashing algorithm that finds the correct file from the image name (or
> whatever identifier you use to find the right image). Pick a
> randomizing hash alorithm so all of the smaller files have about equal
> probability of containging the image and are therefore all about the
> same size. Of course, these two methods may be combined for even
> greater speed.

Yeah, I can see the logic in splitting the file... and I can see how to
do that. I don't, however, know what a hashing algorithm is...

"A hashing algorithm takes a variable length data message and creates
a fixed size message digest."

Hmmm... OK, I think I understand that, IIRC there's something about this
in the Llama book - but that's on the desk at work. Randomizing hash
algorithms - now you really are losing me. Could you explain in more
detail or point me at some web resources that'll explain?

Thank you for your reply.

Justin.

--
Justin C by the sea.

Re: Perl CGI efficiency

am 14.06.2005 21:24:03 von brian d foy

In article , Justin
C wrote:

> We have a website hosted by our ISP, within the site is a gallery which
> changes daily (additions and deletions) and is growing.

If the data only changes once a day, simply churn through it to update
a bunch of static pages.

Otherwise, considering caching the output. Your trouble isn't so much
that you have a big database but that you're doing the same thing
more than once when you don't have to.

Good luck :)

Re: Perl CGI efficiency

am 15.06.2005 11:27:16 von Justin C

On 2005-06-14, _brian_d_foy wrote:
> In article , Justin
> C wrote:
>
>> We have a website hosted by our ISP, within the site is a gallery which
>> changes daily (additions and deletions) and is growing.
>
> If the data only changes once a day, simply churn through it to update
> a bunch of static pages.

This is what we did in the early days when there were fewer images
however, each page is linked to the "previous" and "next" and new images
don't just get tagged on the end, they are insterted alphabetically.
This would mean uploading all 2000+ pages everyday, at the moment we
just upload the new images and csv file and delete any old images.

> Otherwise, considering caching the output. Your trouble isn't so much
> that you have a big database but that you're doing the same thing
> more than once when you don't have to.

The problem here is I don't have control over what the server chooses to
cache, the site is hosted for us by our ISP.

Thank you for your comments.

Justin.

--
Justin C by the sea.

Re: Perl CGI efficiency

am 15.06.2005 16:49:03 von brian d foy

In article , Justin
C wrote:

> On 2005-06-14, _brian_d_foy wrote:
> > In article , Justin
> > C wrote:

> >> We have a website hosted by our ISP, within the site is a gallery which
> >> changes daily (additions and deletions) and is growing.

> > If the data only changes once a day, simply churn through it to update
> > a bunch of static pages.

> This is what we did in the early days when there were fewer images
> however, each page is linked to the "previous" and "next" and new images
> don't just get tagged on the end, they are insterted alphabetically.
> This would mean uploading all 2000+ pages everyday,

You only have to upload the pages that change.

> > Otherwise, considering caching the output. Your trouble isn't so much
> > that you have a big database but that you're doing the same thing
> > more than once when you don't have to.

> The problem here is I don't have control over what the server chooses to
> cache, the site is hosted for us by our ISP.

You can build your own caching mechanism. Once the CGI script creates
a new page, it saves the result. When it onces that same page again, it
grabs the static page it saved.

It's up to you though. Figuring out some odd or tricky hashing thing
is just going to lead you further down the road you're trying to get
off of.

Re: Perl CGI efficiency

am 16.06.2005 16:23:21 von Justin C

On 2005-06-15, _brian_d_foy wrote:
> In article , Justin
> C wrote:
>
>> On 2005-06-14, _brian_d_foy wrote:
>> > In article , Justin
>> > C wrote:
>
>> >> We have a website hosted by our ISP, within the site is a gallery which
>> >> changes daily (additions and deletions) and is growing.
>
>> > If the data only changes once a day, simply churn through it to update
>> > a bunch of static pages.
>
>> This is what we did in the early days when there were fewer images
>> however, each page is linked to the "previous" and "next" and new images
>> don't just get tagged on the end, they are insterted alphabetically.
>> This would mean uploading all 2000+ pages everyday,
>
> You only have to upload the pages that change.

That is true, where there is a new page it's only that page and the
previous and next for that that have to be uploaded and, for deletions
it's the ones each side.

Hmmm... You've set the old grey matter buzzing now.

Justin.

--
Justin C by the sea.