This question isn't so much a mod_perl question, as it is a question
about building high performance websites with Perl.
We have a large, busy, database application that relates to millions of
photos, which we also need to store and display. We've been keeping the
meta data about the photos in PostgreSQL, and the files on the file
system. (Implemented much like CGI::Uploader ).
This has worked great in terms of performance, but with so much data to
manage, over time we have run into data inconsistency issues between the
file system and the database.
So, I'm asking if anyone has had experience successfully storing photos
(or othe files) directly in database? That would solve the consistency
issue, but may create a performance issue. Perhaps the right kind of
caching layer could solve that.
Re: Any success with storing photos in a database?
am 29.09.2008 22:16:12 von aw
Mark Stosberg wrote:
> This question isn't so much a mod_perl question, as it is a question
> about building high performance websites with Perl.
>
> We have a large, busy, database application that relates to millions of
> photos, which we also need to store and display. We've been keeping the
> meta data about the photos in PostgreSQL, and the files on the file
> system. (Implemented much like CGI::Uploader ).
>
> This has worked great in terms of performance, but with so much data to
> manage, over time we have run into data inconsistency issues between the
> file system and the database.
>
> So, I'm asking if anyone has had experience successfully storing photos
> (or othe files) directly in database? That would solve the consistency
> issue, but may create a performance issue. Perhaps the right kind of
> caching layer could solve that.
>
I am curious about your application, because we have something similar
and similar volumes, not with photos but with documents in general.
The following is an opinion piece.
We have also looked at various data organisations over time.
Regarding storing large objects directly in a database, the one issue is
always that (because of the object sizes), it makes any operation on the
rows of such a database or table very heavy. Imagine having to dump or
reload a table that contains 500,000 "blobs" of 2-3 MB each.
(Don't know about PostgreSQL, but many db systems require a dump and a
reload when you change a table structure).
Or simply take a backup of that table. And you cannot make an "rsync"
of a database as easily as of a filesystem.
It also means that any intermediate buffer (which are often used to
improve retrieval of "nearby" rows) is quickly full, with few rows in it.
(On the other hand, if you just keep thumbnails of a couple of Kb, I
guess it would not matter much.)
Another issue is that it happens that databases get screwed up, and the
likelihood probably increases as you push them to their limits (for
example with very large row sizes). Resolving some inconsistencies
between database rows and files on disk may be no fun, but resolving
inconsistencies within a database may be even less so.
One point I am curious about, is what kind of file structure you use to
store the millions of images on the filesystem. I can't imagine that
you do it really into one flat directory ?
And are you storing the real paths directly in the database ?
To get back to your issue of inconsistency : maybe the best strategy is
just to check for such inconsistencies as early as possible ? For
example, when you add or delete objects, write this information
somewhere in a daily "transactions" file, which is then analysed at
night by some job which checks that everything is really where it is
supposed to be, and lets you know when not.
Regarding efficiency : when you think about it, a modern filesystem is
about the fastest, most efficient in space and most reliable database
system that one can think of, with the lowest overhead, as long as you
know the exact path of an object, and as long as all the directories in
the path are kept to a reasonable size (important). It has its inherent
buffering at various levels, optimised to access files. It has a whole
bunch of utilities to manipulate it; it it shareable, but can be locked
when you need it. It is portable.
It does have one inconvenient : it has a single "key" to access an
object (the path). But for that, you have your database system.
Oh, and I've thought of another advantage, in an Apache/web context : to
send the content of a file to a browser, you can take advantage of the
sendfile() call, which is very efficient. Now if your file is a blob in
a row of a database, you have to read it yourself in memory, and send
it, don't you ?
I've just re-convinced myself not to change our storage structure.
Re: Any success with storing photos in a database?
I'm pretty sure that the consensus is to never actually store the files in
the database. What actual inconsistencies are you seeing that you are
trying to fix?
On Mon, Sep 29, 2008 at 1:00 PM, Mark Stosberg wrote:
>
> This question isn't so much a mod_perl question, as it is a question
> about building high performance websites with Perl.
>
> We have a large, busy, database application that relates to millions of
> photos, which we also need to store and display. We've been keeping the
> meta data about the photos in PostgreSQL, and the files on the file
> system. (Implemented much like CGI::Uploader ).
>
> This has worked great in terms of performance, but with so much data to
> manage, over time we have run into data inconsistency issues between the
> file system and the database.
>
> So, I'm asking if anyone has had experience successfully storing photos
> (or othe files) directly in database? That would solve the consistency
> issue, but may create a performance issue. Perhaps the right kind of
> caching layer could solve that.
>
> Mark
>
> --
> . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Mark Stosberg Principal Developer
> mark@summersault.com Summersault, LLC
> 765-939-9301 ext 202 database driven websites
> . . . . . http://www.summersault.com/ . . . . . . . .
>
>
>
I'm pretty sure that the consensus is to never actually store the files in the database. What actual inconsistencies are you seeing that you are trying to fix?
On Mon, Sep 29, 2008 at 1:00 PM, Mark Stosberg <> wrote:
This question isn't so much a mod_perl question, as it is a question
about building high performance websites with Perl.
We have a large, busy, database application that relates to millions of
photos, which we also need to store and display. We've been keeping the
meta data about the photos in PostgreSQL, and the files on the file
system. (Implemented much like CGI::Uploader ).
This has worked great in terms of performance, but with so much data to
manage, over time we have run into data inconsistency issues between the
file system and the database.
So, I'm asking if anyone has had experience successfully storing photos
(or othe files) directly in database? That would solve the consistency
issue, but may create a performance issue. Perhaps the right kind of
Re: Any success with storing photos in a database?
am 29.09.2008 22:49:48 von Frank Wiles
On Mon, 29 Sep 2008 15:00:41 -0400
Mark Stosberg wrote:
> This question isn't so much a mod_perl question, as it is a question
> about building high performance websites with Perl.
>
> We have a large, busy, database application that relates to millions
> of photos, which we also need to store and display. We've been
> keeping the meta data about the photos in PostgreSQL, and the files
> on the file system. (Implemented much like CGI::Uploader ).
>
> This has worked great in terms of performance, but with so much data
> to manage, over time we have run into data inconsistency issues
> between the file system and the database.
>
> So, I'm asking if anyone has had experience successfully storing
> photos (or othe files) directly in database? That would solve the
> consistency issue, but may create a performance issue. Perhaps the
> right kind of caching layer could solve that.
Actually you're already doing it correctly. Andre already mentioned
many of the pitfalls of trying to store large binary data in a database,
so I won't rehash them again.
The only issue you seem to be having is the inconsistency. That issue
is going to be much easier to solve than trying to scale by putting the
photos in the database.
Usually people just make sure inserts/updates to the photo table is
done in a transaction and if that transaction succeeds or fails, it
does the appropriate write/delete on the file system.
But since you're using PostgreSQL ( my favorite database and a large
part of my consulting practice ) you could even go so far as to write a
few pl/perl stored procedures to handle keeping the file system in sync
with the database.
-------------------------------------------------------
Frank Wiles, Revolution Systems, LLC.
Personal : frank@wiles.org http://www.wiles.org
Work : frank@revsys.com http://www.revsys.com
Re: Any success with storing photos in a database?
am 29.09.2008 22:52:03 von Frank Wiles
On Mon, 29 Sep 2008 22:16:12 +0200
Andr=E9 Warnier wrote:
> We have also looked at various data organisations over time.
> Regarding storing large objects directly in a database, the one issue
> is always that (because of the object sizes), it makes any operation
> on the rows of such a database or table very heavy. Imagine having
> to dump or reload a table that contains 500,000 "blobs" of 2-3 MB
> each. (Don't know about PostgreSQL, but many db systems require a
> dump and a reload when you change a table structure).
FYI PostgreSQL doesn't require a dump/reload when altering the table
structure.=20
> Oh, and I've thought of another advantage, in an Apache/web context :
> to send the content of a file to a browser, you can take advantage of
> the sendfile() call, which is very efficient. Now if your file is a
> blob in a row of a database, you have to read it yourself in memory,
> and send it, don't you ?
That is one advantage, but I'm going to take it a step further and say
by having the files outside of the database you don't even need Apache,
but should in fact use a lighter weight web server like tux, nginx, or
lighttpd for serving up static media like that.=20
-------------------------------------------------------
Frank Wiles, Revolution Systems, LLC.=20
Personal : frank@wiles.org http://www.wiles.org
Work : frank@revsys.com http://www.revsys.com=20
Re: Any success with storing photos in a database?
am 29.09.2008 23:04:29 von Perrin Harkins
On Mon, Sep 29, 2008 at 3:00 PM, Mark Stosberg wrote:
> We have a large, busy, database application that relates to millions of
> photos, which we also need to store and display.
Have you read Cal Henderson's book about how Flickr works? It's a bit
extreme, but interesting. A smaller version of the "many photos"
problem is LiveJournal, who use their custom file storage API
(MogileFS) and serve the files with their own web server (perlbal).
Definitely another extreme solution that should probably be considered
a last resort after trying to make easier stuff work, but it sounds
like the easier stuff is not working for you.
- Perrin
Re: Any success with storing photos in a database?
am 29.09.2008 23:04:37 von Cosimo Streppone
In data 29 settembre 2008 alle ore 21:00:41, Mark Stosberg
ha scritto:
> This question isn't so much a mod_perl question, as it is a question
> about building high performance websites with Perl.
>
> We have a large, busy, database application that relates to millions of
> photos, which we also need to store and display. We've been keeping the
> meta data about the photos in PostgreSQL, and the files on the file
> system. (Implemented much like CGI::Uploader ).
We have:
a) ~150,000 avatar tiny pictures (50x50);
b) ~300,000 user photos (320x240 originals), but
also available in 4 more sizes;
c) tens of millions of album pictures in original
and thumbnail sizes;
We're using MySQL 5.0 with MyISAM storage engine. Yes.
Until recently, a) & b) were stored into a MySQL blob field. Yes.
Did you hear me screaming? :-)
Problems I found when I started working here:
- our mod_perl backends were serving 20-40% of picture requests,
which is completely insane;
- our picture-serving code was fetching the picture from the database,
eventually scaling it on the fly (!), saving it in memcached
and $r->print()ing it out down the wire.
That's completely insane. The scaling even disabled caching.
- when you update a picture metadata (this is MySQL), you _LOCK_
the _ENTIRE_ table with hundreds of thousands images.
- fetching from a blob field in MySQL is expensive.
Now instead:
- avatars were our first experiment. They are stored as static resources
with a hierachical and balanced filesystem structure
(using digests and splitting them up);
We completely removed the caching layer from our mod_perl code,
because caching happens directly in the browser for static
resources;
We managed to move away 500,000 req/day from mod_perl to static
HTTP servers.
- User photos are coming. We wrote a nice application layer that
can upload a single resource to many pools of static servers and
in different sizes with automatic thumbnailing. The filesystem path
scheme can be defined into the resource perl class, but basically
is the same digest + splitting, like in:
http://static.myapp.com/pool1/a1/b2c/d3f4g5h6.../123456_m.jp g
We are mass-exporting pictures from the database blobs to our
filesystems via DAV. Using DAV is not the most efficient way but
allows you to attach arbitrary metadata to the filesystem.
We now use that to resolve inconsistencies and "sync" the
metadata in the database.
This should go live with next release. I'll let you know :)
I'm not sure what to do for album pictures. They are already out
of the database, thank god, but the "design" guys now want 3-4
thumbnails even for those pics. Suggestions?
> This has worked great in terms of performance, but with so much data to
> manage, over time we have run into data inconsistency issues between the
> file system and the database.
Can you explain the issues you found?
I'd really like to know, so I'm prepared. :-)
> So, I'm asking if anyone has had experience successfully storing photos
> (or othe files) directly in database? That would solve the consistency
> issue, but may create a performance issue.
Yes, performance issues.
> Perhaps the right kind of caching layer could solve that.
I'm not sure. If you throw caching into the "pics-in-the-db" mess,
IMHO you only make the situation worse.
--
Cosimo
Re: Any success with storing photos in a database?
am 29.09.2008 23:45:05 von James Smith
There are good reasons to store images (especially small ones) in
databases (and with careful management of headers in your mod_perl).
Some of you have missed inherent problems with the file systems
even balanced heirarchical tree - ones in a shared server
environment which can lead to gross efficiencies - in your cases
you may not be doing multiple deletes - but in the examples I work
with it is not the creation and storage which breaks the file
system, but the requirement to clear our old files before filling
up the file system.
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
Re: Any success with storing photos in a database?
am 29.09.2008 23:55:39 von Cosimo Streppone
In data 29 settembre 2008 alle ore 23:45:05, James Smith ha scritto:
> There are good reasons to store images (especially small ones) in
> databases (and with careful management of headers in your mod_perl).
>
> Some of you have missed inherent problems with the file systems
> even balanced heirarchical tree - ones in a shared server
> environment which can lead to gross efficiencies - in your cases
> you may not be doing multiple deletes - but in the examples I work
> with it is not the creation and storage which breaks the file
> system, but the requirement to clear our old files before filling
> up the file system.
If you have "proper" metadata, you can go and delete your files.
In our case, we chose to hash our paths by basically user-id,
so every file owned by a user is in the same folder and
can be deleted without any problems.
Maybe I didn't get your point.
--
Cosimo
Re: Any success with storing photos in a database?
am 30.09.2008 00:07:39 von aw
James Smith wrote:
>
>
> There are good reasons to store images (especially small ones) in
> databases (and with careful management of headers in your mod_perl).
>
> Some of you have missed inherent problems with the file systems
> even balanced heirarchical tree - ones in a shared server
> environment which can lead to gross efficiencies - in your cases
> you may not be doing multiple deletes - but in the examples I work
> with it is not the creation and storage which breaks the file
> system, but the requirement to clear our old files before filling
> up the file system.
>
>
Finally, someone who does not agree with the general line.
That makes it more interesting.
As for the size, the OP indicated that his objects were pictures (photos
I believe), which tend to be in the multi-MB range, and growing as
cameras get more pixels. The documents we handle also, since there is a
notable tendency to inflation of office document sizes, what with
embedded pictures and graphics and such.
I guess indeed it also depends on whether the stored images/documents
are transient, or stay forever. In our case they stay forever, because
they are part of a kind of archive. Once loaded, a document is never
deleted.
That is why we do not just store the original documents in a purely
hierarchical file structure, but have developed a system based on a
notion of "logical volumes", which can be moved without affecting the
link between the database records and the stored objects.
This being said, the organisation in question is still on top of
classical filesystems.
In recent years, the universal support under Unix/Linux of "LVM's" much
simplifies the question of space management, physical location etc.
Disk storage per volume unit is also getting ever larger and cheaper, so
it does not cause us big concern.
At the limit, if a certain document collection is not needed anymore,
the corresponding records can be moved to an archive table or database,
but the documents themselves stay.
Re: Any success with storing photos in a database?
am 30.09.2008 00:09:52 von James Smith
On Mon, 29 Sep 2008, Cosimo Streppone wrote:
> In data 29 settembre 2008 alle ore 23:45:05, James Smith
> ha scritto:
>
>> There are good reasons to store images (especially small ones) in databases
>> (and with careful management of headers in your mod_perl).
>>
>> Some of you have missed inherent problems with the file systems
>> even balanced heirarchical tree - ones in a shared server
>> environment which can lead to gross efficiencies - in your cases
>> you may not be doing multiple deletes - but in the examples I work
>> with it is not the creation and storage which breaks the file
>> system, but the requirement to clear our old files before filling
>> up the file system.
>
> If you have "proper" metadata, you can go and delete your files.
> In our case, we chose to hash our paths by basically user-id,
> so every file owned by a user is in the same folder and
> can be deleted without any problems.
>
We have "proper" meta data - deleting files is in all file systems an
expensive operation, if you have a large number of files to delete,
the overhead of deleting files can become excessive - we produce and
delete anwhere up to and including 1/2 million files per day - and
the deletion is the crippling stage on a journalled file system
> Maybe I didn't get your point.
>
> --
> Cosimo
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
Re: Any success with storing photos in a database?
am 30.09.2008 00:24:07 von Cosimo Streppone
In data 30 settembre 2008 alle ore 00:09:52, James Smith ha scritto:
> On Mon, 29 Sep 2008, Cosimo Streppone wrote:
>
>> In data 29 settembre 2008 alle ore 23:45:05, James Smith
>> ha scritto:
>>
>>> There are good reasons to store images (especially small ones) in
>>> databases (and with careful management of headers in your mod_perl).
>> If you have "proper" metadata, you can go and delete your files.
>
> We have "proper" meta data
Yes, sorry. I was thinking about our case.
> we produce and delete anwhere up to and including 1/2
> million files per day - and
> the deletion is the crippling stage on a journalled file system
I see. Again, our case is very different though.
99,99% is create/add/modify and we almost never delete.
What filesystem/os do you use?
--
Cosimo
Re: Any success with storing photos in a database?
am 30.09.2008 01:11:45 von James Smith
On Tue, 30 Sep 2008, Cosimo Streppone wrote:
> In data 30 settembre 2008 alle ore 00:09:52, James Smith
> ha scritto:
>
>> On Mon, 29 Sep 2008, Cosimo Streppone wrote:
>>
>>> In data 29 settembre 2008 alle ore 23:45:05, James Smith
>>> ha scritto:
>>>
>>>> There are good reasons to store images (especially small ones) in
>>>> databases (and with careful management of headers in your mod_perl).
>
>>> If you have "proper" metadata, you can go and delete your files.
>>
>> We have "proper" meta data
>
> Yes, sorry. I was thinking about our case.
>
>> we produce and delete anwhere up to and including 1/2
>> million files per day - and
>> the deletion is the crippling stage on a journalled file system
>
> I see. Again, our case is very different though.
> 99,99% is create/add/modify and we almost never delete.
>
> What filesystem/os do you use?
It has to be a shared filesystem - so at the moment GPFS/red had; we are
moving over to a memcached system to store and server the temporary images;
otherwise we would be stuck with NFS or Lustre both of which fail
quite badly with small files.. all this is backed by fibre attached
SAN storage
(they really are temporary and can be easily restored)
James
>
> --
> Cosimo
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
Re: Any success with storing photos in a database?
> Message du 29/09/08 23:05
> De : "Perrin Harkins"
> A : "Mark Stosberg"
> Copie à : modperl@perl.apache.org
> Objet : Re: Any success with storing photos in a database?
>
>
> On Mon, Sep 29, 2008 at 3:00 PM, Mark Stosberg wrote:
> > We have a large, busy, database application that relates to millions of
> > photos, which we also need to store and display.
>
> Have you read Cal Henderson's book about how Flickr works? It's a bit
> extreme, but interesting. A smaller version of the "many photos"
> problem is LiveJournal, who use their custom file storage API
> (MogileFS) and serve the files with their own web server (perlbal).
> I had used MogileFS for storing photos, that's a good application.See:
http://www.danga.com/mogilefs/
AFAIK, Perlbal is reverse proxy before the webserver, not a real web server=
..
e=3D"border-left: 2px solid #ff0000; padding-left: 5px; margin-left: 5px">&=
gt; Message du 29/09/08 23:05 > De : "Perrin Harkins"
/>> A : "Mark Stosberg" > Copie à : modperl@p=
erl.apache.org > Objet : Re: Any success with storing photos in a d=
atabase? > > > On Mon, Sep 29, 2008 at 3:00 PM, Ma=
rk Stosberg wrote: > > We have a large, busy, database applicat=
ion that relates to millions of > > photos, which we also need t=
o store and display. > > Have you read Cal Henderson's boo=
k about how Flickr works? It's a bit > extreme, but interesting. =
A smaller version of the "many photos" > problem is LiveJ=
ournal, who use their custom file storage API > (MogileFS) and serv=
e the files with their own web server (perlbal). > I h=
ad used MogileFS for storing photos, that's a good application.See: ht=
tp://www.danga.com/mogilefs/ AFAIK, Perlbal is reverse proxy before th=
e webserver, not a real web server.
Re: Any success with storing photos in a database?
am 30.09.2008 15:11:16 von mpeters
Himanshu wrote:
> A newbie question in the insightful thread. Can serving static files
> and cookie authentication go together?
Yes. You can do this with something like mod_auth_tkt in a proxy web server (very light) and have a
backend mod_perl (or php, or Java, etc) server set the cookie after someone has logged in. Then you
just configure your auth in your proxy for which groups have access to which files.
> Second what
> are the sizes at which one should start moving to files. e.g. how
> about plain text blog entries containing no more than say 5 KBytes.
It really depends on you database. In MySQL using BLOBS won't really cause
problems unless you use queries that involve full table scans or queries that actually pull or
manipulate the BLOBS a lot. So if you have the proper indexes and only pull the BLOBS rarely, then
you should be ok for most uses.
But as far as blogs go, you really shouldn't be pulling the text of the blog out of the DB for every
request. Since it changes very rarely, especially compared to the number of times it's read then
storing it in a "published" state on the filesystem is usually best. And if you still need to add
dynamic bits to the page at run time, then instead of publishing a .html file, you can publish a
template (for Template Toolkit, HTML::Template, etc).
--
Michael Peters
Plus Three, LP
Re: Any success with storing photos in a database?
am 30.09.2008 15:28:44 von Mark Stosberg
> One point I am curious about, is what kind of file structure you use to
> store the millions of images on the filesystem. I can't imagine that
> you do it really into one flat directory ?
Thanks for the response.
We use the 'md5' scheme in CGI::Uploader. From the docs:
"[We] will create three levels of directories based on the first three letters
of the ID's md5 sum. The result may look like this:"
2/0/2/123.jpg"
> And are you storing the real paths directly in the database ?
No, we generate the path based on the unique ID.
You points were all interesting and steer me back in the direction of trying to better
understand and eliminate how inconsistency is creeping into our system.
Re: Any success with storing photos in a database?
am 30.09.2008 15:55:15 von Mark Stosberg
> Usually people just make sure inserts/updates to the photo table is
> done in a transaction and if that transaction succeeds or fails, it
> does the appropriate write/delete on the file system.
I could perhaps be better about "rolling back" file system actions
if a DB transaction files.
I'll be looking more into the cause today. Initial findings suggest that
there was import issue with legacy data long ago, and a significant
issue may not even remain in in the current code.
> But since you're using PostgreSQL ( my favorite database and a large
> part of my consulting practice ) you could even go so far as to write a
> few pl/perl stored procedures to handle keeping the file system in sync
> with the database.
Ah, interesting idea. I'll keep that in mind. I'll have to deal with the
hoop that the image storage server is currently not accessible from the
master PostgreSQL server.
Re: Any success with storing photos in a database?
am 30.09.2008 16:06:26 von Mark Stosberg
On Mon, 29 Sep 2008 17:04:29 -0400
"Perrin Harkins" wrote:
> On Mon, Sep 29, 2008 at 3:00 PM, Mark Stosberg wrote:
> > We have a large, busy, database application that relates to millions of
> > photos, which we also need to store and display.
>
> Have you read Cal Henderson's book about how Flickr works?
It looks familiar, but I don't think I've read it. I've bookmarked it
now. Thanks for the suggestion, Perrin!
If I find anything "interesting" about our inconsistency, I'll report back.
I was just reviewing the code flow of 'store_upload()' in CGI::Uploader:
The key part I see is for the 'update' case where we:
1. Delete old generated files (like Thumbnails)
2. Run the DB update
3. Regenerate the thumbnails.
The DB update failed, we couldn't "undelete" the thumbnails. That design could perhaps could be improved.
Also, I see that this code doesn't use transactions, I think because it expects that the outer application code may wrap this work in a larger action.
With current DBD::Pg and PostgreSQL versions, I could add some "savepoints" here, to make some parts
of this more transactional (like inserting a photo and all of it's thumbnails in the same in one transactions).
The savepoint could would be specific to using the PostgreSQL driver, as I'm not aware that other DBs support the same syntax.
Re: Any success with storing photos in a database?
am 30.09.2008 18:40:43 von Perrin Harkins
On Tue, Sep 30, 2008 at 8:11 AM, Jeff Pang wrote:
> AFAIK, Perlbal is reverse proxy before the webserver, not a real web server.
I don't use it, but it can do auth and serve images. If you read the
presentations about LiveJournal's backend, they explain this.
- Perrin
Re: Any success with storing photos in a database?
am 30.09.2008 19:11:55 von David Nicol
On Tue, Sep 30, 2008 at 11:40 AM, Perrin Harkins wrote:
> I don't use it, but it can do auth and serve images.
as can Apache itself, with appropriate access control. The two steps
(this should not be news to anyone here) are checking the auth then
something like "exec cat $filename" for which modperl is kind of
heavy.
Best to let Apache serve the images from a static directory, after
devising some kind of dynamic .htaccess scheme driven by the smarter
pieces of your system.
Re: Any success with storing photos in a database?
am 30.09.2008 19:24:48 von Perrin Harkins
On Tue, Sep 30, 2008 at 1:11 PM, David Nicol wrote:
> On Tue, Sep 30, 2008 at 11:40 AM, Perrin Harkins wrote:
>> I don't use it, but it can do auth and serve images.
>
> as can Apache itself, with appropriate access control. The two steps
> (this should not be news to anyone here) are checking the auth then
> something like "exec cat $filename" for which modperl is kind of
> heavy.
Perlbal does have some unique features. It can check the auth for an
image with your mod_perl backend and then serve the image itself.
It's built on an event loop model and it has some direct ties to
MogileFS. There's a good feature list here:
http://www.danga.com/perlbal/
I would typically stick with apache and mod_auth_tkt, but for people
who already use MogileFS this must be a pretty attractive setup.
- Perrin
Re: Any success with storing photos in a database? (preventsdouble-submits)
am 15.10.2008 18:31:24 von Mark Stosberg
On Tue, 30 Sep 2008 10:06:26 -0400
Mark Stosberg wrote:
> On Mon, 29 Sep 2008 17:04:29 -0400
> "Perrin Harkins" wrote:
>
> > On Mon, Sep 29, 2008 at 3:00 PM, Mark Stosberg wrote:
> > > We have a large, busy, database application that relates to millions of
> > > photos, which we also need to store and display.
>
> If I find anything "interesting" about our inconsistency, I'll report back.
I said I'd report back if I found out where the inconsistency was appearing, between files
on the file system and meta-data in the database.
So far, I've found the following:
We had a "double submit" bug that allowed a form to be submitted twice when we
weren't fully prepared for that. We are still researching the best practices to
address this a general case. One approach we are considering is change the
submit action on forms with JavaScript, so it disables the submit button, and
then actually submit the form, preventing one kind of double-submission. It
seems like I don't see this approach happening in the wild much, though. I
suspect there is a better solution.
Beyond this, I think we've found the remaining inconsistencies to be happening
at extremely low rates, to the point where it might be worth completely
tracking down the final issue.
In summary, I think the majority advice found on this list holds true: storing
image files on the file system and meta-data in the database is a good way to
go.
Re: Any success with storing photos in a database? (prevents double-submits)
am 15.10.2008 18:41:55 von Perrin Harkins
On Wed, Oct 15, 2008 at 12:31 PM, Mark Stosberg wrote:
> We had a "double submit" bug that allowed a form to be submitted twice when we
> weren't fully prepared for that. We are still researching the best practices to
> address this a general case. One approach we are considering is change the
> submit action on forms with JavaScript, so it disables the submit button, and
> then actually submit the form, preventing one kind of double-submission. It
> seems like I don't see this approach happening in the wild much, though. I
> suspect there is a better solution.
JavaScript is okay, but can be a problem when people hit back
expecting to use the form again and the button is still disabled.
Another approach is a unique ID in the form that you track in the
user's session (i.e. this ID was seen before). If the problem is
large uploads with no feedback until they finish, you can use one of
the upload progress tools.
- Perrin
Re: Any success with storing photos in a database? (preventsdouble-submits)
am 15.10.2008 19:49:59 von Mark Stosberg
On Wed, 15 Oct 2008 12:41:55 -0400
"Perrin Harkins" wrote:
> On Wed, Oct 15, 2008 at 12:31 PM, Mark Stosberg wrote:
> > We had a "double submit" bug that allowed a form to be submitted twice when we
> > weren't fully prepared for that. We are still researching the best practices to
> > address this a general case. One approach we are considering is change the
> > submit action on forms with JavaScript, so it disables the submit button, and
> > then actually submit the form, preventing one kind of double-submission. It
> > seems like I don't see this approach happening in the wild much, though. I
> > suspect there is a better solution.
>
> JavaScript is okay, but can be a problem when people hit back
> expecting to use the form again and the button is still disabled.
Thanks for the response.
That's what I was concerned about. I don't have a sense of how
much this would happen in practice.
> Another approach is a unique ID in the form that you track in the
> user's session (i.e. this ID was seen before). If the problem is
> large uploads with no feedback until they finish, you can use one of
> the upload progress tools.
At one point in the past I did a variation of this where we put the next ID of
a related database sequence in the form, and this would become the new primary
key when inserted, and it would of course not allow the same primary key to be
used twice. That worked, but I realized was open to abuse if a user tweaked the
number to be larger than the sequence. Then, eventually a legitimate user would
eventually be assigned that value by the sequence, and it would fail.
Are their specific modules that you recommend to help with this?
Mark
Re: Any success with storing photos in a database? (prevents double-submits)
am 15.10.2008 19:51:21 von mpeters
Mark Stosberg wrote:
> At one point in the past I did a variation of this where we put the next ID of
> a related database sequence in the form, and this would become the new primary
> key when inserted, and it would of course not allow the same primary key to be
> used twice. That worked, but I realized was open to abuse if a user tweaked the
> number to be larger than the sequence. Then, eventually a legitimate user would
> eventually be assigned that value by the sequence, and it would fail.
>
> Are their specific modules that you recommend to help with this?
You can try a UUID (look at Data::UUID). It's not sequential, but someone could still tweak the form
to create a value that could potentially conflict with another value in the future. But I'd say it's
much less likely than a sequential id.
--
Michael Peters
Plus Three, LP
Re: Any success with storing photos in a database? (prevents double-submits)
am 15.10.2008 20:02:41 von Simon Forster
On 15 Oct 2008, at 18:49, Mark Stosberg wrote:
> On Wed, 15 Oct 2008 12:41:55 -0400
> "Perrin Harkins" wrote:
>
>> On Wed, Oct 15, 2008 at 12:31 PM, Mark Stosberg
>> wrote:
>>> We had a "double submit" bug that allowed a form to be submitted
>>> twice when we
>>> weren't fully prepared for that. We are still researching the best
>>> practices to
>>> address this a general case. One approach we are considering is
>>> change the
>>> submit action on forms with JavaScript, so it disables the submit
>>> button, and
>>> then actually submit the form, preventing one kind of double-
>>> submission. It
>>> seems like I don't see this approach happening in the wild much,
>>> though. I
>>> suspect there is a better solution.
>>
>> JavaScript is okay, but can be a problem when people hit back
>> expecting to use the form again and the button is still disabled.
>
> Thanks for the response.
>
> That's what I was concerned about. I don't have a sense of how
> much this would happen in practice.
Take some elements from the form and search for an add in the past
Re: Any success with storing photos in a database? (preventsdouble-submits)
am 15.10.2008 20:10:12 von Mark Stosberg
On Wed, 15 Oct 2008 13:51:21 -0400
Michael Peters wrote:
> Mark Stosberg wrote:
>
> > At one point in the past I did a variation of this where we put the next ID of
> > a related database sequence in the form, and this would become the new primary
> > key when inserted, and it would of course not allow the same primary key to be
> > used twice. That worked, but I realized was open to abuse if a user tweaked the
> > number to be larger than the sequence. Then, eventually a legitimate user would
> > eventually be assigned that value by the sequence, and it would fail.
> >
> > Are their specific modules that you recommend to help with this?
>
> You can try a UUID (look at Data::UUID). It's not sequential, but someone
> could still tweak the form to create a value that could potentially conflict
> with another value in the future. But I'd say it's much less likely than a
> sequential id.
So how might an implementation look?
We could create table of "form_uuids". Each time a form is submitted, we check
if the UUID we have is has already been used.
If not, we insert the new UUID and proceed.
Every so often, the table could cleaned up via cron, (since we probably don't
care about seeing the same UUID weeks apart, just seconds or minutes apart).
There is still room for a small race condition in between checking to see if we
used the UUID and inserting it, but I think that may be acceptable.
Mark
Re: Any success with storing photos in a database? (preventsdouble-submits)
am 15.10.2008 20:11:53 von Cosimo Streppone
On Wed, Oct 15, 2008 at 12:31 PM, Mark wrote:
> We had a "double submit" bug that allowed a form to be submitted twice
> when we weren't fully prepared for that. [...]
> One approach we are considering is change the submit action on forms
> with JavaScript, so it disables the submit button
That's the cheapest option IMO, and it works very well
for 95% of the typical double submit cases (like Windows
users with the "double-click" syndrome :).
On submit event I checked for a flag to be "1".
If "1", then form has been already submitted.
If "0", set to "1" and submit.
Then Perrin wrote:
> JavaScript is okay, but can be a problem when people hit back
> expecting to use the form again and the button is still disabled.
You could always trigger some javascript event on page load
the re-enables the submit control, even if already enabled,
resetting the flag to "0". This is what I did.
Then, if you want to be 100% sure, you have to work
on the server side...
--
Cosimo
Re: Any success with storing photos in a database? (preventsdouble-submits)
am 15.10.2008 20:15:57 von Mark Stosberg
> Take some elements from the form and search for an add in the past
>
Re: Any success with storing photos in a database? (prevents double-submits)
am 15.10.2008 21:46:35 von mpeters
Mark Stosberg wrote:
> So how might an implementation look?
I would either make the uuid the primary key (might affect performance since it's not an integer,
but a string) or a unique key for the same table. Then you don't have anything else to keep track of
(no extra tables, etc).
> Every so often, the table could cleaned up via cron, (since we probably don't
> care about seeing the same UUID weeks apart, just seconds or minutes apart).
UUID's should never collide.
> There is still room for a small race condition in between checking to see if we
> used the UUID and inserting it, but I think that may be acceptable.
If you're really worried about someone attacking you in this way then insert the record with the
uuid first and then let them upload. If you don't find the uuid they are trying to upload to, then
they changed it so just disallow the upload.
--
Michael Peters
Plus Three, LP
Re: Any success with storing photos in a database? (prevents double-submits)
am 16.10.2008 05:45:58 von Eric Howe
On Oct 15, 2008, at 11:11, Cosimo Streppone wrote:
> On Wed, Oct 15, 2008 at 12:31 PM, Mark wrote:
[...]
> Then Perrin wrote:
>
>> JavaScript is okay, but can be a problem when people hit back
>> expecting to use the form again and the button is still disabled.
>
> You could always trigger some javascript event on page load
> the re-enables the submit control, even if already enabled,
> resetting the flag to "0". This is what I did.
I do it this way (via jQuery) and it works okay even when faced with
the browser's back-button:
function returnFalse() {
return false;
}
$(document).ready(function() {
// Take care of back-button issues first.
$('form').find(':submit').unbind('click', returnFalse);
// Then wire up the auto-disabling stuff.
$('form').submit(function() {
$(this).find(':submit').click(returnFalse);
return true;
});
});
I don't use the 'disabled' attribute as that prevents the browser (at
least some of them) from sending the button back to the server with
the rest of the form.
Eric Howe
eric@pieinsky.ca
Re: Any success with storing photos in a database?
am 24.11.2008 22:40:59 von jonathan vanasco
chiming in a few months late...
from my experience, and responding to some thoughts in the thread:
- storing photos in mysql/pgsql is not a good idea. aside from misc
issues that always arise, you end up stressing the db through searches
- the better way would be to store photo meta-data in the system,
and then search for that to serve. its a much leaner search and
faster/more lightweight.
-we played around with a lot of things
-- mogilefs
--- i think flickr also used pre-yahoo. someone mentioned a
flickrbook, there is also a powerpoint from about 1year after they
launched
-- directory hashing
--- create an md5 and store in db; store on filesystem as ab/cd/ef/
gh/abcdefg + suffix ( [original,-thumb ,-medium].jpg )
---- use rewrite rules on nginx to loadbalance across cluster by
bucket and serve
-- i didn't like perlbal, but it can do some neat in-time
bitshifting on gifs to change the color palatte
-- the BEST solution we found and upgraded to was.. Amazon s3 +
Panther Express. this is before amazon offered cdn services
--- store files on amazon -- originals in private bucket, copies in
public buckets
--- serve files from panther, which was a kickass cdn. they cache
everything in their network , which offsets instabilities from amazon
--- don't worry about disk usage, clustering, load balancing,
BACKUP, and miscellaneous stuff designed around images
--- migrating to that setup took us 2 days... and it was a better
performer than 3 months of code and constant maintenance