Net::Server + IPC::Shareable == decent performance ?
am 08.11.2007 03:24:22 von David Jacobowitz
Hello, this is a question for all the perl people out there who have
written internet servers.
I'm currently have a perl-based server that acts as a hub for a simple
message passing scheme. Clients periodically connect, send a message
to a user (the server puts the message in an in-memory queue for the
recipient, and check for their own messages, with the server splatting
back out the clients message queue and then deleting it.
Each transaction is small and short, and I have everything working
pretty well with a single instance server of Net::Server. And I'm not
doing anything funky with select(); I'm just answering and completing
each transaction in order. This seems to make sense to me, because
there is really not much work per transaction over and above reading
and writing data to the socket.
The thing is, I want this server to be able to take hundreds or maybe
thousands of connections per second. This will never work with a
single process (I don't think. My laptop seems to saturate around 600
xactions/sec, but then the laptop is also running the client processes
as well)
With Net::Server, turning a server into a pre-forked server is pretty
easy. But then each process is independent from that point forward,
and so obviously message queues won't be shared between them. So, I'm
thinking of using IPC::Shared.
But, with all the overhead of IPC will this be any faster than the
single process? Is there an easy way to see where the cycles are
going? If most of the cycles are going to data-structure maintenance,
then I don't see a point in doing this work. If most of them are going
to handling socket stuff, then it would be a win, assuming it works.
Has anyone here made such a server? I'm curious for hints.
As an aside, my application does not require that any user be able to
leave a message for any other user, so it would be okay to segment the
message queues into groups. But for this to work, I'd need a way to
make sure that each client connection matches up with the same server
process on sequential accesses. I could do this, of course, by putting
another server in front of the other servers whose only job in life is
to track which back-end server the client first connected with and
then keep sending the client to the same back-end server on subsequent
connections. But in this case, I'm just creating more or less the same
bottleneck again.
Hints and ideas very much welcome.
thanks,
-- dave j
Re: Net::Server + IPC::Shareable == decent performance ?
am 09.11.2007 21:27:36 von xhoster
David Jacobowitz wrote:
> Hello, this is a question for all the perl people out there who have
> written internet servers.
>
> I'm currently have a perl-based server that acts as a hub for a simple
> message passing scheme. Clients periodically connect, send a message
> to a user (the server puts the message in an in-memory queue for the
> recipient, and check for their own messages, with the server splatting
> back out the clients message queue and then deleting it.
>
> Each transaction is small and short, and I have everything working
> pretty well with a single instance server of Net::Server. And I'm not
> doing anything funky with select(); I'm just answering and completing
> each transaction in order. This seems to make sense to me, because
> there is really not much work per transaction over and above reading
> and writing data to the socket.
>
> The thing is, I want this server to be able to take hundreds or maybe
> thousands of connections per second.
Do you have simple model code the does just the guts of what you have
described with all auxiliary stuff stripped out? If you do, please post it
so we can see exactly what you are doing, and try it out for ourselves. If
not, you should probably make one.
> This will never work with a
> single process (I don't think. My laptop seems to saturate around 600
> xactions/sec, but then the laptop is also running the client processes
> as well)
Is each "transaction" a single TCP connect-send-receive-disconnect
sequence?
At about 600 TCP connections per second on IO::Socket::INET on one of my
machines, the client starts getting refusals with "Cannot assign requested
address". This seems like a kernel thing, not a Perl thing.
> With Net::Server, turning a server into a pre-forked server is pretty
> easy. But then each process is independent from that point forward,
> and so obviously message queues won't be shared between them. So, I'm
> thinking of using IPC::Shared.
>
> But, with all the overhead of IPC will this be any faster than the
> single process?
I suspect it will be slower.
> Is there an easy way to see where the cycles are
> going?
I'd start (on linux) by running the OS "time" on the server and client
and see how much time is spent in system space rather than user space.
Then just change the server so it throws away the client's message and
just sends back a hard-coded dummy message, rather than doing any real
message-queue work. If that makes it run faster, then the message queue
is taking a lot of time. If it doesn't, then it doesn't.
> If most of the cycles are going to data-structure maintenance,
> then I don't see a point in doing this work. If most of them are going
> to handling socket stuff, then it would be a win, assuming it works.
I'm not sure of that. A lot of the "handling socket stuff" goes on in the
kernel, and I suspect much of it is serialized at that point, even if the
user-level processes think it is happening in parallel.
> As an aside, my application does not require that any user be able to
> leave a message for any other user, so it would be okay to segment the
> message queues into groups.
What is a "user" as distinct from a "client"?
> But for this to work, I'd need a way to
> make sure that each client connection matches up with the same server
> process on sequential accesses.
The easiest way is not to close the connection in the first place. Keep
using the same one.
> I could do this, of course, by putting
> another server in front of the other servers whose only job in life is
> to track which back-end server the client first connected with and
> then keep sending the client to the same back-end server on subsequent
> connections. But in this case, I'm just creating more or less the same
> bottleneck again.
I think there is a piece missing here. If the client is long-lived, then
it shouldn't need to told again and again which server it should use--it
should only ask once and then remember which server to use from then on
(or better yet, remember not only the server, but also the connection to
that server.)
On the other hand, if the client is short-lived, then on what basis can it
be considered the "same" as some previous incarnation of itself, in order
for the intermediate server to know what server to route it to?
Whatever it is that defines sameness, perhaps you could compute hash value
on that and use that to compute which server to connect to, omitting the
intermediate server altogether.
Xho
--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.