Preventing SSH timeouts . Some clarification needed

Preventing SSH timeouts . Some clarification needed

am 08.06.2010 11:36:18 von Query

Hi ,

We are seeing some dropped SSH connections because of which some
of the process are failing . The main likely reason for the connection
drops is that both the client and server remains 100% busy during a
certain time interval and during that time interval we see those
occassional connection closed by the server.

===
ssh_exchange_identification: Connection closed by remote host^M
Return_status (65280)
Exit_value (255)
End_time (03 Jun 2010 22:41:41)
====

One work around I could see is adding a timeout value using
ClientAliveInterval option in /etc/ssh/sshd_config on the server side
.. Assume I have set the timeout value to 300 .


" The above option as per the sshd man page tells that it sets a
timeout interval in seconds after which if no data has been received
from the client, sshd(8) will send a message through the encrypted
channel to request a response from the client. "

Let's take a situation where the SSH client is 100% busy or idle and
it had communicated to the server for around 300 seconds , then in
this case if the above option is there , the server should send a
message to the client after 300 secs . The following two scenarios are
coming to my mind.

1) if the server is also 100% busy during that time and could not
send the message to the client , will the ssh connection will be
dropped .
2) or Suppose the server was somewhat free after 350 secs , in that
case will it drop the connection or it will send a message to the
client to check whether the client is active or not since it could not
send the message at 300 s as it was busy during the time .


Please clarify the ssh behaviour for the above scenarious . I hope I
am clear with the question and the above scenarious makes sense .


Thanks
Zaman
--
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing SSH timeouts . Some clarification needed

am 08.06.2010 11:48:22 von Michal Nazarewicz

--=-=-=
Content-Transfer-Encoding: quoted-printable

query writes:
> We are seeing some dropped SSH connections because of which some of
> the process are failing . The main likely reason for the connection
> drops is that both the client and server remains 100% busy during
> a certain time interval and during that time interval we see those
> occassional connection closed by the server.

Are you sure it's not because of some NATing which may have a shorter
timeout then the one used by SSH's keep alive?

> Let's take a situation where the SSH client is 100% busy or idle and
> it had communicated to the server for around 300 seconds , then in
> this case if the above option is there , the server should send a
> message to the client after 300 secs. [...]

300 seconds is a very long time. I consider it unlikely that a process
that was idle for 300 seconds wouldn't quickly get a few CPU cycles just
to send a simple packet. I find it possible only when you use
non-standard schedule policies.

Hope it helps even that I haven't answered your original question.

=2D-=20
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=3D./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +------ooO--(_)--Ooo--

--=-=-=
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)

iEYEARECAAYFAkwOEewACgkQUyzLALfG3x6l8QCgrI6qzW20ZloHdSKCTxEG qoTO
3T8AnA1KRwU5GkfCkTkNBzqTpLsw9AwA
=MUVa
-----END PGP SIGNATURE-----
--=-=-=--
--
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing SSH timeouts . Some clarification needed

am 08.06.2010 12:39:28 von Glynn Clements

query wrote:

> One work around I could see is adding a timeout value using
> ClientAliveInterval option in /etc/ssh/sshd_config on the server side
> . Assume I have set the timeout value to 300 .
>
>
> " The above option as per the sshd man page tells that it sets a
> timeout interval in seconds after which if no data has been received
> from the client, sshd(8) will send a message through the encrypted
> channel to request a response from the client. "
>
> Let's take a situation where the SSH client is 100% busy or idle and
> it had communicated to the server for around 300 seconds , then in
> this case if the above option is there , the server should send a
> message to the client after 300 secs . The following two scenarios are
> coming to my mind.
>
> 1) if the server is also 100% busy during that time and could not
> send the message to the client , will the ssh connection will be
> dropped .
> 2) or Suppose the server was somewhat free after 350 secs , in that
> case will it drop the connection or it will send a message to the
> client to check whether the client is active or not since it could not
> send the message at 300 s as it was busy during the time .

According to the sshd_config(5) manpage, the server will close the
connection after ClientAliveCountMax messages (default value: 3) have
been sent.

I can't see how this can be caused by load. If you haven't yet enabled
ClientAliveInterval, then the connection isn't being closed by sshd
but by the kernel, due to TCP keep-alives not being acknowledged.

By default, the kernel doesn't start sending keep-alives until the
connection has been idle for 2 hours, after which it sends 9 probes at
an interval of 75 seconds, so the system would need to be
non-responsive for over 11 minutes. And the responses are generated by
the kernel, so they'll be sent even if the process is suspended.

As Michal suggests, the most likely reason for this is a NAT timeout.
If you're using NAT, you probably want to set the keep-alive time
(/proc/sys/net/ipv4/tcp_keepalive_time) to a value less than the NAT
timeout. Even then, that will only work for programs which enable
keep-alive (ssh and sshd both do by default; this is controlled by the
TCPKeepAlive option).

--
Glynn Clements
--
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing SSH timeouts . Some clarification needed

am 08.06.2010 17:10:39 von Query

2010/6/8 Michal Nazarewicz :
>> 2010/6/8 Michal Nazarewicz :
>>> Are you sure it's not because of some NATing which may have a short=
er
>>> timeout then the one used by SSH's keep alive?
>
> query writes:
>> I am not 100% sure but during the connection dropout =A0time , the C=
PU
>> is 100% busy as shown
>> by our own reporting utility. =A0Reg NAT ing , I don't think so thos=
e
>> hosts are behind NAT
>> as there was no requirement like that for those hosts to access
>> Internet . =A0Anyway , I will
>> confirm regarding this from the network-admin .
>>
>> P.S: Is there is any utility that can tell us whether we are behind
>> NAT or not .
>
> If =93ifconfig=94 on one host gives you different IP addresses then t=
he
> other host see as incoming IP then you are behind NAT.
>

Sorry , I failed to understand the above statement . But I have
something in mind , I will try it tomorrow .
=46rom the source machine , I send a packet to remote host on a
different network . Now If I capture packet on the remote host and
it comes out to be different ip address than the source host , then
probably I am behind NAT. These hosts are having private ip address.

Will it help to help me know whether I am behind NAT ?




> There may be some other services that close the connection like
> firewalls and some such. =A0You should consult if there are any on th=
e
> path and whether thous could drop connections with your network
> administrator.
>
> Also Glynn's suggestion of making keep alive timeout shorter may work=

>
> I find it hard to believe that high CPU usage could cause connection
> dropping unless you have some *really* busy machine but then you shou=
ld
> consider upgrading hardware or rethinking what services those serves
> provide.

These hosts are not providing any Internet service and mainly
responsible for processing around Gigabits of data . The processing
continues for around 24 hours ,
During the processing , it utilizes around 100% CPU for around 4 hours
and the connection drop happened during that time . Not sure what
processig goes on during the time which takes all the CPU . The user
we are talking here is the user under whom these processes runs .

Hope it helps to understand the scenario .
>
> --
> Best regards, =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 _ =A0 =A0 _
> =A0.o. | Liege of Serenly Enlightened Majesty of =A0 =A0 =A0o' \,=3D.=
/ `o
> =A0..o | Computer Science, =A0Michal "mina86" Nazarewicz =A0 (o o)
> =A0ooo +------ooO--(_)--Ooo--
>
Thanks
Zaman
--
To unsubscribe from this list: send the line "unsubscribe linux-admin" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing SSH timeouts . Some clarification needed

am 08.06.2010 17:10:57 von Query

Thanks for the suggestion .

On Tue, Jun 8, 2010 at 4:09 PM, Glynn Clements m> wrote:
>
> query wrote:
>
>> One work around I could see is adding a =A0timeout value using
>> ClientAliveInterval option in /etc/ssh/sshd_config on the server sid=
e
>> . Assume I have set the timeout value to 300 .
>>
>>
>> " The above option as per the sshd man page tells that it sets a
>> timeout interval in seconds after which if no data has been received
>> from the client, sshd(8) will send a message through the encrypted
>> channel to request a response from the client. "
>>
>> Let's take a situation where the SSH client is 100% busy or idle and
>> it had communicated to the server for around 300 seconds , then in
>> this case if the above option is there , the server should send a
>> message to the client after 300 secs . The following two scenarios a=
re
>> coming to my mind.
>>
>> 1) =A0if the server is also 100% busy during that time and could not
>> send the message to the client , will the ssh connection will be
>> dropped .
>> 2) or Suppose the server was somewhat free after 350 secs , in that
>> case will it drop the connection or it will send a message to the
>> client to check whether the client is active or not since it could n=
ot
>> send the message at 300 s as it was busy during the time .
>
> According to the sshd_config(5) manpage, the server will close the
> connection after ClientAliveCountMax messages (default value: 3) have
> been sent.
>
> I can't see how this can be caused by load. If you haven't yet enable=
d
> ClientAliveInterval, then the connection isn't being closed by sshd
> but by the kernel, due to TCP keep-alives not being acknowledged.

okay...that may be the cause . The client host was also busy because
of which TCP keep-alive
were not acknowledged.


>
> By default, the kernel doesn't start sending keep-alives until the
> connection has been idle for 2 hours, after which it sends 9 probes a=
t
> an interval of 75 seconds, so the system would need to be
> non-responsive for over 11 minutes. And the responses are generated b=
y
> the kernel, so they'll be sent even if the process is suspended.
>
> As Michal suggests, the most likely reason for this is a NAT timeout.
> If you're using NAT, you probably want to set the keep-alive time
> (/proc/sys/net/ipv4/tcp_keepalive_time) to a value less than the NAT
> timeout. Even then, that will only work for programs which enable
> keep-alive (ssh and sshd both do by default; this is controlled by th=
e
> TCPKeepAlive option).

How to determine the value of NAT timeout . Is it at the host level or
the device where NATing is implemented . I was able to
find the keepalive timeout value at the host .

====
$ sudo sysctl -a | grep -i keepalive
net.ipv4.tcp_keepalive_time =3D 7200
net.ipv4.tcp_keepalive_probes =3D 9
net.ipv4.tcp_keepalive_intvl =3D 75
=====3D

Most likely I am not behind NAT , I will confirm it tomorrow . If that
is the case , then which should I consider to increase the timeout
value.
The kernel timeout value or implement either TCPKeepAlive option or
the ClientAliveInterval interval . TCPKeepAlive option is somehow
disabled
in the sshd config file . Please clarify regarding this.


>
> --
> Glynn Clements
>

Once again , thanks all

--Zaman
--
To unsubscribe from this list: send the line "unsubscribe linux-admin" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing SSH timeouts . Some clarification needed

am 08.06.2010 18:19:18 von Glynn Clements

query wrote:

> > I can't see how this can be caused by load. If you haven't yet enabled
> > ClientAliveInterval, then the connection isn't being closed by sshd
> > but by the kernel, due to TCP keep-alives not being acknowledged.
>
> okay...that may be the cause . The client host was also busy because
> of which TCP keep-alive were not acknowledged.

Load won't have any effect upon TCP keep-alives, as it's the kernel
which acknowledges keep-alive packets, not the user process.

Keep-alive allows you to detect that a host is unreachable (e.g.
network failure, system crash, power failure, etc). It doesn't tell
you anything about an individual process.

> > As Michal suggests, the most likely reason for this is a NAT timeout.
> > If you're using NAT, you probably want to set the keep-alive time
> > (/proc/sys/net/ipv4/tcp_keepalive_time) to a value less than the NAT
> > timeout. Even then, that will only work for programs which enable
> > keep-alive (ssh and sshd both do by default; this is controlled by the
> > TCPKeepAlive option).
>
> How to determine the value of NAT timeout . Is it at the host level or
> the device where NATing is implemented .

The device which performs NAT.

> I was able to find the keepalive timeout value at the host .
>
> ====
> $ sudo sysctl -a | grep -i keepalive
> net.ipv4.tcp_keepalive_time = 7200
> net.ipv4.tcp_keepalive_probes = 9
> net.ipv4.tcp_keepalive_intvl = 75
> =====
>
> Most likely I am not behind NAT , I will confirm it tomorrow . If that
> is the case , then which should I consider to increase the timeout
> value.
> The kernel timeout value or implement either TCPKeepAlive option or
> the ClientAliveInterval interval . TCPKeepAlive option is somehow
> disabled in the sshd config file . Please clarify regarding this.

TCPKeepAlive is enabled by default. But even if it's enabled, the
2-hour wait before any keep-alives are sent typically won't be enough
to prevent NAT entries from expiring.

Even the 5-minute interval between SSH keep-alives may be longer than
the NAT expiry time. Low-end router/modem devices with built-in NAT
seem base their default configuration on the assumption that you're
using HTTP from Win95 boxes, where a connection being idle for more
than 30 seconds usually means that the Win95 box has crashed.

Another possibility is a really cheap ISP which uses (a heavily
oversubscribed pool of) dynamic IP addresses, which expire whenever
the connection is idle for more than a minute.

--
Glynn Clements
--
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing SSH timeouts . Some clarification needed

am 08.06.2010 21:48:44 von Michal Nazarewicz

--=-=-=
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

> 2010/6/8 Michal Nazarewicz :
>> If â€=9Cifconfigâ€=9D on one host gives you different IP address=
es then the
>> other host see as incoming IP then you are behind NAT.

query writes:
> Sorry , I failed to understand the above statement . But I have
> something in mind , I will try it tomorrow .
> From the source machine , I send a packet to remote host on a
> different network . Now If I capture packet on the remote host and
> it comes out to be different ip address than the source host , then
> probably I am behind NAT. These hosts are having private ip address.

Yes, that's what I've written. ;)

You run â€=9Cifconfigâ€=9D an one machine at it will show you what'=
s it's IP
address (there may be several interfaces). Then you connect from this
machine to the other machine and check the source address on the other
machine. Repeat for the other direction even thou, if you actually can
connect from one machine to the other then it is likely that the one you
are connecting to is not behind NAT.

> These hosts are not providing any Internet service and mainly
> responsible for processing around Gigabits of data . The processing
> continues for around 24 hours , During the processing , it utilizes
> around 100% CPU for around 4 hours and the connection drop happened
> during that time . Not sure what processig goes on during the time
> which takes all the CPU . The user we are talking here is the user
> under whom these processes runs .

If the processing takes so much time and is so CPU consuming I'd try
running it with nice, ie.: â€=9Cnice -n 20 process-dataâ€=9D rather=
then plain
â€=9Cprocess-dataâ€=9D command.

If the dropping in fact happens only while high CPU usage than maybe in
fact it is a culprit. By running the processing via nice it gets lower
priority (so in effect everything else gets higher priority in
comparison). This could help SSH get its desired CPU interval in time.

=2D-=20
Best regards, _ _
.o. | Liege of Serenly Enlightened Majesty of o' \,=3D./ `o
..o | Computer Science, Michal "mina86" Nazarewicz (o o)
ooo +------ooO--(_)--Ooo--

--=-=-=
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)

iEYEARECAAYFAkwOnqMACgkQUyzLALfG3x5G5QCeI491dNtZdkjyksYeaTm2 XqRl
6rMAni5KChYZk+xnb/lKajOSN1kah+y8
=tZ6N
-----END PGP SIGNATURE-----
--=-=-=--
--
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing SSH timeouts . Some clarification needed

am 09.06.2010 07:33:19 von Query

Hi Guys ,

=46inally got the clarification . We are not behind NAT . I checked
myself by injecting some packets and sniffing the packets on the dest
host as Michael suggested .
I tried the experiment both ways and the no changes in ip address can
be seen . I cross verified with our network-admin that those hosts are
not behind NAT .
So , most likely case for these connection drop-out are



2010/6/9 Michal Nazarewicz :
>> 2010/6/8 Michal Nazarewicz :
>>> If =93ifconfig=94 on one host gives you different IP addresses then=
the
>>> other host see as incoming IP then you are behind NAT.
>
> query writes:
>> Sorry , I failed to understand the above statement . But I have
>> something in mind , I will try it tomorrow .
>> From the source machine , I send a packet to remote host on a
>> different network =A0 . Now If I =A0capture packet on the remote hos=
t and
>> it comes out to be different ip address than the source host , then
>> probably I am behind NAT. =A0These hosts are having private ip addre=
ss.
>
> Yes, that's what I've written. ;)
>
> You run =93ifconfig=94 an one machine at it will show you what's it's=
IP
> address (there may be several interfaces). =A0Then you connect from t=
his
> machine to the other machine and check the source address on the othe=
r
> machine. =A0Repeat for the other direction even thou, if you actually=
can
> connect from one machine to the other then it is likely that the one =
you
> are connecting to is not behind NAT.
>
>> These hosts are not providing any Internet service and mainly
>> responsible for processing around Gigabits of data . The processing
>> continues for around 24 hours , During the processing , it utilizes
>> around 100% CPU for around 4 hours and the connection drop happened
>> during that time . Not sure what processig goes on during the time
>> which takes all the CPU . The user we are talking here is the user
>> under whom these processes runs .
>
> If the processing takes so much time and is so CPU consuming I'd try
> running it with nice, ie.: =93nice -n 20 process-data=94 rather then =
plain
> =93process-data=94 command.
>
> If the dropping in fact happens only while high CPU usage than maybe =
in
> fact it is a culprit. =A0By running the processing via nice it gets l=
ower
> priority (so in effect everything else gets higher priority in
> comparison). =A0This could help SSH get its desired CPU interval in t=
ime.

Thanks for this suggestion . But probably , we will not be able to do
that . Our application itself is doing ssh to the other host and
during high load
the ssh connection drops and our application fails.

>
> --
> Best regards, =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 _ =A0 =A0 _
> =A0.o. | Liege of Serenly Enlightened Majesty of =A0 =A0 =A0o' \,=3D.=
/ `o
> =A0..o | Computer Science, =A0Michal "mina86" Nazarewicz =A0 (o o)
> =A0ooo +------ooO--(_)--Ooo--
>

Thanks
Zaman
--
To unsubscribe from this list: send the line "unsubscribe linux-admin" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing SSH timeouts . Some clarification needed

am 09.06.2010 08:44:31 von Query

Guys , since we are clear now that we are not behind NAT , so we can
forget now about reducing the keep-alive time
(/proc/sys/net/ipv4/tcp_keepalive_time) to a value less than the NAT
timeout. But anyways , I learn something new :) .
The most likely reason which Michael also agreed can be the high load
on both the system .

So, do you suggest now to enable to enable the ClientAliveInterval
option . Also , since ClientAliveCountMax is enabled by default with a
value of 3 ,
so probably I will keep the value of ClientAliveInterval less than 300
secs . I will probably keep it at 60 secs. So , the connection will
dropout after 180 secs if there is no response .

Also , somewhat strange , TCPKeepAlive option is disabled in our
sshd_config file , not sure why . So , If ClientAliveInterval is
enabled , can we can leave TCPKeepAlive disabled . Is our purpose
will serve ?


Thanks
Zaman

On Tue, Jun 8, 2010 at 9:49 PM, Glynn Clements m> wrote:
>
> query wrote:
>
>> > I can't see how this can be caused by load. If you haven't yet ena=
bled
>> > ClientAliveInterval, then the connection isn't being closed by ssh=
d
>> > but by the kernel, due to TCP keep-alives not being acknowledged.
>>
>> okay...that may be the cause . The client host was also busy because
>> of which TCP keep-alive were not acknowledged.
>
> Load won't have any effect upon TCP keep-alives, as it's the kernel
> which acknowledges keep-alive packets, not the user process.
>
> Keep-alive allows you to detect that a host is unreachable (e.g.
> network failure, system crash, power failure, etc). It doesn't tell
> you anything about an individual process.
>
>> > As Michal suggests, the most likely reason for this is a NAT timeo=
ut.
>> > If you're using NAT, you probably want to set the keep-alive time
>> > (/proc/sys/net/ipv4/tcp_keepalive_time) to a value less than the N=
AT
>> > timeout. Even then, that will only work for programs which enable
>> > keep-alive (ssh and sshd both do by default; this is controlled by=
the
>> > TCPKeepAlive option).
>>
>> How to determine the value of NAT timeout . Is it at the host level =
or
>> the device where NATing is implemented .
>
> The device which performs NAT.
>
>> I was able to find the keepalive timeout value at the host .
>>
>> ====
>> $ sudo sysctl -a | grep -i keepalive
>> net.ipv4.tcp_keepalive_time =3D 7200
>> net.ipv4.tcp_keepalive_probes =3D 9
>> net.ipv4.tcp_keepalive_intvl =3D 75
>> =====3D
>>
>> Most likely I am not behind NAT , I will confirm it tomorrow . If th=
at
>> is the case , then which should I consider to increase the timeout
>> value.
>> The kernel timeout value or implement either TCPKeepAlive option or
>> the ClientAliveInterval interval . TCPKeepAlive option is somehow
>> disabled in the sshd config file . =A0Please clarify regarding this.
>
> TCPKeepAlive is enabled by default. But even if it's enabled, the
> 2-hour wait before any keep-alives are sent typically won't be enough
> to prevent NAT entries from expiring.
>
> Even the 5-minute interval between SSH keep-alives may be longer than
> the NAT expiry time. Low-end router/modem devices with built-in NAT
> seem base their default configuration on the assumption that you're
> using HTTP from Win95 boxes, where a connection being idle for more
> than 30 seconds usually means that the Win95 box has crashed.
>
> Another possibility is a really cheap ISP which uses (a heavily
> oversubscribed pool of) dynamic IP addresses, which expire whenever
> the connection is idle for more than a minute.
>
> --
> Glynn Clements
>
--
To unsubscribe from this list: send the line "unsubscribe linux-admin" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing SSH timeouts . Some clarification needed

am 09.06.2010 10:15:39 von adamb

This is a cryptographically signed message in MIME format.

--------------ms090002090100040106090204
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi,

Another setting to try, on the client side, is the ServerAliveInterval.
This sets a keep alive packet to be sent within the SSH protocol, as
opposed to TCPKeepAlive which is within the underlying TCP connection. I
have had the misfortune to be behind firewalls that have harvested
"dead" connections far too quickly, in my opinion, and this setting
worked for me where TCPKeepAlive didn't. Worth a try.

Cheers,

Adam

On 09/06/10 07:44, query wrote:
> Guys , since we are clear now that we are not behind NAT , so we can
> forget now about reducing the keep-alive time
> (/proc/sys/net/ipv4/tcp_keepalive_time) to a value less than the NAT
> timeout. But anyways , I learn something new :) .
> The most likely reason which Michael also agreed can be the high load
> on both the system .
>=20
> So, do you suggest now to enable to enable the ClientAliveInterval
> option . Also , since ClientAliveCountMax is enabled by default with a
> value of 3 ,
> so probably I will keep the value of ClientAliveInterval less than 300
> secs . I will probably keep it at 60 secs. So , the connection will
> dropout after 180 secs if there is no response .
>=20
> Also , somewhat strange , TCPKeepAlive option is disabled in our
> sshd_config file , not sure why . So , If ClientAliveInterval is
> enabled , can we can leave TCPKeepAlive disabled . Is our purpose
> will serve ?
>=20
>=20
> Thanks
> Zaman
>=20
> On Tue, Jun 8, 2010 at 9:49 PM, Glynn Clements m> wrote:
>>
>> query wrote:
>>
>>>> I can't see how this can be caused by load. If you haven't yet enabl=
ed
>>>> ClientAliveInterval, then the connection isn't being closed by sshd
>>>> but by the kernel, due to TCP keep-alives not being acknowledged.
>>>
>>> okay...that may be the cause . The client host was also busy because
>>> of which TCP keep-alive were not acknowledged.
>>
>> Load won't have any effect upon TCP keep-alives, as it's the kernel
>> which acknowledges keep-alive packets, not the user process.
>>
>> Keep-alive allows you to detect that a host is unreachable (e.g.
>> network failure, system crash, power failure, etc). It doesn't tell
>> you anything about an individual process.
>>
>>>> As Michal suggests, the most likely reason for this is a NAT timeout=

>>>> If you're using NAT, you probably want to set the keep-alive time
>>>> (/proc/sys/net/ipv4/tcp_keepalive_time) to a value less than the NAT=

>>>> timeout. Even then, that will only work for programs which enable
>>>> keep-alive (ssh and sshd both do by default; this is controlled by t=
he
>>>> TCPKeepAlive option).
>>>
>>> How to determine the value of NAT timeout . Is it at the host level o=
r
>>> the device where NATing is implemented .
>>
>> The device which performs NAT.
>>
>>> I was able to find the keepalive timeout value at the host .
>>>
>>> ====
>>> $ sudo sysctl -a | grep -i keepalive
>>> net.ipv4.tcp_keepalive_time =3D 7200
>>> net.ipv4.tcp_keepalive_probes =3D 9
>>> net.ipv4.tcp_keepalive_intvl =3D 75
>>> =====3D
>>>
>>> Most likely I am not behind NAT , I will confirm it tomorrow . If tha=
t
>>> is the case , then which should I consider to increase the timeout
>>> value.
>>> The kernel timeout value or implement either TCPKeepAlive option or
>>> the ClientAliveInterval interval . TCPKeepAlive option is somehow
>>> disabled in the sshd config file . Please clarify regarding this.
>>
>> TCPKeepAlive is enabled by default. But even if it's enabled, the
>> 2-hour wait before any keep-alives are sent typically won't be enough
>> to prevent NAT entries from expiring.
>>
>> Even the 5-minute interval between SSH keep-alives may be longer than
>> the NAT expiry time. Low-end router/modem devices with built-in NAT
>> seem base their default configuration on the assumption that you're
>> using HTTP from Win95 boxes, where a connection being idle for more
>> than 30 seconds usually means that the Win95 box has crashed.
>>
>> Another possibility is a really cheap ISP which uses (a heavily
>> oversubscribed pool of) dynamic IP addresses, which expire whenever
>> the connection is idle for more than a minute.
>>
>> --
>> Glynn Clements
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-admin" =
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html


--------------ms090002090100040106090204
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEH AQAAoIIVJjCC
BpwwggWEoAMCAQICAwDMLjANBgkqhkiG9w0BAQUFADCBjDELMAkGA1UEBhMC SUwxFjAUBgNV
BAoTDVN0YXJ0Q29tIEx0ZC4xKzApBgNVBAsTIlNlY3VyZSBEaWdpdGFsIENl cnRpZmljYXRl
IFNpZ25pbmcxODA2BgNVBAMTL1N0YXJ0Q29tIENsYXNzIDEgUHJpbWFyeSBJ bnRlcm1lZGlh
dGUgQ2xpZW50IENBMB4XDTA5MTIxMDA5NDcwMloXDTEwMTIxMTEwNDM1M1ow gZIxIDAeBgNV
BA0TFzExMTgyMC1wZWRaeGw1ejNLOGFZQVJUMR4wHAYDVQQKExVQZXJzb25h IE5vdCBWYWxp
ZGF0ZWQxKTAnBgNVBAMTIFN0YXJ0Q29tIEZyZWUgQ2VydGlmaWNhdGUgTWVt YmVyMSMwIQYJ
KoZIhvcNAQkBFhRhZGFtYkBhZ2l0YXRlLm9yZy51azCCASIwDQYJKoZIhvcN AQEBBQADggEP
ADCCAQoCggEBAOofaCOt5EdexCwTURhbkVx2+AfysIgIJ56+BngA6buGdaT4 aiMgbI/05qL+
5hHNHGOShbSZmS14ffLcDrO97bnlsDiK0RiFNAPmmXUTK3EqjHqdsZfKLZir rH4nrTYdNSOw
hz7v2S+Ar/606D/9yO1iPpergqFv7lI9sRHNSVr66NrH6AEgLwWM/fMeePJg F7Zv6tHlpET9
sUrTdqYWU8pbhByyyoUsMADC81pO4isTt6AY5K5jvPtKNklSj4Vt4PQ0tymQ tfnR2l3KVeNG
NipR1EH1BUaCKuhP3ivrAPb5UOQ8QA44SDhcWjWSOuj5e00MognvzCTucPbC rmXhl4kCAwEA
AaOCAv0wggL5MAkGA1UdEwQCMAAwCwYDVR0PBAQDAgSwMB0GA1UdJQQWMBQG CCsGAQUFBwMC
BggrBgEFBQcDBDAdBgNVHQ4EFgQU8YgXrDjxb92HRDCpmAd5wkrL5HcwHwYD VR0jBBgwFoAU
U3Ltkpzg2ssBXHx+ljVO8tS4UYIwHwYDVR0RBBgwFoEUYWRhbWJAYWdpdGF0 ZS5vcmcudWsw
ggFCBgNVHSAEggE5MIIBNTCCATEGCysGAQQBgbU3AQIBMIIBIDAuBggrBgEF BQcCARYiaHR0
cDovL3d3dy5zdGFydHNzbC5jb20vcG9saWN5LnBkZjA0BggrBgEFBQcCARYo aHR0cDovL3d3
dy5zdGFydHNzbC5jb20vaW50ZXJtZWRpYXRlLnBkZjCBtwYIKwYBBQUHAgIw gaowFBYNU3Rh
cnRDb20gTHRkLjADAgEBGoGRTGltaXRlZCBMaWFiaWxpdHksIHNlZSBzZWN0 aW9uICpMZWdh
bCBMaW1pdGF0aW9ucyogb2YgdGhlIFN0YXJ0Q29tIENlcnRpZmljYXRpb24g QXV0aG9yaXR5
IFBvbGljeSBhdmFpbGFibGUgYXQgaHR0cDovL3d3dy5zdGFydHNzbC5jb20v cG9saWN5LnBk
ZjBjBgNVHR8EXDBaMCugKaAnhiVodHRwOi8vd3d3LnN0YXJ0c3NsLmNvbS9j cnR1MS1jcmwu
Y3JsMCugKaAnhiVodHRwOi8vY3JsLnN0YXJ0c3NsLmNvbS9jcnR1MS1jcmwu Y3JsMIGOBggr
BgEFBQcBAQSBgTB/MDkGCCsGAQUFBzABhi1odHRwOi8vb2NzcC5zdGFydHNz bC5jb20vc3Vi
L2NsYXNzMS9jbGllbnQvY2EwQgYIKwYBBQUHMAKGNmh0dHA6Ly93d3cuc3Rh cnRzc2wuY29t
L2NlcnRzL3N1Yi5jbGFzczEuY2xpZW50LmNhLmNydDAjBgNVHRIEHDAahhho dHRwOi8vd3d3
LnN0YXJ0c3NsLmNvbS8wDQYJKoZIhvcNAQEFBQADggEBAHqf8U+Mt0kKbhnF O0+wBghId6fa
2nZsTOI4oEnshcsyKlkE8SFr1L/9DNCxdFgMTM+F9TjdBc9qPXAflGHwVVlI UT/0CtWMFKjU
hxWx+hQddNwI3gRhDAE3M9Ae2FDRxJQbGkcFKGH1Q9rMWFQWGt3gBaYVuqQL EKc8Rvs6bCrS
XGvLee+wfjz+TUmTZIOPl7nlv6aGowFSWwhZ/VEAz1udtqyMKVkfmlQNF4jR s4Vbmi4xrpg9
tZvdRXrQowGjKGSdw5YFfFG3QyCwci1CqAuw1xhR9KxLQwLc5NyR0jso9Oy0 GCCFC5xOYCU6
68CuhEHKPREsD0kYOmX7xiCQmEgwggacMIIFhKADAgECAgMAzC4wDQYJKoZI hvcNAQEFBQAw
gYwxCzAJBgNVBAYTAklMMRYwFAYDVQQKEw1TdGFydENvbSBMdGQuMSswKQYD VQQLEyJTZWN1
cmUgRGlnaXRhbCBDZXJ0aWZpY2F0ZSBTaWduaW5nMTgwNgYDVQQDEy9TdGFy dENvbSBDbGFz
cyAxIFByaW1hcnkgSW50ZXJtZWRpYXRlIENsaWVudCBDQTAeFw0wOTEyMTAw OTQ3MDJaFw0x
MDEyMTExMDQzNTNaMIGSMSAwHgYDVQQNExcxMTE4MjAtcGVkWnhsNXozSzhh WUFSVDEeMBwG
A1UEChMVUGVyc29uYSBOb3QgVmFsaWRhdGVkMSkwJwYDVQQDEyBTdGFydENv bSBGcmVlIENl
cnRpZmljYXRlIE1lbWJlcjEjMCEGCSqGSIb3DQEJARYUYWRhbWJAYWdpdGF0 ZS5vcmcudWsw
ggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDqH2gjreRHXsQsE1EY W5FcdvgH8rCI
CCeevgZ4AOm7hnWk+GojIGyP9Oai/uYRzRxjkoW0mZkteH3y3A6zve255bA4 itEYhTQD5pl1
EytxKox6nbGXyi2Yq6x+J602HTUjsIc+79kvgK/+tOg//cjtYj6Xq4Khb+5S PbERzUla+uja
x+gBIC8FjP3zHnjyYBe2b+rR5aRE/bFK03amFlPKW4QcssqFLDAAwvNaTuIr E7egGOSuY7z7
SjZJUo+FbeD0NLcpkLX50dpdylXjRjYqUdRB9QVGgiroT94r6wD2+VDkPEAO OEg4XFo1kjro
+XtNDKIJ78wk7nD2wq5l4ZeJAgMBAAGjggL9MIIC+TAJBgNVHRMEAjAAMAsG A1UdDwQEAwIE
sDAdBgNVHSUEFjAUBggrBgEFBQcDAgYIKwYBBQUHAwQwHQYDVR0OBBYEFPGI F6w48W/dh0Qw
qZgHecJKy+R3MB8GA1UdIwQYMBaAFFNy7ZKc4NrLAVx8fpY1TvLUuFGCMB8G A1UdEQQYMBaB
FGFkYW1iQGFnaXRhdGUub3JnLnVrMIIBQgYDVR0gBIIBOTCCATUwggExBgsr BgEEAYG1NwEC
ATCCASAwLgYIKwYBBQUHAgEWImh0dHA6Ly93d3cuc3RhcnRzc2wuY29tL3Bv bGljeS5wZGYw
NAYIKwYBBQUHAgEWKGh0dHA6Ly93d3cuc3RhcnRzc2wuY29tL2ludGVybWVk aWF0ZS5wZGYw
gbcGCCsGAQUFBwICMIGqMBQWDVN0YXJ0Q29tIEx0ZC4wAwIBARqBkUxpbWl0 ZWQgTGlhYmls
aXR5LCBzZWUgc2VjdGlvbiAqTGVnYWwgTGltaXRhdGlvbnMqIG9mIHRoZSBT dGFydENvbSBD
ZXJ0aWZpY2F0aW9uIEF1dGhvcml0eSBQb2xpY3kgYXZhaWxhYmxlIGF0IGh0 dHA6Ly93d3cu
c3RhcnRzc2wuY29tL3BvbGljeS5wZGYwYwYDVR0fBFwwWjAroCmgJ4YlaHR0 cDovL3d3dy5z
dGFydHNzbC5jb20vY3J0dTEtY3JsLmNybDAroCmgJ4YlaHR0cDovL2NybC5z dGFydHNzbC5j
b20vY3J0dTEtY3JsLmNybDCBjgYIKwYBBQUHAQEEgYEwfzA5BggrBgEFBQcw AYYtaHR0cDov
L29jc3Auc3RhcnRzc2wuY29tL3N1Yi9jbGFzczEvY2xpZW50L2NhMEIGCCsG AQUFBzAChjZo
dHRwOi8vd3d3LnN0YXJ0c3NsLmNvbS9jZXJ0cy9zdWIuY2xhc3MxLmNsaWVu dC5jYS5jcnQw
IwYDVR0SBBwwGoYYaHR0cDovL3d3dy5zdGFydHNzbC5jb20vMA0GCSqGSIb3 DQEBBQUAA4IB
AQB6n/FPjLdJCm4ZxTtPsAYISHen2tp2bEziOKBJ7IXLMipZBPEha9S//QzQ sXRYDEzPhfU4
3QXPaj1wH5Rh8FVZSFE/9ArVjBSo1IcVsfoUHXTcCN4EYQwBNzPQHthQ0cSU GxpHBShh9UPa
zFhUFhrd4AWmFbqkCxCnPEb7Omwq0lxry3nvsH48/k1Jk2SDj5e55b+mhqMB UlsIWf1RAM9b
nbasjClZH5pUDReI0bOFW5ouMa6YPbWb3UV60KMBoyhkncOWBXxRt0MgsHIt QqgLsNcYUfSs
S0MC3OTckdI7KPTstBgghQucTmAlOuvAroRByj0RLA9JGDpl+8YgkJhIMIIH 4jCCBcqgAwIB
AgIBDTANBgkqhkiG9w0BAQUFADB9MQswCQYDVQQGEwJJTDEWMBQGA1UEChMN U3RhcnRDb20g
THRkLjErMCkGA1UECxMiU2VjdXJlIERpZ2l0YWwgQ2VydGlmaWNhdGUgU2ln bmluZzEpMCcG
A1UEAxMgU3RhcnRDb20gQ2VydGlmaWNhdGlvbiBBdXRob3JpdHkwHhcNMDcx MDI0MjEwMTU0
WhcNMTIxMDIyMjEwMTU0WjCBjDELMAkGA1UEBhMCSUwxFjAUBgNVBAoTDVN0 YXJ0Q29tIEx0
ZC4xKzApBgNVBAsTIlNlY3VyZSBEaWdpdGFsIENlcnRpZmljYXRlIFNpZ25p bmcxODA2BgNV
BAMTL1N0YXJ0Q29tIENsYXNzIDEgUHJpbWFyeSBJbnRlcm1lZGlhdGUgQ2xp ZW50IENBMIIB
IjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxwmDzM4t2BqxKaQuE6uW vooyg4ymiEGW
VUet1G8SD+rqvyNH4QrvnEIaFHxOhESip7vMz39ScLpNLbL1QpOlPW/tFIzN HS3qd2XRNYG5
Sv9RcGE+T4qbLtsjjJbi6sL7Ls/f/X9ftTyhxvxWkf8KW37iKrueKsxw2Hqo lH7GM6FX5UfN
AwAu4ZifkpmZzU1slBhyWwaQPEPPZRsWoTb7q8hmgv6Nv3Hg9rmA1/VPBIOQ 6SKRkHXG0Hhm
q1dOFoAFI411+a/9nWm5rcVjGcIWZ2v/43Yksq60jExipA4l5uv9/+Hm33mb gmCszdj/Dthf
13tgAv2O83hLJ0exTqfrlwIDAQABo4IDWzCCA1cwDAYDVR0TBAUwAwEB/zAL BgNVHQ8EBAMC
AaYwHQYDVR0OBBYEFFNy7ZKc4NrLAVx8fpY1TvLUuFGCMIGoBgNVHSMEgaAw gZ2AFE4L7xqk
QFulF2mHMMo0aEPQQa7yoYGBpH8wfTELMAkGA1UEBhMCSUwxFjAUBgNVBAoT DVN0YXJ0Q29t
IEx0ZC4xKzApBgNVBAsTIlNlY3VyZSBEaWdpdGFsIENlcnRpZmljYXRlIFNp Z25pbmcxKTAn
BgNVBAMTIFN0YXJ0Q29tIENlcnRpZmljYXRpb24gQXV0aG9yaXR5ggEBMAkG A1UdEgQCMAAw
PQYIKwYBBQUHAQEEMTAvMC0GCCsGAQUFBzAChiFodHRwOi8vd3d3LnN0YXJ0 c3NsLmNvbS9z
ZnNjYS5jcnQwYAYDVR0fBFkwVzAsoCqgKIYmaHR0cDovL2NlcnQuc3RhcnRj b20ub3JnL3Nm
c2NhLWNybC5jcmwwJ6AloCOGIWh0dHA6Ly9jcmwuc3RhcnRzc2wuY29tL3Nm c2NhLmNybDCC
AV0GA1UdIASCAVQwggFQMIIBTAYLKwYBBAGBtTcBAQQwggE7MC8GCCsGAQUF BwIBFiNodHRw
Oi8vY2VydC5zdGFydGNvbS5vcmcvcG9saWN5LnBkZjA1BggrBgEFBQcCARYp aHR0cDovL2Nl
cnQuc3RhcnRjb20ub3JnL2ludGVybWVkaWF0ZS5wZGYwgdAGCCsGAQUFBwIC MIHDMCcWIFN0
YXJ0IENvbW1lcmNpYWwgKFN0YXJ0Q29tKSBMdGQuMAMCAQEagZdMaW1pdGVk IExpYWJpbGl0
eSwgcmVhZCB0aGUgc2VjdGlvbiAqTGVnYWwgTGltaXRhdGlvbnMqIG9mIHRo ZSBTdGFydENv
bSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eSBQb2xpY3kgYXZhaWxhYmxlIGF0 IGh0dHA6Ly9j
ZXJ0LnN0YXJ0Y29tLm9yZy9wb2xpY3kucGRmMBEGCWCGSAGG+EIBAQQEAwIA BzBQBglghkgB
hvhCAQ0EQxZBU3RhcnRDb20gQ2xhc3MgMSBQcmltYXJ5IEludGVybWVkaWF0 ZSBGcmVlIFNT
TCBFbWFpbCBDZXJ0aWZpY2F0ZXMwDQYJKoZIhvcNAQEFBQADggIBAKqa4eBb jM4dG/wdxiww
IKC3kyb98QK2zREovyn/xzDP/4H/Bc8FFDTgoJR+nX2Li0EP3U7TsjG+CaIi 90+8YlShADpk
Prfm/8SzjGtJtfM6EaluJOhpcqMr3OyzK3aYGJP5RIeZ6vLT3fQaDZsIooXl 6YSFR/0HpU4F
JDc0wuyFaZmFbCrjTp8RNYyRWTTX6mWSv+TraOwuj3zrrddSpgUEi2WqwM9G /5o4IXQbGHx7
oXTvL6zrw9IOYO3QOKZDgFNhHeKUgqMAUiLcg/+WhcGe+Y4umKuxghtwaYsg D/bLfIfop3NC
/u5JqwDCWizAJruhmbOV4LG859MFCb2w/YeY55zDPVGmQ3MZdriwdOKrhlFj OjYihmm28UHO
vND2G3kK0LvnuieLqjQMc6GuUcZAQOWv96pW4BfbiQXpAqibMMeb0PZISa7P FEzGiBc2xAuV
RkM4kB9/+iieA1D/OTiRJwsf6rkoVgOsN9fCw522tzOmuVfiqDS4bFYv00sX /dFGwasHUUf3
DsLhpDSYdejb74SKjtuqLDIOuAm2bA1axA6+7kjFeNIngSU6OPSMre+xAjoc /6coaMGthFD+
mimr/i/8F8wDwdyzas7oxkdCtaW8hVir8mJnbp4CbckllDMPkeQ6qQNmxSDh OeqX1jyx2cTi
/vPq+/TyxV/stlehMYID0DCCA8wCAQEwgZQwgYwxCzAJBgNVBAYTAklMMRYw FAYDVQQKEw1T
dGFydENvbSBMdGQuMSswKQYDVQQLEyJTZWN1cmUgRGlnaXRhbCBDZXJ0aWZp Y2F0ZSBTaWdu
aW5nMTgwNgYDVQQDEy9TdGFydENvbSBDbGFzcyAxIFByaW1hcnkgSW50ZXJt ZWRpYXRlIENs
aWVudCBDQQIDAMwuMAkGBSsOAwIaBQCgggIQMBgGCSqGSIb3DQEJAzELBgkq hkiG9w0BBwEw
HAYJKoZIhvcNAQkFMQ8XDTEwMDYwOTA4MTUzOVowIwYJKoZIhvcNAQkEMRYE FH0ZDgnSSNb8
SFovYUhpEc7Y0p8uMF8GCSqGSIb3DQEJDzFSMFAwCwYJYIZIAWUDBAECMAoG CCqGSIb3DQMH
MA4GCCqGSIb3DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggq hkiG9w0DAgIB
KDCBpQYJKwYBBAGCNxAEMYGXMIGUMIGMMQswCQYDVQQGEwJJTDEWMBQGA1UE ChMNU3RhcnRD
b20gTHRkLjErMCkGA1UECxMiU2VjdXJlIERpZ2l0YWwgQ2VydGlmaWNhdGUg U2lnbmluZzE4
MDYGA1UEAxMvU3RhcnRDb20gQ2xhc3MgMSBQcmltYXJ5IEludGVybWVkaWF0 ZSBDbGllbnQg
Q0ECAwDMLjCBpwYLKoZIhvcNAQkQAgsxgZeggZQwgYwxCzAJBgNVBAYTAklM MRYwFAYDVQQK
Ew1TdGFydENvbSBMdGQuMSswKQYDVQQLEyJTZWN1cmUgRGlnaXRhbCBDZXJ0 aWZpY2F0ZSBT
aWduaW5nMTgwNgYDVQQDEy9TdGFydENvbSBDbGFzcyAxIFByaW1hcnkgSW50 ZXJtZWRpYXRl
IENsaWVudCBDQQIDAMwuMA0GCSqGSIb3DQEBAQUABIIBAECcL4BRt8+86i27 NQAtWrIiYYmL
VgAKL8O5OpUwFsHj1/EOVjQCV1jWnnPgSFdJuonXFpFp9rCE+ITTvs/emM9x MMWwWkNtYWxs
B3+qW34S1EJ6N4+7XQ8gLBt4W+MQXFGrC76XOLLBsWK330Wzm+od0oxkgVwr A5sau3ksheQm
cngabzCRlWo6KnTzv4Z/xTaz+4bStaVbXAwKakNitYgembL2zw0qX6XrWBjn OHP/rC5ZjPgn
vkcwVqYRm16r8ik7I6t1YplBOJwPM/9pAVTYHt6D6tGVtoX7dzsCfpT2KWyb kijgVkQkaTO+
w6fsjE4iuVHBGs5eIzcj2wjrurIAAAAAAAA=
--------------ms090002090100040106090204--
--
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing SSH timeouts . Some clarification needed

am 09.06.2010 12:14:33 von Glynn Clements

query wrote:

> Guys , since we are clear now that we are not behind NAT , so we can
> forget now about reducing the keep-alive time

Note that the same issue applies for firewalls which use connection
tracking to determine which packets to allow. Ultimately, it's the
connection tracking that's the issue, not NAT per se.

If the router "forgets" a connection because it hasn't seen any
traffic in a long time, and the result of this is that subsequent
packets are silently discarded, then the connection will cease to
work, resulting in a timeout occurring the next time either side tries
to send data.

This isn't an issue if connection tracking is only used for
scheduling.

> So, do you suggest now to enable to enable the ClientAliveInterval
> option . Also , since ClientAliveCountMax is enabled by default with a
> value of 3 ,
> so probably I will keep the value of ClientAliveInterval less than 300
> secs . I will probably keep it at 60 secs. So , the connection will
> dropout after 180 secs if there is no response .
>
> Also , somewhat strange , TCPKeepAlive option is disabled in our
> sshd_config file , not sure why . So , If ClientAliveInterval is
> enabled , can we can leave TCPKeepAlive disabled . Is our purpose
> will serve ?

The main reason to disable TCP keep-alives is that they can cause a
connection to be dropped unnecessarily. A secondary reason is that
they will cause an on-demand link-layer connection to be opened
unnecessarily, but that's less of issue nowadays.

Without keep-alives, an idle TCP connection doesn't cause any packets
to be sent. The physical link could be down for days, but the TCP
connection will remain open so long as no packets are sent while the
link is down. Enabling keep-alives will cause the connection to fail
in this situation.

The main purpose of keep-alives is to prevent the situation where the
client system crashes, leaving the server process listening for
packets which will never arive. Without keep-alives, there's no way to
distinguish between a client which has permanently vanished and one
which has merely been idle for a long time.

The SSH keep-alives (ClientAliveInterval and ServerAliveInterval)
serve a similar purpose (to force a connection to be terminated when
the other end disappears without explicitly closing the connection),
except that the SSH protocol prevents spoofing. This prevents the
situation where an attacker "silences" one side of the connection and
spoofs TCP keep-alives to prevent the server from realising that
something has happened.

--
Glynn Clements
--
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing SSH timeouts . Some clarification needed

am 10.06.2010 08:02:09 von Query

On Wed, Jun 9, 2010 at 9:22 PM, Glynn Clements m> wrote:
>
> query wrote:
>
>> okay..So ,what I can understand is that keep-alives or similar like
>> (ClientAliveInterval and ServerAliveInterval) options are never
>> going to help to prevent those timeouts . Enabling those options ,
>> will only adverse the situation .
>
> Not necessarily. If the problem is caused by connection tracking
> expiring the connection, keep-alives may prevent this from happening,
> although the default settings for TCP keep-alives are probably
> insufficient.
>
>> So , if the client host is busy for a long time and is not able to
>> send any messages to the SSH server , then the server will drop the
>> connection assuming that the client has crashed =A0for whatever reas=
on
>> if keep-alives like options are enabled .
>
> Yes, for SSH keep-alives. TCP keep-alives are handled by the kernel,
> and only require that the host is functioning and connected. Even if
> the ssh or sshd processes were completely suspended (in the sense of
> "kill -STOP ..."), TCP keep-alives will continue to be sent and/or
> acknowledged.
>
>> On the other hand , =A0if
>> keep-alive option is disabled , the server is never going to drop th=
e
>> SSH connection even if the client crashes or 100% busy ( =A0could no=
t
>> send a message to the server) or idle . The SSH connection drop was
>> initiated by the kernel as you mentioned in your first comment and w=
e
>> can do nothing on the SSH configurations =A0to avoid those timeouts.
>
> If the problem is due to connection tracking, enabling frequent
> keep-alives should prevent the connection from expiring. However, thi=
s
> can cause a connection to be dropped if the system is under heavy
> load, even if the connection is otherwise idle. The risk can be
> reduced by increasing the value for the ClientAliveCountMax or
> ServerAliveCountMax options, so that the connection is only dropped i=
f
> the process stops responding for an extended period.

okay..Thanks for the clarification . Since the host sometimes
continues to remain busy for around 2 hours , so the
ClientAliveCountMax should be a high value in our case .

==========
cpu mem
Time %util %util

06/07-23:00 - - 100.0 17.4
06/07-23:30 - - 100.0 18.1
06/08-00:00 - - 100.0 18.0
06/08-00:30 - - 100.0 17.4
=========3D


Since I am not sure of the connection tracking timeout value , So , I
am planning to put a value of (ClientAliveInterval and
ServerAliveInterval) to be 300 secs and
CountMax value to be 24 so that even in the worst case of high load ,
it continues to send message to the server so that the connection does
not break. Since in our case , both the client and server remains busy
at the same time , so I am planning to use the option on both the
client and server , so that either of it can send a send a SSH keep
alive message to inform the router that the connection is alive. But
I hope it will not add any extra load on the server since already the
CPU is 100% high .

Thanks
Zaman


>
> --
> Glynn Clements
>
--
To unsubscribe from this list: send the line "unsubscribe linux-admin" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing SSH timeouts . Some clarification needed

am 10.06.2010 15:03:30 von Glynn Clements

query wrote:

> okay..Thanks for the clarification . Since the host sometimes
> continues to remain busy for around 2 hours ,

Busy to the point that ssh/sshd doesn't get *any* CPU time for 2
hours? Either you're misunderstanding something, or that's a seriously
misconfigured server.

In general, processes which need a lot of CPU cycles should have a
lower priority than those which need little. The relative "importance"
of processes doesn't matter here. A system where the key process gets
95% CPU while support processes get the other 5% is preferable to one
where the key process gets 100% CPU and support processes are
suspended for long periods.

--
Glynn Clements
--
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing SSH timeouts . Some clarification needed

am 10.06.2010 18:35:38 von Query

On Thu, Jun 10, 2010 at 6:33 PM, Glynn Clements
wrote:
>
> query wrote:
>
>> okay..Thanks for the clarification . Since the host sometimes
>> continues to remain busy for around 2 hours ,
>
> Busy to the point that ssh/sshd doesn't get *any* CPU time for 2
> hours? Either you're misunderstanding something, or that's a seriously
> misconfigured server.

That is my misunderstanding only .The CPU is 100% busy but it is not
that all the 100% is being utilized by our process and no other
process is getting the CPU time. I will calculate an optimal value by
going through once more over the system during the peak CPU
utilization .
But I am still confused who is terminating the connection in our case
and on how is calculating the timeout value.
AS you mentioned in your first comment that it the kernel who is
terminating the connection , but based on what it is terminating
the connection . As you said earlier , Keep-alive allows us to detect
that a host is unreachable (e.g.
network failure, system crash, power failure, etc) , It is not going
to kill sshd ,

Apologies for repeating the same question , but I am still confused
regarding this.

Thanks
Zaman



>
> In general, processes which need a lot of CPU cycles should have a
> lower priority than those which need little. The relative "importance"
> of processes doesn't matter here. A system where the key process gets
> 95% CPU while support processes get the other 5% is preferable to one
> where the key process gets 100% CPU and support processes are
> suspended for long periods.
>
> --
> Glynn Clements
>
--
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing SSH timeouts . Some clarification needed

am 11.06.2010 01:52:59 von Glynn Clements

query wrote:

> >> okay..Thanks for the clarification . Since the host sometimes
> >> continues to remain busy for around 2 hours ,
> >
> > Busy to the point that ssh/sshd doesn't get *any* CPU time for 2
> > hours? Either you're misunderstanding something, or that's a seriously
> > misconfigured server.
>
> That is my misunderstanding only .The CPU is 100% busy but it is not
> that all the 100% is being utilized by our process and no other
> process is getting the CPU time. I will calculate an optimal value by
> going through once more over the system during the peak CPU
> utilization .
> But I am still confused who is terminating the connection in our case
> and on how is calculating the timeout value.
> AS you mentioned in your first comment that it the kernel who is
> terminating the connection , but based on what it is terminating
> the connection . As you said earlier , Keep-alive allows us to detect
> that a host is unreachable (e.g.
> network failure, system crash, power failure, etc) , It is not going
> to kill sshd ,

It won't kill sshd; however, if packets (data or keep-alives) which
are sent to the client stop being acknowledged, operations on the
socket will eventually fail with ETIMEDOUT. At this point, sshd will
close the connection of its own accord.

The relevant setting is /proc/sys/net/ipv4/tcp_retries2:

tcp_retries2 (integer; default: 15; since Linux 2.2)
The maximum number of times a TCP packet is retransmitted in
established state before giving up. The default value is 15,
which corresponds to a duration of approximately between 13 to
30 minutes, depending on the retransmission timeout. The
RFC 1122 specified minimum limit of 100 seconds is typically
deemed too short.

The initial retransmission timeout is determined by the measured
round-trip latency for the connection. Subsequent retransmissions
occur at exponentially increasing intervals, capped at 120 seconds.

If keep-alives aren't being sent, the connection can only time out as
a result of data being sent. If keep-alives are being sent, a timeout
can occur even if the connection is otherwise idle (that's the purpose
of keep-alives).

--
Glynn Clements
--
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing SSH timeouts . Some clarification needed

am 11.06.2010 09:22:00 von Query

Clements , thanks for all the detailed explanation . I think things
are clear to me now . Will try to apply the changes in sshd_config .

And Thanks Michael and all for providing insights on the issue .

Thanks
Zaman

On Fri, Jun 11, 2010 at 5:22 AM, Glynn Clements
wrote:
>
> query wrote:
>
>> >> okay..Thanks for the clarification . Since the host sometimes
>> >> continues to remain busy for around 2 hours ,
>> >
>> > Busy to the point that ssh/sshd doesn't get *any* CPU time for 2
>> > hours? Either you're misunderstanding something, or that's a serio=
usly
>> > misconfigured server.
>>
>> That is my misunderstanding only .The CPU is 100% busy but it is not
>> that all the 100% is being utilized by our process and no other
>> process is getting the CPU time. =A0I will calculate an optimal valu=
e by
>> going through once more over the system during the peak CPU
>> utilization .
>> But I am still confused who is terminating the connection in our cas=
e
>> and on how is calculating the timeout value.
>> AS you mentioned in your first comment that it the kernel who is
>> terminating the connection , but based on what it is terminating
>> the connection . As you said earlier , Keep-alive allows us to detec=
t
>> that a host is unreachable (e.g.
>> network failure, system crash, power failure, etc) , It is not going
>> to kill sshd ,
>
> It won't kill sshd; however, if packets (data or keep-alives) which
> are sent to the client stop being acknowledged, operations on the
> socket will eventually fail with ETIMEDOUT. At this point, sshd will
> close the connection of its own accord.
>
> The relevant setting is /proc/sys/net/ipv4/tcp_retries2:
>
> =A0 =A0 =A0 tcp_retries2 (integer; default: 15; since Linux 2.2)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0The maximum number of times a TCP =A0packe=
t =A0is =A0retransmitted =A0in
> =A0 =A0 =A0 =A0 =A0 =A0 =A0established =A0state =A0before =A0giving u=
p. =A0The default value is 15,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0which corresponds to a duration of approxi=
mately between =A013 =A0to
> =A0 =A0 =A0 =A0 =A0 =A0 =A030 =A0minutes, =A0depending =A0on =A0the =A0=
retransmission =A0timeout. =A0 The
> =A0 =A0 =A0 =A0 =A0 =A0 =A0RFC 1122 specified minimum limit of =A0100=
=A0seconds =A0is =A0typically
> =A0 =A0 =A0 =A0 =A0 =A0 =A0deemed too short.
>
> The initial retransmission timeout is determined by the measured
> round-trip latency for the connection. Subsequent retransmissions
> occur at exponentially increasing intervals, capped at 120 seconds.
>
> If keep-alives aren't being sent, the connection can only time out as
> a result of data being sent. If keep-alives are being sent, a timeout
> can occur even if the connection is otherwise idle (that's the purpos=
e
> of keep-alives).
>
> --
> Glynn Clements
>
--
To unsubscribe from this list: send the line "unsubscribe linux-admin" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html