VOIP (fun and relavent to comp.unix.shell)

am 27.11.2007 22:08:58 von Edward Rosten

On Nov 26, 4:08 pm, john.f.kl...@gmail.com wrote:

> Here is the pipe flow
>
> audio/video signal | VOIP type compression | more compression like
> arithmetic compression ? | encryption | netcat ---> listener on the
> other computer to reverse all these.

Well, I've always been interested by this, but never had time to test
it.

So, something very basic like this may work:

cat /dev/audio | netcat -p 6666 host
nectat -lp 6666 > /dev/audio

One could layer in compression using something like this (does ogg*
understand - as stdout/in?):

oggenc -B 8 -C 1 -R 8000 -o - /dev/audio | netcat -p 6666 host
netcat -lp 6666 | ogg123 -

Compression can then be added using ssh port forwarding, or a 1 liner
like this:

oggenc -B 8 -C 1 -R 8000 -o - /dev/audio | ssh user@host: 'ogg123 -'

There are problems. All delays will add up, rapidly giving
unacceptable latency. One needs a "squishing buffer" on the receiving
end:

netcat -lp 6666 | ogg123 | squish > /dev/audio

If squish is reading data faster than writing it, it will override old
data with new data. This will remove latency, but will miss out chunks
of audio (no choice there). Can anyone think of a "squish" shell
program?

I expect though that UDP is a rather better choice than TCP for this
application. But I can't think of any easy way of doing it with
encryption. Missing blocks from compression doesn't matter since ogg
is a streaming format and can pick a stream back up.

Anyway, 1 way encrypted, compressed VOIP in one line of shell. Not a
bad illustration of the power of shell scripts.

Anyone want to make suggestions for easy ways of doing squish/
encrypted UDP? Bonus points for rejecting packets older than the most
recent.

-Ed

Re: VOIP (fun and relavent to comp.unix.shell)

am 27.11.2007 22:34:28 von Edward Rosten

On Nov 27, 2:08 pm, Edward Rosten wrote:

> oggenc -B 8 -C 1 -R 8000 -o - /dev/audio | netcat -p 6666 host
> netcat -lp 6666 | ogg123 -

The speex codec would be better for this.

One could replace oggenc with ffmpeg to grab video to have 1 way video
chat in 1 line of shell. You'd want to make ffmpeg use a streaming
container, rather than avi, though.

-Ed

Re: VOIP (fun and relavent to comp.unix.shell)

am 28.11.2007 04:28:48 von Maxwell Lol

> > Here is the pipe flow
> >
> > audio/video signal | VOIP type compression | more compression like
> > arithmetic compression ? | encryption | netcat ---> listener on the
> > other computer to reverse all these.

It this realtime? Or are you using complete files/streams/mp3s/videos.

audio/video compression is time based. And if you want to have
listenable conversations, you need to have a latency (ideally) of less
than 200 msec. It's usually better to include the compression in the
CODEC, rather than add it afterwards.

The network overhead is often more significant to the payload with
high compression of audio. Say the VoIP packet is 20 bytes of
data. How much will compression help, when you are compressing a
packet at a time? Data compression works by volume. So it only makes
sense if you compress a entire file/conversation.

>I expect though that UDP is a rather better choice than TCP for this
>application.

Because of the latency. Better to drop a packet/word than to pause for
several seconds waiting for data to be retransmitted. But if it's not
realtime, then TCP would provide reliability at the cost of latency.

>But I can't think of any easy way of doing it with
>encryption.

Encryption is also not too useful in realtime. Either it encrypts the
entire stream (and if you miss a packet, you can't decrypt), or else
it works a packet at a time. TCP encryption makes sense (i.e. ssh),
but not for UDP/realtime. It's tricky to make a good system. Bad
systems are easy.

If realtime VOIP packets contain short phrases like "the" then you are
going to be encrypting the same phrase over and over, and this makes
the cryptoanalysis easy. You can pad data with random data, making
traffic analysis harder.

Re: VOIP (fun and relavent to comp.unix.shell)

am 28.11.2007 17:54:21 von Edward Rosten

On Nov 27, 8:28 pm, Maxwell Lol wrote:
> > > Here is the pipe flow
>
> > > audio/video signal | VOIP type compression | more compression like
> > > arithmetic compression ? | encryption | netcat ---> listener on the
> > > other computer to reverse all these.
>
> It this realtime? Or are you using complete files/streams/mp3s/videos.
>
> audio/video compression is time based. And if you want to have
> listenable conversations, you need to have a latency (ideally) of less
> than 200 msec. It's usually better to include the compression in the
> CODEC, rather than add it afterwards.

I do not understand what you mean by that. By the way, the speex codec
looks like it has been designed with this in mind. It appeats to
introduce a 30ms latency, which is acceptable. I would imagine that
the network latency will dominate. It also appears to deal gracefully
with packet loss, so that makes life easier.

http://www.speex.org/comparison/

> The network overhead is often more significant to the payload with
> high compression of audio. Say the VoIP packet is 20 bytes of
> data. How much will compression help, when you are compressing a
> packet at a time? Data compression works by volume. So it only makes
> sense if you compress a entire file/conversation.

I'm going to do a back-of-the-envelope calculation here:

Consider 8Khz, 8 bit mono audio (telephone quality). That corresponds
to 64kb/s, where as comppressed it can get down to 2kbit/s.

Assume that 10 packets are being sent per second (this introduces
100ms latency, allows for 100ms network latency to come in under your
requirement above).

Uncompressed requires 800 bytes per packet, verses 25 bytes per packet
for compressed. I believe that UDP packets have 16 bytes of overhead
(including both the UDP and IP parts), so I would say that the
compression does make a significant difference in this case.

As an aside, looking at the feature list from speex, the compression
buys you a whole bunch of useful things like robustness to packet loss
and echo cancellation "for free".

> >I expect though that UDP is a rather better choice than TCP for this
> >application.
>
> Because of the latency. Better to drop a packet/word than to pause for
> several seconds waiting for data to be retransmitted. But if it's not
> realtime, then TCP would provide reliability at the cost of latency.

Yep. Now, If I could figure out how to set the packet size with
netcat, I could make sure that exactly one speex packet (assuming CBR
encoding) was sent per UDP packet.

> >But I can't think of any easy way of doing it with
> >encryption.
>
> Encryption is also not too useful in realtime. Either it encrypts the
> entire stream (and if you miss a packet, you can't decrypt), or else
> it works a packet at a time. TCP encryption makes sense (i.e. ssh),
> but not for UDP/realtime. It's tricky to make a good system. Bad
> systems are easy.

One would want to encrypt for privacy, but I'm not an encryption
expert. However, the newest TLS seems to be able to secure UDP
communications.

socat looks like a beefed up netcat, and supports TLS, but it doesn't
seem to do UDP over TLS yet.

http://www.dest-unreach.org/socat/doc/socat.html

> If realtime VOIP packets contain short phrases like "the" then you are
> going to be encrypting the same phrase over and over, and this makes
> the cryptoanalysis easy. You can pad data with random data, making
> traffic analysis harder.

Well, it makes the cryptanalysis easier, at any rate. The variability
inherent in speech and the compression will mahe it harder, though.

You say designing a bad system is easy (true), but in this case, most
of the parts (eg speex, TLS) have been designed well by other people.
They might combine to make a passable or even decent VOIP system with
compression and encryption.

It would certainly be the shortest VOIP program available...

-Ed

Re: VOIP (fun and relavent to comp.unix.shell)

am 29.11.2007 02:27:13 von Maxwell Lol

Edward Rosten writes:

> On Nov 27, 8:28 pm, Maxwell Lol wrote:
> > > > Here is the pipe flow
> >
> > > > audio/video signal | VOIP type compression | more compression like
> > > > arithmetic compression ? | encryption | netcat ---> listener on the
> > > > other computer to reverse all these.

>> audio/video compression is time based. And if you want to have
>> listenable conversations, you need to have a latency (ideally) of less
>> than 200 msec. It's usually better to include the compression in the
>> CODEC, rather than add it afterwards.
>
>I do not understand what you mean by that.

I mean that the CODEC must decide to output based on the elapsed time
of the input stream. Some might even output when there is continuous
silence (time pases with no input).

>By the way, the speex codec
>looks like it has been designed with this in mind. It appeats to
>introduce a 30ms latency, which is acceptable.

This is ADDED to the latency in the network. And you need it
on both ends.

So if you currently have a latency of 200 msec with no compression,
and then use speedx, this adds 30ms on both ends, which means you now
need a latency of 200-(2*30) or 140ms.

>I would imagine that
>the network latency will dominate.

Makes sense.

>It also appears to deal gracefully
>with packet loss, so that makes life easier.
>
>
>http://www.speex.org/comparison/

Packet loss is an issue wuth UDP, so it looks like speedx is designed
for UDP in mind. I am not an expert, but realtime VoIP over TCP seems
unlikely except if you have a fast LAN.

> > The network overhead is often more significant to the payload with
> > high compression of audio. Say the VoIP packet is 20 bytes of
> > data. How much will compression help, when you are compressing a
> > packet at a time? Data compression works by volume. So it only makes
> > sense if you compress a entire file/conversation.
>
> Uncompressed requires 800 bytes per packet, verses 25 bytes per packet
> for compressed. I believe that UDP packets have 16 bytes of overhead
> (including both the UDP and IP parts), so I would say that the
> compression does make a significant difference in this case.

I was talking about the "more compression like arithmetic compression?"
part - not the compression by the CODEC.

I just did a

% dd count=25 ibs=1/tmp/a
% gzip /tmp/a

and the data grew from 25 bytes to 48 bytes.
So using arithmetic compression doesn't make sense if the CODEC is decent.

> You say designing a bad system is easy (true), but in this case, most
> of the parts (eg speex, TLS) have been designed well by other people.
> They might combine to make a passable or even decent VOIP system with
> compression and encryption.

Well, it would be somewhat private, but security experts won't think
much of the privacy. It will defeat casual observers.

I'd add random bytes both before and after the payload,
and encrypt each packet. But this also eats into the latency. If it
takes 30ms per encryption/decryption, then the 140ms above drops to
80ms.

The 200ms max is idea. It allows untrained people to talk without
stepping on the delayed response from the other side.

Now - you can get by with greater latency, but the more latency you
allow, the more training you need. You can use the military style and
say "Over" when you are done speaking.

I'd be interested if anyone tries this.

Re: VOIP (fun and relavent to comp.unix.shell)

am 29.11.2007 18:15:33 von Edward Rosten

On Nov 28, 6:27 pm, Maxwell Lol wrote:

> >By the way, the speex codec
> >looks like it has been designed with this in mind. It appeats to
> >introduce a 30ms latency, which is acceptable.
>
> This is ADDED to the latency in the network. And you need it
> on both ends.

Yes it is added, but not on both ends. With a very fast CPU, decoding
can be done more or less instantly. For encoding, the encoder is
limited since it needs 30ms worth of sound to encode.

> >It also appears to deal gracefully
> >with packet loss, so that makes life easier.
>
> >http://www.speex.org/comparison/
>
> Packet loss is an issue wuth UDP, so it looks like speedx is designed
> for UDP in mind. I am not an expert, but realtime VoIP over TCP seems
> unlikely except if you have a fast LAN.

Agreed.

> I was talking about the "more compression like arithmetic compression?"
> part - not the compression by the CODEC.

> and the data grew from 25 bytes to 48 bytes.
> So using arithmetic compression doesn't make sense if the CODEC is decent.

Yes: if the codec is any good, then that extra layer of encoding will
already have been done. Speex looks well designed, so I doubt that any
simple post processing compression hacks will help.

> > You say designing a bad system is easy (true), but in this case, most
> > of the parts (eg speex, TLS) have been designed well by other people.
> > They might combine to make a passable or even decent VOIP system with
> > compression and encryption.
>
> Well, it would be somewhat private, but security experts won't think
> much of the privacy. It will defeat casual observers.

I don't know how good

> I'd add random bytes both before and after the payload,
> and encrypt each packet. But this also eats into the latency. If it
> takes 30ms per encryption/decryption, then the 140ms above drops to
> 80ms.

Assume that you can generate crypto-secure random numbers much faster
than you need them (reasonable on a comptuer with a hardware RNG), and
that encryption is reasonably fast (this seems to be the case with a
modern CPU), this should not harm latency much. Of course, the data-
rate goes up. So you'll need larger packets, or more frequent packets.

> The 200ms max is idea. It allows untrained people to talk without
> stepping on the delayed response from the other side.
>
> Now - you can get by with greater latency, but the more latency you
> allow, the more training you need. You can use the military style and
> say "Over" when you are done speaking.
>
> I'd be interested if anyone tries this.

Which? military style or VOIP shell scripts?

-Ed

Re: VOIP (fun and relavent to comp.unix.shell)

am 29.11.2007 19:27:49 von Maxwell Lol

Edward Rosten writes:

> Yes it is added, but not on both ends. With a very fast CPU, decoding
> can be done more or less instantly. For encoding, the encoder is
> limited since it needs 30ms worth of sound to encode.

Ah, Thanks for the update.

> > I'd add random bytes both before and after the payload,
> > and encrypt each packet. But this also eats into the latency. If it
> > takes 30ms per encryption/decryption, then the 140ms above drops to
> > 80ms.
>
> Assume that you can generate crypto-secure random numbers much faster
> than you need them (reasonable on a comptuer with a hardware RNG), and
> that encryption is reasonably fast (this seems to be the case with a
> modern CPU), this should not harm latency much. Of course, the data-
> rate goes up. So you'll need larger packets, or more frequent packets.

Yup.
> > I'd be interested if anyone tries this.
>
> Which? military style or VOIP shell scripts?

Military VoIP is just people training. As kids we would pretend to do this.
No big deal.

I'm interested in the VoIP shell scripts, and how it turns out.
The entire concept of Unix pipes don't really match in what VoIP must do.

Think of
A | B | C | D

Normally it's block buffered. Even if the applications use unbuffered
I/O, the programs would have to generate output based on elapsed time.

Consider someone saying "Ummmmmmmmmmmmmmmmmmmmmmmmmmmmm" for a full
second. The CODEC needs to output something even when the input
doesn't "change." And the next program on the other end of the pipe
has to recognize the end of a unit of time as well.

Perhaps if each timeslice was treated as a single line (ASCII HEX?),
and the line terminator determined the end of the time unit.

Re: VOIP (fun and relavent to comp.unix.shell)

am 29.11.2007 23:26:21 von Edward Rosten

On Nov 29, 11:27 am, Maxwell Lol wrote:

> Think of
> A | B | C | D
>
> Normally it's block buffered. Even if the applications use unbuffered
> I/O, the programs would have to generate output based on elapsed time.

It should be easy to try. I have no microphone and broken sound,
though (I think I busted the output on the soundcard with a static
shock).

> Consider someone saying "Ummmmmmmmmmmmmmmmmmmmmmmmmmmmm" for a full
> second. The CODEC needs to output something even when the input
> doesn't "change." And the next program on the other end of the pipe
> has to recognize the end of a unit of time as well.

> Perhaps if each timeslice was treated as a single line (ASCII HEX?),
> and the line terminator determined the end of the time unit.

I think the constant bitrate codecs will do the job. They give less
good compression, since a VBR codec could represent mmmmmmm and
mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm with about the same number of
bits. The CBR ones loose that, but have the advantage that they're
more suitable for low latency work. I think speex, mp3 and vorbis can
all encode CBR.

I'm still looking at netcat to try to figure out how to change the UDP
packet size.

-Ed