extract lines between two tags

extract lines between two tags

am 22.10.2007 14:55:17 von dave

I have a data file which has as its first column a time and a voltage
as the second field. A volt meter measured a voltage multiple times per
second. The resolution of the time is only one second, but the sample
rated is > 1 Hz.

I'd like to extract from this file all lines between any two times. As
you can see below, there are multiple lines with the same time in them.
If necessary, I can assume the data is continuously sampled, so I could
get the data between t1 and t2, by extracting those between the first
occurance of t1, and the first coinsurance of t2 + 1 second, then just
ignoring the very last line, which contains the unwanted data point at
t2 + 1 second.

Any thoughts the best way to do this?


20:23:51 -56.64
20:23:51 -56.96
20:23:51 -58.40
20:23:51 -56.96
20:23:52 -57.12
20:23:52 -57.92
20:23:52 -56.64
20:23:52 -56.80
20:23:52 -58.08
20:23:52 -56.64
20:23:52 -57.60
20:23:52 -57.76
20:23:53 -56.48
20:23:53 -57.12
20:23:53 -57.76
20:23:53 -56.64
20:23:53 -57.44
20:23:53 -57.60
20:23:53 -56.32
20:23:53 -57.60
20:23:53 -57.60
20:23:53 -56.64
20:23:53 -57.60
20:23:54 -57.60
20:23:54 -56.16
20:23:54 -57.28
20:23:54 -56.96
20:23:54 -56.32
20:23:54 -57.60
20:23:54 -57.28
20:23:54 -56.64
20:23:54 -57.60
20:23:54 -56.96
20:23:54 -56.16
20:23:55 -57.76
20:23:55 -56.96
20:23:55 -56.48
20:23:55 -58.08
20:23:55 -57.28
20:23:55 -56.64
20:23:55 -58.08
20:23:55 -57.12
20:23:55 -56.80
20:23:55 -58.24

Re: extract lines between two tags

am 22.10.2007 17:14:00 von Janis Papanagnou

Dave wrote:
> I have a data file which has as its first column a time and a voltage as
> the second field. A volt meter measured a voltage multiple times per
> second. The resolution of the time is only one second, but the sample
> rated is > 1 Hz.
>
> I'd like to extract from this file all lines between any two times. As
> you can see below, there are multiple lines with the same time in them.
> If necessary, I can assume the data is continuously sampled, so I could
> get the data between t1 and t2, by extracting those between the first
> occurance of t1, and the first coinsurance of t2 + 1 second, then just
> ignoring the very last line, which contains the unwanted data point at
> t2 + 1 second.
>
> Any thoughts the best way to do this?

Use awk. The most primitive (therefore not perfect) way is, e.g.,...

awk -v t1=20:23:52 -v t2=20:23:53 '$1>=t1 && $1<=t2' yourdatafile

or (without variables) just hard coded and inline expanded numbers...

awk '$1>=20:23:52 && $1<=20:23:53' yourdatafile

but that won't work in case you have a 23:59:59 -> 00:00:00 transition.
To catch that case you can, e.g., implement a state automaton in awk;
if you have the above requirement and need assistance come back and ask.

Janis

>
>
> 20:23:51 -56.64
> 20:23:51 -56.96
> 20:23:51 -58.40
> 20:23:51 -56.96
> 20:23:52 -57.12
> 20:23:52 -57.92
> 20:23:52 -56.64
> 20:23:52 -56.80
> 20:23:52 -58.08
> 20:23:52 -56.64
> 20:23:52 -57.60
> 20:23:52 -57.76
> 20:23:53 -56.48
> 20:23:53 -57.12
> 20:23:53 -57.76
> 20:23:53 -56.64
> 20:23:53 -57.44
> 20:23:53 -57.60
> 20:23:53 -56.32
> 20:23:53 -57.60
> 20:23:53 -57.60
> 20:23:53 -56.64
> 20:23:53 -57.60
> 20:23:54 -57.60
> 20:23:54 -56.16
> 20:23:54 -57.28
> 20:23:54 -56.96
> 20:23:54 -56.32
> 20:23:54 -57.60
> 20:23:54 -57.28
> 20:23:54 -56.64
> 20:23:54 -57.60
> 20:23:54 -56.96
> 20:23:54 -56.16
> 20:23:55 -57.76
> 20:23:55 -56.96
> 20:23:55 -56.48
> 20:23:55 -58.08
> 20:23:55 -57.28
> 20:23:55 -56.64
> 20:23:55 -58.08
> 20:23:55 -57.12
> 20:23:55 -56.80
> 20:23:55 -58.24

Re: extract lines between two tags

am 22.10.2007 17:16:00 von Janis Papanagnou

Janis Papanagnou wrote:
> Dave wrote:
>
>> I have a data file which has as its first column a time and a voltage
>> as the second field. A volt meter measured a voltage multiple times
>> per second. The resolution of the time is only one second, but the
>> sample rated is > 1 Hz.
>>
>> I'd like to extract from this file all lines between any two times. As
>> you can see below, there are multiple lines with the same time in
>> them. If necessary, I can assume the data is continuously sampled, so
>> I could get the data between t1 and t2, by extracting those between
>> the first occurance of t1, and the first coinsurance of t2 + 1 second,
>> then just ignoring the very last line, which contains the unwanted
>> data point at t2 + 1 second.
>>
>> Any thoughts the best way to do this?
>
>
> Use awk. The most primitive (therefore not perfect) way is, e.g.,...
>
> awk -v t1=20:23:52 -v t2=20:23:53 '$1>=t1 && $1<=t2' yourdatafile
>
> or (without variables) just hard coded and inline expanded numbers...
>
> awk '$1>=20:23:52 && $1<=20:23:53' yourdatafile

Sorry, the values should be quoted...

awk '$1>="20:23:52" && $1<="20:23:53"' yourdatafile


Janis

>
> but that won't work in case you have a 23:59:59 -> 00:00:00 transition.
> To catch that case you can, e.g., implement a state automaton in awk;
> if you have the above requirement and need assistance come back and ask.
>
> Janis
>
>>
>>
>> 20:23:51 -56.64
>> 20:23:51 -56.96
>> 20:23:51 -58.40
>> 20:23:51 -56.96
>> 20:23:52 -57.12
>> 20:23:52 -57.92
>> 20:23:52 -56.64
>> 20:23:52 -56.80
>> 20:23:52 -58.08
>> 20:23:52 -56.64
>> 20:23:52 -57.60
>> 20:23:52 -57.76
>> 20:23:53 -56.48
>> 20:23:53 -57.12
>> 20:23:53 -57.76
>> 20:23:53 -56.64
>> 20:23:53 -57.44
>> 20:23:53 -57.60
>> 20:23:53 -56.32
>> 20:23:53 -57.60
>> 20:23:53 -57.60
>> 20:23:53 -56.64
>> 20:23:53 -57.60
>> 20:23:54 -57.60
>> 20:23:54 -56.16
>> 20:23:54 -57.28
>> 20:23:54 -56.96
>> 20:23:54 -56.32
>> 20:23:54 -57.60
>> 20:23:54 -57.28
>> 20:23:54 -56.64
>> 20:23:54 -57.60
>> 20:23:54 -56.96
>> 20:23:54 -56.16
>> 20:23:55 -57.76
>> 20:23:55 -56.96
>> 20:23:55 -56.48
>> 20:23:55 -58.08
>> 20:23:55 -57.28
>> 20:23:55 -56.64
>> 20:23:55 -58.08
>> 20:23:55 -57.12
>> 20:23:55 -56.80
>> 20:23:55 -58.24

Re: extract lines between two tags

am 22.10.2007 17:16:59 von Icarus Sparry

On Mon, 22 Oct 2007 13:55:17 +0100, Dave wrote:

> I have a data file which has as its first column a time and a voltage as
> the second field. A volt meter measured a voltage multiple times per
> second. The resolution of the time is only one second, but the sample
> rated is > 1 Hz.
>
> I'd like to extract from this file all lines between any two times. As
> you can see below, there are multiple lines with the same time in them.
> If necessary, I can assume the data is continuously sampled, so I could
> get the data between t1 and t2, by extracting those between the first
> occurance of t1, and the first coinsurance of t2 + 1 second, then just
> ignoring the very last line, which contains the unwanted data point at
> t2 + 1 second.
>
> Any thoughts the best way to do this?
>
>
> 20:23:51 -56.64
> 20:23:51 -56.96
> 20:23:51 -58.40
> 20:23:51 -56.96
> 20:23:52 -57.12
[sample data snipped]

If you can afford to loose the first value and gain an extra value at the
end (an off by one error) then the following is trivial

sed -e '1,/^20:23:52/d' -e '/^20:23:55/q'

which deletes up to and including the first line starting 20:23:52, and
stops processing when it sees 20:23:55 at the start of the line.

To fix the off by one error

sed -n -e '1,/^20:23:51/d -e '/20:23:51/d' -e '/20:23:55/q' -e p

which deletes up to the first line starting 20:23:51, deletes all lines
that match 20:23:51 at the start, quits when it sees 20:23:55, and prints
anything that is left. The -n tells it not to print by default.

Re: extract lines between two tags

am 22.10.2007 18:13:55 von dave

Dave wrote:
> I have a data file which has as its first column a time and a voltage as
> the second field. A volt meter measured a voltage multiple times per
> second. The resolution of the time is only one second, but the sample
> rated is > 1 Hz.
>
> I'd like to extract from this file all lines between any two times. As
> you can see below, there are multiple lines with the same time in them.
> If necessary, I can assume the data is continuously sampled, so I could
> get the data between t1 and t2, by extracting those between the first
> occurance of t1, and the first coinsurance of t2 + 1 second, then just
> ignoring the very last line, which contains the unwanted data point at
> t2 + 1 second.
>
> Any thoughts the best way to do this?

Thanks everyone.

After asking, I came up with this way. It is a bit of a hack, but seems
to work.

It needed the gnu version of grep (ggrep) on my system.

ggrep -A 1000000 00:59:24 filename

prints the 1000000 lines after 00:59:24. Combining that with the -B
option, which prints the lines before a given tag, and it works.

ggrep -A 1000000 23:22:03 filename > /tmp/foo.$$`
ggrep -B 1000000 23:24:18 /tmp/foo.$$


one could use a pipe rather a temp file of course.