extract lines between two tags
extract lines between two tags
am 22.10.2007 14:55:17 von dave
I have a data file which has as its first column a time and a voltage
as the second field. A volt meter measured a voltage multiple times per
second. The resolution of the time is only one second, but the sample
rated is > 1 Hz.
I'd like to extract from this file all lines between any two times. As
you can see below, there are multiple lines with the same time in them.
If necessary, I can assume the data is continuously sampled, so I could
get the data between t1 and t2, by extracting those between the first
occurance of t1, and the first coinsurance of t2 + 1 second, then just
ignoring the very last line, which contains the unwanted data point at
t2 + 1 second.
Any thoughts the best way to do this?
20:23:51 -56.64
20:23:51 -56.96
20:23:51 -58.40
20:23:51 -56.96
20:23:52 -57.12
20:23:52 -57.92
20:23:52 -56.64
20:23:52 -56.80
20:23:52 -58.08
20:23:52 -56.64
20:23:52 -57.60
20:23:52 -57.76
20:23:53 -56.48
20:23:53 -57.12
20:23:53 -57.76
20:23:53 -56.64
20:23:53 -57.44
20:23:53 -57.60
20:23:53 -56.32
20:23:53 -57.60
20:23:53 -57.60
20:23:53 -56.64
20:23:53 -57.60
20:23:54 -57.60
20:23:54 -56.16
20:23:54 -57.28
20:23:54 -56.96
20:23:54 -56.32
20:23:54 -57.60
20:23:54 -57.28
20:23:54 -56.64
20:23:54 -57.60
20:23:54 -56.96
20:23:54 -56.16
20:23:55 -57.76
20:23:55 -56.96
20:23:55 -56.48
20:23:55 -58.08
20:23:55 -57.28
20:23:55 -56.64
20:23:55 -58.08
20:23:55 -57.12
20:23:55 -56.80
20:23:55 -58.24
Re: extract lines between two tags
am 22.10.2007 17:14:00 von Janis Papanagnou
Dave wrote:
> I have a data file which has as its first column a time and a voltage as
> the second field. A volt meter measured a voltage multiple times per
> second. The resolution of the time is only one second, but the sample
> rated is > 1 Hz.
>
> I'd like to extract from this file all lines between any two times. As
> you can see below, there are multiple lines with the same time in them.
> If necessary, I can assume the data is continuously sampled, so I could
> get the data between t1 and t2, by extracting those between the first
> occurance of t1, and the first coinsurance of t2 + 1 second, then just
> ignoring the very last line, which contains the unwanted data point at
> t2 + 1 second.
>
> Any thoughts the best way to do this?
Use awk. The most primitive (therefore not perfect) way is, e.g.,...
awk -v t1=20:23:52 -v t2=20:23:53 '$1>=t1 && $1<=t2' yourdatafile
or (without variables) just hard coded and inline expanded numbers...
awk '$1>=20:23:52 && $1<=20:23:53' yourdatafile
but that won't work in case you have a 23:59:59 -> 00:00:00 transition.
To catch that case you can, e.g., implement a state automaton in awk;
if you have the above requirement and need assistance come back and ask.
Janis
>
>
> 20:23:51 -56.64
> 20:23:51 -56.96
> 20:23:51 -58.40
> 20:23:51 -56.96
> 20:23:52 -57.12
> 20:23:52 -57.92
> 20:23:52 -56.64
> 20:23:52 -56.80
> 20:23:52 -58.08
> 20:23:52 -56.64
> 20:23:52 -57.60
> 20:23:52 -57.76
> 20:23:53 -56.48
> 20:23:53 -57.12
> 20:23:53 -57.76
> 20:23:53 -56.64
> 20:23:53 -57.44
> 20:23:53 -57.60
> 20:23:53 -56.32
> 20:23:53 -57.60
> 20:23:53 -57.60
> 20:23:53 -56.64
> 20:23:53 -57.60
> 20:23:54 -57.60
> 20:23:54 -56.16
> 20:23:54 -57.28
> 20:23:54 -56.96
> 20:23:54 -56.32
> 20:23:54 -57.60
> 20:23:54 -57.28
> 20:23:54 -56.64
> 20:23:54 -57.60
> 20:23:54 -56.96
> 20:23:54 -56.16
> 20:23:55 -57.76
> 20:23:55 -56.96
> 20:23:55 -56.48
> 20:23:55 -58.08
> 20:23:55 -57.28
> 20:23:55 -56.64
> 20:23:55 -58.08
> 20:23:55 -57.12
> 20:23:55 -56.80
> 20:23:55 -58.24
Re: extract lines between two tags
am 22.10.2007 17:16:00 von Janis Papanagnou
Janis Papanagnou wrote:
> Dave wrote:
>
>> I have a data file which has as its first column a time and a voltage
>> as the second field. A volt meter measured a voltage multiple times
>> per second. The resolution of the time is only one second, but the
>> sample rated is > 1 Hz.
>>
>> I'd like to extract from this file all lines between any two times. As
>> you can see below, there are multiple lines with the same time in
>> them. If necessary, I can assume the data is continuously sampled, so
>> I could get the data between t1 and t2, by extracting those between
>> the first occurance of t1, and the first coinsurance of t2 + 1 second,
>> then just ignoring the very last line, which contains the unwanted
>> data point at t2 + 1 second.
>>
>> Any thoughts the best way to do this?
>
>
> Use awk. The most primitive (therefore not perfect) way is, e.g.,...
>
> awk -v t1=20:23:52 -v t2=20:23:53 '$1>=t1 && $1<=t2' yourdatafile
>
> or (without variables) just hard coded and inline expanded numbers...
>
> awk '$1>=20:23:52 && $1<=20:23:53' yourdatafile
Sorry, the values should be quoted...
awk '$1>="20:23:52" && $1<="20:23:53"' yourdatafile
Janis
>
> but that won't work in case you have a 23:59:59 -> 00:00:00 transition.
> To catch that case you can, e.g., implement a state automaton in awk;
> if you have the above requirement and need assistance come back and ask.
>
> Janis
>
>>
>>
>> 20:23:51 -56.64
>> 20:23:51 -56.96
>> 20:23:51 -58.40
>> 20:23:51 -56.96
>> 20:23:52 -57.12
>> 20:23:52 -57.92
>> 20:23:52 -56.64
>> 20:23:52 -56.80
>> 20:23:52 -58.08
>> 20:23:52 -56.64
>> 20:23:52 -57.60
>> 20:23:52 -57.76
>> 20:23:53 -56.48
>> 20:23:53 -57.12
>> 20:23:53 -57.76
>> 20:23:53 -56.64
>> 20:23:53 -57.44
>> 20:23:53 -57.60
>> 20:23:53 -56.32
>> 20:23:53 -57.60
>> 20:23:53 -57.60
>> 20:23:53 -56.64
>> 20:23:53 -57.60
>> 20:23:54 -57.60
>> 20:23:54 -56.16
>> 20:23:54 -57.28
>> 20:23:54 -56.96
>> 20:23:54 -56.32
>> 20:23:54 -57.60
>> 20:23:54 -57.28
>> 20:23:54 -56.64
>> 20:23:54 -57.60
>> 20:23:54 -56.96
>> 20:23:54 -56.16
>> 20:23:55 -57.76
>> 20:23:55 -56.96
>> 20:23:55 -56.48
>> 20:23:55 -58.08
>> 20:23:55 -57.28
>> 20:23:55 -56.64
>> 20:23:55 -58.08
>> 20:23:55 -57.12
>> 20:23:55 -56.80
>> 20:23:55 -58.24
Re: extract lines between two tags
am 22.10.2007 17:16:59 von Icarus Sparry
On Mon, 22 Oct 2007 13:55:17 +0100, Dave wrote:
> I have a data file which has as its first column a time and a voltage as
> the second field. A volt meter measured a voltage multiple times per
> second. The resolution of the time is only one second, but the sample
> rated is > 1 Hz.
>
> I'd like to extract from this file all lines between any two times. As
> you can see below, there are multiple lines with the same time in them.
> If necessary, I can assume the data is continuously sampled, so I could
> get the data between t1 and t2, by extracting those between the first
> occurance of t1, and the first coinsurance of t2 + 1 second, then just
> ignoring the very last line, which contains the unwanted data point at
> t2 + 1 second.
>
> Any thoughts the best way to do this?
>
>
> 20:23:51 -56.64
> 20:23:51 -56.96
> 20:23:51 -58.40
> 20:23:51 -56.96
> 20:23:52 -57.12
[sample data snipped]
If you can afford to loose the first value and gain an extra value at the
end (an off by one error) then the following is trivial
sed -e '1,/^20:23:52/d' -e '/^20:23:55/q'
which deletes up to and including the first line starting 20:23:52, and
stops processing when it sees 20:23:55 at the start of the line.
To fix the off by one error
sed -n -e '1,/^20:23:51/d -e '/20:23:51/d' -e '/20:23:55/q' -e p
which deletes up to the first line starting 20:23:51, deletes all lines
that match 20:23:51 at the start, quits when it sees 20:23:55, and prints
anything that is left. The -n tells it not to print by default.
Re: extract lines between two tags
am 22.10.2007 18:13:55 von dave
Dave wrote:
> I have a data file which has as its first column a time and a voltage as
> the second field. A volt meter measured a voltage multiple times per
> second. The resolution of the time is only one second, but the sample
> rated is > 1 Hz.
>
> I'd like to extract from this file all lines between any two times. As
> you can see below, there are multiple lines with the same time in them.
> If necessary, I can assume the data is continuously sampled, so I could
> get the data between t1 and t2, by extracting those between the first
> occurance of t1, and the first coinsurance of t2 + 1 second, then just
> ignoring the very last line, which contains the unwanted data point at
> t2 + 1 second.
>
> Any thoughts the best way to do this?
Thanks everyone.
After asking, I came up with this way. It is a bit of a hack, but seems
to work.
It needed the gnu version of grep (ggrep) on my system.
ggrep -A 1000000 00:59:24 filename
prints the 1000000 lines after 00:59:24. Combining that with the -B
option, which prints the lines before a given tag, and it works.
ggrep -A 1000000 23:22:03 filename > /tmp/foo.$$`
ggrep -B 1000000 23:24:18 /tmp/foo.$$
one could use a pipe rather a temp file of course.