While loop slowness

am 01.04.2008 19:41:42 von eskgwin

I have a while inside a while inside a while that is very slow for
large reads. Here is the code (it is really long):

{ while read myline; do
if [[ $myline = ""* ]];then
read dayline
firstpass="${dayline##}"
daytime="${firstpass%%}"
linevals=$daytime$comma
read vehicletime
read meascolumn
read measvalue
read limitsflag
i=0
while [[ $i -le $meascount ]];do
firstpass="${meascolumn##}"
meascol="${firstpass%%}"
if [[ $meascol = $i ]];then
firstpass="${measvalue##}"
measval="${firstpass%%}"
linevals=$linevals$measval$comma
read meascolumn
if [[ $meascolumn = ""* ]];then
while [[ $i < $(($meascount-1)) ]];do
linevals=$linevals$comma
i=$(($i+1))
done
break
fi
read measvalue
read limitsflag
else
linevals=$linevals$comma
fi
i=$(($i+1))
done
fi
if [[ $myline = ""* ]];then
break
fi
if [[ $myline = ""* || $meascolumn = " meas>"* ]];then
while [[ $i -le $(($meascount-1)) ]];do
linevals=$linevals$comma
i=$(($i+1))
done linevals1="${linevals%,,}"
print $linevals1 >> $3
continue
fi
done } < $dspbfile

So, what this does is takes data from one file that looks like this
(and this is just a a partial file):

2008/035:23:08:09.803
83289.803
8
-25.0335
<
2008/035:23:08:25.333
83305.333
9
0

11
3.22123e+09

/limits-flag>

And prints it into a file that looks like this:
2008/035:23:08:09.803,,,,,,,,,-25.0335,,,
2008/035:23:08:25.333,,,,,,,,,,0,,3.22123e+09

Where the meas-column field is where the value gets put and if there
is no value for the column (they are in order), then it will just get
a comma. And there needs to be commas for each mnemonic (which I do
know how many there are) even if it has no value.

When I have only 60 samples in the first file, it runs very quickly.
When I have 274,100 samples in the first file, it takes 2-3 hours to
run.

Is there a quicker way to do this? If not, that is ok. I just can't
seem to find one. Thanks for any help.

Allyson

Re: While loop slowness

am 01.04.2008 19:56:12 von Janis Papanagnou

eskgwin@gmail.com wrote:
> I have a while inside a while inside a while that is very slow for
> large reads. Here is the code (it is really long):
>
> { while read myline; do
> if [[ $myline = ""* ]];then
> read dayline
> firstpass="${dayline##}"
> daytime="${firstpass%%}"
> linevals=$daytime$comma
> read vehicletime
> read meascolumn
> read measvalue
> read limitsflag
> i=0
> while [[ $i -le $meascount ]];do
> firstpass="${meascolumn##}"
> meascol="${firstpass%%}"
> if [[ $meascol = $i ]];then
> firstpass="${measvalue##}"
> measval="${firstpass%%}"
> linevals=$linevals$measval$comma
> read meascolumn
> if [[ $meascolumn = ""* ]];then
> while [[ $i < $(($meascount-1)) ]];do
> linevals=$linevals$comma
> i=$(($i+1))
> done
> break
> fi
> read measvalue
> read limitsflag
> else
> linevals=$linevals$comma
> fi
> i=$(($i+1))
> done
> fi
> if [[ $myline = ""* ]];then
> break
> fi
> if [[ $myline = ""* || $meascolumn = " > meas>"* ]];then
> while [[ $i -le $(($meascount-1)) ]];do
> linevals=$linevals$comma
> i=$(($i+1))
> done linevals1="${linevals%,,}"
> print $linevals1 >> $3
> continue
> fi
> done } < $dspbfile
>
> So, what this does is takes data from one file that looks like this
> (and this is just a a partial file):
>
>
> 2008/035:23:08:09.803
> 83289.803
> 8
> -25.0335
> <
> 2008/035:23:08:25.333
> 83305.333
> 9
> 0
>
> 11
> 3.22123e+09
>
>
> /limits-flag>
>
>
>
> And prints it into a file that looks like this:
> 2008/035:23:08:09.803,,,,,,,,,-25.0335,,,
> 2008/035:23:08:25.333,,,,,,,,,,0,,3.22123e+09
>
> Where the meas-column field is where the value gets put and if there
> is no value for the column (they are in order), then it will just get
> a comma. And there needs to be commas for each mnemonic (which I do
> know how many there are) even if it has no value.
>
> When I have only 60 samples in the first file, it runs very quickly.
> When I have 274,100 samples in the first file, it takes 2-3 hours to
> run.
>
> Is there a quicker way to do this? If not, that is ok. I just can't
> seem to find one. Thanks for any help.

Have a look at xgawk (XML extended GNU awk) to process such data.

Janis

>
> Allyson
>
>
>
>
>

Re: While loop slowness

am 01.04.2008 20:05:53 von Ed Morton

On 4/1/2008 12:41 PM, eskgwin@gmail.com wrote:
> I have a while inside a while inside a while that is very slow for
> large reads. Here is the code (it is really long):
>
> { while read myline; do
> if [[ $myline = ""* ]];then
> read dayline
> firstpass="${dayline##}"
> daytime="${firstpass%%}"
> linevals=$daytime$comma
> read vehicletime
> read meascolumn
> read measvalue
> read limitsflag
> i=0
> while [[ $i -le $meascount ]];do
> firstpass="${meascolumn##}"
> meascol="${firstpass%%}"
> if [[ $meascol = $i ]];then
> firstpass="${measvalue##}"
> measval="${firstpass%%}"
> linevals=$linevals$measval$comma
> read meascolumn
> if [[ $meascolumn = ""* ]];then
> while [[ $i < $(($meascount-1)) ]];do
> linevals=$linevals$comma
> i=$(($i+1))
> done
> break
> fi
> read measvalue
> read limitsflag
> else
> linevals=$linevals$comma
> fi
> i=$(($i+1))
> done
> fi
> if [[ $myline = ""* ]];then
> break
> fi
> if [[ $myline = ""* || $meascolumn = " > meas>"* ]];then
> while [[ $i -le $(($meascount-1)) ]];do
> linevals=$linevals$comma
> i=$(($i+1))
> done linevals1="${linevals%,,}"
> print $linevals1 >> $3
> continue
> fi
> done } < $dspbfile
>
> So, what this does is takes data from one file that looks like this
> (and this is just a a partial file):
>
>
> 2008/035:23:08:09.803
> 83289.803
> 8
> -25.0335
> <
> 2008/035:23:08:25.333
> 83305.333
> 9
> 0
>
> 11
> 3.22123e+09
>
>
> /limits-flag>
>
>
>
> And prints it into a file that looks like this:
> 2008/035:23:08:09.803,,,,,,,,,-25.0335,,,
> 2008/035:23:08:25.333,,,,,,,,,,0,,3.22123e+09
>
> Where the meas-column field is where the value gets put and if there
> is no value for the column (they are in order), then it will just get
> a comma. And there needs to be commas for each mnemonic (which I do
> know how many there are) even if it has no value.
>
> When I have only 60 samples in the first file, it runs very quickly.
> When I have 274,100 samples in the first file, it takes 2-3 hours to
> run.
>
> Is there a quicker way to do this? If not, that is ok. I just can't
> seem to find one. Thanks for any help.

shell loops are usually the wrong approach. I don't think your sample input is
quite right as it has things in it like "<" and "/limits-flag>". It
appears that you're trying to get all the between "" and ""
into a single line. If so, take a look at this using GNU awk on a modified
verion of your input file:

$ cat file

2008/035:23:08:09.803
83289.803
8
-25.0335

2008/035:23:08:25.333
83305.333
9
0

11
3.22123e+09

$ gawk -v RS="[[:space:]]*" -F'\n' '{
for (i=2;i split($i,arr,"[<> ]+")
printf "%s=\"%s\"\n",arr[2],arr[3]
}
print "----"
}' file
day-time="2008/035:23:08:09.803"
vehicle-time="83289.803"
meas-column="8"
meas-value="-25.0335"
----
day-time="2008/035:23:08:25.333"
vehicle-time="83305.333"
meas-column="9"
meas-value="0"
limits-flag="/limits-flag"
----
meas-column="11"
meas-value="3.22123e+09"
limits-flag="/limits-flag"
----

and if it seems to be roughly pulling out and grouping the right information, we
could tidy it up and figure out how to deal with the missing fields for each record.

Ed.

Re: While loop slowness

am 01.04.2008 20:18:18 von eskgwin

On Apr 1, 11:05=A0am, Ed Morton wrote:
> On 4/1/2008 12:41 PM, eskg...@gmail.com wrote:
>
>
>
>
>
> > I have a while inside a while inside a while that is very slow for
> > large reads. Here is the code (it is really long):
>
> > { while read myline; do
> > =A0 =A0 =A0 =A0 if [[ $myline =3D ""* ]];then
> > =A0 =A0 =A0 =A0 =A0 read dayline
> > =A0 =A0 =A0 =A0 =A0 firstpass=3D"${dayline##}"
> > =A0 =A0 =A0 =A0 =A0 daytime=3D"${firstpass%%}"
> > =A0 =A0 =A0 =A0 =A0 linevals=3D$daytime$comma
> > =A0 =A0 =A0 =A0 =A0 read vehicletime
> > =A0 =A0 =A0 =A0 =A0 read meascolumn
> > =A0 =A0 =A0 =A0 =A0 read measvalue
> > =A0 =A0 =A0 =A0 =A0 read limitsflag
> > =A0 =A0 =A0 =A0 =A0 i=3D0
> > =A0 =A0 =A0 =A0 =A0 while [[ $i -le $meascount ]];do
> > =A0 =A0 =A0 =A0 =A0 =A0 firstpass=3D"${meascolumn##}"
> > =A0 =A0 =A0 =A0 =A0 =A0 meascol=3D"${firstpass%%}"
> > =A0 =A0 =A0 =A0 =A0 =A0 if [[ $meascol =3D $i ]];then
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 firstpass=3D"${measvalue##}"
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 measval=3D"${firstpass%%}"
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 linevals=3D$linevals$measval$comma
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 read meascolumn
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 if [[ $meascolumn =3D ""* ]];then=

> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 while [[ $i < $(($meascount-1)) ]];do
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 linevals=3D$linevals$comma
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 i=3D$(($i+1))
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 done
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 fi
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 read measvalue
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 read limitsflag
> > =A0 =A0 =A0 =A0 =A0 =A0 else
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 linevals=3D$linevals$comma
> > =A0 =A0 =A0 =A0 =A0 =A0 fi
> > =A0 =A0 =A0 =A0 =A0 =A0 i=3D$(($i+1))
> > =A0 =A0 =A0 =A0 =A0 done
> > =A0 =A0 =A0 =A0 fi
> > =A0 =A0 =A0 =A0 if [[ $myline =3D ""* ]];then
> > =A0 =A0 =A0 =A0 =A0 break
> > =A0 =A0 =A0 =A0 fi
> > =A0 =A0 =A0 =A0 if [[ $myline =3D ""* || $meascolumn =3D " lm-
> > meas>"* ]];then
> > =A0 =A0 =A0 =A0 =A0 while [[ $i -le $(($meascount-1)) ]];do
> > =A0 =A0 =A0 =A0 =A0 =A0 linevals=3D$linevals$comma
> > =A0 =A0 =A0 =A0 =A0 =A0 i=3D$(($i+1))
> > =A0 =A0 =A0 =A0 =A0 done linevals1=3D"${linevals%,,}"
> > =A0 =A0 =A0 =A0 =A0 print $linevals1 >> $3
> > =A0 =A0 =A0 =A0 =A0 continue
> > =A0 =A0 =A0 =A0 fi
> > =A0 =A0 done } < $dspbfile
>
> > So, what this does is takes data from one file that looks like this
> > (and this is just a a partial file):
>
> >
> > 2008/035:23:08:09.803
> > =A0 83289.803
> > 8
> > -25.0335
> > <
> > 2008/035:23:08:25.333
> > =A0 83305.333
> > 9
> > 0
> >
> > 11
> > 3.22123e+09
> >
> >
> > /limits-flag>
> >
> >
>
> > And prints it into a file that looks like this:
> > 2008/035:23:08:09.803,,,,,,,,,-25.0335,,,
> > 2008/035:23:08:25.333,,,,,,,,,,0,,3.22123e+09
>
> > Where the meas-column field is where the value gets put and if there
> > is no value for the column (they are in order), then it will just get
> > a comma. And there needs to be commas for each mnemonic (which I do
> > know how many there are) even if it has no value.
>
> > When I have only 60 samples in the first file, it runs very quickly.
> > When I have 274,100 samples in the first file, it takes 2-3 hours to
> > run.
>
> > Is there a quicker way to do this? If not, that is ok. =A0I just can't
> > seem to find one. Thanks for any help.
>
> shell loops are usually the wrong approach. I don't think your sample inpu=
t is
> quite right as it has things in it like "<" and "/limits-flag>".=
It
> appears that you're trying to get all the between "" and " meas>"
> into a single line. If so, take a look at this using GNU awk on a modified=

> verion of your input file:
>
> $ cat file
>
> 2008/035:23:08:09.803
> =A0 83289.803
> 8
> -25.0335
>
>
> 2008/035:23:08:25.333
> =A0 83305.333
> 9
> 0
>
>
>
> 11
> 3.22123e+09
>
>
> $ gawk -v RS=3D"[[:space:]]*" -F'\n' '{
> =A0 =A0 =A0 =A0 for (i=3D2;i > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 split($i,arr,"[<> ]+")
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 printf "%s=3D\"%s\"\n",arr[2],arr[3]
> =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 print "----"}' file
>
> day-time=3D"2008/035:23:08:09.803"
> vehicle-time=3D"83289.803"
> meas-column=3D"8"
> meas-value=3D"-25.0335"
> ----
> day-time=3D"2008/035:23:08:25.333"
> vehicle-time=3D"83305.333"
> meas-column=3D"9"
> meas-value=3D"0"
> limits-flag=3D"/limits-flag"
> ----
> meas-column=3D"11"
> meas-value=3D"3.22123e+09"
> limits-flag=3D"/limits-flag"
> ----
>
> and if it seems to be roughly pulling out and grouping the right informati=
on, we
> could tidy it up and figure out how to deal with the missing fields for ea=
ch record.
>
> =A0 =A0 =A0 =A0 Ed.- Hide quoted text -
>
> - Show quoted text -

The only data I need actually is the day-time and meas-value. I need
the meas-column to figure out where to put each value in the line. The
part that seems hard is to figure out how to deal with the missing
fields and getting the values in the right places. Thanks.

Allyson

Re: While loop slowness

am 01.04.2008 20:27:08 von Ed Morton

On 4/1/2008 1:18 PM, eskgwin@gmail.com wrote:
> On Apr 1, 11:05 am, Ed Morton wrote:
>
>>On 4/1/2008 12:41 PM, eskg...@gmail.com wrote:
>>
>>
>>
>>
>>
>>
>>>I have a while inside a while inside a while that is very slow for
>>>large reads. Here is the code (it is really long):
>>
>>>{ while read myline; do
>>> if [[ $myline = ""* ]];then
>>> read dayline
>>> firstpass="${dayline##}"
>>> daytime="${firstpass%%}"
>>> linevals=$daytime$comma
>>> read vehicletime
>>> read meascolumn
>>> read measvalue
>>> read limitsflag
>>> i=0
>>> while [[ $i -le $meascount ]];do
>>> firstpass="${meascolumn##}"
>>> meascol="${firstpass%%}"
>>> if [[ $meascol = $i ]];then
>>> firstpass="${measvalue##}"
>>> measval="${firstpass%%}"
>>> linevals=$linevals$measval$comma
>>> read meascolumn
>>> if [[ $meascolumn = ""* ]];then
>>> while [[ $i < $(($meascount-1)) ]];do
>>> linevals=$linevals$comma
>>> i=$(($i+1))
>>> done
>>> break
>>> fi
>>> read measvalue
>>> read limitsflag
>>> else
>>> linevals=$linevals$comma
>>> fi
>>> i=$(($i+1))
>>> done
>>> fi
>>> if [[ $myline = ""* ]];then
>>> break
>>> fi
>>> if [[ $myline = ""* || $meascolumn = " >>>meas>"* ]];then
>>> while [[ $i -le $(($meascount-1)) ]];do
>>> linevals=$linevals$comma
>>> i=$(($i+1))
>>> done linevals1="${linevals%,,}"
>>> print $linevals1 >> $3
>>> continue
>>> fi
>>> done } < $dspbfile
>>
>>>So, what this does is takes data from one file that looks like this
>>>(and this is just a a partial file):
>>
>>>
>>>2008/035:23:08:09.803
>>> 83289.803
>>>8
>>>-25.0335
>>><
>>>2008/035:23:08:25.333
>>> 83305.333
>>>9
>>>0
>>>
>>>11
>>>3.22123e+09
>>>
>>>
>>>/limits-flag>
>>>
>>>
>>
>>>And prints it into a file that looks like this:
>>>2008/035:23:08:09.803,,,,,,,,,-25.0335,,,
>>>2008/035:23:08:25.333,,,,,,,,,,0,,3.22123e+09
>>
>>>Where the meas-column field is where the value gets put and if there
>>>is no value for the column (they are in order), then it will just get
>>>a comma. And there needs to be commas for each mnemonic (which I do
>>>know how many there are) even if it has no value.
>>
>>>When I have only 60 samples in the first file, it runs very quickly.
>>>When I have 274,100 samples in the first file, it takes 2-3 hours to
>>>run.
>>
>>>Is there a quicker way to do this? If not, that is ok. I just can't
>>>seem to find one. Thanks for any help.
>>
>>shell loops are usually the wrong approach. I don't think your sample input is
>>quite right as it has things in it like "<" and "/limits-flag>". It
>>appears that you're trying to get all the between "" and ""
>>into a single line. If so, take a look at this using GNU awk on a modified
>>verion of your input file:
>>
>>$ cat file
>>
>>2008/035:23:08:09.803
>> 83289.803
>>8
>>-25.0335
>>
>>
>>2008/035:23:08:25.333
>> 83305.333
>>9
>>0
>>
>>
>>
>>11
>>3.22123e+09
>>
>>
>>$ gawk -v RS="[[:space:]]*" -F'\n' '{
>> for (i=2;i >> split($i,arr,"[<> ]+")
>> printf "%s=\"%s\"\n",arr[2],arr[3]
>> }
>> print "----"}' file
>>
>>day-time="2008/035:23:08:09.803"
>>vehicle-time="83289.803"
>>meas-column="8"
>>meas-value="-25.0335"
>>----
>>day-time="2008/035:23:08:25.333"
>>vehicle-time="83305.333"
>>meas-column="9"
>>meas-value="0"
>>limits-flag="/limits-flag"
>>----
>>meas-column="11"
>>meas-value="3.22123e+09"
>>limits-flag="/limits-flag"
>>----
>>
>>and if it seems to be roughly pulling out and grouping the right information, we
>>could tidy it up and figure out how to deal with the missing fields for each record.
>>
>> Ed.- Hide quoted text -
>>
>>- Show quoted text -
>
>
> The only data I need actually is the day-time and meas-value. I need
> the meas-column to figure out where to put each value in the line. The
> part that seems hard is to figure out how to deal with the missing
> fields and getting the values in the right places. Thanks.
>

OK, so given the input file I show above, we can do this:

gawk -v OFS="," -v RS="[[:space:]]*" -F'\n' '{
dayTime=measColumn=measValue=""
for (i=2;i split($i,arr,"[<> ]+")
if (arr[2] == "day-time") {
dayTime=arr[3]
}
if (arr[2] == "meas-column") {
measColumn=arr[3]
}
if (arr[2] == "meas-value") {
measValue=arr[3]
}
}
print dayTime,measColumn,measValue
}' file
2008/035:23:08:09.803,8,-25.0335
2008/035:23:08:25.333,9,0
,11,3.22123e+09

What needs to be done now?

Ed.

Re: While loop slowness

am 01.04.2008 20:36:41 von eskgwin

On Apr 1, 11:27=A0am, Ed Morton wrote:
> On 4/1/2008 1:18 PM, eskg...@gmail.com wrote:
>
>
>
>
>
> > On Apr 1, 11:05 am, Ed Morton wrote:
>
> >>On 4/1/2008 12:41 PM, eskg...@gmail.com wrote:
>
> >>>I have a while inside a while inside a while that is very slow for
> >>>large reads. Here is the code (it is really long):
>
> >>>{ while read myline; do
> >>> =A0 =A0 =A0 =A0if [[ $myline =3D ""* ]];then
> >>> =A0 =A0 =A0 =A0 =A0read dayline
> >>> =A0 =A0 =A0 =A0 =A0firstpass=3D"${dayline##}"
> >>> =A0 =A0 =A0 =A0 =A0daytime=3D"${firstpass%%}"
> >>> =A0 =A0 =A0 =A0 =A0linevals=3D$daytime$comma
> >>> =A0 =A0 =A0 =A0 =A0read vehicletime
> >>> =A0 =A0 =A0 =A0 =A0read meascolumn
> >>> =A0 =A0 =A0 =A0 =A0read measvalue
> >>> =A0 =A0 =A0 =A0 =A0read limitsflag
> >>> =A0 =A0 =A0 =A0 =A0i=3D0
> >>> =A0 =A0 =A0 =A0 =A0while [[ $i -le $meascount ]];do
> >>> =A0 =A0 =A0 =A0 =A0 =A0firstpass=3D"${meascolumn##}"
> >>> =A0 =A0 =A0 =A0 =A0 =A0meascol=3D"${firstpass%%}"
> >>> =A0 =A0 =A0 =A0 =A0 =A0if [[ $meascol =3D $i ]];then
> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0firstpass=3D"${measvalue##}"
> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0measval=3D"${firstpass%%}"
> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0linevals=3D$linevals$measval$comma
> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0read meascolumn
> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0if [[ $meascolumn =3D ""* ]];the=
n
> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0while [[ $i < $(($meascount-1)) ]];do
> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0linevals=3D$linevals$comma
> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0i=3D$(($i+1))
> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0done
> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break
> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0fi
> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0read measvalue
> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0read limitsflag
> >>> =A0 =A0 =A0 =A0 =A0 =A0else
> >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0linevals=3D$linevals$comma
> >>> =A0 =A0 =A0 =A0 =A0 =A0fi
> >>> =A0 =A0 =A0 =A0 =A0 =A0i=3D$(($i+1))
> >>> =A0 =A0 =A0 =A0 =A0done
> >>> =A0 =A0 =A0 =A0fi
> >>> =A0 =A0 =A0 =A0if [[ $myline =3D ""* ]];then
> >>> =A0 =A0 =A0 =A0 =A0break
> >>> =A0 =A0 =A0 =A0fi
> >>> =A0 =A0 =A0 =A0if [[ $myline =3D ""* || $meascolumn =3D " tlm-
> >>>meas>"* ]];then
> >>> =A0 =A0 =A0 =A0 =A0while [[ $i -le $(($meascount-1)) ]];do
> >>> =A0 =A0 =A0 =A0 =A0 =A0linevals=3D$linevals$comma
> >>> =A0 =A0 =A0 =A0 =A0 =A0i=3D$(($i+1))
> >>> =A0 =A0 =A0 =A0 =A0done linevals1=3D"${linevals%,,}"
> >>> =A0 =A0 =A0 =A0 =A0print $linevals1 >> $3
> >>> =A0 =A0 =A0 =A0 =A0continue
> >>> =A0 =A0 =A0 =A0fi
> >>> =A0 =A0done } < $dspbfile
>
> >>>So, what this does is takes data from one file that looks like this
> >>>(and this is just a a partial file):
>
> >>>
> >>>2008/035:23:08:09.803
> >>> =A0 83289.803
> >>>8
> >>>-25.0335
> >>><
> >>>2008/035:23:08:25.333
> >>> =A0 83305.333
> >>>9
> >>>0
> >>>
> >>>11
> >>>3.22123e+09
> >>>
> >>>
> >>>/limits-flag>
> >>>
> >>>
>
> >>>And prints it into a file that looks like this:
> >>>2008/035:23:08:09.803,,,,,,,,,-25.0335,,,
> >>>2008/035:23:08:25.333,,,,,,,,,,0,,3.22123e+09
>
> >>>Where the meas-column field is where the value gets put and if there
> >>>is no value for the column (they are in order), then it will just get
> >>>a comma. And there needs to be commas for each mnemonic (which I do
> >>>know how many there are) even if it has no value.
>
> >>>When I have only 60 samples in the first file, it runs very quickly.
> >>>When I have 274,100 samples in the first file, it takes 2-3 hours to
> >>>run.
>
> >>>Is there a quicker way to do this? If not, that is ok. =A0I just can't
> >>>seem to find one. Thanks for any help.
>
> >>shell loops are usually the wrong approach. I don't think your sample in=
put is
> >>quite right as it has things in it like "<" and "/limits-flag>=
". It
> >>appears that you're trying to get all the between "" and " m-meas>"
> >>into a single line. If so, take a look at this using GNU awk on a modifi=
ed
> >>verion of your input file:
>
> >>$ cat file
> >>
> >>2008/035:23:08:09.803
> >> =A0 83289.803
> >>8
> >>-25.0335
> >>
> >>
> >>2008/035:23:08:25.333
> >> =A0 83305.333
> >>9
> >>0
> >>
> >>
> >>
> >>11
> >>3.22123e+09
> >>
> >>
> >>$ gawk -v RS=3D"[[:space:]]*" -F'\n' '{
> >> =A0 =A0 =A0 =A0for (i=3D2;i > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0split($i,arr,"[<> ]+")
> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0printf "%s=3D\"%s\"\n",arr[2],arr[3]
> >> =A0 =A0 =A0 =A0}
> >> =A0 =A0 =A0 =A0print "----"}' file
>
> >>day-time=3D"2008/035:23:08:09.803"
> >>vehicle-time=3D"83289.803"
> >>meas-column=3D"8"
> >>meas-value=3D"-25.0335"
> >>----
> >>day-time=3D"2008/035:23:08:25.333"
> >>vehicle-time=3D"83305.333"
> >>meas-column=3D"9"
> >>meas-value=3D"0"
> >>limits-flag=3D"/limits-flag"
> >>----
> >>meas-column=3D"11"
> >>meas-value=3D"3.22123e+09"
> >>limits-flag=3D"/limits-flag"
> >>----
>
> >>and if it seems to be roughly pulling out and grouping the right informa=
tion, we
> >>could tidy it up and figure out how to deal with the missing fields for =
each record.
>
> >> =A0 =A0 =A0 =A0Ed.- Hide quoted text -
>
> >>- Show quoted text -
>
> > The only data I need actually is the day-time and meas-value. I need
> > the meas-column to figure out where to put each value in the line. The
> > part that seems hard is to figure out how to deal with the missing
> > fields and getting the values in the right places. Thanks.
>
> OK, so given the input file I show above, we can do this:
>
> gawk -v OFS=3D"," -v RS=3D"[[:space:]]*" -F'\n' '{
> =A0 =A0 =A0 =A0 dayTime=3DmeasColumn=3DmeasValue=3D""
> =A0 =A0 =A0 =A0 for (i=3D2;i > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 split($i,arr,"[<> ]+")
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (arr[2] == "day-time") {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 dayTime=3Darr[3]
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (arr[2] == "meas-column") {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 measColumn=3Darr[3]
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (arr[2] == "meas-value") {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 measValue=3Darr[3]
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 print dayTime,measColumn,measValue}' file
>
> 2008/035:23:08:09.803,8,-25.0335
> 2008/035:23:08:25.333,9,0
> ,11,3.22123e+09
>
> What needs to be done now?
>
> =A0 =A0 =A0 =A0 Ed.- Hide quoted text -
>
> - Show quoted text -
It needs to look like this:

2008/035:23:08:09.803,,,,,,,,,-25.0335,,,
2008/035:23:08:25.333,,,,,,,,,,0,,3.22123e+09

with the 8 of the meas-column being the place to put the meas-value of
-25.0335. In the second line, the 9 is the column where the 0 meas-
value goes and the 11 is the column where the 3.22123e+09 goes.

Also, when I try to use gawk on my unix box:

Machine hardware: sun4u
OS version: 5.8
Processor type: sparc
Hardware: SUNW,Sun-Blade-100

I get this:
a.ksh[3]: gawk: not found

I can't even do a man on it:
No manual entry for gawk.

Is there something equivalent that I can use? Thanks.

Allyson

Re: While loop slowness

am 01.04.2008 21:03:01 von Ed Morton

On 4/1/2008 1:36 PM, eskgwin@gmail.com wrote:
> On Apr 1, 11:27 am, Ed Morton wrote:
>
>>On 4/1/2008 1:18 PM, eskg...@gmail.com wrote:
>>
>>
>>
>>
>>
>>
>>>On Apr 1, 11:05 am, Ed Morton wrote:
>>
>>>>On 4/1/2008 12:41 PM, eskg...@gmail.com wrote:
>>>
>>>>>I have a while inside a while inside a while that is very slow for
>>>>>large reads. Here is the code (it is really long):
>>>>
>>>>>{ while read myline; do
>>>>> if [[ $myline = ""* ]];then
>>>>> read dayline
>>>>> firstpass="${dayline##}"
>>>>> daytime="${firstpass%%}"
>>>>> linevals=$daytime$comma
>>>>> read vehicletime
>>>>> read meascolumn
>>>>> read measvalue
>>>>> read limitsflag
>>>>> i=0
>>>>> while [[ $i -le $meascount ]];do
>>>>> firstpass="${meascolumn##}"
>>>>> meascol="${firstpass%%}"
>>>>> if [[ $meascol = $i ]];then
>>>>> firstpass="${measvalue##}"
>>>>> measval="${firstpass%%}"
>>>>> linevals=$linevals$measval$comma
>>>>> read meascolumn
>>>>> if [[ $meascolumn = ""* ]];then
>>>>> while [[ $i < $(($meascount-1)) ]];do
>>>>> linevals=$linevals$comma
>>>>> i=$(($i+1))
>>>>> done
>>>>> break
>>>>> fi
>>>>> read measvalue
>>>>> read limitsflag
>>>>> else
>>>>> linevals=$linevals$comma
>>>>> fi
>>>>> i=$(($i+1))
>>>>> done
>>>>> fi
>>>>> if [[ $myline = ""* ]];then
>>>>> break
>>>>> fi
>>>>> if [[ $myline = ""* || $meascolumn = " >>>>>meas>"* ]];then
>>>>> while [[ $i -le $(($meascount-1)) ]];do
>>>>> linevals=$linevals$comma
>>>>> i=$(($i+1))
>>>>> done linevals1="${linevals%,,}"
>>>>> print $linevals1 >> $3
>>>>> continue
>>>>> fi
>>>>> done } < $dspbfile
>>>>
>>>>>So, what this does is takes data from one file that looks like this
>>>>>(and this is just a a partial file):
>>>>
>>>>>
>>>>>2008/035:23:08:09.803
>>>>> 83289.803
>>>>>8
>>>>>-25.0335
>>>>><
>>>>>2008/035:23:08:25.333
>>>>> 83305.333
>>>>>9
>>>>>0
>>>>>
>>>>>11
>>>>>3.22123e+09
>>>>>
>>>>>
>>>>>/limits-flag>
>>>>>
>>>>>
>>>>
>>>>>And prints it into a file that looks like this:
>>>>>2008/035:23:08:09.803,,,,,,,,,-25.0335,,,
>>>>>2008/035:23:08:25.333,,,,,,,,,,0,,3.22123e+09
>>>>
>>>>>Where the meas-column field is where the value gets put and if there
>>>>>is no value for the column (they are in order), then it will just get
>>>>>a comma. And there needs to be commas for each mnemonic (which I do
>>>>>know how many there are) even if it has no value.
>>>>
>>>>>When I have only 60 samples in the first file, it runs very quickly.
>>>>>When I have 274,100 samples in the first file, it takes 2-3 hours to
>>>>>run.
>>>>
>>>>>Is there a quicker way to do this? If not, that is ok. I just can't
>>>>>seem to find one. Thanks for any help.
>>>>
>>>>shell loops are usually the wrong approach. I don't think your sample input is
>>>>quite right as it has things in it like "<" and "/limits-flag>". It
>>>>appears that you're trying to get all the between "" and ""
>>>>into a single line. If so, take a look at this using GNU awk on a modified
>>>>verion of your input file:
>>>
>>>>$ cat file
>>>>
>>>>2008/035:23:08:09.803
>>>> 83289.803
>>>>8
>>>>-25.0335
>>>>
>>>>
>>>>2008/035:23:08:25.333
>>>> 83305.333
>>>>9
>>>>0
>>>>
>>>>
>>>>
>>>>11
>>>>3.22123e+09
>>>>
>>>>
>>>>$ gawk -v RS="[[:space:]]*" -F'\n' '{
>>>> for (i=2;i >>>> split($i,arr,"[<> ]+")
>>>> printf "%s=\"%s\"\n",arr[2],arr[3]
>>>> }
>>>> print "----"}' file
>>>
>>>>day-time="2008/035:23:08:09.803"
>>>>vehicle-time="83289.803"
>>>>meas-column="8"
>>>>meas-value="-25.0335"
>>>>----
>>>>day-time="2008/035:23:08:25.333"
>>>>vehicle-time="83305.333"
>>>>meas-column="9"
>>>>meas-value="0"
>>>>limits-flag="/limits-flag"
>>>>----
>>>>meas-column="11"
>>>>meas-value="3.22123e+09"
>>>>limits-flag="/limits-flag"
>>>>----
>>>
>>>>and if it seems to be roughly pulling out and grouping the right information, we
>>>>could tidy it up and figure out how to deal with the missing fields for each record.
>>>
>>>> Ed.- Hide quoted text -
>>>
>>>>- Show quoted text -
>>>
>>>The only data I need actually is the day-time and meas-value. I need
>>>the meas-column to figure out where to put each value in the line. The
>>>part that seems hard is to figure out how to deal with the missing
>>>fields and getting the values in the right places. Thanks.
>>
>>OK, so given the input file I show above, we can do this:
>>
>>gawk -v OFS="," -v RS="[[:space:]]*" -F'\n' '{
>> dayTime=measColumn=measValue=""
>> for (i=2;i >> split($i,arr,"[<> ]+")
>> if (arr[2] == "day-time") {
>> dayTime=arr[3]
>> }
>> if (arr[2] == "meas-column") {
>> measColumn=arr[3]
>> }
>> if (arr[2] == "meas-value") {
>> measValue=arr[3]
>> }
>> }
>> print dayTime,measColumn,measValue}' file
>>
>>2008/035:23:08:09.803,8,-25.0335
>>2008/035:23:08:25.333,9,0
>>,11,3.22123e+09
>>
>>What needs to be done now?
>>
>> Ed.- Hide quoted text -
>>
>>- Show quoted text -
>
> It needs to look like this:
>
> 2008/035:23:08:09.803,,,,,,,,,-25.0335,,,
> 2008/035:23:08:25.333,,,,,,,,,,0,,3.22123e+09
>
> with the 8 of the meas-column being the place to put the meas-value of
> -25.0335. In the second line, the 9 is the column where the 0 meas-
> value goes and the 11 is the column where the 3.22123e+09 goes.

That's not a problem but before I do any more: is my guess at your input file
format correct or should it instead be this (deleted the 2 lines immediately
before 11):

2008/035:23:08:09.803
83289.803
8
-25.0335

2008/035:23:08:25.333
83305.333
9
0

11
3.22123e+09

or should it really be something else? There's different solutions depending on
the correct input format.

> Also, when I try to use gawk on my unix box:
>
> Machine hardware: sun4u
> OS version: 5.8
> Processor type: sparc
> Hardware: SUNW,Sun-Blade-100
>
> I get this:
> a.ksh[3]: gawk: not found

Then gawk isn't in your PATH or it may not already be installed on your machine.

> I can't even do a man on it:
> No manual entry for gawk.
>
> Is there something equivalent that I can use? Thanks.

You can use any awk that allows you to use a regular-expression as it's
record-separator (RS) but the only awk I personally know of that supports that
is gawk. We could come up with workarounds but gawk has many, many features that
make it a good choice of awk to use so if I were you I'd download and install it
from http://www.gnu.org/software/gawk/ if you don't already have it.

Ed.

Re: While loop slowness

am 01.04.2008 23:18:30 von Chris Mattern

On 2008-04-01, Janis Papanagnou wrote:
> eskgwin@gmail.com wrote:

>
> Have a look at xgawk (XML extended GNU awk) to process such data.
>
Perl has a number of XML handling packages that have a lot of
use and polishing behind them; I'd recommend going that route.

--
Christopher Mattern

NOTICE
Thank you for noticing this new notice
Your noticing it has been noted
And will be reported to the authorities

Re: While loop slowness

am 02.04.2008 18:44:14 von Michael Tosch

Ed Morton wrote:
>
> On 4/1/2008 1:36 PM, eskgwin@gmail.com wrote:
>> On Apr 1, 11:27 am, Ed Morton wrote:
>>
>>> On 4/1/2008 1:18 PM, eskg...@gmail.com wrote:
>>>
>>>
>>>
>>>
>>>
>>>
>>>> On Apr 1, 11:05 am, Ed Morton wrote:
>>>>> On 4/1/2008 12:41 PM, eskg...@gmail.com wrote:
>>>>>> I have a while inside a while inside a while that is very slow for
>>>>>> large reads. Here is the code (it is really long):
>>>>>> { while read myline; do
>>>>>> if [[ $myline = ""* ]];then
>>>>>> read dayline
>>>>>> firstpass="${dayline##}"
>>>>>> daytime="${firstpass%%}"
>>>>>> linevals=$daytime$comma
>>>>>> read vehicletime
>>>>>> read meascolumn
>>>>>> read measvalue
>>>>>> read limitsflag
>>>>>> i=0
>>>>>> while [[ $i -le $meascount ]];do
>>>>>> firstpass="${meascolumn##}"
>>>>>> meascol="${firstpass%%}"
>>>>>> if [[ $meascol = $i ]];then
>>>>>> firstpass="${measvalue##}"
>>>>>> measval="${firstpass%%}"
>>>>>> linevals=$linevals$measval$comma
>>>>>> read meascolumn
>>>>>> if [[ $meascolumn = ""* ]];then
>>>>>> while [[ $i < $(($meascount-1)) ]];do
>>>>>> linevals=$linevals$comma
>>>>>> i=$(($i+1))
>>>>>> done
>>>>>> break
>>>>>> fi
>>>>>> read measvalue
>>>>>> read limitsflag
>>>>>> else
>>>>>> linevals=$linevals$comma
>>>>>> fi
>>>>>> i=$(($i+1))
>>>>>> done
>>>>>> fi
>>>>>> if [[ $myline = ""* ]];then
>>>>>> break
>>>>>> fi
>>>>>> if [[ $myline = ""* || $meascolumn = " >>>>>> meas>"* ]];then
>>>>>> while [[ $i -le $(($meascount-1)) ]];do
>>>>>> linevals=$linevals$comma
>>>>>> i=$(($i+1))
>>>>>> done linevals1="${linevals%,,}"
>>>>>> print $linevals1 >> $3
>>>>>> continue
>>>>>> fi
>>>>>> done } < $dspbfile
>>>>>> So, what this does is takes data from one file that looks like this
>>>>>> (and this is just a a partial file):
>>>>>>
>>>>>> 2008/035:23:08:09.803
>>>>>> 83289.803
>>>>>> 8
>>>>>> -25.0335
>>>>>> <
>>>>>> 2008/035:23:08:25.333
>>>>>> 83305.333
>>>>>> 9
>>>>>> 0
>>>>>>
>>>>>> 11
>>>>>> 3.22123e+09
>>>>>>
>>>>>>
>>>>>> /limits-flag>
>>>>>>
>>>>>>
>>>>>> And prints it into a file that looks like this:
>>>>>> 2008/035:23:08:09.803,,,,,,,,,-25.0335,,,
>>>>>> 2008/035:23:08:25.333,,,,,,,,,,0,,3.22123e+09
>>>>>> Where the meas-column field is where the value gets put and if there
>>>>>> is no value for the column (they are in order), then it will just get
>>>>>> a comma. And there needs to be commas for each mnemonic (which I do
>>>>>> know how many there are) even if it has no value.
>>>>>> When I have only 60 samples in the first file, it runs very quickly.
>>>>>> When I have 274,100 samples in the first file, it takes 2-3 hours to
>>>>>> run.
>>>>>> Is there a quicker way to do this? If not, that is ok. I just can't
>>>>>> seem to find one. Thanks for any help.
>>>>> shell loops are usually the wrong approach. I don't think your sample input is
>>>>> quite right as it has things in it like "<" and "/limits-flag>". It
>>>>> appears that you're trying to get all the between "" and ""
>>>>> into a single line. If so, take a look at this using GNU awk on a modified
>>>>> verion of your input file:
>>>>> $ cat file
>>>>>
>>>>> 2008/035:23:08:09.803
>>>>> 83289.803
>>>>> 8
>>>>> -25.0335
>>>>>
>>>>>
>>>>> 2008/035:23:08:25.333
>>>>> 83305.333
>>>>> 9
>>>>> 0
>>>>>
>>>>>
>>>>>
>>>>> 11
>>>>> 3.22123e+09
>>>>>
>>>>>
>>>>> $ gawk -v RS="[[:space:]]*" -F'\n' '{
>>>>> for (i=2;i >>>>> split($i,arr,"[<> ]+")
>>>>> printf "%s=\"%s\"\n",arr[2],arr[3]
>>>>> }
>>>>> print "----"}' file
>>>>> day-time="2008/035:23:08:09.803"
>>>>> vehicle-time="83289.803"
>>>>> meas-column="8"
>>>>> meas-value="-25.0335"
>>>>> ----
>>>>> day-time="2008/035:23:08:25.333"
>>>>> vehicle-time="83305.333"
>>>>> meas-column="9"
>>>>> meas-value="0"
>>>>> limits-flag="/limits-flag"
>>>>> ----
>>>>> meas-column="11"
>>>>> meas-value="3.22123e+09"
>>>>> limits-flag="/limits-flag"
>>>>> ----
>>>>> and if it seems to be roughly pulling out and grouping the right information, we
>>>>> could tidy it up and figure out how to deal with the missing fields for each record.
>>>>> Ed.- Hide quoted text -
>>>>> - Show quoted text -
>>>> The only data I need actually is the day-time and meas-value. I need
>>>> the meas-column to figure out where to put each value in the line. The
>>>> part that seems hard is to figure out how to deal with the missing
>>>> fields and getting the values in the right places. Thanks.
>>> OK, so given the input file I show above, we can do this:
>>>
>>> gawk -v OFS="," -v RS="[[:space:]]*" -F'\n' '{
>>> dayTime=measColumn=measValue=""
>>> for (i=2;i >>> split($i,arr,"[<> ]+")
>>> if (arr[2] == "day-time") {
>>> dayTime=arr[3]
>>> }
>>> if (arr[2] == "meas-column") {
>>> measColumn=arr[3]
>>> }
>>> if (arr[2] == "meas-value") {
>>> measValue=arr[3]
>>> }
>>> }
>>> print dayTime,measColumn,measValue}' file
>>>
>>> 2008/035:23:08:09.803,8,-25.0335
>>> 2008/035:23:08:25.333,9,0
>>> ,11,3.22123e+09
>>>
>>> What needs to be done now?
>>>
>>> Ed.- Hide quoted text -
>>>
>>> - Show quoted text -
>> It needs to look like this:
>>
>> 2008/035:23:08:09.803,,,,,,,,,-25.0335,,,
>> 2008/035:23:08:25.333,,,,,,,,,,0,,3.22123e+09
>>
>> with the 8 of the meas-column being the place to put the meas-value of
>> -25.0335. In the second line, the 9 is the column where the 0 meas-
>> value goes and the 11 is the column where the 3.22123e+09 goes.
>
> That's not a problem but before I do any more: is my guess at your input file
> format correct or should it instead be this (deleted the 2 lines immediately
> before 11):
>
>
> 2008/035:23:08:09.803
> 83289.803
> 8
> -25.0335
>
>
> 2008/035:23:08:25.333
> 83305.333
> 9
> 0
>
> 11
> 3.22123e+09
>
>
>
> or should it really be something else? There's different solutions depending on
> the correct input format.
>
>> Also, when I try to use gawk on my unix box:
>>
>> Machine hardware: sun4u
>> OS version: 5.8
>> Processor type: sparc
>> Hardware: SUNW,Sun-Blade-100
>>
>> I get this:
>> a.ksh[3]: gawk: not found
>
> Then gawk isn't in your PATH or it may not already be installed on your machine.
>
>> I can't even do a man on it:
>> No manual entry for gawk.
>>
>> Is there something equivalent that I can use? Thanks.
>
> You can use any awk that allows you to use a regular-expression as it's
> record-separator (RS) but the only awk I personally know of that supports that
> is gawk. We could come up with workarounds but gawk has many, many features that
> make it a good choice of awk to use so if I were you I'd download and install it
> from http://www.gnu.org/software/gawk/ if you don't already have it.
>
> Ed.
>

IMHO nawk and /usr/xpg4/bin/awk would work in this case.

But it's a good idea to download GNU awk,
I would choose http://www.sunfreeware.com/
which will guide you to the needed GNU libraries.

--
Michael Tosch @ hp : com