seperating parametrs in loop other then space

seperating parametrs in loop other then space

am 12.04.2008 07:23:54 von alexus

i have few huge flat files, for example one of them is ~150Mb.zip or
~610Mb uncompressed and it contains bunch of records seperated by |
(pipe), between pipes there is a text, some of the fields contains
spaces and I need to import all of it into MySQL in most optimized way
in terms of shell side, so first things first i need to parse it, what
would be the best way to read file line by line?

for i in `cat /path/to/file`; do
echo $i
done

wouldn't really work here, as some of the lines contains spaces, and
echo $i would show everything before space and i cant break line into
multiple peaces.

is there an easy way to read by line or seperate parametrs in loop by
line instead of spaces?

Re: seperating parametrs in loop other then space

am 12.04.2008 07:56:43 von jak

On Fri, 11 Apr 2008 22:23:54 -0700 (PDT), alexus
wrote:

>i have few huge flat files, for example one of them is ~150Mb.zip or
>~610Mb uncompressed and it contains bunch of records seperated by |
>(pipe), between pipes there is a text, some of the fields contains
>spaces and I need to import all of it into MySQL in most optimized way

>is there an easy way to read by line or seperate parametrs in loop by
>line instead of spaces?

cat flatfile |

while read -d '|'; do
echo "$REPLY"
done


--
Webmail for Dialup Users
http://www.isp2dial.com/freeaccounts.html

Re: seperating parametrs in loop other then space

am 12.04.2008 11:40:08 von PK

alexus wrote:

> is there an easy way to read by line or seperate parametrs in loop by
> line instead of spaces?

Please provide sample input and expected output.

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.

Re: seperating parametrs in loop other then space

am 12.04.2008 12:54:57 von Maxwell Lol

alexus writes:

> is there an easy way to read by line or seperate parametrs in loop by
> line instead of spaces?

Generally it's best to have a single process read the file and parse
it a line at a time. You can use a posix shell, AWK, perl or whatever
depending upon the complexity.

Re: seperating parametrs in loop other then space

am 12.04.2008 17:17:09 von cfajohnson

On 2008-04-12, alexus wrote:
> i have few huge flat files, for example one of them is ~150Mb.zip or
> ~610Mb uncompressed and it contains bunch of records seperated by |
> (pipe), between pipes there is a text, some of the fields contains
> spaces and I need to import all of it into MySQL in most optimized way
> in terms of shell side, so first things first i need to parse it, what
> would be the best way to read file line by line?
>
> for i in `cat /path/to/file`; do
> echo $i
> done
>
> wouldn't really work here, as some of the lines contains spaces, and
> echo $i would show everything before space and i cant break line into
> multiple peaces.
>
> is there an easy way to read by line or seperate parametrs in loop by
> line instead of spaces?

They way to read a file in a shell loop is:

while IFS= read -r line
do
printf "%s\n" "$line"
done < /path/to/file

The 'while ... read ... line' can be adjusted to break the line
into parts.

However, for files as large as you are dealing with, it is better
to use awk.

--
Chris F.A. Johnson, author
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence

Re: seperating parametrs in loop other then space

am 12.04.2008 18:17:14 von mop2

I think "while read" is a good solution, but if you like "for":

IFS=\|
for i in `cat /path/to/file`; do
echo $i
done

Re: seperating parametrs in loop other then space

am 12.04.2008 18:17:27 von Dan Mercer

"alexus" wrote in message
news:261d338e-bf14-474d-a7c0-75aee779a932@8g2000hse.googlegr oups.com...
>i have few huge flat files, for example one of them is ~150Mb.zip or
> ~610Mb uncompressed and it contains bunch of records seperated by |
> (pipe), between pipes there is a text, some of the fields contains
> spaces and I need to import all of it into MySQL in most optimized way
> in terms of shell side, so first things first i need to parse it, what
> would be the best way to read file line by line?
>
> for i in `cat /path/to/file`; do
> echo $i
> done
>

while IFS='|' read v1 v2 v3 ...
do
...
done < file

Don't use :

cat file | while read

which is a worse than UUOC, particularly if you are using bash.

Your version will in all likelihood crash your shell .

Dan Mercer


> wouldn't really work here, as some of the lines contains spaces, and
> echo $i would show everything before space and i cant break line into
> multiple peaces.
>
> is there an easy way to read by line or seperate parametrs in loop by
> line instead of spaces?

Re: seperating parametrs in loop other then space

am 12.04.2008 19:24:02 von jak

On Sat, 12 Apr 2008 11:17:27 -0500, "Dan Mercer"
wrote:

>while IFS='|' read v1 v2 v3 ...
> do
> ...
> done < file
>
>Don't use :
>
>cat file | while read
>
>which is a worse than UUOC

Not UUOC when you don't want an input file name embedded in the
script. Why obsess about it? There's more than one way to skin a
cat. I like cats.


>particularly if you are using bash. Your version will in all
>likelihood crash your shell .

???


--
Webmail for Dialup Users
http://www.isp2dial.com/freeaccounts.html

Re: seperating parametrs in loop other then space

am 12.04.2008 19:48:34 von Chris Mattern

On 2008-04-12, www.isp2dial.com wrote:
> On Sat, 12 Apr 2008 11:17:27 -0500, "Dan Mercer"
> wrote:
>
>>while IFS='|' read v1 v2 v3 ...
>> do
>> ...
>> done < file
>>
>>Don't use :
>>
>>cat file | while read
>>
>>which is a worse than UUOC
>
> Not UUOC when you don't want an input file name embedded in the
> script.

Er, why is "cat file" any less embedding the file name in the script
than "< file"? Yes, you can say "cat $variable", but you can say
"< $variable", too.


--
Christopher Mattern

NOTICE
Thank you for noticing this new notice
Your noticing it has been noted
And will be reported to the authorities

Re: seperating parametrs in loop other then space

am 12.04.2008 20:04:01 von PK

www.isp2dial.com wrote:

>>Don't use :
>>
>>cat file | while read
>>
>>which is a worse than UUOC
>
> Not UUOC when you don't want an input file name embedded in the
> script.

How does UUOC help with that? Can you provide an example?
(just curious...)

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.

Re: seperating parametrs in loop other then space

am 12.04.2008 20:08:22 von jak

On Sat, 12 Apr 2008 12:48:34 -0500, Chris Mattern
wrote:

>>>Don't use :
>>>
>>>cat file | while read
>>>
>>>which is a worse than UUOC
>>
>> Not UUOC when you don't want an input file name embedded in the
>> script.
>
>Er, why is "cat file" any less embedding the file name in the script
>than "< file"?

I don't mean "cat file | while read" is all in one script.

Suppose you have a while read loop in a small self sufficient script,
and you want to connect various input files to that script.

In that case, cat is the man for the job.

That *is* the unix philosophy. One task, one tool.


--
Webmail for Dialup Users
http://www.isp2dial.com/freeaccounts.html

Re: seperating parametrs in loop other then space

am 12.04.2008 20:08:22 von jak

On Sat, 12 Apr 2008 20:04:01 +0200, pk wrote:

>www.isp2dial.com wrote:
>
>>>Don't use :
>>>
>>>cat file | while read
>>>
>>>which is a worse than UUOC
>>
>> Not UUOC when you don't want an input file name embedded in the
>> script.
>
>How does UUOC help with that? Can you provide an example?
>(just curious...)

See previous post, thanks.


--
Webmail for Dialup Users
http://www.isp2dial.com/freeaccounts.html

Re: seperating parametrs in loop other then space

am 12.04.2008 21:11:31 von jak

On Sat, 12 Apr 2008 21:11:38 +0200, pk wrote:

>www.isp2dial.com wrote:

>> Suppose you have a while read loop in a small self sufficient script,
>> and you want to connect various input files to that script.
>>
>> In that case, cat is the man for the job.
>
>But then you're not doing a UUOC, since you're using cat to concatenate many
>files.

Right. Or it could be one file at a time, manually from the command
line, depending on the desired input.

Cat is quick. Unless you're calling it 10,000 times in some kind of
loop, why worry about UUOC. OTOH, anything done 10,000+ times should
be optimized for speed and efficiency.


--
Webmail for Dialup Users
http://www.isp2dial.com/freeaccounts.html

Re: seperating parametrs in loop other then space

am 12.04.2008 21:11:38 von PK

www.isp2dial.com wrote:

>>Er, why is "cat file" any less embedding the file name in the script
>>than "< file"?
>
> I don't mean "cat file | while read" is all in one script.
>
> Suppose you have a while read loop in a small self sufficient script,
> and you want to connect various input files to that script.
>
> In that case, cat is the man for the job.

But then you're not doing a UUOC, since you're using cat to concatenate many
files.

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.

Re: seperating parametrs in loop other then space

am 12.04.2008 21:46:57 von PK

www.isp2dial.com wrote:

>>But then you're not doing a UUOC, since you're using cat to concatenate
>>many files.
>
> Right. Or it could be one file at a time, manually from the command
> line, depending on the desired input.
>
> Cat is quick. Unless you're calling it 10,000 times in some kind of
> loop, why worry about UUOC. OTOH, anything done 10,000+ times should
> be optimized for speed and efficiency.

I guess it's just a matter of getting into the right habit. Writing

< file script

or

script < file

is just as easy as writing

cat file | script

and avoids the UUOC (actually, it's also less typing).

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.

Re: seperating parametrs in loop other then space

am 12.04.2008 21:49:30 von Janis Papanagnou

www.isp2dial.com wrote:
> On Sat, 12 Apr 2008 21:11:38 +0200, pk wrote:
>
>
>>www.isp2dial.com wrote:
>
>
>
>>>Suppose you have a while read loop in a small self sufficient script,
>>>and you want to connect various input files to that script.
>>>
>>>In that case, cat is the man for the job.
>>
>>But then you're not doing a UUOC, since you're using cat to concatenate many
>>files.
>
>
> Right. Or it could be one file at a time, manually from the command
> line, depending on the desired input.
>
> Cat is quick.

Grep is also very quick. So let's add another useless grep process?
Seriously, if you can avoid an unnecessary process and IPC then just
avoid it. Why trying to rationalize irrational habits?

> Unless you're calling it 10,000 times in some kind of
> loop, why worry about UUOC. OTOH, anything done 10,000+ times should
> be optimized for speed and efficiency.

Here, in the context oc UUOC, we're not talking about optimization.
We're talking about inventing additional code that doesn't contribute
to the task done.

Janis

Re: seperating parametrs in loop other then space

am 13.04.2008 00:43:15 von Chris Mattern

On 2008-04-12, www.isp2dial.com wrote:
> On Sat, 12 Apr 2008 12:48:34 -0500, Chris Mattern
> wrote:
>
>>>>Don't use :
>>>>
>>>>cat file | while read
>>>>
>>>>which is a worse than UUOC
>>>
>>> Not UUOC when you don't want an input file name embedded in the
>>> script.
>>
>>Er, why is "cat file" any less embedding the file name in the script
>>than "< file"?
>
> I don't mean "cat file | while read" is all in one script.
>
> Suppose you have a while read loop in a small self sufficient script,
> and you want to connect various input files to that script.

Then you can use "< file" to pipe the file to the script on the command
line you're invoking the script on. You're still not making any sense
as to why you need cat.
>
> In that case, cat is the man for the job.

Actually, no, it's not.
>
> That *is* the unix philosophy. One task, one tool.
>
Correct. And the task here is input redirection, and the tool is "<".

--
Christopher Mattern

NOTICE
Thank you for noticing this new notice
Your noticing it has been noted
And will be reported to the authorities

Re: seperating parametrs in loop other then space

am 13.04.2008 01:51:05 von jak

On Sat, 12 Apr 2008 17:43:15 -0500, Chris Mattern
wrote:

>> I don't mean "cat file | while read" is all in one script.
>>
>> Suppose you have a while read loop in a small self sufficient script,
>> and you want to connect various input files to that script.
>
>Then you can use "< file" to pipe the file to the script on the command
>line you're invoking the script on. You're still not making any sense
>as to why you need cat.
>>
>> In that case, cat is the man for the job.
>
>Actually, no, it's not.
>>
>> That *is* the unix philosophy. One task, one tool.
>>
>Correct. And the task here is input redirection, and the tool is "<".

I often have a sort, tr, etc., in the pipe between cat and while. But
if UUOC is a religion to you, don't let my beliefs trouble you ...


--
Webmail for Dialup Users
http://www.isp2dial.com/freeaccounts.html

Re: seperating parametrs in loop other then space

am 13.04.2008 03:19:01 von cfajohnson

On 2008-04-12, www.isp2dial.com wrote:
> On Sat, 12 Apr 2008 17:43:15 -0500, Chris Mattern
> wrote:
>
>>> I don't mean "cat file | while read" is all in one script.
>>>
>>> Suppose you have a while read loop in a small self sufficient script,
>>> and you want to connect various input files to that script.
>>
>>Then you can use "< file" to pipe the file to the script on the command
>>line you're invoking the script on. You're still not making any sense
>>as to why you need cat.
>>>
>>> In that case, cat is the man for the job.
>>
>>Actually, no, it's not.
>>>
>>> That *is* the unix philosophy. One task, one tool.
>>>
>>Correct. And the task here is input redirection, and the tool is "<".
>
> I often have a sort, tr, etc., in the pipe between cat and while.

Why does that make any difference? Redirect the input to the first
command in the pipeline.

One time it is useful, which you alluded to but didn't explain, is
when there are multiple files to be read into a command that can
only take its input from stdin. E.g.:

cat "$@" | tr ...


> But if UUOC is a religion to you, don't let my beliefs trouble you
> ...

--
Chris F.A. Johnson, author
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence

Re: seperating parametrs in loop other then space

am 13.04.2008 04:18:15 von alexus

On Apr 12, 5:40=A0am, pk wrote:
> alexuswrote:
> > is there an easy way to read by line or seperate parametrs in loop by
> > line instead of spaces?
>
> Please provide sample input and expected output.
>
> --
> All the commands are tested with bash and GNU tools, so they may use
> nonstandard features. I try to mention when something is nonstandard (if
> I'm aware of that), but I may miss something. Corrections are welcome.

for an example

alexus@jot ~/alexus/yourinternetbookstore.com/df/ecampus 516$ head
biggerbookspml.txt
ISBN|TITLE|AUTHOR|MEDIATYPE|COPYRIGHTDATE|LISTPRICE|CONDITIO N|
SITEPRICE|AVAILABILITY|PRODUCTURL|IMAGEURL
9780001046405|Wuthering Heights|Bronte, Emily; Shaw, Martin|Audio
Cassette|1/1/1900|18.00|N|13.75|Usually Ships in 5-7 Business Days|
http://www.biggerbooks.com/bk_detail.asp?isbn=3D978000104640 5|http://images.=
biggerbooks.com/images/d/6/405/9780001046405.jpg
9780001047587|The Adventures of Tom Sawyer|Twain, Mark|Audio Cassette|
1/1/1900|17.99|N|13.74|Usually Ships in 5-7 Business Days|http://
www.biggerbooks.com/bk_detail.asp?isbn=3D9780001047587|http: //images.biggerb=
ooks.com/images/d/7/587/9780001047587.jpg
9780001048072|Nostromo|Conrad, Joseph; Ackland, Joss|Audio Cassette|
1/1/1900|22.99|N|17.57|Usually Ships in 5-7 Business Days|http://
www.biggerbooks.com/bk_detail.asp?isbn=3D9780001048072|http: //images.biggerb=
ooks.com/images/d/8/072/9780001048072.jpg
9780001049352|Uncle Tom's Cabin|Stowe, Harriet Beecher; Ross, Ricco|
Audio Cassette|1/1/1900|14.95|N|11.42|Usually Ships in 5-7 Business
Days|http://www.biggerbooks.com/bk_detail.asp?isbn=3D9780001 049352|
http://images.biggerbooks.com/images/d/9/352/9780001049352.j pg
9780002000086|All Heart: The Autobiography of Michael Pinball Clemons|
Clemons, Michael; Loney, Don|Hardcover|9/1/1998|21.00|N|20.0|Usually
Ships in 5-7 Business Days|http://www.biggerbooks.com/bk_detail.asp?
isbn=3D9780002000086|http://images.biggerbooks.com/images/d/
0/086/9780002000086.jpg
9780002154123|France: The Beautiful Cookbook : Authentic Recipes from
the Regions of France|SCOTTO SISTERS|Hardcover|10/18/1989|50.00|N|38.2|
Usually Ships in 5-7 Business Days|http://www.biggerbooks.com/
bk_detail.asp?isbn=3D9780002154123|http://images.biggerbooks .com/images/
d/4/123/9780002154123.jpg
9780002199186|Insects of Britain & Northern Europe|Michael Chinery; M.
Chinery|Hardcover|1/1/1900|45.00|N|34.39|Usually Ships in 5-7 Business
Days|http://www.biggerbooks.com/bk_detail.asp?isbn=3D9780002 199186|
http://images.biggerbooks.com/images/d/9/186/9780002199186.j pg
9780002200370|Birds Song & Calls of Britain & Northern Europe|Sample,
Geoff|Audio Cassette|1/1/1900|45.00|N|34.39|Usually Ships in 5-7
Business Days|http://www.biggerbooks.com/bk_detail.asp?
isbn=3D9780002200370|http://images.biggerbooks.com/images/d/
0/370/9780002200370.jpg
9780002245449|Edmund and Hillary: A Tale from China Plate Farm|
Jackson, Chris|Hardcover|1/1/1997|16.00|N|12.2|Usually Ships in 5-7
Business Days|http://www.biggerbooks.com/bk_detail.asp?
isbn=3D9780002245449|http://images.biggerbooks.com/images/d/
5/449/9780002245449.jpg
alexus@jot ~/alexus/yourinternetbookstore.com/df/ecampus 517$

this is what I get as input

output should separate each filed in following format

'content of the field','content of next field','content of next field'
and so on

Re: seperating parametrs in loop other then space

am 13.04.2008 04:32:38 von alexus

On Apr 12, 11:17=A0am, "Chris F.A. Johnson"
wrote:
> On 2008-04-12,alexuswrote:
> > i have few huge flat files, for example one of them is ~150Mb.zip or
> > ~610Mb uncompressed and it contains bunch of records seperated by |
> > (pipe), between pipes there is a text, some of the fields contains
> > spaces and I need to import all of it into MySQL in most optimized way
> > in terms of shell side, so first things first i need to parse it, what
> > would be the best way to read file line by line?
>
> > for i in `cat /path/to/file`; do
> > =A0echo $i
> > done
>
> > wouldn't really work here, as some of the lines contains spaces, and
> > echo $i would show everything before space and i cant break line into
> > multiple peaces.
>
> > is there an easy way to read by line or seperate parametrs in loop by
> > line instead of spaces?
>
> =A0 =A0 They way to read a file in a shell loop is:
>
> while IFS=3D read -r line
> do
> =A0 printf "%s\n" "$line"
> done < /path/to/file
>
> =A0 =A0 The 'while ... read ... line' can be adjusted to break the line
> =A0 =A0 into parts.
>
> =A0 =A0 However, for files as large as you are dealing with, it is better
> =A0 =A0 to use awk.
>
> --
> =A0 =A0Chris F.A. Johnson, author =A0 =A0 =A0 hell/>
> =A0 =A0Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)=

> =A0  ===== My code in this post, if any, assumes the POSIX loc=
ale
> =A0  ===== and is released under the GNU General Public Licenc=
e

really? i'd thought awk would be heaver

i couldn't get your script to work:( but i've got www. isp2dial. com
to work though

Re: seperating parametrs in loop other then space

am 13.04.2008 04:33:50 von alexus

On Apr 12, 12:17=A0pm, mop2
wrote:
> I think "while read" is a good solution, but if you like "for":
>
> IFS=3D\|
> for i in `cat /path/to/file`; do
> =A0echo $i
> done

thanks! it works

Re: seperating parametrs in loop other then space

am 13.04.2008 05:03:47 von jak

On Sun, 13 Apr 2008 01:19:01 +0000, "Chris F.A. Johnson"
wrote:

>> I often have a sort, tr, etc., in the pipe between cat and while.
>
> Why does that make any difference? Redirect the input to the first
> command in the pipeline.

When I design a thing, I look for ways to fold special cases into the
general case.

Using cat to pipe two or more files is the general case. Piping one
file is the special case.

I don't strain my brain trying to remember the quirky redirection
syntax for the first command in the pipe. Always using cat as the
first command avoids that.

I like to work smart, not hard. Let the machine do the hard work.

This argument over UUOC shows why not everyone has an aptitude for
design. Good designers eliminate special cases whenever possible.


--
Webmail for Dialup Users
http://www.isp2dial.com/freeaccounts.html

Re: seperating parametrs in loop other then space

am 13.04.2008 05:15:18 von alexus

On Apr 12, 1:23=A0am, alexus wrote:
> i have few huge flat files, for example one of them is ~150Mb.zip or
> ~610Mb uncompressed and it contains bunch of records seperated by |
> (pipe), between pipes there is a text, some of the fields contains
> spaces and I need to import all of it into MySQL in most optimized way
> in terms of shell side, so first things first i need to parse it, what
> would be the best way to read file line by line?
>
> for i in `cat /path/to/file`; do
> =A0echo $i
> done
>
> wouldn't really work here, as some of the lines contains spaces, and
> echo $i would show everything before space and i cant break line into
> multiple peaces.
>
> is there an easy way to read by line or seperate parametrs in loop by
> line instead of spaces?


IFS=3D\|
for i in `head -2 biggerbookspml.txt`; do
echo \'$i\' >> biggerbookspml2.txt
done

this "sort of" worked, yet i'm facing following problem

cat biggerbookspml2.txt
'ISBN'
'TITLE'
'AUTHOR'
'MEDIATYPE'
'COPYRIGHTDATE'
'LISTPRICE'
'CONDITION'
'SITEPRICE'
'AVAILABILITY'
'PRODUCTURL'
'IMAGEURL
9780001046405'
'Wuthering Heights'
'Bronte, Emily; Shaw, Martin'
'Audio Cassette'
'1/1/1900'
'18.00'
'N'
'13.75'
'Usually Ships in 5-7 Business Days'
'http://www.biggerbooks.com/bk_detail.asp?isbn=3D97800010464 05'
'http://images.biggerbooks.com/images/d/6/405/9780001046405. jpg

1) i need each line that i take from original file go to new line into
new file, not line by line into new file
2) for some reason last time i process param it wont add single quote
as i asked

Re: seperating parametrs in loop other then space

am 13.04.2008 11:48:39 von PK

alexus wrote:

>> is there an easy way to read by line or seperate parametrs in loop by
>> line instead of spaces?
>
>
> IFS=\|
> for i in `head -2 biggerbookspml.txt`; do
> echo \'$i\' >> biggerbookspml2.txt
> done
>
> this "sort of" worked, yet i'm facing following problem
>[cut]
> 1) i need each line that i take from original file go to new line into
> new file, not line by line into new file
> 2) for some reason last time i process param it wont add single quote
> as i asked

Using the output you pasted in your other reply to me:

ISBN|TITLE|AUTHOR|MEDIATYPE|COPYRIGHTDATE|LISTPRICE|CONDITIO N|SITEPRICE|
AVAILABILITY|PRODUCTURL|IMAGEURL
9780001046405|Wuthering Heights|Bronte, Emily; Shaw, Martin|Audio Cassette
1/1/1900|18.00|N|13.75|Usually Ships in 5-7 Business Days|
http://www.biggerbooks.com/bk_detail.asp?isbn=9780001046405
http://images.biggerbooks.com/images/d/6/405/9780001046405.j pg

Where the above, without line wraps, represents two input lines (header and
a sample line). If I'm correct, then:

sed "s/^/'/;s/$/'/;s/|/','/g" yourfile.txt

or, with awk:

awk -F'|' -v OFS="','" -v sq=\' '{$1=$1; print sq $0 sq}' yourfile.txt

(the $1=$1 seems to be necessary to have awk re-evaluate the line and do
field splitting)

--->>>> Please note that you may have problems processing the output if some
field contains single quotes in the original file.

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.

Re: seperating parametrs in loop other then space

am 13.04.2008 11:51:34 von PK

pk wrote:

> ISBN|TITLE|AUTHOR|MEDIATYPE|COPYRIGHTDATE|LISTPRICE|CONDITIO N|SITEPRICE|
> AVAILABILITY|PRODUCTURL|IMAGEURL
> 9780001046405|Wuthering Heights|Bronte, Emily; Shaw, Martin|Audio Cassette
> 1/1/1900|18.00|N|13.75|Usually Ships in 5-7 Business Days|
> http://www.biggerbooks.com/bk_detail.asp?isbn=9780001046405
> http://images.biggerbooks.com/images/d/6/405/9780001046405.j pg

For some reason the "|"s were removed at the end of the third and
penultimate line, but they should be there.

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.

Re: seperating parametrs in loop other then space

am 13.04.2008 20:32:47 von gerg

www.isp2dial.com writes:
>
>When I design a thing, I look for ways to fold special cases into the
>general case.
>
>Using cat to pipe two or more files is the general case. Piping one
>file is the special case.
>

You make a good point with respect to your preference to apply the
general solution to as many specific cases as you can.

However, your original answer in the thread was a mere code example
without any explanation. You gave a general-case answer to a special-
case question, without saying so.

So it's no wonder people thought you were suggesting a non-optimal
use of 'cat' for the special case of the original question.

-Greg
--
::::::::::::: Greg Andrews ::::: gerg@panix.com :::::::::::::
I have a map of the United States that's actual size.
-- Steven Wright

Re: seperating parametrs in loop other then space

am 14.04.2008 03:12:12 von mop2

Hi alexus,
another idea (ok with bash):

while read line;do
echo \'${line//|/\',\'}\'
done


alexus wrote:
> for an example
>
> alexus@jot ~/alexus/yourinternetbookstore.com/df/ecampus 516$ head
> biggerbookspml.txt
> ISBN|TITLE|AUTHOR|MEDIATYPE|COPYRIGHTDATE|LISTPRICE|CONDITIO N|
> SITEPRICE|AVAILABILITY|PRODUCTURL|IMAGEURL
> 9780001046405|Wuthering Heights|Bronte, Emily; Shaw, Martin|Audio
> Cassette|1/1/1900|18.00|N|13.75|Usually Ships in 5-7 Business Days|
> http://www.biggerbooks.com/bk_detail.asp?isbn=9780001046405| http://images.biggerbooks.com/images/d/6/405/9780001046405.j pg
> 9780001047587|The Adventures of Tom Sawyer|Twain, Mark|Audio Cassette|
> 1/1/1900|17.99|N|13.74|Usually Ships in 5-7 Business Days|http://
> www.biggerbooks.com/bk_detail.asp?isbn=9780001047587|http:// images.biggerbooks.com/images/d/7/587/9780001047587.jpg
>
> this is what I get as input
>
> output should separate each filed in following format
>
> 'content of the field','content of next field','content of next field'
> and so on

Re: seperating parametrs in loop other then space

am 14.04.2008 03:22:54 von mop2

An idea with bash:

while read line;do
echo \'${line//|/\',\'}\'
done

alexus wrote:
> On Apr 12, 5:40ï¿=BDam, pk wrote:
> > alexuswrote:
> > > is there an easy way to read by line or seperate parametrs in loop by
> > > line instead of spaces?
> >
> > Please provide sample input and expected output.
> >
> > --
> > All the commands are tested with bash and GNU tools, so they may use
> > nonstandard features. I try to mention when something is nonstandard (if=

> > I'm aware of that), but I may miss something. Corrections are welcome.
>
> for an example
>
> alexus@jot ~/alexus/yourinternetbookstore.com/df/ecampus 516$ head
> biggerbookspml.txt
> ISBN|TITLE|AUTHOR|MEDIATYPE|COPYRIGHTDATE|LISTPRICE|CONDITIO N|
> SITEPRICE|AVAILABILITY|PRODUCTURL|IMAGEURL
> 9780001046405|Wuthering Heights|Bronte, Emily; Shaw, Martin|Audio
> Cassette|1/1/1900|18.00|N|13.75|Usually Ships in 5-7 Business Days|
> http://www.biggerbooks.com/bk_detail.asp?isbn=3D978000104640 5|http://image=
s.biggerbooks.com/images/d/6/405/9780001046405.jpg
> 9780001047587|The Adventures of Tom Sawyer|Twain, Mark|Audio Cassette|
> 1/1/1900|17.99|N|13.74|Usually Ships in 5-7 Business Days|http://
> www.biggerbooks.com/bk_detail.asp?isbn=3D9780001047587|http: //images.bigge=
rbooks.com/images/d/7/587/9780001047587.jpg
> 9780001048072|Nostromo|Conrad, Joseph; Ackland, Joss|Audio Cassette|
> 1/1/1900|22.99|N|17.57|Usually Ships in 5-7 Business Days|http://
> www.biggerbooks.com/bk_detail.asp?isbn=3D9780001048072|http: //images.bigge=
rbooks.com/images/d/8/072/9780001048072.jpg
> 9780001049352|Uncle Tom's Cabin|Stowe, Harriet Beecher; Ross, Ricco|
> Audio Cassette|1/1/1900|14.95|N|11.42|Usually Ships in 5-7 Business
> Days|http://www.biggerbooks.com/bk_detail.asp?isbn=3D9780001 049352|
> http://images.biggerbooks.com/images/d/9/352/9780001049352.j pg
> 9780002000086|All Heart: The Autobiography of Michael Pinball Clemons|
> Clemons, Michael; Loney, Don|Hardcover|9/1/1998|21.00|N|20.0|Usually
> Ships in 5-7 Business Days|http://www.biggerbooks.com/bk_detail.asp?
> isbn=3D9780002000086|http://images.biggerbooks.com/images/d/
> 0/086/9780002000086.jpg
> 9780002154123|France: The Beautiful Cookbook : Authentic Recipes from
> the Regions of France|SCOTTO SISTERS|Hardcover|10/18/1989|50.00|N|38.2|
> Usually Ships in 5-7 Business Days|http://www.biggerbooks.com/
> bk_detail.asp?isbn=3D9780002154123|http://images.biggerbooks .com/images/
> d/4/123/9780002154123.jpg
> 9780002199186|Insects of Britain & Northern Europe|Michael Chinery; M.
> Chinery|Hardcover|1/1/1900|45.00|N|34.39|Usually Ships in 5-7 Business
> Days|http://www.biggerbooks.com/bk_detail.asp?isbn=3D9780002 199186|
> http://images.biggerbooks.com/images/d/9/186/9780002199186.j pg
> 9780002200370|Birds Song & Calls of Britain & Northern Europe|Sample,
> Geoff|Audio Cassette|1/1/1900|45.00|N|34.39|Usually Ships in 5-7
> Business Days|http://www.biggerbooks.com/bk_detail.asp?
> isbn=3D9780002200370|http://images.biggerbooks.com/images/d/
> 0/370/9780002200370.jpg
> 9780002245449|Edmund and Hillary: A Tale from China Plate Farm|
> Jackson, Chris|Hardcover|1/1/1997|16.00|N|12.2|Usually Ships in 5-7
> Business Days|http://www.biggerbooks.com/bk_detail.asp?
> isbn=3D9780002245449|http://images.biggerbooks.com/images/d/
> 5/449/9780002245449.jpg
> alexus@jot ~/alexus/yourinternetbookstore.com/df/ecampus 517$
>
> this is what I get as input
>
> output should separate each filed in following format
>
> 'content of the field','content of next field','content of next field'
> and so on

Re: seperating parametrs in loop other then space

am 15.04.2008 02:16:16 von brian_hiles

On Apr 11, 10:23 pm, alexus wrote:
> i have few huge flat files, for example one of them is ~150Mb.zip or
> ~610Mb uncompressed and it contains bunch of records seperated by |
> (pipe), between pipes there is a text, some of the fields contains
> spaces and I need to import all of it into MySQL in most optimized way
> in terms of shell side, so first things first i need to parse it, what
> would be the best way to read file line by line?
> ...

Cat? While read? Echo? Yuck!

N/awk is plainly the tool to use in this circumstance; indeed,
this is its canonical design purpose -- its raison d'=EAtre.

# not tested
awk 'BEGIN { FS=3D"|" }
{ # translate record into SQL INSERT syntax
print "INSERT DELAYED INTO mytable "
printf "VALUES ( "
while( val <=3D NF )
{ printf ("'\''%s'\'', ", $val)
}
print " )"
'

(Will you not be desiring the DELAYED attribute in your
INSERTs?)

Is it necessary to read from the uncompressed file? I think
that a convenient and acceptably efficient "end-run" around
the whole UUOC issue is to feed awk through the stdout of
unzip:

unzip -p thefile.zip | awk -f code.awk >code.sql # or -c option

BTW, you should at least check for disallowed characters:

http://dev.mysql.com/doc/refman/5.0/en/string-syntax.html

=3DBrian