reformatting a large list

reformatting a large list

am 08.10.2007 15:07:47 von juicymixx

Hey everyone,

I have a large delimited list of names, information and
addresses. The different entries in the list are separated by at
least one blank line. I'm not interested in all of the entries,
rather I'm trying to collect only the entries that contain a line with
the word "Special:" in it. If this word is in an entry, then after
this word is a mailing name (the rest of that line). So, my first
problem is selecting out only the entries which contain this word.
My second problem is one of formatting. If the mailing name contains
the word "Family" then the mailing name should be the first line in
the new entry, if not, then the mailing name needs to be whatever is
on the "Special:" line plus the last name of the person in the
entry. The format for the rest of the new entry should be the normal
street_address on one line, city, state and zip on another. I've
been plugging away at this one for a while now, and I can't seem to
get it working well. The problem is that the starting list contains
several hundred entries...
I know this doesn't make much sense, so I have an example:

sample entry list:

"
NAME

ADDRESS

Jane Smith
NOTE: has an old car


Some Place
NOTE:


Another Place
NOTE: good place
street_address001, city001, state001 zip001

Cindy Loo Who
NOTE: can sing
also knows karate


Super Big Company
NOTE: lots of stuff
more stuff
and even more
maybe that's all


Lance J. Armstrong
NOTE: has a good name
Special: Lance & Jenny
street_address002, city002, state002 zip002

Vance Sanders
NOTE:


Bessie Maple
NOTE: some_info
Special: Bessie
street_address003
apt003, city003, state003 zip003

Susan B. Anthony
NOTE: yes something here
street_address004, city004, state004 zip004

Benny Hill
NOTE: on tv
funny guy
likes people
Special: The Hill Family
street_address005, city005, state005 zip005

Shawn Smith
NOTE: has a wife
has a car
Special: Shawn, Sharon, Tracy and Matthew
street_address006, city006, state006 zip006

Sharon Smith
NOTE: married to Shawn
street_address006, city006, state006 zip006

a different place
NOTE:
"

I'm trying to get the sample list to end up with like this:

"
Lance & Jenny Armstrong
street_address002
city002, state002 zip002

Bessie Maple
street_address003
city003, state003 zip003

The Hill Family
street_address005
city005, state005 zip005

Shawn, Sharon, Tracy and Matthew Smith
street_address006
city006, state006, zip006
"

??

If anyone could help me, I'd appreciate it.

Re: reformatting a large list

am 08.10.2007 15:54:21 von Ed Morton

juicymixx@mailinator.com wrote:
> Hey everyone,
>
> I have a large delimited list of names, information and
> addresses. The different entries in the list are separated by at
> least one blank line. I'm not interested in all of the entries,
> rather I'm trying to collect only the entries that contain a line with
> the word "Special:" in it. If this word is in an entry, then after
> this word is a mailing name (the rest of that line). So, my first
> problem is selecting out only the entries which contain this word.
> My second problem is one of formatting. If the mailing name contains
> the word "Family" then the mailing name should be the first line in
> the new entry, if not, then the mailing name needs to be whatever is
> on the "Special:" line plus the last name of the person in the
> entry. The format for the rest of the new entry should be the normal
> street_address on one line, city, state and zip on another. I've
> been plugging away at this one for a while now, and I can't seem to
> get it working well. The problem is that the starting list contains
> several hundred entries...
> I know this doesn't make much sense, so I have an example:
>
> sample entry list:
>
> "
> NAME
>
> ADDRESS
>
> Jane Smith
> NOTE: has an old car
>
>
> Some Place
> NOTE:
>
>
> Another Place
> NOTE: good place
> street_address001, city001, state001 zip001
>
> Cindy Loo Who
> NOTE: can sing
> also knows karate
>
>
> Super Big Company
> NOTE: lots of stuff
> more stuff
> and even more
> maybe that's all
>
>
> Lance J. Armstrong
> NOTE: has a good name
> Special: Lance & Jenny
> street_address002, city002, state002 zip002
>
> Vance Sanders
> NOTE:
>
>
> Bessie Maple
> NOTE: some_info
> Special: Bessie
> street_address003
> apt003, city003, state003 zip003
>
> Susan B. Anthony
> NOTE: yes something here
> street_address004, city004, state004 zip004
>
> Benny Hill
> NOTE: on tv
> funny guy
> likes people
> Special: The Hill Family
> street_address005, city005, state005 zip005
>
> Shawn Smith
> NOTE: has a wife
> has a car
> Special: Shawn, Sharon, Tracy and Matthew
> street_address006, city006, state006 zip006
>
> Sharon Smith
> NOTE: married to Shawn
> street_address006, city006, state006 zip006
>
> a different place
> NOTE:
> "
>
> I'm trying to get the sample list to end up with like this:
>
> "
> Lance & Jenny Armstrong
> street_address002
> city002, state002 zip002
>
> Bessie Maple
> street_address003
> city003, state003 zip003
>
> The Hill Family
> street_address005
> city005, state005 zip005
>
> Shawn, Sharon, Tracy and Matthew Smith
> street_address006
> city006, state006, zip006
> "
>
> ??
>
> If anyone could help me, I'd appreciate it.
>

$ cat select.awk
BEGIN{ RS=""; ORS="\n\n"; FS=OFS="\n" }
/\nSpecial:/ {
surname=$1
sub(/.* /,"",surname)

forenames=$0
sub(/.*Special: */,"",forenames)
sub(/\n.*/,"",forenames)

name = (forenames ~ /Family/ ? forenames : forenames " " surname)

sub(/.*Special:[^\n]+\n/,"")
street=rest=$0
sub(/[[:space:],].*/,"",street)
sub(/[^[:space:]]+[[:space:]]+/,"",rest)

print name, street, rest
}
$
$ awk -f select.awk file
Lance & Jenny Armstrong
street_address002
city002, state002 zip002

Bessie Maple
street_address003
apt003, city003, state003 zip003

The Hill Family
street_address005
city005, state005 zip005

Shawn, Sharon, Tracy and Matthew Smith
street_address006
city006, state006 zip006


Note the "apt003" for "Bessie Maple". You had deleted that in your
desired output, but it seems like you would need it so I left it in but
I didn't know if you'd want it with the street address or the other
address fields.

Ed.

Re: reformatting a large list

am 08.10.2007 20:36:46 von juicymixx

On Oct 8, 9:54 am, Ed Morton wrote:
> juicym...@mailinator.com wrote:
> > Hey everyone,
>
> > I have a large delimited list of names, information and
> > addresses. The different entries in the list are separated by at
> > least one blank line. I'm not interested in all of the entries,
> > rather I'm trying to collect only the entries that contain a line with
> > the word "Special:" in it. If this word is in an entry, then after
> > this word is a mailing name (the rest of that line). So, my first
> > problem is selecting out only the entries which contain this word.
> > My second problem is one of formatting. If the mailing name contains
> > the word "Family" then the mailing name should be the first line in
> > the new entry, if not, then the mailing name needs to be whatever is
> > on the "Special:" line plus the last name of the person in the
> > entry. The format for the rest of the new entry should be the normal
> > street_address on one line, city, state and zip on another. I've
> > been plugging away at this one for a while now, and I can't seem to
> > get it working well. The problem is that the starting list contains
> > several hundred entries...
> > I know this doesn't make much sense, so I have an example:
>
> > sample entry list:
>
> > "
> > NAME
>
> > ADDRESS
>
> > Jane Smith
> > NOTE: has an old car
>
> > Some Place
> > NOTE:
>
> > Another Place
> > NOTE: good place
> > street_address001, city001, state001 zip001
>
> > Cindy Loo Who
> > NOTE: can sing
> > also knows karate
>
> > Super Big Company
> > NOTE: lots of stuff
> > more stuff
> > and even more
> > maybe that's all
>
> > Lance J. Armstrong
> > NOTE: has a good name
> > Special: Lance & Jenny
> > street_address002, city002, state002 zip002
>
> > Vance Sanders
> > NOTE:
>
> > Bessie Maple
> > NOTE: some_info
> > Special: Bessie
> > street_address003
> > apt003, city003, state003 zip003
>
> > Susan B. Anthony
> > NOTE: yes something here
> > street_address004, city004, state004 zip004
>
> > Benny Hill
> > NOTE: on tv
> > funny guy
> > likes people
> > Special: The Hill Family
> > street_address005, city005, state005 zip005
>
> > Shawn Smith
> > NOTE: has a wife
> > has a car
> > Special: Shawn, Sharon, Tracy and Matthew
> > street_address006, city006, state006 zip006
>
> > Sharon Smith
> > NOTE: married to Shawn
> > street_address006, city006, state006 zip006
>
> > a different place
> > NOTE:
> > "
>
> > I'm trying to get the sample list to end up with like this:
>
> > "
> > Lance & Jenny Armstrong
> > street_address002
> > city002, state002 zip002
>
> > Bessie Maple
> > street_address003
> > city003, state003 zip003
>
> > The Hill Family
> > street_address005
> > city005, state005 zip005
>
> > Shawn, Sharon, Tracy and Matthew Smith
> > street_address006
> > city006, state006, zip006
> > "
>
> > ??
>
> > If anyone could help me, I'd appreciate it.
>
> $ cat select.awk
> BEGIN{ RS=""; ORS="\n\n"; FS=OFS="\n" }
> /\nSpecial:/ {
> surname=$1
> sub(/.* /,"",surname)
>
> forenames=$0
> sub(/.*Special: */,"",forenames)
> sub(/\n.*/,"",forenames)
>
> name = (forenames ~ /Family/ ? forenames : forenames " " surname)
>
> sub(/.*Special:[^\n]+\n/,"")
> street=rest=$0
> sub(/[[:space:],].*/,"",street)
> sub(/[^[:space:]]+[[:space:]]+/,"",rest)
>
> print name, street, rest}
>
> $
> $ awk -f select.awk file
> Lance & Jenny Armstrong
> street_address002
> city002, state002 zip002
>
> Bessie Maple
> street_address003
> apt003, city003, state003 zip003
>
> The Hill Family
> street_address005
> city005, state005 zip005
>
> Shawn, Sharon, Tracy and Matthew Smith
> street_address006
> city006, state006 zip006
>
> Note the "apt003" for "Bessie Maple". You had deleted that in your
> desired output, but it seems like you would need it so I left it in but
> I didn't know if you'd want it with the street address or the other
> address fields.
>
> Ed.


Thanks so much, Ed!

It works wonderfully. There's only one small problem... The
street address will most probably contain spaces, like:
2400 SW 13th St
or
106 Jefferson Place
the script you gave me splits the street address onto two lines at the
first space:
2400
SW 13th St
or
106
Jefferson Place
I just don't understand awk enough to see the fix...

Thanks again for all your help!

Re: reformatting a large list

am 08.10.2007 20:57:18 von Ed Morton

juicymixx@mailinator.com wrote:
> On Oct 8, 9:54 am, Ed Morton wrote:
>
>>juicym...@mailinator.com wrote:
>>
>>>Hey everyone,
>>
>>> I have a large delimited list of names, information and
>>>addresses. The different entries in the list are separated by at
>>>least one blank line. I'm not interested in all of the entries,
>>>rather I'm trying to collect only the entries that contain a line with
>>>the word "Special:" in it. If this word is in an entry, then after
>>>this word is a mailing name (the rest of that line). So, my first
>>>problem is selecting out only the entries which contain this word.
>>>My second problem is one of formatting. If the mailing name contains
>>>the word "Family" then the mailing name should be the first line in
>>>the new entry, if not, then the mailing name needs to be whatever is
>>>on the "Special:" line plus the last name of the person in the
>>>entry. The format for the rest of the new entry should be the normal
>>>street_address on one line, city, state and zip on another. I've
>>>been plugging away at this one for a while now, and I can't seem to
>>>get it working well. The problem is that the starting list contains
>>>several hundred entries...
>>> I know this doesn't make much sense, so I have an example:
>>
>>>sample entry list:
>>
>>>"
>>>NAME
>>
>>>ADDRESS
>>
>>>Jane Smith
>>>NOTE: has an old car
>>
>>>Some Place
>>>NOTE:
>>
>>>Another Place
>>>NOTE: good place
>>>street_address001, city001, state001 zip001
>>
>>>Cindy Loo Who
>>>NOTE: can sing
>>>also knows karate
>>
>>>Super Big Company
>>>NOTE: lots of stuff
>>>more stuff
>>>and even more
>>>maybe that's all
>>
>>>Lance J. Armstrong
>>>NOTE: has a good name
>>>Special: Lance & Jenny
>>>street_address002, city002, state002 zip002
>>
>>>Vance Sanders
>>>NOTE:
>>
>>>Bessie Maple
>>>NOTE: some_info
>>>Special: Bessie
>>>street_address003
>>>apt003, city003, state003 zip003
>>
>>>Susan B. Anthony
>>>NOTE: yes something here
>>>street_address004, city004, state004 zip004
>>
>>>Benny Hill
>>>NOTE: on tv
>>>funny guy
>>>likes people
>>>Special: The Hill Family
>>>street_address005, city005, state005 zip005
>>
>>>Shawn Smith
>>>NOTE: has a wife
>>>has a car
>>>Special: Shawn, Sharon, Tracy and Matthew
>>>street_address006, city006, state006 zip006
>>
>>>Sharon Smith
>>>NOTE: married to Shawn
>>>street_address006, city006, state006 zip006
>>
>>>a different place
>>>NOTE:
>>>"
>>
>>>I'm trying to get the sample list to end up with like this:
>>
>>>"
>>>Lance & Jenny Armstrong
>>>street_address002
>>>city002, state002 zip002
>>
>>>Bessie Maple
>>>street_address003
>>>city003, state003 zip003
>>
>>>The Hill Family
>>>street_address005
>>>city005, state005 zip005
>>
>>>Shawn, Sharon, Tracy and Matthew Smith
>>>street_address006
>>>city006, state006, zip006
>>>"
>>
>>>??
>>
>>>If anyone could help me, I'd appreciate it.
>>
>>$ cat select.awk
>>BEGIN{ RS=""; ORS="\n\n"; FS=OFS="\n" }
>>/\nSpecial:/ {
>> surname=$1
>> sub(/.* /,"",surname)
>>
>> forenames=$0
>> sub(/.*Special: */,"",forenames)
>> sub(/\n.*/,"",forenames)
>>
>> name = (forenames ~ /Family/ ? forenames : forenames " " surname)
>>
>> sub(/.*Special:[^\n]+\n/,"")
>> street=rest=$0
>> sub(/[[:space:],].*/,"",street)
>> sub(/[^[:space:]]+[[:space:]]+/,"",rest)
>>
>> print name, street, rest}
>>
>>$
>>$ awk -f select.awk file
>>Lance & Jenny Armstrong
>>street_address002
>>city002, state002 zip002
>>
>>Bessie Maple
>>street_address003
>>apt003, city003, state003 zip003
>>
>>The Hill Family
>>street_address005
>>city005, state005 zip005
>>
>>Shawn, Sharon, Tracy and Matthew Smith
>>street_address006
>>city006, state006 zip006
>>
>>Note the "apt003" for "Bessie Maple". You had deleted that in your
>>desired output, but it seems like you would need it so I left it in but
>>I didn't know if you'd want it with the street address or the other
>>address fields.
>>
>> Ed.
>
>
>
> Thanks so much, Ed!
>
> It works wonderfully. There's only one small problem... The
> street address will most probably contain spaces, like:
> 2400 SW 13th St
> or
> 106 Jefferson Place
> the script you gave me splits the street address onto two lines at the
> first space:
> 2400
> SW 13th St
> or
> 106
> Jefferson Place
> I just don't understand awk enough to see the fix...
>
> Thanks again for all your help!
>

Change the final two sub()s to this:

sub(/[\n,].*/,"",street)
sub(/[^,]+,[[:space:]]*/,"",rest)

Regards,

Ed.

Re: reformatting a large list

am 08.10.2007 23:12:45 von juicymixx

On Oct 8, 2:57 pm, Ed Morton wrote:
> juicym...@mailinator.com wrote:
> > On Oct 8, 9:54 am, Ed Morton wrote:
>
> >>juicym...@mailinator.com wrote:
>
> >>>Hey everyone,
>
> >>> I have a large delimited list of names, information and
> >>>addresses. The different entries in the list are separated by at
> >>>least one blank line. I'm not interested in all of the entries,
> >>>rather I'm trying to collect only the entries that contain a line with
> >>>the word "Special:" in it. If this word is in an entry, then after
> >>>this word is a mailing name (the rest of that line). So, my first
> >>>problem is selecting out only the entries which contain this word.
> >>>My second problem is one of formatting. If the mailing name contains
> >>>the word "Family" then the mailing name should be the first line in
> >>>the new entry, if not, then the mailing name needs to be whatever is
> >>>on the "Special:" line plus the last name of the person in the
> >>>entry. The format for the rest of the new entry should be the normal
> >>>street_address on one line, city, state and zip on another. I've
> >>>been plugging away at this one for a while now, and I can't seem to
> >>>get it working well. The problem is that the starting list contains
> >>>several hundred entries...
> >>> I know this doesn't make much sense, so I have an example:
>
> >>>sample entry list:
>
> >>>"
> >>>NAME
>
> >>>ADDRESS
>
> >>>Jane Smith
> >>>NOTE: has an old car
>
> >>>Some Place
> >>>NOTE:
>
> >>>Another Place
> >>>NOTE: good place
> >>>street_address001, city001, state001 zip001
>
> >>>Cindy Loo Who
> >>>NOTE: can sing
> >>>also knows karate
>
> >>>Super Big Company
> >>>NOTE: lots of stuff
> >>>more stuff
> >>>and even more
> >>>maybe that's all
>
> >>>Lance J. Armstrong
> >>>NOTE: has a good name
> >>>Special: Lance & Jenny
> >>>street_address002, city002, state002 zip002
>
> >>>Vance Sanders
> >>>NOTE:
>
> >>>Bessie Maple
> >>>NOTE: some_info
> >>>Special: Bessie
> >>>street_address003
> >>>apt003, city003, state003 zip003
>
> >>>Susan B. Anthony
> >>>NOTE: yes something here
> >>>street_address004, city004, state004 zip004
>
> >>>Benny Hill
> >>>NOTE: on tv
> >>>funny guy
> >>>likes people
> >>>Special: The Hill Family
> >>>street_address005, city005, state005 zip005
>
> >>>Shawn Smith
> >>>NOTE: has a wife
> >>>has a car
> >>>Special: Shawn, Sharon, Tracy and Matthew
> >>>street_address006, city006, state006 zip006
>
> >>>Sharon Smith
> >>>NOTE: married to Shawn
> >>>street_address006, city006, state006 zip006
>
> >>>a different place
> >>>NOTE:
> >>>"
>
> >>>I'm trying to get the sample list to end up with like this:
>
> >>>"
> >>>Lance & Jenny Armstrong
> >>>street_address002
> >>>city002, state002 zip002
>
> >>>Bessie Maple
> >>>street_address003
> >>>city003, state003 zip003
>
> >>>The Hill Family
> >>>street_address005
> >>>city005, state005 zip005
>
> >>>Shawn, Sharon, Tracy and Matthew Smith
> >>>street_address006
> >>>city006, state006, zip006
> >>>"
>
> >>>??
>
> >>>If anyone could help me, I'd appreciate it.
>
> >>$ cat select.awk
> >>BEGIN{ RS=""; ORS="\n\n"; FS=OFS="\n" }
> >>/\nSpecial:/ {
> >> surname=$1
> >> sub(/.* /,"",surname)
>
> >> forenames=$0
> >> sub(/.*Special: */,"",forenames)
> >> sub(/\n.*/,"",forenames)
>
> >> name = (forenames ~ /Family/ ? forenames : forenames " " surname)
>
> >> sub(/.*Special:[^\n]+\n/,"")
> >> street=rest=$0
> >> sub(/[[:space:],].*/,"",street)
> >> sub(/[^[:space:]]+[[:space:]]+/,"",rest)
>
> >> print name, street, rest}
>
> >>$
> >>$ awk -f select.awk file
> >>Lance & Jenny Armstrong
> >>street_address002
> >>city002, state002 zip002
>
> >>Bessie Maple
> >>street_address003
> >>apt003, city003, state003 zip003
>
> >>The Hill Family
> >>street_address005
> >>city005, state005 zip005
>
> >>Shawn, Sharon, Tracy and Matthew Smith
> >>street_address006
> >>city006, state006 zip006
>
> >>Note the "apt003" for "Bessie Maple". You had deleted that in your
> >>desired output, but it seems like you would need it so I left it in but
> >>I didn't know if you'd want it with the street address or the other
> >>address fields.
>
> >> Ed.
>
> > Thanks so much, Ed!
>
> > It works wonderfully. There's only one small problem... The
> > street address will most probably contain spaces, like:
> > 2400 SW 13th St
> > or
> > 106 Jefferson Place
> > the script you gave me splits the street address onto two lines at the
> > first space:
> > 2400
> > SW 13th St
> > or
> > 106
> > Jefferson Place
> > I just don't understand awk enough to see the fix...
>
> > Thanks again for all your help!
>
> Change the final two sub()s to this:
>
> sub(/[\n,].*/,"",street)
> sub(/[^,]+,[[:space:]]*/,"",rest)
>
> Regards,
>
> Ed.

Thank you SOOO much!

I just realized two situations which make the output funny, mainly
because I didn't give more/better examples:

1) sometimes "Special:" is on the same line as "NOTE:" as in
NOTE: Special: Fred & Wilma
I fixed this by changing
/\nSpecial:/
to
/Special:/

2) This one I can't figure out. If I have an entry like
"
Uriah Heep
NOTE: character in a novel
Special: Uriah & Jenny, Magda and Sally
another note
street_address006, city006, state006 zip006
"
the output includes "another note". In all the entries the address
information will be on the last line. Is there (another) quick fix
to get the address info from the last line?

Thanks again!

Re: reformatting a large list

am 08.10.2007 23:33:13 von juicymixx

On Oct 8, 5:12 pm, "juicym...@mailinator.com"
wrote:
> On Oct 8, 2:57 pm, Ed Morton wrote:
>
>
>
> > juicym...@mailinator.com wrote:
> > > On Oct 8, 9:54 am, Ed Morton wrote:
>
> > >>juicym...@mailinator.com wrote:
>
> > >>>Hey everyone,
>
> > >>> I have a large delimited list of names, information and
> > >>>addresses. The different entries in the list are separated by at
> > >>>least one blank line. I'm not interested in all of the entries,
> > >>>rather I'm trying to collect only the entries that contain a line with
> > >>>the word "Special:" in it. If this word is in an entry, then after
> > >>>this word is a mailing name (the rest of that line). So, my first
> > >>>problem is selecting out only the entries which contain this word.
> > >>>My second problem is one of formatting. If the mailing name contains
> > >>>the word "Family" then the mailing name should be the first line in
> > >>>the new entry, if not, then the mailing name needs to be whatever is
> > >>>on the "Special:" line plus the last name of the person in the
> > >>>entry. The format for the rest of the new entry should be the normal
> > >>>street_address on one line, city, state and zip on another. I've
> > >>>been plugging away at this one for a while now, and I can't seem to
> > >>>get it working well. The problem is that the starting list contains
> > >>>several hundred entries...
> > >>> I know this doesn't make much sense, so I have an example:
>
> > >>>sample entry list:
>
> > >>>"
> > >>>NAME
>
> > >>>ADDRESS
>
> > >>>Jane Smith
> > >>>NOTE: has an old car
>
> > >>>Some Place
> > >>>NOTE:
>
> > >>>Another Place
> > >>>NOTE: good place
> > >>>street_address001, city001, state001 zip001
>
> > >>>Cindy Loo Who
> > >>>NOTE: can sing
> > >>>also knows karate
>
> > >>>Super Big Company
> > >>>NOTE: lots of stuff
> > >>>more stuff
> > >>>and even more
> > >>>maybe that's all
>
> > >>>Lance J. Armstrong
> > >>>NOTE: has a good name
> > >>>Special: Lance & Jenny
> > >>>street_address002, city002, state002 zip002
>
> > >>>Vance Sanders
> > >>>NOTE:
>
> > >>>Bessie Maple
> > >>>NOTE: some_info
> > >>>Special: Bessie
> > >>>street_address003
> > >>>apt003, city003, state003 zip003
>
> > >>>Susan B. Anthony
> > >>>NOTE: yes something here
> > >>>street_address004, city004, state004 zip004
>
> > >>>Benny Hill
> > >>>NOTE: on tv
> > >>>funny guy
> > >>>likes people
> > >>>Special: The Hill Family
> > >>>street_address005, city005, state005 zip005
>
> > >>>Shawn Smith
> > >>>NOTE: has a wife
> > >>>has a car
> > >>>Special: Shawn, Sharon, Tracy and Matthew
> > >>>street_address006, city006, state006 zip006
>
> > >>>Sharon Smith
> > >>>NOTE: married to Shawn
> > >>>street_address006, city006, state006 zip006
>
> > >>>a different place
> > >>>NOTE:
> > >>>"
>
> > >>>I'm trying to get the sample list to end up with like this:
>
> > >>>"
> > >>>Lance & Jenny Armstrong
> > >>>street_address002
> > >>>city002, state002 zip002
>
> > >>>Bessie Maple
> > >>>street_address003
> > >>>city003, state003 zip003
>
> > >>>The Hill Family
> > >>>street_address005
> > >>>city005, state005 zip005
>
> > >>>Shawn, Sharon, Tracy and Matthew Smith
> > >>>street_address006
> > >>>city006, state006, zip006
> > >>>"
>
> > >>>??
>
> > >>>If anyone could help me, I'd appreciate it.
>
> > >>$ cat select.awk
> > >>BEGIN{ RS=""; ORS="\n\n"; FS=OFS="\n" }
> > >>/\nSpecial:/ {
> > >> surname=$1
> > >> sub(/.* /,"",surname)
>
> > >> forenames=$0
> > >> sub(/.*Special: */,"",forenames)
> > >> sub(/\n.*/,"",forenames)
>
> > >> name = (forenames ~ /Family/ ? forenames : forenames " " surname)
>
> > >> sub(/.*Special:[^\n]+\n/,"")
> > >> street=rest=$0
> > >> sub(/[[:space:],].*/,"",street)
> > >> sub(/[^[:space:]]+[[:space:]]+/,"",rest)
>
> > >> print name, street, rest}
>
> > >>$
> > >>$ awk -f select.awk file
> > >>Lance & Jenny Armstrong
> > >>street_address002
> > >>city002, state002 zip002
>
> > >>Bessie Maple
> > >>street_address003
> > >>apt003, city003, state003 zip003
>
> > >>The Hill Family
> > >>street_address005
> > >>city005, state005 zip005
>
> > >>Shawn, Sharon, Tracy and Matthew Smith
> > >>street_address006
> > >>city006, state006 zip006
>
> > >>Note the "apt003" for "Bessie Maple". You had deleted that in your
> > >>desired output, but it seems like you would need it so I left it in but
> > >>I didn't know if you'd want it with the street address or the other
> > >>address fields.
>
> > >> Ed.
>
> > > Thanks so much, Ed!
>
> > > It works wonderfully. There's only one small problem... The
> > > street address will most probably contain spaces, like:
> > > 2400 SW 13th St
> > > or
> > > 106 Jefferson Place
> > > the script you gave me splits the street address onto two lines at the
> > > first space:
> > > 2400
> > > SW 13th St
> > > or
> > > 106
> > > Jefferson Place
> > > I just don't understand awk enough to see the fix...
>
> > > Thanks again for all your help!
>
> > Change the final two sub()s to this:
>
> > sub(/[\n,].*/,"",street)
> > sub(/[^,]+,[[:space:]]*/,"",rest)
>
> > Regards,
>
> > Ed.
>
> Thank you SOOO much!
>
> I just realized two situations which make the output funny, mainly
> because I didn't give more/better examples:
>
> 1) sometimes "Special:" is on the same line as "NOTE:" as in
> NOTE: Special: Fred & Wilma
> I fixed this by changing
> /\nSpecial:/
> to
> /Special:/
>
> 2) This one I can't figure out. If I have an entry like
> "
> Uriah Heep
> NOTE: character in a novel
> Special: Uriah & Jenny, Magda and Sally
> another note
> street_address006, city006, state006 zip006
> "
> the output includes "another note". In all the entries the address
> information will be on the last line. Is there (another) quick fix
> to get the address info from the last line?
>
> Thanks again!


Oh, I think I got it. I changed:
street=rest=$0
to
street=rest=$NF

is that the right way to do it?

thanks again for all your help!

Re: reformatting a large list

am 08.10.2007 23:40:26 von Ed Morton

juicymixx@mailinator.com wrote:
> On Oct 8, 5:12 pm, "juicym...@mailinator.com"
> wrote:
>
>>On Oct 8, 2:57 pm, Ed Morton wrote:
>>
>>
>>
>>
>>>juicym...@mailinator.com wrote:
>>>
>>>>On Oct 8, 9:54 am, Ed Morton wrote:
>>
>>>>>juicym...@mailinator.com wrote:
>>
>>>>>>Hey everyone,
>>
>>>>>> I have a large delimited list of names, information and
>>>>>>addresses. The different entries in the list are separated by at
>>>>>>least one blank line. I'm not interested in all of the entries,
>>>>>>rather I'm trying to collect only the entries that contain a line with
>>>>>>the word "Special:" in it. If this word is in an entry, then after
>>>>>>this word is a mailing name (the rest of that line). So, my first
>>>>>>problem is selecting out only the entries which contain this word.
>>>>>>My second problem is one of formatting. If the mailing name contains
>>>>>>the word "Family" then the mailing name should be the first line in
>>>>>>the new entry, if not, then the mailing name needs to be whatever is
>>>>>>on the "Special:" line plus the last name of the person in the
>>>>>>entry. The format for the rest of the new entry should be the normal
>>>>>>street_address on one line, city, state and zip on another. I've
>>>>>>been plugging away at this one for a while now, and I can't seem to
>>>>>>get it working well. The problem is that the starting list contains
>>>>>>several hundred entries...
>>>>>> I know this doesn't make much sense, so I have an example:
>>
>>>>>>sample entry list:
>>
>>>>>>"
>>>>>>NAME
>>
>>>>>>ADDRESS
>>
>>>>>>Jane Smith
>>>>>>NOTE: has an old car
>>
>>>>>>Some Place
>>>>>>NOTE:
>>
>>>>>>Another Place
>>>>>>NOTE: good place
>>>>>>street_address001, city001, state001 zip001
>>
>>>>>>Cindy Loo Who
>>>>>>NOTE: can sing
>>>>>>also knows karate
>>
>>>>>>Super Big Company
>>>>>>NOTE: lots of stuff
>>>>>>more stuff
>>>>>>and even more
>>>>>>maybe that's all
>>
>>>>>>Lance J. Armstrong
>>>>>>NOTE: has a good name
>>>>>>Special: Lance & Jenny
>>>>>>street_address002, city002, state002 zip002
>>
>>>>>>Vance Sanders
>>>>>>NOTE:
>>
>>>>>>Bessie Maple
>>>>>>NOTE: some_info
>>>>>>Special: Bessie
>>>>>>street_address003
>>>>>>apt003, city003, state003 zip003
>>
>>>>>>Susan B. Anthony
>>>>>>NOTE: yes something here
>>>>>>street_address004, city004, state004 zip004
>>
>>>>>>Benny Hill
>>>>>>NOTE: on tv
>>>>>>funny guy
>>>>>>likes people
>>>>>>Special: The Hill Family
>>>>>>street_address005, city005, state005 zip005
>>
>>>>>>Shawn Smith
>>>>>>NOTE: has a wife
>>>>>>has a car
>>>>>>Special: Shawn, Sharon, Tracy and Matthew
>>>>>>street_address006, city006, state006 zip006
>>
>>>>>>Sharon Smith
>>>>>>NOTE: married to Shawn
>>>>>>street_address006, city006, state006 zip006
>>
>>>>>>a different place
>>>>>>NOTE:
>>>>>>"
>>
>>>>>>I'm trying to get the sample list to end up with like this:
>>
>>>>>>"
>>>>>>Lance & Jenny Armstrong
>>>>>>street_address002
>>>>>>city002, state002 zip002
>>
>>>>>>Bessie Maple
>>>>>>street_address003
>>>>>>city003, state003 zip003
>>
>>>>>>The Hill Family
>>>>>>street_address005
>>>>>>city005, state005 zip005
>>
>>>>>>Shawn, Sharon, Tracy and Matthew Smith
>>>>>>street_address006
>>>>>>city006, state006, zip006
>>>>>>"
>>
>>>>>>??
>>
>>>>>>If anyone could help me, I'd appreciate it.
>>
>>>>>$ cat select.awk
>>>>>BEGIN{ RS=""; ORS="\n\n"; FS=OFS="\n" }
>>>>>/\nSpecial:/ {
>>>>> surname=$1
>>>>> sub(/.* /,"",surname)
>>
>>>>> forenames=$0
>>>>> sub(/.*Special: */,"",forenames)
>>>>> sub(/\n.*/,"",forenames)
>>
>>>>> name = (forenames ~ /Family/ ? forenames : forenames " " surname)
>>
>>>>> sub(/.*Special:[^\n]+\n/,"")
>>>>> street=rest=$0
>>>>> sub(/[[:space:],].*/,"",street)
>>>>> sub(/[^[:space:]]+[[:space:]]+/,"",rest)
>>
>>>>> print name, street, rest}
>>
>>>>>$
>>>>>$ awk -f select.awk file
>>>>>Lance & Jenny Armstrong
>>>>>street_address002
>>>>>city002, state002 zip002
>>
>>>>>Bessie Maple
>>>>>street_address003
>>>>>apt003, city003, state003 zip003
>>
>>>>>The Hill Family
>>>>>street_address005
>>>>>city005, state005 zip005
>>
>>>>>Shawn, Sharon, Tracy and Matthew Smith
>>>>>street_address006
>>>>>city006, state006 zip006
>>
>>>>>Note the "apt003" for "Bessie Maple". You had deleted that in your
>>>>>desired output, but it seems like you would need it so I left it in but
>>>>>I didn't know if you'd want it with the street address or the other
>>>>>address fields.
>>
>>>>> Ed.
>>
>>>>Thanks so much, Ed!
>>
>>>> It works wonderfully. There's only one small problem... The
>>>>street address will most probably contain spaces, like:
>>>>2400 SW 13th St
>>>>or
>>>>106 Jefferson Place
>>>>the script you gave me splits the street address onto two lines at the
>>>>first space:
>>>>2400
>>>>SW 13th St
>>>>or
>>>>106
>>>>Jefferson Place
>>>>I just don't understand awk enough to see the fix...
>>
>>>>Thanks again for all your help!
>>
>>>Change the final two sub()s to this:
>>
>>> sub(/[\n,].*/,"",street)
>>> sub(/[^,]+,[[:space:]]*/,"",rest)
>>
>>>Regards,
>>
>>> Ed.
>>
>>Thank you SOOO much!
>>
>>I just realized two situations which make the output funny, mainly
>>because I didn't give more/better examples:
>>
>>1) sometimes "Special:" is on the same line as "NOTE:" as in
>>NOTE: Special: Fred & Wilma
>>I fixed this by changing
>>/\nSpecial:/
>>to
>>/Special:/
>>
>>2) This one I can't figure out. If I have an entry like
>>"
>>Uriah Heep
>>NOTE: character in a novel
>>Special: Uriah & Jenny, Magda and Sally
>>another note
>>street_address006, city006, state006 zip006
>>"
>>the output includes "another note". In all the entries the address
>>information will be on the last line. Is there (another) quick fix
>>to get the address info from the last line?
>>
>>Thanks again!
>
>
>
> Oh, I think I got it. I changed:
> street=rest=$0
> to
> street=rest=$NF
>
> is that the right way to do it?

No because your initial example:

Bessie Maple
NOTE: some_info
Special: Bessie
street_address003
apt003, city003, state003 zip003

contradicts your statment that the address is on the last line since
it's partially on the second-last line. How can you tell that
"street_address003" isn't "another note"?

Ed.

Re: reformatting a large list

am 08.10.2007 23:42:32 von William James

On Oct 8, 8:07 am, "juicym...@mailinator.com"
wrote:
> Hey everyone,
>
> I have a large delimited list of names, information and
> addresses. The different entries in the list are separated by at
> least one blank line. I'm not interested in all of the entries,
> rather I'm trying to collect only the entries that contain a line with
> the word "Special:" in it. If this word is in an entry, then after
> this word is a mailing name (the rest of that line). So, my first
> problem is selecting out only the entries which contain this word.
> My second problem is one of formatting. If the mailing name contains
> the word "Family" then the mailing name should be the first line in
> the new entry, if not, then the mailing name needs to be whatever is
> on the "Special:" line plus the last name of the person in the
> entry. The format for the rest of the new entry should be the normal
> street_address on one line, city, state and zip on another. I've
> been plugging away at this one for a while now, and I can't seem to
> get it working well. The problem is that the starting list contains
> several hundred entries...
> I know this doesn't make much sense, so I have an example:
>
> sample entry list:
>
> "
> NAME
>
> ADDRESS
>
> Jane Smith
> NOTE: has an old car
>
> Some Place
> NOTE:
>
> Another Place
> NOTE: good place
> street_address001, city001, state001 zip001
>
> Cindy Loo Who
> NOTE: can sing
> also knows karate
>
> Super Big Company
> NOTE: lots of stuff
> more stuff
> and even more
> maybe that's all
>
> Lance J. Armstrong
> NOTE: has a good name
> Special: Lance & Jenny
> street_address002, city002, state002 zip002
>
> Vance Sanders
> NOTE:
>
> Bessie Maple
> NOTE: some_info
> Special: Bessie
> street_address003
> apt003, city003, state003 zip003
>
> Susan B. Anthony
> NOTE: yes something here
> street_address004, city004, state004 zip004
>
> Benny Hill
> NOTE: on tv
> funny guy
> likes people
> Special: The Hill Family
> street_address005, city005, state005 zip005
>
> Shawn Smith
> NOTE: has a wife
> has a car
> Special: Shawn, Sharon, Tracy and Matthew
> street_address006, city006, state006 zip006
>
> Sharon Smith
> NOTE: married to Shawn
> street_address006, city006, state006 zip006
>
> a different place
> NOTE:
> "
>
> I'm trying to get the sample list to end up with like this:
>
> "
> Lance & Jenny Armstrong
> street_address002
> city002, state002 zip002
>
> Bessie Maple
> street_address003
> city003, state003 zip003
>
> The Hill Family
> street_address005
> city005, state005 zip005
>
> Shawn, Sharon, Tracy and Matthew Smith
> street_address006
> city006, state006, zip006
> "
>
> ??
>
> If anyone could help me, I'd appreciate it.

#!ruby
gets(nil).split( /\n[ \t]*\n/ ).each{ |text|
if text =~ /(.*?)\n.*Special: *(.*?)\n.*(^.*?$)/m
name,family,address = $~.captures
if family =~ / Family/
name = family
else
name[ /.* / ] = family + " "
end
puts name, address.sub( /, */, "\n" )
puts
end
}

Re: reformatting a large list

am 09.10.2007 14:31:00 von juicymixx

On Oct 8, 5:40 pm, Ed Morton wrote:
> juicym...@mailinator.com wrote:
> > On Oct 8, 5:12 pm, "juicym...@mailinator.com"
> > wrote:
>
> >>On Oct 8, 2:57 pm, Ed Morton wrote:
>
> >>>juicym...@mailinator.com wrote:
>
> >>>>On Oct 8, 9:54 am, Ed Morton wrote:
>
> >>>>>juicym...@mailinator.com wrote:
>
> >>>>>>Hey everyone,
>
> >>>>>> I have a large delimited list of names, information and
> >>>>>>addresses. The different entries in the list are separated by at
> >>>>>>least one blank line. I'm not interested in all of the entries,
> >>>>>>rather I'm trying to collect only the entries that contain a line with
> >>>>>>the word "Special:" in it. If this word is in an entry, then after
> >>>>>>this word is a mailing name (the rest of that line). So, my first
> >>>>>>problem is selecting out only the entries which contain this word.
> >>>>>>My second problem is one of formatting. If the mailing name contains
> >>>>>>the word "Family" then the mailing name should be the first line in
> >>>>>>the new entry, if not, then the mailing name needs to be whatever is
> >>>>>>on the "Special:" line plus the last name of the person in the
> >>>>>>entry. The format for the rest of the new entry should be the normal
> >>>>>>street_address on one line, city, state and zip on another. I've
> >>>>>>been plugging away at this one for a while now, and I can't seem to
> >>>>>>get it working well. The problem is that the starting list contains
> >>>>>>several hundred entries...
> >>>>>> I know this doesn't make much sense, so I have an example:
>
> >>>>>>sample entry list:
>
> >>>>>>"
> >>>>>>NAME
>
> >>>>>>ADDRESS
>
> >>>>>>Jane Smith
> >>>>>>NOTE: has an old car
>
> >>>>>>Some Place
> >>>>>>NOTE:
>
> >>>>>>Another Place
> >>>>>>NOTE: good place
> >>>>>>street_address001, city001, state001 zip001
>
> >>>>>>Cindy Loo Who
> >>>>>>NOTE: can sing
> >>>>>>also knows karate
>
> >>>>>>Super Big Company
> >>>>>>NOTE: lots of stuff
> >>>>>>more stuff
> >>>>>>and even more
> >>>>>>maybe that's all
>
> >>>>>>Lance J. Armstrong
> >>>>>>NOTE: has a good name
> >>>>>>Special: Lance & Jenny
> >>>>>>street_address002, city002, state002 zip002
>
> >>>>>>Vance Sanders
> >>>>>>NOTE:
>
> >>>>>>Bessie Maple
> >>>>>>NOTE: some_info
> >>>>>>Special: Bessie
> >>>>>>street_address003
> >>>>>>apt003, city003, state003 zip003
>
> >>>>>>Susan B. Anthony
> >>>>>>NOTE: yes something here
> >>>>>>street_address004, city004, state004 zip004
>
> >>>>>>Benny Hill
> >>>>>>NOTE: on tv
> >>>>>>funny guy
> >>>>>>likes people
> >>>>>>Special: The Hill Family
> >>>>>>street_address005, city005, state005 zip005
>
> >>>>>>Shawn Smith
> >>>>>>NOTE: has a wife
> >>>>>>has a car
> >>>>>>Special: Shawn, Sharon, Tracy and Matthew
> >>>>>>street_address006, city006, state006 zip006
>
> >>>>>>Sharon Smith
> >>>>>>NOTE: married to Shawn
> >>>>>>street_address006, city006, state006 zip006
>
> >>>>>>a different place
> >>>>>>NOTE:
> >>>>>>"
>
> >>>>>>I'm trying to get the sample list to end up with like this:
>
> >>>>>>"
> >>>>>>Lance & Jenny Armstrong
> >>>>>>street_address002
> >>>>>>city002, state002 zip002
>
> >>>>>>Bessie Maple
> >>>>>>street_address003
> >>>>>>city003, state003 zip003
>
> >>>>>>The Hill Family
> >>>>>>street_address005
> >>>>>>city005, state005 zip005
>
> >>>>>>Shawn, Sharon, Tracy and Matthew Smith
> >>>>>>street_address006
> >>>>>>city006, state006, zip006
> >>>>>>"
>
> >>>>>>??
>
> >>>>>>If anyone could help me, I'd appreciate it.
>
> >>>>>$ cat select.awk
> >>>>>BEGIN{ RS=""; ORS="\n\n"; FS=OFS="\n" }
> >>>>>/\nSpecial:/ {
> >>>>> surname=$1
> >>>>> sub(/.* /,"",surname)
>
> >>>>> forenames=$0
> >>>>> sub(/.*Special: */,"",forenames)
> >>>>> sub(/\n.*/,"",forenames)
>
> >>>>> name = (forenames ~ /Family/ ? forenames : forenames " " surname)
>
> >>>>> sub(/.*Special:[^\n]+\n/,"")
> >>>>> street=rest=$0
> >>>>> sub(/[[:space:],].*/,"",street)
> >>>>> sub(/[^[:space:]]+[[:space:]]+/,"",rest)
>
> >>>>> print name, street, rest}
>
> >>>>>$
> >>>>>$ awk -f select.awk file
> >>>>>Lance & Jenny Armstrong
> >>>>>street_address002
> >>>>>city002, state002 zip002
>
> >>>>>Bessie Maple
> >>>>>street_address003
> >>>>>apt003, city003, state003 zip003
>
> >>>>>The Hill Family
> >>>>>street_address005
> >>>>>city005, state005 zip005
>
> >>>>>Shawn, Sharon, Tracy and Matthew Smith
> >>>>>street_address006
> >>>>>city006, state006 zip006
>
> >>>>>Note the "apt003" for "Bessie Maple". You had deleted that in your
> >>>>>desired output, but it seems like you would need it so I left it in but
> >>>>>I didn't know if you'd want it with the street address or the other
> >>>>>address fields.
>
> >>>>> Ed.
>
> >>>>Thanks so much, Ed!
>
> >>>> It works wonderfully. There's only one small problem... The
> >>>>street address will most probably contain spaces, like:
> >>>>2400 SW 13th St
> >>>>or
> >>>>106 Jefferson Place
> >>>>the script you gave me splits the street address onto two lines at the
> >>>>first space:
> >>>>2400
> >>>>SW 13th St
> >>>>or
> >>>>106
> >>>>Jefferson Place
> >>>>I just don't understand awk enough to see the fix...
>
> >>>>Thanks again for all your help!
>
> >>>Change the final two sub()s to this:
>
> >>> sub(/[\n,].*/,"",street)
> >>> sub(/[^,]+,[[:space:]]*/,"",rest)
>
> >>>Regards,
>
> >>> Ed.
>
> >>Thank you SOOO much!
>
> >>I just realized two situations which make the output funny, mainly
> >>because I didn't give more/better examples:
>
> >>1) sometimes "Special:" is on the same line as "NOTE:" as in
> >>NOTE: Special: Fred & Wilma
> >>I fixed this by changing
> >>/\nSpecial:/
> >>to
> >>/Special:/
>
> >>2) This one I can't figure out. If I have an entry like
> >>"
> >>Uriah Heep
> >>NOTE: character in a novel
> >>Special: Uriah & Jenny, Magda and Sally
> >>another note
> >>street_address006, city006, state006 zip006
> >>"
> >>the output includes "another note". In all the entries the address
> >>information will be on the last line. Is there (another) quick fix
> >>to get the address info from the last line?
>
> >>Thanks again!
>
> > Oh, I think I got it. I changed:
> > street=rest=$0
> > to
> > street=rest=$NF
>
> > is that the right way to do it?
>
> No because your initial example:
>
> Bessie Maple
> NOTE: some_info
> Special: Bessie
> street_address003
> apt003, city003, state003 zip003
>
> contradicts your statment that the address is on the last line since
> it's partially on the second-last line. How can you tell that
> "street_address003" isn't "another note"?
>
> Ed.


oops! You just pointed a problem with the entries... All the
entries end with the address at the end of the entry... If there is
an apartment number then the end of the entry is:
street_address
apt_num, city, state
if there is no apartment number then then end of the entry is:
street_address, city, state

Luckily all apartment numbers start with
APT

so, I should be able to check for that just as when you checked for
Family

thanks for all your help!