Parsing a Text File using regex

Parsing a Text File using regex

am 04.08.2011 03:38:14 von Ryan Lagola

--20cf3079ba58d2dc6604a9a40803
Content-Type: text/plain; charset=ISO-8859-1

Hello,
I have been scratching my head on this problem and was wondering if someone
can help me out. Basically I need to take a raw list of data (a snippet of
it is below my email) and create another file with the information formatted
in the following format: "Date: Category: Winner." The example of the
finished file is as follows:

1934: Actor: Clark Gable
1934: Actress: Claudette Colbert
1934: Art Direction: The Merry Widow

As I am not a programmer by nature, I'm trying to figure out how to work out
the logic of this program. The "Date" does not repeat with each category
but only changes when the next year of results is displayed in the data
file. How do I setup my logic to support this? Any help that can be
provided would be much appreciated.

Just an FYI - here is my crack at finding the lines that match each
attribute:
Date: print $_ if $_ =~ /^(\d{4})*/
#look for four digits at the beginning of a string
Category: print $_ if $_ =~ /^[A-Z]+/
#look for one or more all caps characters at the beginning of a string
Winner: print $_ if $_ =~ /(\*)*(--)/
#look for a field that starts with an asterisks and contains "--"

I am open to comments on my regular expressions. Thanks!


==================== SNIPPET OF RAW DATA FILE ====================

1934 (7th),
ACTOR,
*,"Clark Gable -- It Happened One Night {""Peter Warne""}"
ACTRESS,
*,"Claudette Colbert -- It Happened One Night {""Ellie Andrews""}"
ART DIRECTION,
*,"The Merry Widow -- Cedric Gibbons, Fredric Hope"
,[NOTE: won by two votes]
ASSISTANT DIRECTOR,
*,Viva Villa! -- John Waters
CINEMATOGRAPHY,per
*,Cleopatra -- Vicxtor Milner
DIRECTING,
*,It Happened One Night -- Frank Capra
FILM EDITING,
*,Eskimo -- Conrad Nervig
MUSIC (Scoring),
*,"One Night of Love -- Columbia Studio Music Department, Louis Silvers,
head of department (Thematic Music by Victor Schertzinger and Gus Kahn)"
MUSIC (Song),
*,"""The Continental"" from The Gay Divorcee -- Music by Con Conrad; Lyrics
by Herb Magidson"
OUTSTANDING PRODUCTION,
*,It Happened One Night -- Columbia
SHORT SUBJECT (Cartoon),
*,"The Tortoise and the Hare -- Walt Disney, Producer"
SHORT SUBJECT (Comedy),
*,"La Cucaracha -- Kenneth Macgowan, Producer"
SHORT SUBJECT (Novelty),
*,"City of Wax -- Stacy Woodard and Horace Woodard, Producers"
SOUND RECORDING,
*,"One Night of Love -- Columbia Studio Sound Department, John Livadary,
Sound Director"
WRITING (Adaptation),
*,It Happened One Night -- Robert Riskin
WRITING (Original Story),
*,Manhattan Melodrama -- Arthur Caesar
SPECIAL AWARD,
*,"To Shirley Temple, in grateful recognition of her outstanding
contribution to screen entertainment during the year 1934."
SCIENTIFIC OR TECHNICAL AWARD (Class II),
*,"To ELECTRICAL RESEARCH PRODUCTS, INC. for their development of the
Vertical Cut Disc Method of recording sound for motion pictures (hill and
dale recording). [Sound]"
SCIENTIFIC OR TECHNICAL AWARD (Class III),
*,"To COLUMBIA PICTURES CORPORATION for their application of the Vertical
Cut Disc Method (hill and dale recording) to actual studio production, with
their recording of the sound on the picture One Night of Love. [Sound]"
*,To BELL AND HOWELL COMPANY for their development of the Bell and Howell
Fully Automatic Sound and Picture Printer. [Laboratory]
,
1935 (8th),
ACTOR,
*,"Victor McLaglen -- The Informer {""Gypo Nolan""}"
ACTRESS,
*,"Bette Davis -- Dangerous {""Joyce Heath""}"
ART DIRECTION,
*,The Dark Angel -- Richard Day
ASSISTANT DIRECTOR,
*,"The Lives of a Bengal Lancer -- Clem Beauchamp, Paul Wing"
CINEMATOGRAPHY,
*,A Midsummer Night's Dream -- Hal Mohr
,[NOTE: THIS IS NOT AN OFFICIAL NOMINATION. Write-in candidate.]
DANCE DIRECTION,
*,"Dave Gould -- ""I've Got a Feeling You're Fooling"" number from Broadway
Melody of 1936; and ""Straw Hat"" number from Folies Bergere"
DIRECTING,
*,The Informer -- John Ford
FILM EDITING,
*,A Midsummer Night's Dream -- Ralph Dawson
MUSIC (Scoring),
*,"The Informer -- RKO Radio Studio Music Department, Max Steiner, head of
department (Score by Max Steiner)"
MUSIC (Song),
*,"""Lullaby of Broadway"" from Gold Diggers of 1935 -- Music by Harry
Warren; Lyrics by Al Dubin"
OUTSTANDING PRODUCTION,
*,Mutiny on the Bounty -- Metro-Goldwyn-Mayer
SHORT SUBJECT (Cartoon),
*,"Three Orphan Kittens -- Walt Disney, Producer"
SHORT SUBJECT (Comedy),
*,"How to Sleep -- Jack Chertok, Producer"
SHORT SUBJECT (Novelty),
*,Wings over Mt. Everest -- Gaumont British and Skibo Productions
SOUND RECORDING,
*,"Naughty Marietta -- Metro-Goldwyn-Mayer Studio Sound Department, Douglas
Shearer, Sound Director"
WRITING (Original Story),
*,"The Scoundrel -- Ben Hecht, Charles MacArthur"
WRITING (Screenplay),
*,The Informer -- Dudley Nichols
,"[NOTE: Mr. Nichols initially refused the award, but Academy records
indicate that he was in possession of a statuette by 1949.]"
SPECIAL AWARD,
*,"To David Wark Griffith, for his distinguished creative achievements as
director and producer and his invaluable initiative and lasting
contributions to the progress of the motion picture arts."
SCIENTIFIC OR TECHNICAL AWARD (Class II),
*,To AGFA ANSCO CORPORATION for their development of the Agfa infra-red
film. [Film]
*,To EASTMAN KODAK COMPANY for their development of the Eastman Pola-Screen.
[Lenses and Filters]
SCIENTIFIC OR TECHNICAL AWARD (Class III),
*,"To METRO-GOLDWYN-MAYER STUDIO for the development of anti-directional
negative and positive development by means of jet turbulation, and the
application of the method to all negative and print processing of the entire
product of a major producing company. [Laboratory]"
*,"To WILLIAM A. MUELLER of Warner Bros.-First National Studio Sound
Department for his method of dubbing, in which the level of the dialogue
automatically controls the level of the accompanying music and sound
effects. [Sound]"
*,"To MOLE-RICHARDSON COMPANY for their development of the ""Solar-spot""
spot lamps. [Lighting]"
*,To DOUGLAS SHEARER and METRO-GOLDWYN-MAYER STUDIO SOUND DEPARTMENT for
their automatic control system for cameras and sound recording machines and
auxiliary stage equipment. [Stage Operations]
*,"To ELECTRICAL RESEARCH PRODUCTS, INC. for their study and development of
equipment to analyze and measure flutter resulting from the travel of the
film through the mechanisms used in the recording and reproduction of sound.
[Sound]"
*,"To PARAMOUNT PRODUCTIONS, INC. for the design and construction of the
Paramount transparency air turbine developing machine. [Laboratory]"
*,"To NATHAN LEVINSON, Director of Sound Recording for Warner Bros.-First
National Studio, for the method of intercutting variable density and
variable area sound tracks to secure an increase in the effective volume
range of sound recorded for motion pictures. [Sound]"
,

--20cf3079ba58d2dc6604a9a40803--

Re: Parsing a Text File using regex

am 04.08.2011 10:41:29 von timothy adigun

--0016e6d975aa77201704a9a9f2c0
Content-Type: text/plain; charset=ISO-8859-1

Hi Ryan,
Try the the code below, it should help.

=======================

#!/usr/bin/perl -w
use strict;

my $ln="";
my ($yr,$cat,$win)=("","","");
my $filename="New_output.txt";
chomp(my $raw_file=<@ARGV>);

open READFILE,"<","$raw_file" or die "can't open $!";
open OUTPUTFILE,">","$filename" or die "cannot read $!";
while(){chomp;
$ln.="\n" if /^\W.?+$/;
if(/^\d{4}/){$yr=$&;} # get the year
if(/^[A-Z].+/){ $cat=$&; # get the Category
$cat=join"",split /,/,$cat; # remove the comma in front
$ln.=" $yr: ".$cat; # add both the year and Category
}
if(/\--.+/){$win=$`; # get the winner
$win=join"",split /[\*,\"]/,$win;
$ln.=": ".$win."\n"; #### If you use "$ln.=": ".$win. $&. "\n";"
#### you get "-- It Happened One Night {""Peter Warne""}",etc added to
what you have
}
}
print OUTPUTFILE $ln;
close OUTPUTFILE;
close READFILE;
============================================================
I used special match variables ($`, $&, and $'), which means
$` ==> before match variable,
$& ==> match variable and
$' ==> after match variable.
If the code doesn't like you want it you might have to play around with
regular expressions!
Regards.

On Thu, Aug 4, 2011 at 2:38 AM, Ryan Lagola wrote:

> Hello,
> I have been scratching my head on this problem and was wondering if someone
> can help me out. Basically I need to take a raw list of data (a snippet of
> it is below my email) and create another file with the information
> formatted
> in the following format: "Date: Category: Winner." The example of the
> finished file is as follows:
>
> 1934: Actor: Clark Gable
> 1934: Actress: Claudette Colbert
> 1934: Art Direction: The Merry Widow
>
> As I am not a programmer by nature, I'm trying to figure out how to work
> out
> the logic of this program. The "Date" does not repeat with each category
> but only changes when the next year of results is displayed in the data
> file. How do I setup my logic to support this? Any help that can be
> provided would be much appreciated.
>
> Just an FYI - here is my crack at finding the lines that match each
> attribute:
> Date: print $_ if $_ =~ /^(\d{4})*/
> #look for four digits at the beginning of a string
> Category: print $_ if $_ =~ /^[A-Z]+/
> #look for one or more all caps characters at the beginning of a string
> Winner: print $_ if $_ =~ /(\*)*(--)/
> #look for a field that starts with an asterisks and contains "--"
>
> I am open to comments on my regular expressions. Thanks!
>
>
> ==================== SNIPPET OF RAW DATA FILE ====================
>
> 1934 (7th),
> ACTOR,
> *,"Clark Gable -- It Happened One Night {""Peter Warne""}"
> ACTRESS,
> *,"Claudette Colbert -- It Happened One Night {""Ellie Andrews""}"
> ART DIRECTION,
> *,"The Merry Widow -- Cedric Gibbons, Fredric Hope"
> ,[NOTE: won by two votes]
> ASSISTANT DIRECTOR,
> *,Viva Villa! -- John Waters
> CINEMATOGRAPHY,per
> *,Cleopatra -- Vicxtor Milner
> DIRECTING,
> *,It Happened One Night -- Frank Capra
> FILM EDITING,
> *,Eskimo -- Conrad Nervig
> MUSIC (Scoring),
> *,"One Night of Love -- Columbia Studio Music Department, Louis Silvers,
> head of department (Thematic Music by Victor Schertzinger and Gus Kahn)"
> MUSIC (Song),
> *,"""The Continental"" from The Gay Divorcee -- Music by Con Conrad; Lyrics
> by Herb Magidson"
> OUTSTANDING PRODUCTION,
> *,It Happened One Night -- Columbia
> SHORT SUBJECT (Cartoon),
> *,"The Tortoise and the Hare -- Walt Disney, Producer"
> SHORT SUBJECT (Comedy),
> *,"La Cucaracha -- Kenneth Macgowan, Producer"
> SHORT SUBJECT (Novelty),
> *,"City of Wax -- Stacy Woodard and Horace Woodard, Producers"
> SOUND RECORDING,
> *,"One Night of Love -- Columbia Studio Sound Department, John Livadary,
> Sound Director"
> WRITING (Adaptation),
> *,It Happened One Night -- Robert Riskin
> WRITING (Original Story),
> *,Manhattan Melodrama -- Arthur Caesar
> SPECIAL AWARD,
> *,"To Shirley Temple, in grateful recognition of her outstanding
> contribution to screen entertainment during the year 1934."
> SCIENTIFIC OR TECHNICAL AWARD (Class II),
> *,"To ELECTRICAL RESEARCH PRODUCTS, INC. for their development of the
> Vertical Cut Disc Method of recording sound for motion pictures (hill and
> dale recording). [Sound]"
> SCIENTIFIC OR TECHNICAL AWARD (Class III),
> *,"To COLUMBIA PICTURES CORPORATION for their application of the Vertical
> Cut Disc Method (hill and dale recording) to actual studio production, with
> their recording of the sound on the picture One Night of Love. [Sound]"
> *,To BELL AND HOWELL COMPANY for their development of the Bell and Howell
> Fully Automatic Sound and Picture Printer. [Laboratory]
> ,
> 1935 (8th),
> ACTOR,
> *,"Victor McLaglen -- The Informer {""Gypo Nolan""}"
> ACTRESS,
> *,"Bette Davis -- Dangerous {""Joyce Heath""}"
> ART DIRECTION,
> *,The Dark Angel -- Richard Day
> ASSISTANT DIRECTOR,
> *,"The Lives of a Bengal Lancer -- Clem Beauchamp, Paul Wing"
> CINEMATOGRAPHY,
> *,A Midsummer Night's Dream -- Hal Mohr
> ,[NOTE: THIS IS NOT AN OFFICIAL NOMINATION. Write-in candidate.]
> DANCE DIRECTION,
> *,"Dave Gould -- ""I've Got a Feeling You're Fooling"" number from Broadway
> Melody of 1936; and ""Straw Hat"" number from Folies Bergere"
> DIRECTING,
> *,The Informer -- John Ford
> FILM EDITING,
> *,A Midsummer Night's Dream -- Ralph Dawson
> MUSIC (Scoring),
> *,"The Informer -- RKO Radio Studio Music Department, Max Steiner, head of
> department (Score by Max Steiner)"
> MUSIC (Song),
> *,"""Lullaby of Broadway"" from Gold Diggers of 1935 -- Music by Harry
> Warren; Lyrics by Al Dubin"
> OUTSTANDING PRODUCTION,
> *,Mutiny on the Bounty -- Metro-Goldwyn-Mayer
> SHORT SUBJECT (Cartoon),
> *,"Three Orphan Kittens -- Walt Disney, Producer"
> SHORT SUBJECT (Comedy),
> *,"How to Sleep -- Jack Chertok, Producer"
> SHORT SUBJECT (Novelty),
> *,Wings over Mt. Everest -- Gaumont British and Skibo Productions
> SOUND RECORDING,
> *,"Naughty Marietta -- Metro-Goldwyn-Mayer Studio Sound Department, Douglas
> Shearer, Sound Director"
> WRITING (Original Story),
> *,"The Scoundrel -- Ben Hecht, Charles MacArthur"
> WRITING (Screenplay),
> *,The Informer -- Dudley Nichols
> ,"[NOTE: Mr. Nichols initially refused the award, but Academy records
> indicate that he was in possession of a statuette by 1949.]"
> SPECIAL AWARD,
> *,"To David Wark Griffith, for his distinguished creative achievements as
> director and producer and his invaluable initiative and lasting
> contributions to the progress of the motion picture arts."
> SCIENTIFIC OR TECHNICAL AWARD (Class II),
> *,To AGFA ANSCO CORPORATION for their development of the Agfa infra-red
> film. [Film]
> *,To EASTMAN KODAK COMPANY for their development of the Eastman
> Pola-Screen.
> [Lenses and Filters]
> SCIENTIFIC OR TECHNICAL AWARD (Class III),
> *,"To METRO-GOLDWYN-MAYER STUDIO for the development of anti-directional
> negative and positive development by means of jet turbulation, and the
> application of the method to all negative and print processing of the
> entire
> product of a major producing company. [Laboratory]"
> *,"To WILLIAM A. MUELLER of Warner Bros.-First National Studio Sound
> Department for his method of dubbing, in which the level of the dialogue
> automatically controls the level of the accompanying music and sound
> effects. [Sound]"
> *,"To MOLE-RICHARDSON COMPANY for their development of the ""Solar-spot""
> spot lamps. [Lighting]"
> *,To DOUGLAS SHEARER and METRO-GOLDWYN-MAYER STUDIO SOUND DEPARTMENT for
> their automatic control system for cameras and sound recording machines and
> auxiliary stage equipment. [Stage Operations]
> *,"To ELECTRICAL RESEARCH PRODUCTS, INC. for their study and development of
> equipment to analyze and measure flutter resulting from the travel of the
> film through the mechanisms used in the recording and reproduction of
> sound.
> [Sound]"
> *,"To PARAMOUNT PRODUCTIONS, INC. for the design and construction of the
> Paramount transparency air turbine developing machine. [Laboratory]"
> *,"To NATHAN LEVINSON, Director of Sound Recording for Warner Bros.-First
> National Studio, for the method of intercutting variable density and
> variable area sound tracks to secure an increase in the effective volume
> range of sound recorded for motion pictures. [Sound]"
> ,
>

--0016e6d975aa77201704a9a9f2c0--

Re: Parsing a Text File using regex

am 04.08.2011 13:25:05 von Ryan Lagola

--20cf307ca1c689128604a9ac3b75
Content-Type: text/plain; charset=ISO-8859-1

Timothy,
That worked like a charm. Thank you so much for the help.
-Ryan



On Thu, Aug 4, 2011 at 4:41 AM, timothy adigun <2teezperl@gmail.com> wrote:

> Hi Ryan,
> Try the the code below, it should help.
>
> =======================
>
> #!/usr/bin/perl -w
> use strict;
>
> my $ln="";
> my ($yr,$cat,$win)=("","","");
> my $filename="New_output.txt";
> chomp(my $raw_file=<@ARGV>);
>
> open READFILE,"<","$raw_file" or die "can't open $!";
> open OUTPUTFILE,">","$filename" or die "cannot read $!";
> while(){chomp;
> $ln.="\n" if /^\W.?+$/;
> if(/^\d{4}/){$yr=$&;} # get the year
> if(/^[A-Z].+/){ $cat=$&; # get the Category
> $cat=join"",split /,/,$cat; # remove the comma in front
> $ln.=" $yr: ".$cat; # add both the year and Category
> }
> if(/\--.+/){$win=$`; # get the winner
> $win=join"",split /[\*,\"]/,$win;
> $ln.=": ".$win."\n"; #### If you use "$ln.=": ".$win. $&. "\n";"
> #### you get "-- It Happened One Night {""Peter Warne""}",etc added to
> what you have
> }
> }
> print OUTPUTFILE $ln;
> close OUTPUTFILE;
> close READFILE;
> ============================================================
> I used special match variables ($`, $&, and $'), which means
> $` ==> before match variable,
> $& ==> match variable and
> $' ==> after match variable.
> If the code doesn't like you want it you might have to play around with
> regular expressions!
> Regards.
>
> On Thu, Aug 4, 2011 at 2:38 AM, Ryan Lagola wrote:
>
> > Hello,
> > I have been scratching my head on this problem and was wondering if
> someone
> > can help me out. Basically I need to take a raw list of data (a snippet
> of
> > it is below my email) and create another file with the information
> > formatted
> > in the following format: "Date: Category: Winner." The example of the
> > finished file is as follows:
> >
> > 1934: Actor: Clark Gable
> > 1934: Actress: Claudette Colbert
> > 1934: Art Direction: The Merry Widow
> >
> > As I am not a programmer by nature, I'm trying to figure out how to work
> > out
> > the logic of this program. The "Date" does not repeat with each category
> > but only changes when the next year of results is displayed in the data
> > file. How do I setup my logic to support this? Any help that can be
> > provided would be much appreciated.
> >
> > Just an FYI - here is my crack at finding the lines that match each
> > attribute:
> > Date: print $_ if $_ =~ /^(\d{4})*/
> > #look for four digits at the beginning of a string
> > Category: print $_ if $_ =~ /^[A-Z]+/
> > #look for one or more all caps characters at the beginning of a
> string
> > Winner: print $_ if $_ =~ /(\*)*(--)/
> > #look for a field that starts with an asterisks and contains
> "--"
> >
> > I am open to comments on my regular expressions. Thanks!
> >
> >
> > ==================== SNIPPET OF RAW DATA FILE ====================
> >
> > 1934 (7th),
> > ACTOR,
> > *,"Clark Gable -- It Happened One Night {""Peter Warne""}"
> > ACTRESS,
> > *,"Claudette Colbert -- It Happened One Night {""Ellie Andrews""}"
> > ART DIRECTION,
> > *,"The Merry Widow -- Cedric Gibbons, Fredric Hope"
> > ,[NOTE: won by two votes]
> > ASSISTANT DIRECTOR,
> > *,Viva Villa! -- John Waters
> > CINEMATOGRAPHY,per
> > *,Cleopatra -- Vicxtor Milner
> > DIRECTING,
> > *,It Happened One Night -- Frank Capra
> > FILM EDITING,
> > *,Eskimo -- Conrad Nervig
> > MUSIC (Scoring),
> > *,"One Night of Love -- Columbia Studio Music Department, Louis Silvers,
> > head of department (Thematic Music by Victor Schertzinger and Gus Kahn)"
> > MUSIC (Song),
> > *,"""The Continental"" from The Gay Divorcee -- Music by Con Conrad;
> Lyrics
> > by Herb Magidson"
> > OUTSTANDING PRODUCTION,
> > *,It Happened One Night -- Columbia
> > SHORT SUBJECT (Cartoon),
> > *,"The Tortoise and the Hare -- Walt Disney, Producer"
> > SHORT SUBJECT (Comedy),
> > *,"La Cucaracha -- Kenneth Macgowan, Producer"
> > SHORT SUBJECT (Novelty),
> > *,"City of Wax -- Stacy Woodard and Horace Woodard, Producers"
> > SOUND RECORDING,
> > *,"One Night of Love -- Columbia Studio Sound Department, John Livadary,
> > Sound Director"
> > WRITING (Adaptation),
> > *,It Happened One Night -- Robert Riskin
> > WRITING (Original Story),
> > *,Manhattan Melodrama -- Arthur Caesar
> > SPECIAL AWARD,
> > *,"To Shirley Temple, in grateful recognition of her outstanding
> > contribution to screen entertainment during the year 1934."
> > SCIENTIFIC OR TECHNICAL AWARD (Class II),
> > *,"To ELECTRICAL RESEARCH PRODUCTS, INC. for their development of the
> > Vertical Cut Disc Method of recording sound for motion pictures (hill and
> > dale recording). [Sound]"
> > SCIENTIFIC OR TECHNICAL AWARD (Class III),
> > *,"To COLUMBIA PICTURES CORPORATION for their application of the Vertical
> > Cut Disc Method (hill and dale recording) to actual studio production,
> with
> > their recording of the sound on the picture One Night of Love. [Sound]"
> > *,To BELL AND HOWELL COMPANY for their development of the Bell and Howell
> > Fully Automatic Sound and Picture Printer. [Laboratory]
> > ,
> > 1935 (8th),
> > ACTOR,
> > *,"Victor McLaglen -- The Informer {""Gypo Nolan""}"
> > ACTRESS,
> > *,"Bette Davis -- Dangerous {""Joyce Heath""}"
> > ART DIRECTION,
> > *,The Dark Angel -- Richard Day
> > ASSISTANT DIRECTOR,
> > *,"The Lives of a Bengal Lancer -- Clem Beauchamp, Paul Wing"
> > CINEMATOGRAPHY,
> > *,A Midsummer Night's Dream -- Hal Mohr
> > ,[NOTE: THIS IS NOT AN OFFICIAL NOMINATION. Write-in candidate.]
> > DANCE DIRECTION,
> > *,"Dave Gould -- ""I've Got a Feeling You're Fooling"" number from
> Broadway
> > Melody of 1936; and ""Straw Hat"" number from Folies Bergere"
> > DIRECTING,
> > *,The Informer -- John Ford
> > FILM EDITING,
> > *,A Midsummer Night's Dream -- Ralph Dawson
> > MUSIC (Scoring),
> > *,"The Informer -- RKO Radio Studio Music Department, Max Steiner, head
> of
> > department (Score by Max Steiner)"
> > MUSIC (Song),
> > *,"""Lullaby of Broadway"" from Gold Diggers of 1935 -- Music by Harry
> > Warren; Lyrics by Al Dubin"
> > OUTSTANDING PRODUCTION,
> > *,Mutiny on the Bounty -- Metro-Goldwyn-Mayer
> > SHORT SUBJECT (Cartoon),
> > *,"Three Orphan Kittens -- Walt Disney, Producer"
> > SHORT SUBJECT (Comedy),
> > *,"How to Sleep -- Jack Chertok, Producer"
> > SHORT SUBJECT (Novelty),
> > *,Wings over Mt. Everest -- Gaumont British and Skibo Productions
> > SOUND RECORDING,
> > *,"Naughty Marietta -- Metro-Goldwyn-Mayer Studio Sound Department,
> Douglas
> > Shearer, Sound Director"
> > WRITING (Original Story),
> > *,"The Scoundrel -- Ben Hecht, Charles MacArthur"
> > WRITING (Screenplay),
> > *,The Informer -- Dudley Nichols
> > ,"[NOTE: Mr. Nichols initially refused the award, but Academy records
> > indicate that he was in possession of a statuette by 1949.]"
> > SPECIAL AWARD,
> > *,"To David Wark Griffith, for his distinguished creative achievements as
> > director and producer and his invaluable initiative and lasting
> > contributions to the progress of the motion picture arts."
> > SCIENTIFIC OR TECHNICAL AWARD (Class II),
> > *,To AGFA ANSCO CORPORATION for their development of the Agfa infra-red
> > film. [Film]
> > *,To EASTMAN KODAK COMPANY for their development of the Eastman
> > Pola-Screen.
> > [Lenses and Filters]
> > SCIENTIFIC OR TECHNICAL AWARD (Class III),
> > *,"To METRO-GOLDWYN-MAYER STUDIO for the development of anti-directional
> > negative and positive development by means of jet turbulation, and the
> > application of the method to all negative and print processing of the
> > entire
> > product of a major producing company. [Laboratory]"
> > *,"To WILLIAM A. MUELLER of Warner Bros.-First National Studio Sound
> > Department for his method of dubbing, in which the level of the dialogue
> > automatically controls the level of the accompanying music and sound
> > effects. [Sound]"
> > *,"To MOLE-RICHARDSON COMPANY for their development of the ""Solar-spot""
> > spot lamps. [Lighting]"
> > *,To DOUGLAS SHEARER and METRO-GOLDWYN-MAYER STUDIO SOUND DEPARTMENT for
> > their automatic control system for cameras and sound recording machines
> and
> > auxiliary stage equipment. [Stage Operations]
> > *,"To ELECTRICAL RESEARCH PRODUCTS, INC. for their study and development
> of
> > equipment to analyze and measure flutter resulting from the travel of the
> > film through the mechanisms used in the recording and reproduction of
> > sound.
> > [Sound]"
> > *,"To PARAMOUNT PRODUCTIONS, INC. for the design and construction of the
> > Paramount transparency air turbine developing machine. [Laboratory]"
> > *,"To NATHAN LEVINSON, Director of Sound Recording for Warner Bros.-First
> > National Studio, for the method of intercutting variable density and
> > variable area sound tracks to secure an increase in the effective volume
> > range of sound recorded for motion pictures. [Sound]"
> > ,
> >
>

--20cf307ca1c689128604a9ac3b75--

Re: Parsing a Text File using regex

am 04.08.2011 19:21:30 von jwkrahn

timothy adigun wrote:
> Hi Ryan,
> Try the the code below, it should help.
>
> =======================
>
> #!/usr/bin/perl -w
> use strict;
>
> my $ln="";
> my ($yr,$cat,$win)=("","","");
> my $filename="New_output.txt";
> chomp(my $raw_file=<@ARGV>);

That is the same as saying:

chomp( my $raw_file = glob "@ARGV" );

Why are you copying the contents of @ARGV to a string and then globbing
that string?

If @ARGV contains more than one element then this will not work correctly.

And why chomp() a string that will not contain newlines?

What you want is something like:

my $raw_file = $ARGV[ 0 ];

Or:

my $raw_file = shift;

But you should probably verify that @ARGV is not empty first.


> open READFILE,"<","$raw_file" or die "can't open $!";

Why are you copying $raw_file to a string?


> open OUTPUTFILE,">","$filename" or die "cannot read $!";

Why are you copying $filename to a string?


> while(){chomp;
> $ln.="\n" if /^\W.?+$/;
> if(/^\d{4}/){$yr=$&;} # get the year
> if(/^[A-Z].+/){ $cat=$&; # get the Category
> $cat=join"",split /,/,$cat; # remove the comma in front
> $ln.=" $yr: ".$cat; # add both the year and Category
> }
> if(/\--.+/){$win=$`; # get the winner

The use of $&, $' and $` will slow down *ALL* regular expressions in the
program. Better to just use capturing parentheses.

if (/^(\d{4})/ ) { $yr = $1 } # get the year
if ( /^([A-Z].+)/ ) {
$cat = $1; # get the Category
$cat = join "", split /,/, $cat; # remove the comma in front
$ln.= " $yr: " . $cat; # add both the year and Category
}
if ( /(.*?)\--.+/ ) { $win = $1; # get the winner

And the line:

> $cat = join "", split /,/, $cat; # remove the comma in front

Says "remove the comma in front" but it will remove ALL commas.

A more efficient way to remove all commas is:

$cat =~ tr/,//d; # remove all commas


> $win=join"",split /[\*,\"]/,$win;

Again, a more efficient way to remove all '*', ',' and '"' characters is:

$win =~ tr/*,"//d;


> $ln.=": ".$win."\n"; #### If you use "$ln.=": ".$win. $&. "\n";"
> #### you get "-- It Happened One Night {""Peter Warne""}",etc added to
> what you have
> }
> }
> print OUTPUTFILE $ln;
> close OUTPUTFILE;
> close READFILE;



John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Parsing a Text File using regex

am 05.08.2011 02:33:13 von timothy adigun

--0016363ba3da206d0204a9b73e2d
Content-Type: text/plain; charset=ISO-8859-1

Hi John,
I believe you know that in Perl there are more than one way to do it! i.e
solve a problem. And that one way is no better than the other, it only
depend on what the programmer preferred to use, as long as the syntax are
correct.
Secondly, most of your why would have been answered, if only you check Ryan
Lagola's request.

>>If @ARGV contains more than one element then this will not work correctly.

*Not true!* using $ARGV[0], select only one file to generate your report
from. But using @ARGV, one has all the files listed. Moreover, how do you
know how many files he/she intended using from the CLI at once?! So, for me
it is saver to use @ARGV. Please, don't misunderstand this, there are
several ways of doing things!

"

> open READFILE,"<","$raw_file" or die "can't open $!";
>

>>Why are you copying $raw_file to a string?



open OUTPUTFILE,">","$filename" or die "cannot read $!";
>

>>Why are you copying $filename to a string?
"
I don't know what you mean by "copying both $raw_file and $filename into a
string"! If you mean by using a double quote around $raw_file and $filename,
then I should explain that that is called Interpolation in Perl! -- These
two variables are scalar, so when a double quote they are interpolated i.e
the value of the scalar (in this context) is inserted.
Lastly, codes are written to be improved on. One of the reasons we have
different books!
Regards.

On Thu, Aug 4, 2011 at 6:21 PM, John W. Krahn wrote:

> timothy adigun wrote:
>
>> Hi Ryan,
>> Try the the code below, it should help.
>>
>> =======================
>>
>> #!/usr/bin/perl -w
>> use strict;
>>
>> my $ln="";
>> my ($yr,$cat,$win)=("","","");
>> my $filename="New_output.txt";
>> chomp(my $raw_file=<@ARGV>);
>>
>
> That is the same as saying:
>
> chomp( my $raw_file = glob "@ARGV" );
>
> Why are you copying the contents of @ARGV to a string and then globbing
> that string?
>
> If @ARGV contains more than one element then this will not work correctly.
>
> And why chomp() a string that will not contain newlines?
>
> What you want is something like:
>
> my $raw_file = $ARGV[ 0 ];
>
> Or:
>
> my $raw_file = shift;
>
> But you should probably verify that @ARGV is not empty first.
>
>
>
> open READFILE,"<","$raw_file" or die "can't open $!";
>>
>
> Why are you copying $raw_file to a string?
>
>
>
> open OUTPUTFILE,">","$filename" or die "cannot read $!";
>>
>
> Why are you copying $filename to a string?
>
>
>
> while(){chomp;
>> $ln.="\n" if /^\W.?+$/;
>> if(/^\d{4}/){$yr=$&;} # get the year
>> if(/^[A-Z].+/){ $cat=$&; # get the Category
>> $cat=join"",split /,/,$cat; # remove the comma in front
>> $ln.=" $yr: ".$cat; # add both the year and Category
>> }
>> if(/\--.+/){$win=$`; # get the winner
>>
>
> The use of $&, $' and $` will slow down *ALL* regular expressions in the
> program. Better to just use capturing parentheses.
>
> if (/^(\d{4})/ ) { $yr = $1 } # get the year
> if ( /^([A-Z].+)/ ) {
> $cat = $1; # get the Category
>
> $cat = join "", split /,/, $cat; # remove the comma in front
> $ln.= " $yr: " . $cat; # add both the year and Category
> }
> if ( /(.*?)\--.+/ ) { $win = $1; # get the winner
>
> And the line:
>
>
> $cat = join "", split /,/, $cat; # remove the comma in front
>>
>
> Says "remove the comma in front" but it will remove ALL commas.
>
> A more efficient way to remove all commas is:
>
> $cat =~ tr/,//d; # remove all commas
>
>
> $win=join"",split /[\*,\"]/,$win;
>>
>
> Again, a more efficient way to remove all '*', ',' and '"' characters is:
>
> $win =~ tr/*,"//d;
>
>
>
> $ln.=": ".$win."\n"; #### If you use "$ln.=": ".$win. $&. "\n";"
>> #### you get "-- It Happened One Night {""Peter Warne""}",etc added
>> to
>> what you have
>> }
>> }
>> print OUTPUTFILE $ln;
>> close OUTPUTFILE;
>> close READFILE;
>>
>
>
>
> John
> --
> Any intelligent fool can make things bigger and
> more complex... It takes a touch of genius -
> and a lot of courage to move in the opposite
> direction. -- Albert Einstein
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> http://learn.perl.org/
>
>
>

--0016363ba3da206d0204a9b73e2d--

Re: Parsing a Text File using regex

am 05.08.2011 03:14:25 von Uri Guttman

>>>>> "ta" == timothy adigun <2teezperl@gmail.com> writes:

ta> I believe you know that in Perl there are more than one way to do
ta> it! i.e solve a problem. And that one way is no better than the
ta> other, it only depend on what the programmer preferred to use, as
ta> long as the syntax are correct. Secondly, most of your why would
ta> have been answered, if only you check Ryan Lagola's request.

>>> If @ARGV contains more than one element then this will not work correctly.

ta> *Not true!* using $ARGV[0], select only one file to generate your
ta> report from. But using @ARGV, one has all the files
ta> listed. Moreover, how do you know how many files he/she intended
ta> using from the CLI at once?! So, for me it is saver to use
ta> @ARGV. Please, don't misunderstand this, there are several ways of
ta> doing things!

no, you had this code:

chomp(my $raw_file=<@ARGV>);

and that will not work if @ARGV has more than one filename. try it out
and see.

>> open READFILE,"<","$raw_file" or die "can't open $!";

>>> Why are you copying $raw_file to a string?

ta> open OUTPUTFILE,">","$filename" or die "cannot read $!";
>>

>>> Why are you copying $filename to a string?

ta> I don't know what you mean by "copying both $raw_file and $filename into a
ta> string"! If you mean by using a double quote around $raw_file and $filename,
ta> then I should explain that that is called Interpolation in Perl! -- These
ta> two variables are scalar, so when a double quote they are interpolated i.e
ta> the value of the scalar (in this context) is inserted.

you are telling someone who knows perl well about interpolation. but
what you didn't get (and john didn't explain clearly enough it seems),
is that quoting a scalar like that isn't needed and it makes an extra
useless copy of the data. you can pass a scalar anywhere you want
without quoting it. also in some cases like with objects, quoting it
will actually be a bug.

ta> Lastly, codes are written to be improved on. One of the reasons we have
ta> different books!

huh?? this has nothing to do with books. it is just poor coding and john
was correcting you.

also please learn to edit quoted posts as there is no reason to see all
of the previous emails.

uri

--
Uri Guttman -- uri AT perlhunter DOT com --- http://www.perlhunter.com --
------------ Perl Developer Recruiting and Placement Services -------------
----- Perl Code Review, Architecture, Development, Training, Support -------

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Parsing a Text File using regex

am 06.08.2011 16:14:46 von Emeka

--000e0cd1522412dd9b04a9d6d6dd
Content-Type: text/plain; charset=ISO-8859-1

John,

Thanks for making things pretty simple for mere mortals ..


>>
> chomp( my $raw_file = glob "@ARGV" );
>

I am of the view that glob sub is used for as tree (that is to get all the
files in a folder and all its sub-folders. From the above, it seems like it
could be used for something else... Someone should help me out here.


> Why are you copying the contents of @ARGV to a string and then globbing
> that string?
>
> If @ARGV contains more than one element then this will not work correctly.
>
> And why chomp() a string that will not contain newlines?
>
> What you want is something like:
>
> my $raw_file = $ARGV[ 0 ];
>
>
>
> while(){chomp;
>> $ln.="\n" if /^\W.?+$/;
>> if(/^\d{4}/){$yr=$&;} # get the year
>> if(/^[A-Z].+/){ $cat=$&; # get the Category
>> $cat=join"",split /,/,$cat; # remove the comma in front
>> $ln.=" $yr: ".$cat; # add both the year and Category
>> }
>> if(/\--.+/){$win=$`; # get the winner
>>
>
> The use of $&, $' and $` will slow down *ALL* regular expressions in the
> program. Better to just use capturing parentheses.
>
> if (/^(\d{4})/ ) { $yr = $1 } # get the year
> if ( /^([A-Z].+)/ ) {
> $cat = $1; # get the Category
>
> $cat = join "", split /,/, $cat; # remove the comma in front
> $ln.= " $yr: " . $cat; # add both the year and Category
> }
> if ( /(.*?)\--.+/ ) { $win = $1; # get the winner
>


What is the idiomatic Perl , $1 or $[`, &,'] ? And what makes [$&, $', $`]
to slow down *ALL* regular expressions in the program.

>
>
>
>


--
*Satajanus Nig. Ltd


*

--000e0cd1522412dd9b04a9d6d6dd--

Re: Parsing a Text File using regex

am 06.08.2011 20:29:35 von Rob Dixon

On 06/08/2011 15:14, Emeka wrote:
> John,
>
> Thanks for making things pretty simple for mere mortals ..

Hi Emeka

>>>
>> chomp( my $raw_file = glob "@ARGV" );
>>
>
> I am of the view that glob sub is used for as tree (that is to get all the
> files in a folder and all its sub-folders. From the above, it seems like it
> could be used for something else... Someone should help me out here.

my @file_list = glob "@ARGV";

is a simple way of getting a list of all files that match the list of
filename patterns passed on the command line. But in scalar context (as
Timothy originally posted) it will fetch only the first file in that
list, and it is wrong to chomp it as it is not terminated by an
additional newline.

You are mostly correct, except that glob will not search a directory
tree of files. You can use wildcards, such as

glob '~/*/*'

which will list all files in any directory immediately within the home
directory, but to search throughout a directory tree of arbitrary depth
you need File::Find or something similar.

That is all glob does. It is of no use in any other way. What can be
confusing is that <*.pl> calls glob '*.pl' whereas calls
readline filehandle.

Take a look at

perldoc -f glob

and

perldoc File::Glob

(which is the module that implements the glob operator).

>> The use of $&, $' and $` will slow down *ALL* regular expressions in the
>> program. Better to just use capturing parentheses.
>>
>> if (/^(\d{4})/ ) { $yr = $1 } # get the year
>> if ( /^([A-Z].+)/ ) {
>> $cat = $1; # get the Category
>>
>> $cat = join "", split /,/, $cat; # remove the comma in front
>> $ln.= " $yr: " . $cat; # add both the year and Category
>> }
>> if ( /(.*?)\--.+/ ) { $win = $1; # get the winner
>>
>
>
> What is the idiomatic Perl , $1 or $[`,&,'] ? And what makes [$&, $', $`]
> to slow down *ALL* regular expressions in the program.

perldoc prelre says this:

> WARNING: Once Perl sees that you need one of $&, $`, or $' anywhere in
> the program, it has to provide them for every pattern match. This may
> substantially slow your program. Perl uses the same mechanism to produce
> $1, $2, etc, so you also pay a price for each pattern that contains
> capturing parentheses.

so no recent well-written code will use $& etc. although they are still
available for backward compatability.

Reading the same source again, it also says this

> So avoid $&, $', and $` if you can, but if you can't (and some
> algorithms really appreciate them), once you've used them once, use
> them at will, because you've already paid the price.

so there is a niche for their continued use, but I have never come
across anything that isn't better written using captures.

HTH,

Rob




--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/