Help extracting something from a string
Help extracting something from a string
am 19.11.2007 18:03:08 von bone
I am having a hard time figuring this one out as the records I am
asked to work with "seem" rather arbitrary.
I have a stream of text and I need to extract a filename in the form
(bash wildcards) "*-*-*-*-*-*.pdf"
including the double-quotes, the characters surrounding it could be
anything at all.
I won't get into how I have tried to do this so far but let's just say
cut isn't cutting it and I am pretty unskilled with sed apparently.
any help is appreciated.
Re: Help extracting something from a string
am 19.11.2007 23:02:22 von Ed Morton
On 11/19/2007 11:03 AM, bone wrote:
> I am having a hard time figuring this one out as the records I am
> asked to work with "seem" rather arbitrary.
>
> I have a stream of text and I need to extract a filename in the form
> (bash wildcards) "*-*-*-*-*-*.pdf"
> including the double-quotes, the characters surrounding it could be
> anything at all.
>
> I won't get into how I have tried to do this so far but let's just say
> cut isn't cutting it and I am pretty unskilled with sed apparently.
>
> any help is appreciated.
man grep
If that doesn't do it, post some sample input and expected output.
Ed.
Re: Help extracting something from a string
am 19.11.2007 23:06:21 von Edward Rosten
On Nov 19, 10:03 am, bone wrote:
> I am having a hard time figuring this one out as the records I am
> asked to work with "seem" rather arbitrary.
>
> I have a stream of text and I need to extract a filename in the form
> (bash wildcards) "*-*-*-*-*-*.pdf"
> including the double-quotes, the characters surrounding it could be
> anything at all.
Do you have more that one per line? If so, that pattern will not work.
Consider that the pattern is:
"*-*.pdf"
Then, the whole line will match the pattern:
"a-b.pdf" junk junk junk junk junk junk "b-c.pdf"
> I won't get into how I have tried to do this so far but let's just say
> cut isn't cutting it and I am pretty unskilled with sed apparently.
If you insist on space separation, and disallow spaces in the
filename, the following will work:
while read i
do
case "$i" in
\"*-*-*-*-*.pdf\")
echio $i;;
esac
done
If you want to allow spaces, and you have only one per line, this sed
script will do:
sed -ne's/.*\(".*-.*-.*-.*-.*-.*\.pdf"\).*/\1/;tp;d;:p;p'
Re: Help extracting something from a string
am 20.11.2007 20:10:41 von bone
On Nov 19, 5:06 pm, Edward Rosten wrote:
> On Nov 19, 10:03 am, bone wrote:
>
> > I am having a hard time figuring this one out as the records I am
> > asked to work with "seem" rather arbitrary.
>
> > I have a stream of text and I need to extract a filename in the form
> > (bash wildcards) "*-*-*-*-*-*.pdf"
> > including the double-quotes, the characters surrounding it could be
> > anything at all.
>
> Do you have more that one per line? If so, that pattern will not work.
> Consider that the pattern is:
> "*-*.pdf"
>
> Then, the whole line will match the pattern:
>
> "a-b.pdf" junk junk junk junk junk junk "b-c.pdf"
>
> > I won't get into how I have tried to do this so far but let's just say
> > cut isn't cutting it and I am pretty unskilled with sed apparently.
>
> If you insist on space separation, and disallow spaces in the
> filename, the following will work:
>
> while read i
> do
> case "$i" in
> \"*-*-*-*-*.pdf\")
> echio $i;;
> esac
> done
I don't control the input, it will not be space delimited generally
though.
>
> If you want to allow spaces, and you have only one per line, this sed
> script will do:
> sed -ne's/.*\(".*-.*-.*-.*-.*-.*\.pdf"\).*/\1/;tp;d;:p;p'
this doesn't seem to work:
$ echo ksdjfglsdfg"ddfd-dfdf-dfdf-dfdf-dfdf-dfdfd-dfdf.pdf"sdgsg| sed
-ne's/.*\(".*-.*-.*-.*-.*-.*\.pdf"\).*/\1/;tp;d;:p;p'
doesn't return anything
Re: Help extracting something from a string
am 20.11.2007 20:18:44 von bone
On Nov 19, 5:02 pm, Ed Morton wrote:
> On 11/19/2007 11:03 AM, bone wrote:
>
> > I am having a hard time figuring this one out as the records I am
> > asked to work with "seem" rather arbitrary.
>
> > I have a stream of text and I need to extract a filename in the form
> > (bash wildcards) "*-*-*-*-*-*.pdf"
> > including the double-quotes, the characters surrounding it could be
> > anything at all.
>
> > I won't get into how I have tried to do this so far but let's just say
> > cut isn't cutting it and I am pretty unskilled with sed apparently.
>
> > any help is appreciated.
>
> man grep
>
> If that doesn't do it, post some sample input and expected output.
>
> Ed.
I don't think grep is what I need at all.
sample input:
ksdj68248*&^*6862834fglsdfg"ddfd-dfdf-dfdf-dfdf-dfd-dd-
ddf.pdf"sdg*^&(6262646294626&^*&@^$*4":":#2sg
expected output:
ddfd-dfdf-dfdf-dfdf-dfd-dd-ddf.pdf
record length is arbitrary, record seperator character is nonexistent,
what the "matching filename" is is unknown.
Re: Help extracting something from a string
am 21.11.2007 08:25:19 von pgas
bone wrote:
>
> sample input:
>
> ksdj68248*&^*6862834fglsdfg"ddfd-dfdf-dfdf-dfdf-dfd-dd-
> ddf.pdf"sdg*^&(6262646294626&^*&@^$*4":":#2sg
>
> expected output:
>
> ddfd-dfdf-dfdf-dfdf-dfd-dd-ddf.pdf
>
> record length is arbitrary, record seperator character is nonexistent,
> what the "matching filename" is is unknown.
" is not a record separator?
in case it is:
awk -F\" '{print $2}'
in case it is not a separator, how do you know the file name is
'ddfd-dfdf-dfdf-dfdf-dfd-dd-ddf.pdf' and not
'sdfg"ddfd-dfdf-dfdf-dfdf-dfd-dd-ddf.pdf'OA ?
--
pgas @ SDF Public Access UNIX System - http://sdf.lonestar.org
Re: Help extracting something from a string
am 23.11.2007 00:22:58 von William Park
bone wrote:
> I don't think grep is what I need at all.
>
> sample input:
>
> ksdj68248*&^*6862834fglsdfg"ddfd-dfdf-dfdf-dfdf-dfd-dd-
> ddf.pdf"sdg*^&(6262646294626&^*&@^$*4":":#2sg
>
>
> expected output:
>
> ddfd-dfdf-dfdf-dfdf-dfd-dd-ddf.pdf
>
> record length is arbitrary, record seperator character is nonexistent,
> what the "matching filename" is is unknown.
Actually it's the first thing you should try... like
grep -o -e '"[^"]*\.pdf"'
or
tr '"' '\n' | grep '\.pdf$'
--
William Park , Toronto, Canada
BashDiff: Super Bash shell
http://freshmeat.net/projects/bashdiff/
Re: Help extracting something from a string
am 26.11.2007 23:37:00 von thomasriise
try sed:
echo 'ksdj68248*&^*6862834fglsdfg"ddfd-dfdf-dfdf-dfdf-dfd-dd-
ddf.pdf"hjkdlha' | sed 's/\(.*\)\(\"\)\(.*\)\(.pdf\)\(\"\)\(.*\)/\3\4/
g'
?
Re: Help extracting something from a string
am 27.11.2007 22:38:31 von Edward Rosten
On Nov 20, 12:10 pm, bone wrote:
> On Nov 19, 5:06 pm, Edward Rosten wrote:
> $ echo ksdjfglsdfg"ddfd-dfdf-dfdf-dfdf-dfdf-dfdfd-dfdf.pdf"sdgsg| sed
> -ne's/.*\(".*-.*-.*-.*-.*-.*\.pdf"\).*/\1/;tp;d;:p;p'
To see why this does not work, type:
echo ksdjfglsdfg"ddfd-dfdf-dfdf-dfdf-dfdf-dfdfd-dfdf.pdf"sdgsg
The shell is eating your "s
-Ed