Help extracting something from a string

Help extracting something from a string

am 19.11.2007 18:03:08 von bone

I am having a hard time figuring this one out as the records I am
asked to work with "seem" rather arbitrary.

I have a stream of text and I need to extract a filename in the form
(bash wildcards) "*-*-*-*-*-*.pdf"
including the double-quotes, the characters surrounding it could be
anything at all.

I won't get into how I have tried to do this so far but let's just say
cut isn't cutting it and I am pretty unskilled with sed apparently.

any help is appreciated.

Re: Help extracting something from a string

am 19.11.2007 23:02:22 von Ed Morton

On 11/19/2007 11:03 AM, bone wrote:
> I am having a hard time figuring this one out as the records I am
> asked to work with "seem" rather arbitrary.
>
> I have a stream of text and I need to extract a filename in the form
> (bash wildcards) "*-*-*-*-*-*.pdf"
> including the double-quotes, the characters surrounding it could be
> anything at all.
>
> I won't get into how I have tried to do this so far but let's just say
> cut isn't cutting it and I am pretty unskilled with sed apparently.
>
> any help is appreciated.

man grep

If that doesn't do it, post some sample input and expected output.

Ed.

Re: Help extracting something from a string

am 19.11.2007 23:06:21 von Edward Rosten

On Nov 19, 10:03 am, bone wrote:
> I am having a hard time figuring this one out as the records I am
> asked to work with "seem" rather arbitrary.
>
> I have a stream of text and I need to extract a filename in the form
> (bash wildcards) "*-*-*-*-*-*.pdf"
> including the double-quotes, the characters surrounding it could be
> anything at all.

Do you have more that one per line? If so, that pattern will not work.
Consider that the pattern is:
"*-*.pdf"

Then, the whole line will match the pattern:

"a-b.pdf" junk junk junk junk junk junk "b-c.pdf"

> I won't get into how I have tried to do this so far but let's just say
> cut isn't cutting it and I am pretty unskilled with sed apparently.


If you insist on space separation, and disallow spaces in the
filename, the following will work:

while read i
do
case "$i" in
\"*-*-*-*-*.pdf\")
echio $i;;
esac
done

If you want to allow spaces, and you have only one per line, this sed
script will do:
sed -ne's/.*\(".*-.*-.*-.*-.*-.*\.pdf"\).*/\1/;tp;d;:p;p'

Re: Help extracting something from a string

am 20.11.2007 20:10:41 von bone

On Nov 19, 5:06 pm, Edward Rosten wrote:
> On Nov 19, 10:03 am, bone wrote:
>
> > I am having a hard time figuring this one out as the records I am
> > asked to work with "seem" rather arbitrary.
>
> > I have a stream of text and I need to extract a filename in the form
> > (bash wildcards) "*-*-*-*-*-*.pdf"
> > including the double-quotes, the characters surrounding it could be
> > anything at all.
>
> Do you have more that one per line? If so, that pattern will not work.
> Consider that the pattern is:
> "*-*.pdf"
>
> Then, the whole line will match the pattern:
>
> "a-b.pdf" junk junk junk junk junk junk "b-c.pdf"
>
> > I won't get into how I have tried to do this so far but let's just say
> > cut isn't cutting it and I am pretty unskilled with sed apparently.
>
> If you insist on space separation, and disallow spaces in the
> filename, the following will work:
>
> while read i
> do
> case "$i" in
> \"*-*-*-*-*.pdf\")
> echio $i;;
> esac
> done

I don't control the input, it will not be space delimited generally
though.

>
> If you want to allow spaces, and you have only one per line, this sed
> script will do:
> sed -ne's/.*\(".*-.*-.*-.*-.*-.*\.pdf"\).*/\1/;tp;d;:p;p'

this doesn't seem to work:

$ echo ksdjfglsdfg"ddfd-dfdf-dfdf-dfdf-dfdf-dfdfd-dfdf.pdf"sdgsg| sed
-ne's/.*\(".*-.*-.*-.*-.*-.*\.pdf"\).*/\1/;tp;d;:p;p'

doesn't return anything

Re: Help extracting something from a string

am 20.11.2007 20:18:44 von bone

On Nov 19, 5:02 pm, Ed Morton wrote:
> On 11/19/2007 11:03 AM, bone wrote:
>
> > I am having a hard time figuring this one out as the records I am
> > asked to work with "seem" rather arbitrary.
>
> > I have a stream of text and I need to extract a filename in the form
> > (bash wildcards) "*-*-*-*-*-*.pdf"
> > including the double-quotes, the characters surrounding it could be
> > anything at all.
>
> > I won't get into how I have tried to do this so far but let's just say
> > cut isn't cutting it and I am pretty unskilled with sed apparently.
>
> > any help is appreciated.
>
> man grep
>
> If that doesn't do it, post some sample input and expected output.
>
> Ed.

I don't think grep is what I need at all.

sample input:

ksdj68248*&^*6862834fglsdfg"ddfd-dfdf-dfdf-dfdf-dfd-dd-
ddf.pdf"sdg*^&(6262646294626&^*&@^$*4":":#2sg


expected output:

ddfd-dfdf-dfdf-dfdf-dfd-dd-ddf.pdf

record length is arbitrary, record seperator character is nonexistent,
what the "matching filename" is is unknown.

Re: Help extracting something from a string

am 21.11.2007 08:25:19 von pgas

bone wrote:
>
> sample input:
>
> ksdj68248*&^*6862834fglsdfg"ddfd-dfdf-dfdf-dfdf-dfd-dd-
> ddf.pdf"sdg*^&(6262646294626&^*&@^$*4":":#2sg
>
> expected output:
>
> ddfd-dfdf-dfdf-dfdf-dfd-dd-ddf.pdf
>
> record length is arbitrary, record seperator character is nonexistent,
> what the "matching filename" is is unknown.

" is not a record separator?
in case it is:
awk -F\" '{print $2}'

in case it is not a separator, how do you know the file name is
'ddfd-dfdf-dfdf-dfdf-dfd-dd-ddf.pdf' and not
'sdfg"ddfd-dfdf-dfdf-dfdf-dfd-dd-ddf.pdf'OA ?


--
pgas @ SDF Public Access UNIX System - http://sdf.lonestar.org

Re: Help extracting something from a string

am 23.11.2007 00:22:58 von William Park

bone wrote:
> I don't think grep is what I need at all.
>
> sample input:
>
> ksdj68248*&^*6862834fglsdfg"ddfd-dfdf-dfdf-dfdf-dfd-dd-
> ddf.pdf"sdg*^&(6262646294626&^*&@^$*4":":#2sg
>
>
> expected output:
>
> ddfd-dfdf-dfdf-dfdf-dfd-dd-ddf.pdf
>
> record length is arbitrary, record seperator character is nonexistent,
> what the "matching filename" is is unknown.

Actually it's the first thing you should try... like

grep -o -e '"[^"]*\.pdf"'
or
tr '"' '\n' | grep '\.pdf$'

--
William Park , Toronto, Canada
BashDiff: Super Bash shell
http://freshmeat.net/projects/bashdiff/

Re: Help extracting something from a string

am 26.11.2007 23:37:00 von thomasriise

try sed:

echo 'ksdj68248*&^*6862834fglsdfg"ddfd-dfdf-dfdf-dfdf-dfd-dd-
ddf.pdf"hjkdlha' | sed 's/\(.*\)\(\"\)\(.*\)\(.pdf\)\(\"\)\(.*\)/\3\4/
g'

?

Re: Help extracting something from a string

am 27.11.2007 22:38:31 von Edward Rosten

On Nov 20, 12:10 pm, bone wrote:
> On Nov 19, 5:06 pm, Edward Rosten wrote:

> $ echo ksdjfglsdfg"ddfd-dfdf-dfdf-dfdf-dfdf-dfdfd-dfdf.pdf"sdgsg| sed
> -ne's/.*\(".*-.*-.*-.*-.*-.*\.pdf"\).*/\1/;tp;d;:p;p'

To see why this does not work, type:

echo ksdjfglsdfg"ddfd-dfdf-dfdf-dfdf-dfdf-dfdfd-dfdf.pdf"sdgsg

The shell is eating your "s

-Ed