OK, I give up here. I am DEFINITELY not a Regex expert, and have been
working on this for hours with no luck.
Basically I need to parse a page for certain information which will be
fed back into CURL to post to a site. I need to find four types of tags
on the page:
Re: Regex help
am 15.10.2007 08:25:19 von Steve
"Jerry Stuckle" wrote in message
news:KaadnQnnGt0WT4_anZ2dnUVZ_tajnZ2d@comcast.com...
> OK, I give up here. I am DEFINITELY not a Regex expert, and have been
> working on this for hours with no luck.
>
> Basically I need to parse a page for certain information which will be fed
> back into CURL to post to a site. I need to find four types of tags on
> the page:
>
>
>
>
>
Re: Regex help
am 15.10.2007 10:44:27 von Captain Paralytic
On 15 Oct, 03:37, Jerry Stuckle wrote:
> OK, I give up here. I am DEFINITELY not a Regex expert, and have been
> working on this for hours with no luck.
>
> Basically I need to parse a page for certain information which will be
> fed back into CURL to post to a site. I need to find four types of tags
> on the page:
>
>
>
>
>
Re: Regex help
am 15.10.2007 12:15:21 von Jerry Stuckle
Steve wrote:
> "Jerry Stuckle" wrote in message
> news:KaadnQnnGt0WT4_anZ2dnUVZ_tajnZ2d@comcast.com...
>> OK, I give up here. I am DEFINITELY not a Regex expert, and have been
>> working on this for hours with no luck.
>>
>> Basically I need to parse a page for certain information which will be fed
>> back into CURL to post to a site. I need to find four types of tags on
>> the page:
>>
>>
>>
>>
>>
Re: Regex help
am 15.10.2007 12:17:00 von Jerry Stuckle
Captain Paralytic wrote:
> On 15 Oct, 03:37, Jerry Stuckle wrote:
>> OK, I give up here. I am DEFINITELY not a Regex expert, and have been
>> working on this for hours with no luck.
>>
>> Basically I need to parse a page for certain information which will be
>> fed back into CURL to post to a site. I need to find four types of tags
>> on the page:
>>
>>
>>
>>
>>
>>
>> I don't need any other tags.
>>
>> From the hidden and submit types, I need name and value. From the text
>> and select types, I just need the name.
>>
>> I can assume the attributes will always show up in this order, but there
>> may be other things between the < and > delimiters. Additionally, the
>> actual type and name may have single or double quotes around them, or
>> neither.
>>
>> Does anyone have some code for this? It doesn't have to be all one regex.
>>
>> TIA.
>>
>> --
>> ==================
>> Remove the "x" from my email address
>> Jerry Stuckle
>> JDS Computer Training Corp.
>> jstuck...@attglobal.net
>> ==================
>
> Could you use the php dom functionality for this?
>
> Wouldn't it be good if php had the equivalent of
> getElementsByTagName()!
>
>
On 15 Oct, 11:17, Jerry Stuckle wrote:
> Captain Paralytic wrote:
> > On 15 Oct, 03:37, Jerry Stuckle wrote:
> >> OK, I give up here. I am DEFINITELY not a Regex expert, and have been
> >> working on this for hours with no luck.
>
> >> Basically I need to parse a page for certain information which will be
> >> fed back into CURL to post to a site. I need to find four types of tags
> >> on the page:
>
> >>
> >>
> >>
> >>
>
> >> I don't need any other tags.
>
> >> From the hidden and submit types, I need name and value. From the text
> >> and select types, I just need the name.
>
> >> I can assume the attributes will always show up in this order, but there
> >> may be other things between the < and > delimiters. Additionally, the
> >> actual type and name may have single or double quotes around them, or
> >> neither.
>
> >> Does anyone have some code for this? It doesn't have to be all one regex.
>
> >> TIA.
>
> >> --
> >> ==================
> >> Remove the "x" from my email address
> >> Jerry Stuckle
> >> JDS Computer Training Corp.
> >> jstuck...@attglobal.net
> >> ==================
>
> > Could you use the php dom functionality for this?
>
> > Wouldn't it be good if php had the equivalent of
> > getElementsByTagName()!
>
> Hi, Paul,
>
> How I wish I could - it was the first thing I tried. However, this page
> is not well formed html, and DOM throws up all over it.
>
> --
> ==================
> Remove the "x" from my email address
> Jerry Stuckle
> JDS Computer Training Corp.
> jstuck...@attglobal.net
> ==================- Hide quoted text -
>
> - Show quoted text -
"Jerry Stuckle" wrote in message
news:K-qdnTSkY4NaoI7anZ2dnUVZ_j6dnZ2d@comcast.com...
> Steve wrote:
>> "Jerry Stuckle" wrote in message
>> news:KaadnQnnGt0WT4_anZ2dnUVZ_tajnZ2d@comcast.com...
>>> OK, I give up here. I am DEFINITELY not a Regex expert, and have been
>>> working on this for hours with no luck.
>>>
>>> Basically I need to parse a page for certain information which will be
>>> fed back into CURL to post to a site. I need to find four types of tags
>>> on the page:
>>>
>>>
>>>
>>>
>>>
>>>
>>> I don't need any other tags.
>>>
>>> From the hidden and submit types, I need name and value. From the text
>>> and select types, I just need the name.
>>>
>>> I can assume the attributes will always show up in this order, but there
>>> may be other things between the < and > delimiters. Additionally, the
>>> actual type and name may have single or double quotes around them, or
>>> neither.
>>>
>>> Does anyone have some code for this? It doesn't have to be all one
>>> regex.
>>
>> alright, jer. let's see what we can do...
>>
>> here's an eyeballed attempt:
>>
>> <(select\s?[^> ].*?)|(input\s[^t]*?type\s*?=\s?('|"|\s)(hidden|text|submit) \3[^>].*?)>
>>
>> to keep it easier, i'd think about using that to get your general
>> matches. iterating through those, i'd apply another regex to break out
>> the name, type, and value. you could very well catch it all in the above,
>> however, it's not as straightforward and hence, not easily maintained. if
>> you need additional help on writing this, let me know. i'll psuedo-code
>> the whole enchillada if you want. this should be sufficient in getting
>> only those tags you listed above...which is a good start.
>>
>> btw, make the seach caseINsensitive.
>
> Hi, Steve,
>
> Yep, it's a start. Some problems (output below), but I think it will get
> me a little farther.
>
> And you're right, I already gave up on getting everything in one pass. I
> was thinking of trying to just get everything for a single element type
> (i.e. all elements), but this gives me another idea,
> also.
>
> And the output from the first try:
>
> Array
> (
> [0] => Array
> (
> [0] =>
> [1] =>
> [2] =>
> )
>
> [1] => Array
> (
> [0] => select n
> [1] => select n
> [2] => select n
> )
>
> [2] => Array
> (
> [0] =>
> [1] =>
> [2] =>
> )
>
> [3] => Array
> (
> [0] =>
> [1] =>
> [2] =>
> )
>
> [4] => Array
> (
> [0] =>
> [1] =>
> [2] =>
> )
>
> )
Steve wrote:
> "Jerry Stuckle" wrote in message
> news:K-qdnTSkY4NaoI7anZ2dnUVZ_j6dnZ2d@comcast.com...
>> Steve wrote:
>>> "Jerry Stuckle" wrote in message
>>> news:KaadnQnnGt0WT4_anZ2dnUVZ_tajnZ2d@comcast.com...
>>>> OK, I give up here. I am DEFINITELY not a Regex expert, and have been
>>>> working on this for hours with no luck.
>>>>
>>>> Basically I need to parse a page for certain information which will be
>>>> fed back into CURL to post to a site. I need to find four types of tags
>>>> on the page:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> I don't need any other tags.
>>>>
>>>> From the hidden and submit types, I need name and value. From the text
>>>> and select types, I just need the name.
>>>>
>>>> I can assume the attributes will always show up in this order, but there
>>>> may be other things between the < and > delimiters. Additionally, the
>>>> actual type and name may have single or double quotes around them, or
>>>> neither.
>>>>
>>>> Does anyone have some code for this? It doesn't have to be all one
>>>> regex.
>>> alright, jer. let's see what we can do...
>>>
>>> here's an eyeballed attempt:
>>>
>>> <(select\s?[^> ].*?)|(input\s[^t]*?type\s*?=\s?('|"|\s)(hidden|text|submit) \3[^>].*?)>
>>>
>>> to keep it easier, i'd think about using that to get your general
>>> matches. iterating through those, i'd apply another regex to break out
>>> the name, type, and value. you could very well catch it all in the above,
>>> however, it's not as straightforward and hence, not easily maintained. if
>>> you need additional help on writing this, let me know. i'll psuedo-code
>>> the whole enchillada if you want. this should be sufficient in getting
>>> only those tags you listed above...which is a good start.
>>>
>>> btw, make the seach caseINsensitive.
>> Hi, Steve,
>>
>> Yep, it's a start. Some problems (output below), but I think it will get
>> me a little farther.
>>
>> And you're right, I already gave up on getting everything in one pass. I
>> was thinking of trying to just get everything for a single element type
>> (i.e. all elements), but this gives me another idea,
>> also.
>>
>> And the output from the first try:
>>
>> Array
>> (
>> [0] => Array
>> (
>> [0] =>
>> [1] =>
>> [2] =>
>> )
>>
>> [1] => Array
>> (
>> [0] => select n
>> [1] => select n
>> [2] => select n
>> )
>>
>> [2] => Array
>> (
>> [0] =>
>> [1] =>
>> [2] =>
>> )
>>
>> [3] => Array
>> (
>> [0] =>
>> [1] =>
>> [2] =>
>> )
>>
>> [4] => Array
>> (
>> [0] =>
>> [1] =>
>> [2] =>
>> )
>>
>> )
>
> well, that's no so good a start! i'll break out the old regex ide and fix
> that...if you want.
>
>
>
Captain Paralytic wrote:
> On 15 Oct, 11:17, Jerry Stuckle wrote:
>> Captain Paralytic wrote:
>>> On 15 Oct, 03:37, Jerry Stuckle wrote:
>>>> OK, I give up here. I am DEFINITELY not a Regex expert, and have been
>>>> working on this for hours with no luck.
>>>> Basically I need to parse a page for certain information which will be
>>>> fed back into CURL to post to a site. I need to find four types of tags
>>>> on the page:
>>>>
>>>>
>>>>
>>>>
>>>> I don't need any other tags.
>>>> From the hidden and submit types, I need name and value. From the text
>>>> and select types, I just need the name.
>>>> I can assume the attributes will always show up in this order, but there
>>>> may be other things between the < and > delimiters. Additionally, the
>>>> actual type and name may have single or double quotes around them, or
>>>> neither.
>>>> Does anyone have some code for this? It doesn't have to be all one regex.
>>>> TIA.
>>>> --
>>>> ==================
>>>> Remove the "x" from my email address
>>>> Jerry Stuckle
>>>> JDS Computer Training Corp.
>>>> jstuck...@attglobal.net
>>>> ==================
>>> Could you use the php dom functionality for this?
>>> Wouldn't it be good if php had the equivalent of
>>> getElementsByTagName()!
>> Hi, Paul,
>>
>> How I wish I could - it was the first thing I tried. However, this page
>> is not well formed html, and DOM throws up all over it.
>>
>> --
>> ==================
>> Remove the "x" from my email address
>> Jerry Stuckle
>> JDS Computer Training Corp.
>> jstuck...@attglobal.net
>> ==================- Hide quoted text -
>>
>> - Show quoted text -
>
> Of course, when I said: "Wouldn't it be good if php had the equivalent
> of getElementsByTagName()!", I meant that it would be good if its
> version was as tolerant as javascript's one.
>
>
"Jerry Stuckle" wrote in message
news:u9WdnU2yhZ2Q5o7anZ2dnUVZ_trinZ2d@comcast.com...
> Steve wrote:
>> "Jerry Stuckle" wrote in message
>> news:K-qdnTSkY4NaoI7anZ2dnUVZ_j6dnZ2d@comcast.com...
>>> Steve wrote:
>>>> "Jerry Stuckle" wrote in message
>>>> news:KaadnQnnGt0WT4_anZ2dnUVZ_tajnZ2d@comcast.com...
>>>>> OK, I give up here. I am DEFINITELY not a Regex expert, and have been
>>>>> working on this for hours with no luck.
>>>>>
>>>>> Basically I need to parse a page for certain information which will be
>>>>> fed back into CURL to post to a site. I need to find four types of
>>>>> tags on the page:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I don't need any other tags.
>>>>>
>>>>> From the hidden and submit types, I need name and value. From the
>>>>> text and select types, I just need the name.
>>>>>
>>>>> I can assume the attributes will always show up in this order, but
>>>>> there may be other things between the < and > delimiters.
>>>>> Additionally, the actual type and name may have single or double
>>>>> quotes around them, or neither.
>>>>>
>>>>> Does anyone have some code for this? It doesn't have to be all one
>>>>> regex.
>>>> alright, jer. let's see what we can do...
>>>>
>>>> here's an eyeballed attempt:
>>>>
>>>> <(select\s?[^> ].*?)|(input\s[^t]*?type\s*?=\s?('|"|\s)(hidden|text|submit) \3[^>].*?)>
>>>>
>>>> to keep it easier, i'd think about using that to get your general
>>>> matches. iterating through those, i'd apply another regex to break out
>>>> the name, type, and value. you could very well catch it all in the
>>>> above, however, it's not as straightforward and hence, not easily
>>>> maintained. if you need additional help on writing this, let me know.
>>>> i'll psuedo-code the whole enchillada if you want. this should be
>>>> sufficient in getting only those tags you listed above...which is a
>>>> good start.
>>>>
>>>> btw, make the seach caseINsensitive.
>>> Hi, Steve,
>>>
>>> Yep, it's a start. Some problems (output below), but I think it will
>>> get me a little farther.
>>>
>>> And you're right, I already gave up on getting everything in one pass. I
>>> was thinking of trying to just get everything for a single element type
>>> (i.e. all elements), but this gives me another
>>> idea, also.
>>>
>>> And the output from the first try:
>>>
>>> Array
>>> (
>>> [0] => Array
>>> (
>>> [0] =>
>>> [1] =>
>>> [2] =>
>>> )
>>>
>>> [1] => Array
>>> (
>>> [0] => select n
>>> [1] => select n
>>> [2] => select n
>>> )
>>>
>>> [2] => Array
>>> (
>>> [0] =>
>>> [1] =>
>>> [2] =>
>>> )
>>>
>>> [3] => Array
>>> (
>>> [0] =>
>>> [1] =>
>>> [2] =>
>>> )
>>>
>>> [4] => Array
>>> (
>>> [0] =>
>>> [1] =>
>>> [2] =>
>>> )
>>>
>>> )
>>
>> well, that's no so good a start! i'll break out the old regex ide and fix
>> that...if you want.
>
> If you have the time, I would appreciate it. Otherwise I can struggle
> through this myself :-)
Steve wrote:
> "Jerry Stuckle" wrote in message
> ok, here's the one to get the select:
>
> (select)\s*?[^n].*?(name)\s*?=\s*?(?:\'|")?([^\3>]*)?\3?\s*? [^>]
>
> here's the one to break out the inputs and capture each type, name,
> and value:
>
> (input)\s*?[^n].*?(?:(name|type|value)\s*?=\s*?(?:'|")?([^\2 >]*?)\2?(?:\s)?)*?>
Strange, when I tried to find this thread in OE instead of on Google, it was
up before a 2007-06-22 post instead of at the bottom of the list. Wierd.
I have to say that, on the occations when I ask for help on a REGEX and the
answer comes back from someone far more expert than me (doesn't take much),
looking as horrendous as Steve's solution (no offence meant), I feel sort of
vindicated. One always suspects that one's problem should be solvable with a
really short clever bit of regex. One forgets that the really clever stuff
looks like Steve's solution!
Re: Regex help
am 15.10.2007 22:47:22 von Jerry Stuckle
Steve wrote:
> "Jerry Stuckle" wrote in message
> news:u9WdnU2yhZ2Q5o7anZ2dnUVZ_trinZ2d@comcast.com...
>> Steve wrote:
>>> "Jerry Stuckle" wrote in message
>>> news:K-qdnTSkY4NaoI7anZ2dnUVZ_j6dnZ2d@comcast.com...
>>>> Steve wrote:
>>>>> "Jerry Stuckle" wrote in message
>>>>> news:KaadnQnnGt0WT4_anZ2dnUVZ_tajnZ2d@comcast.com...
>>>>>> OK, I give up here. I am DEFINITELY not a Regex expert, and have been
>>>>>> working on this for hours with no luck.
>>>>>>
>>>>>> Basically I need to parse a page for certain information which will be
>>>>>> fed back into CURL to post to a site. I need to find four types of
>>>>>> tags on the page:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I don't need any other tags.
>>>>>>
>>>>>> From the hidden and submit types, I need name and value. From the
>>>>>> text and select types, I just need the name.
>>>>>>
>>>>>> I can assume the attributes will always show up in this order, but
>>>>>> there may be other things between the < and > delimiters.
>>>>>> Additionally, the actual type and name may have single or double
>>>>>> quotes around them, or neither.
>>>>>>
>>>>>> Does anyone have some code for this? It doesn't have to be all one
>>>>>> regex.
>>>>> alright, jer. let's see what we can do...
>>>>>
>>>>> here's an eyeballed attempt:
>>>>>
>>>>> <(select\s?[^> ].*?)|(input\s[^t]*?type\s*?=\s?('|"|\s)(hidden|text|submit) \3[^>].*?)>
>>>>>
>>>>> to keep it easier, i'd think about using that to get your general
>>>>> matches. iterating through those, i'd apply another regex to break out
>>>>> the name, type, and value. you could very well catch it all in the
>>>>> above, however, it's not as straightforward and hence, not easily
>>>>> maintained. if you need additional help on writing this, let me know.
>>>>> i'll psuedo-code the whole enchillada if you want. this should be
>>>>> sufficient in getting only those tags you listed above...which is a
>>>>> good start.
>>>>>
>>>>> btw, make the seach caseINsensitive.
>>>> Hi, Steve,
>>>>
>>>> Yep, it's a start. Some problems (output below), but I think it will
>>>> get me a little farther.
>>>>
>>>> And you're right, I already gave up on getting everything in one pass. I
>>>> was thinking of trying to just get everything for a single element type
>>>> (i.e. all elements), but this gives me another
>>>> idea, also.
>>>>
>>>> And the output from the first try:
>>>>
>>>> Array
>>>> (
>>>> [0] => Array
>>>> (
>>>> [0] =>
>>>> [1] =>
>>>> [2] =>
>>>> )
>>>>
>>>> [1] => Array
>>>> (
>>>> [0] => select n
>>>> [1] => select n
>>>> [2] => select n
>>>> )
>>>>
>>>> [2] => Array
>>>> (
>>>> [0] =>
>>>> [1] =>
>>>> [2] =>
>>>> )
>>>>
>>>> [3] => Array
>>>> (
>>>> [0] =>
>>>> [1] =>
>>>> [2] =>
>>>> )
>>>>
>>>> [4] => Array
>>>> (
>>>> [0] =>
>>>> [1] =>
>>>> [2] =>
>>>> )
>>>>
>>>> )
>>> well, that's no so good a start! i'll break out the old regex ide and fix
>>> that...if you want.
>> If you have the time, I would appreciate it. Otherwise I can struggle
>> through this myself :-)
>
> ok, here's the one to get the select:
>
> (select)\s*?[^n].*?(name)\s*?=\s*?(?:\'|")?([^\3>]*)?\3?\s*? [^>]
>
> here's the one to break out the inputs and capture each type, name, and
> value:
>
> (input)\s*?[^n].*?(?:(name|type|value)\s*?=\s*?(?:'|")?([^\2 >]*?)\2?(?:\s)?)*?>
>
> the problem with this one though, is that it debugs fine in 'the regulator'
> regex ide. however, some of the captures are being overwritten under
> preg_match_all.
>
> the implementation would have been an array of these two patterns. preg
> should return the type (select or input)...from that point, you'd know where
> in the matches to find the type, name, and value regardless of the order in
> which it came. as it is, you can use $matches[0][...n] on the input pattern
> matches to iterate the full input match.
>
> hope that helps.
>
>
>
Paul Lautman wrote:
> Steve wrote:
>> "Jerry Stuckle" wrote in message
>> ok, here's the one to get the select:
>>
>> (select)\s*?[^n].*?(name)\s*?=\s*?(?:\'|")?([^\3>]*)?\3?\s*? [^>]
>>
>> here's the one to break out the inputs and capture each type, name,
>> and value:
>>
>> (input)\s*?[^n].*?(?:(name|type|value)\s*?=\s*?(?:'|")?([^\2 >]*?)\2?(?:\s)?)*?>
>
> Strange, when I tried to find this thread in OE instead of on Google, it was
> up before a 2007-06-22 post instead of at the bottom of the list. Wierd.
>
> I have to say that, on the occations when I ask for help on a REGEX and the
> answer comes back from someone far more expert than me (doesn't take much),
> looking as horrendous as Steve's solution (no offence meant), I feel sort of
> vindicated. One always suspects that one's problem should be solvable with a
> really short clever bit of regex. One forgets that the really clever stuff
> looks like Steve's solution!
>
>
>
>
Paul,
Yes, I had the same topic as an older thread, and evidently Giganews
attached my post to the thread.
Strange... I didn't even have the thread displayed (it was long since
read).
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Re: Regex help
am 15.10.2007 23:55:25 von Steve
"Paul Lautman" wrote in message
news:5nhpj9Fi9q28U1@mid.individual.net...
> Steve wrote:
>> "Jerry Stuckle" wrote in message
>> ok, here's the one to get the select:
>>
>> (select)\s*?[^n].*?(name)\s*?=\s*?(?:\'|")?([^\3>]*)?\3?\s*? [^>]
>>
>> here's the one to break out the inputs and capture each type, name,
>> and value:
>>
>> (input)\s*?[^n].*?(?:(name|type|value)\s*?=\s*?(?:'|")?([^\2 >]*?)\2?(?:\s)?)*?>
>
> Strange, when I tried to find this thread in OE instead of on Google, it
> was up before a 2007-06-22 post instead of at the bottom of the list.
> Wierd.
>
> I have to say that, on the occations when I ask for help on a REGEX and
> the answer comes back from someone far more expert than me (doesn't take
> much), looking as horrendous as Steve's solution (no offence meant), I
> feel sort of vindicated. One always suspects that one's problem should be
> solvable with a really short clever bit of regex. One forgets that the
> really clever stuff looks like Steve's solution!
keep in mind, please, that regex is another language in itself and is fully
capable of the same indentation and commenting as any other language. were i
to have known i was going to be graded, i may have done both.
if you are uncomfortable using regex, don't. as for the rest of us who
understand it, it is an irreplaceable tool.
Re: Regex help
am 15.10.2007 23:56:15 von Steve
"Jerry Stuckle" wrote in message
news:bbudnS3Qb-29T47anZ2dnUVZ_uzinZ2d@comcast.com...
> Steve wrote:
>> "Jerry Stuckle" wrote in message
>> news:u9WdnU2yhZ2Q5o7anZ2dnUVZ_trinZ2d@comcast.com...
>>> Steve wrote:
>>>> "Jerry Stuckle" wrote in message
>>>> news:K-qdnTSkY4NaoI7anZ2dnUVZ_j6dnZ2d@comcast.com...
>>>>> Steve wrote:
>>>>>> "Jerry Stuckle" wrote in message
>>>>>> news:KaadnQnnGt0WT4_anZ2dnUVZ_tajnZ2d@comcast.com...
>>>>>>> OK, I give up here. I am DEFINITELY not a Regex expert, and have
>>>>>>> been working on this for hours with no luck.
>>>>>>>
>>>>>>> Basically I need to parse a page for certain information which will
>>>>>>> be fed back into CURL to post to a site. I need to find four types
>>>>>>> of tags on the page:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I don't need any other tags.
>>>>>>>
>>>>>>> From the hidden and submit types, I need name and value. From the
>>>>>>> text and select types, I just need the name.
>>>>>>>
>>>>>>> I can assume the attributes will always show up in this order, but
>>>>>>> there may be other things between the < and > delimiters.
>>>>>>> Additionally, the actual type and name may have single or double
>>>>>>> quotes around them, or neither.
>>>>>>>
>>>>>>> Does anyone have some code for this? It doesn't have to be all one
>>>>>>> regex.
>>>>>> alright, jer. let's see what we can do...
>>>>>>
>>>>>> here's an eyeballed attempt:
>>>>>>
>>>>>> <(select\s?[^> ].*?)|(input\s[^t]*?type\s*?=\s?('|"|\s)(hidden|text|submit) \3[^>].*?)>
>>>>>>
>>>>>> to keep it easier, i'd think about using that to get your general
>>>>>> matches. iterating through those, i'd apply another regex to break
>>>>>> out the name, type, and value. you could very well catch it all in
>>>>>> the above, however, it's not as straightforward and hence, not easily
>>>>>> maintained. if you need additional help on writing this, let me know.
>>>>>> i'll psuedo-code the whole enchillada if you want. this should be
>>>>>> sufficient in getting only those tags you listed above...which is a
>>>>>> good start.
>>>>>>
>>>>>> btw, make the seach caseINsensitive.
>>>>> Hi, Steve,
>>>>>
>>>>> Yep, it's a start. Some problems (output below), but I think it will
>>>>> get me a little farther.
>>>>>
>>>>> And you're right, I already gave up on getting everything in one pass.
>>>>> I was thinking of trying to just get everything for a single element
>>>>> type (i.e. all elements), but this gives me
>>>>> another idea, also.
>>>>>
>>>>> And the output from the first try:
>>>>>
>>>>> Array
>>>>> (
>>>>> [0] => Array
>>>>> (
>>>>> [0] =>
>>>>> [1] =>
>>>>> [2] =>
>>>>> )
>>>>>
>>>>> [1] => Array
>>>>> (
>>>>> [0] => select n
>>>>> [1] => select n
>>>>> [2] => select n
>>>>> )
>>>>>
>>>>> [2] => Array
>>>>> (
>>>>> [0] =>
>>>>> [1] =>
>>>>> [2] =>
>>>>> )
>>>>>
>>>>> [3] => Array
>>>>> (
>>>>> [0] =>
>>>>> [1] =>
>>>>> [2] =>
>>>>> )
>>>>>
>>>>> [4] => Array
>>>>> (
>>>>> [0] =>
>>>>> [1] =>
>>>>> [2] =>
>>>>> )
>>>>>
>>>>> )
>>>> well, that's no so good a start! i'll break out the old regex ide and
>>>> fix that...if you want.
>>> If you have the time, I would appreciate it. Otherwise I can struggle
>>> through this myself :-)
>>
>> ok, here's the one to get the select:
>>
>> (select)\s*?[^n].*?(name)\s*?=\s*?(?:\'|")?([^\3>]*)?\3?\s*? [^>]
>>
>> here's the one to break out the inputs and capture each type, name, and
>> value:
>>
>> (input)\s*?[^n].*?(?:(name|type|value)\s*?=\s*?(?:'|")?([^\2 >]*?)\2?(?:\s)?)*?>
>>
>> the problem with this one though, is that it debugs fine in 'the
>> regulator' regex ide. however, some of the captures are being overwritten
>> under preg_match_all.
>>
>> the implementation would have been an array of these two patterns. preg
>> should return the type (select or input)...from that point, you'd know
>> where in the matches to find the type, name, and value regardless of the
>> order in which it came. as it is, you can use $matches[0][...n] on the
>> input pattern matches to iterate the full input match.
>>
>> hope that helps.
>
> Thanks much, Steve! I think I can make it from here.
"Jerry Stuckle" wrote in message
news:bbudnS3Qb-29T47anZ2dnUVZ_uzinZ2d@comcast.com...
> Steve wrote:
>> "Jerry Stuckle" wrote in message
>> news:u9WdnU2yhZ2Q5o7anZ2dnUVZ_trinZ2d@comcast.com...
>>> Steve wrote:
>>>> "Jerry Stuckle" wrote in message
>>>> news:K-qdnTSkY4NaoI7anZ2dnUVZ_j6dnZ2d@comcast.com...
>>>>> Steve wrote:
>>>>>> "Jerry Stuckle" wrote in message
>>>>>> news:KaadnQnnGt0WT4_anZ2dnUVZ_tajnZ2d@comcast.com...
>>>>>>> OK, I give up here. I am DEFINITELY not a Regex expert, and have
>>>>>>> been working on this for hours with no luck.
>>>>>>>
>>>>>>> Basically I need to parse a page for certain information which will
>>>>>>> be fed back into CURL to post to a site. I need to find four types
>>>>>>> of tags on the page:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I don't need any other tags.
>>>>>>>
>>>>>>> From the hidden and submit types, I need name and value. From the
>>>>>>> text and select types, I just need the name.
>>>>>>>
>>>>>>> I can assume the attributes will always show up in this order, but
>>>>>>> there may be other things between the < and > delimiters.
>>>>>>> Additionally, the actual type and name may have single or double
>>>>>>> quotes around them, or neither.
>>>>>>>
>>>>>>> Does anyone have some code for this? It doesn't have to be all one
>>>>>>> regex.
>>>>>> alright, jer. let's see what we can do...
>>>>>>
>>>>>> here's an eyeballed attempt:
>>>>>>
>>>>>> <(select\s?[^> ].*?)|(input\s[^t]*?type\s*?=\s?('|"|\s)(hidden|text|submit) \3[^>].*?)>
>>>>>>
>>>>>> to keep it easier, i'd think about using that to get your general
>>>>>> matches. iterating through those, i'd apply another regex to break
>>>>>> out the name, type, and value. you could very well catch it all in
>>>>>> the above, however, it's not as straightforward and hence, not easily
>>>>>> maintained. if you need additional help on writing this, let me know.
>>>>>> i'll psuedo-code the whole enchillada if you want. this should be
>>>>>> sufficient in getting only those tags you listed above...which is a
>>>>>> good start.
>>>>>>
>>>>>> btw, make the seach caseINsensitive.
>>>>> Hi, Steve,
>>>>>
>>>>> Yep, it's a start. Some problems (output below), but I think it will
>>>>> get me a little farther.
>>>>>
>>>>> And you're right, I already gave up on getting everything in one pass.
>>>>> I was thinking of trying to just get everything for a single element
>>>>> type (i.e. all elements), but this gives me
>>>>> another idea, also.
>>>>>
>>>>> And the output from the first try:
>>>>>
>>>>> Array
>>>>> (
>>>>> [0] => Array
>>>>> (
>>>>> [0] =>
>>>>> [1] =>
>>>>> [2] =>
>>>>> )
>>>>>
>>>>> [1] => Array
>>>>> (
>>>>> [0] => select n
>>>>> [1] => select n
>>>>> [2] => select n
>>>>> )
>>>>>
>>>>> [2] => Array
>>>>> (
>>>>> [0] =>
>>>>> [1] =>
>>>>> [2] =>
>>>>> )
>>>>>
>>>>> [3] => Array
>>>>> (
>>>>> [0] =>
>>>>> [1] =>
>>>>> [2] =>
>>>>> )
>>>>>
>>>>> [4] => Array
>>>>> (
>>>>> [0] =>
>>>>> [1] =>
>>>>> [2] =>
>>>>> )
>>>>>
>>>>> )
>>>> well, that's no so good a start! i'll break out the old regex ide and
>>>> fix that...if you want.
>>> If you have the time, I would appreciate it. Otherwise I can struggle
>>> through this myself :-)
>>
>> ok, here's the one to get the select:
>>
>> (select)\s*?[^n].*?(name)\s*?=\s*?(?:\'|")?([^\3>]*)?\3?\s*? [^>]
>>
>> here's the one to break out the inputs and capture each type, name, and
>> value:
>>
>> (input)\s*?[^n].*?(?:(name|type|value)
On 15 Oct, 22:55, "Steve" wrote:
> "Paul Lautman" wrote in message
>
> news:5nhpj9Fi9q28U1@mid.individual.net...
>
>
>
>
>
> > Steve wrote:
> >> "Jerry Stuckle" wrote in message
> >> ok, here's the one to get the select:
>
> >> (select)\s*?[^n].*?(name)\s*?=3D\s*?(?:\'|")?([^\3>]*)?\3?\s *?[^>]
>
> >> here's the one to break out the inputs and capture each type, name,
> >> and value:
>
> >> (input)\s*?[^n].*?(?:(name|type|value)\s*?=3D\s*?(?:'|")?([^ \2>]*?)\2?=
(?:\s)?=AD)*?>
>
> > Strange, when I tried to find this thread in OE instead of on Google, it
> > was up before a 2007-06-22 post instead of at the bottom of the list.
> > Wierd.
>
> > I have to say that, on the occations when I ask for help on a REGEX and
> > the answer comes back from someone far more expert than me (doesn't take
> > much), looking as horrendous as Steve's solution (no offence meant), I
> > feel sort of vindicated. One always suspects that one's problem should =
be
> > solvable with a really short clever bit of regex. One forgets that the
> > really clever stuff looks like Steve's solution!
>
> keep in mind, please, that regex is another language in itself and is ful=
ly
> capable of the same indentation and commenting as any other language. wer=
e i
> to have known i was going to be graded, i may have done both.
>
> if you are uncomfortable using regex, don't. as for the rest of us who
> understand it, it is an irreplaceable tool.- Hide quoted text -
>
> - Show quoted text -
Read again Steve. I was complimenting you on your skill, not deriding
your style.
Re: Regex help
am 16.10.2007 14:38:01 von Steve
"Captain Paralytic" wrote in message
news:1192526378.172236.167230@y27g2000pre.googlegroups.com.. .
On 15 Oct, 22:55, "Steve" wrote:
> "Paul Lautman" wrote in message
>
> news:5nhpj9Fi9q28U1@mid.individual.net...
>
>
>
>
>
> > Steve wrote:
> >> "Jerry Stuckle" wrote in message
> >> ok, here's the one to get the select:
>
> >> (select)\s*?[^n].*?(name)\s*?=\s*?(?:\'|")?([^\3>]*)?\3?\s*? [^>]
>
> >> here's the one to break out the inputs and capture each type, name,
> >> and value:
>
> >> (input)\s*?[^n].*?(?:(name|type|value)\s*?=\s*?(?:'|")?([^\2 >]*?)\2?(?:\s)?)*?>
>
> > Strange, when I tried to find this thread in OE instead of on Google, it
> > was up before a 2007-06-22 post instead of at the bottom of the list.
> > Wierd.
>
> > I have to say that, on the occations when I ask for help on a REGEX and
> > the answer comes back from someone far more expert than me (doesn't take
> > much), looking as horrendous as Steve's solution (no offence meant), I
> > feel sort of vindicated. One always suspects that one's problem should
> > be
> > solvable with a really short clever bit of regex. One forgets that the
> > really clever stuff looks like Steve's solution!
>
> keep in mind, please, that regex is another language in itself and is
> fully
> capable of the same indentation and commenting as any other language. were
> i
> to have known i was going to be graded, i may have done both.
>
> if you are uncomfortable using regex, don't. as for the rest of us who
> understand it, it is an irreplaceable tool.- Hide quoted text -
>
> - Show quoted text -
Read again Steve. I was complimenting you on your skill, not deriding
your style.
i know paul. there was no venom in my response either. just a nod that while
regex can be daunting, tasks can be accomplished in other ways if it doesn't
suit. anyway, i didn't take offense...hope you didn't either.
cheers
Re: Regex help
am 16.10.2007 14:43:56 von Jerry Stuckle
Steve wrote:
> "Jerry Stuckle" wrote in message
> news:bbudnS3Qb-29T47anZ2dnUVZ_uzinZ2d@comcast.com...
>> Steve wrote:
>>> "Jerry Stuckle" wrote in message
>>> news:u9WdnU2yhZ2Q5o7anZ2dnUVZ_trinZ2d@comcast.com...
>>>> Steve wrote:
>>>>> "Jerry Stuckle" wrote in message
>>>>> news:K-qdnTSkY4NaoI7anZ2dnUVZ_j6dnZ2d@comcast.com...
>>>>>> Steve wrote:
>>>>>>> "Jerry Stuckle" wrote in message
>>>>>>> news:KaadnQnnGt0WT4_anZ2dnUVZ_tajnZ2d@comcast.com...
>>>>>>>> OK, I give up here. I am DEFINITELY not a Regex expert, and have
>>>>>>>> been working on this for hours with no luck.
>>>>>>>>
>>>>>>>> Basically I need to parse a page for certain information which will
>>>>>>>> be fed back into CURL to post to a site. I need to find four types
>>>>>>>> of tags on the page:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I don't need any other tags.
>>>>>>>>
>>>>>>>> From the hidden and submit types, I need name and value. From the
>>>>>>>> text and select types, I just need the name.
>>>>>>>>
>>>>>>>> I can assume the attributes will always show up in this order, but
>>>>>>>> there may be other things between the < and > delimiters.
>>>>>>>> Additionally, the actual type and name may have single or double
>>>>>>>> quotes around them, or neither.
>>>>>>>>
>>>>>>>> Does anyone have some code for this? It doesn't have to be all one
>>>>>>>> regex.
>>>>>>> alright, jer. let's see what we can do...
>>>>>>>
>>>>>>> here's an eyeballed attempt:
>>>>>>>
>>>>>>> <(select\s?[^> ].*?)|(input\s[^t]*?type\s*?=\s?('|"|\s)(hidden|text|submit) \3[^>].*?)>
>>>>>>>
>>>>>>> to keep it easier, i'd think about using that to get your general
>>>>>>> matches. iterating through those, i'd apply another regex to break
>>>>>>> out the name, type, and value. you could very well catch it all in
>>>>>>> the above, however, it's not as straightforward and hence, not easily
>>>>>>> maintained. if you need additional help on writing this, let me know.
>>>>>>> i'll psuedo-code the whole enchillada if you want. this should be
>>>>>>> sufficient in getting only those tags you listed above...which is a
>>>>>>> good start.
>>>>>>>
>>>>>>> btw, make the seach caseINsensitive.
>>>>>> Hi, Steve,
>>>>>>
>>>>>> Yep, it's a start. Some problems (output below), but I think it will
>>>>>> get me a little farther.
>>>>>>
>>>>>> And you're right, I already gave up on getting everything in one pass.
>>>>>> I was thinking of trying to just get everything for a single element
>>>>>> type (i.e. all elements), but this gives me
>>>>>> another idea, also.
>>>>>>
>>>>>> And the output from the first try:
>>>>>>
>>>>>> Array
>>>>>> (
>>>>>> [0] => Array
>>>>>> (
>>>>>> [0] =>
>>>>>> [1] =>
>>>>>> [2] =>
>>>>>> )
>>>>>>
>>>>>> [1] => Array
>>>>>> (
>>>>>> [0] => select n
>>>>>> [1] => select n
>>>>>> [2] => select n
>>>>>> )
>>>>>>
>>>>>> [2] => Array
>>>>>> (
>>>>>> [0] =>
>>>>>> [1] =>
>>>>>> [2] =>
>>>>>> )
>>>>>>
>>>>>> [3] => Array
>>>>>> (
>>>>>> [0] =>
>>>>>> [1] =>
>>>>>> [2] =>
>>>>>> )
>>>>>>
>>>>>> [4] => Array
>>>>>> (
>>>>>> [0] =>
>>>>>> [1] =>
>>>>>> [2] =>
>>>>>> )
>>>>>>
>>>>>> )
>>>>> well, that's no so good a start! i'll break out the old regex ide and
>>>>> fix that...if you want.
>>>> If you have the time, I would appreciate it. Otherwise I can struggle
>>>> through this myself :-)
>>> ok, here's the one to get the select:
>>>
>>> (select)\s*?[^n].*?(name)\s*?=\s*?(?:\'|")?([^\3>]*)?\3?\s*? [^>]
>>>
>>> here's the one to break out the inputs and capture each type, name, and
>>> value:
>>>
>>> (input)\s*?[^n].*?(?:(name|type|value)
>
> hey...did you notice this above? it should be [^ntv]
>
> they may account for some of the wierdness. ;^)
>
>
>
"Jerry Stuckle" wrote in message
news:NZqdncA6nNuKL4nanZ2dnUVZ_h_inZ2d@comcast.com...
> Steve wrote:
>> "Jerry Stuckle" wrote in message
>> news:bbudnS3Qb-29T47anZ2dnUVZ_uzinZ2d@comcast.com...
>>> Steve wrote:
>>>> "Jerry Stuckle" wrote in message
>>>> news:u9WdnU2yhZ2Q5o7anZ2dnUVZ_trinZ2d@comcast.com...
>>>>> Steve wrote:
>>>>>> "Jerry Stuckle" wrote in message
>>>>>> news:K-qdnTSkY4NaoI7anZ2dnUVZ_j6dnZ2d@comcast.com...
>>>>>>> Steve wrote:
>>>>>>>> "Jerry Stuckle" wrote in message
>>>>>>>> news:KaadnQnnGt0WT4_anZ2dnUVZ_tajnZ2d@comcast.com...
>>>>>>>>> OK, I give up here. I am DEFINITELY not a Regex expert, and have
>>>>>>>>> been working on this for hours with no luck.
>>>>>>>>>
>>>>>>>>> Basically I need to parse a page for certain information which
>>>>>>>>> will be fed back into CURL to post to a site. I need to find four
>>>>>>>>> types of tags on the page:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I don't need any other tags.
>>>>>>>>>
>>>>>>>>> From the hidden and submit types, I need name and value. From the
>>>>>>>>> text and select types, I just need the name.
>>>>>>>>>
>>>>>>>>> I can assume the attributes will always show up in this order, but
>>>>>>>>> there may be other things between the < and > delimiters.
>>>>>>>>> Additionally, the actual type and name may have single or double
>>>>>>>>> quotes around them, or neither.
>>>>>>>>>
>>>>>>>>> Does anyone have some code for this? It doesn't have to be all
>>>>>>>>> one regex.
>>>>>>>> alright, jer. let's see what we can do...
>>>>>>>>
>>>>>>>> here's an eyeballed attempt:
>>>>>>>>
>>>>>>>> <(select\s?[^> ].*?)|(input\s[^t]*?type\s*?=\s?('|"|\s)(hidden|text|submit) \3[^>].*?)>
>>>>>>>>
>>>>>>>> to keep it easier, i'd think about using that to get your general
>>>>>>>> matches. iterating through those, i'd apply another regex to break
>>>>>>>> out the name, type, and value. you could very well catch it all in
>>>>>>>> the above, however, it's not as straightforward and hence, not
>>>>>>>> easily maintained. if you need additional help on writing this, let
>>>>>>>> me know. i'll psuedo-code the whole enchillada if you want. this
>>>>>>>> should be sufficient in getting only those tags you listed
>>>>>>>> above...which is a good start.
>>>>>>>>
>>>>>>>> btw, make the seach caseINsensitive.
>>>>>>> Hi, Steve,
>>>>>>>
>>>>>>> Yep, it's a start. Some problems (output below), but I think it
>>>>>>> will get me a little farther.
>>>>>>>
>>>>>>> And you're right, I already gave up on getting everything in one
>>>>>>> pass. I was thinking of trying to just get everything for a single
>>>>>>> element type (i.e. all elements), but this
>>>>>>> gives me another idea, also.
>>>>>>>
>>>>>>> And the output from the first try:
>>>>>>>
>>>>>>> Array
>>>>>>> (
>>>>>>> [0] => Array
>>>>>>> (
>>>>>>> [0] =>
>>>>>>> [1] =>
>>>>>>> [2] =>
>>>>>>> )
>>>>>>>
>>>>>>> [1] => Array
>>>>>>> (
>>>>>>> [0] => select n
>>>>>>> [1] => select n
>>>>>>> [2] => select n
>>>>>>> )
>>>>>>>
>>>>>>> [2] => Array
>>>>>>> (
>>>>>>> [0] =>
>>>>>>> [1] =>
>>>>>>> [2] =>
>>>>>>> )
>>>>>>>
>>>>>>> [3] => Array
>>>>>>> (
>>>>>>> [0] =>
>>>>>>> [1] =>
>>>>>>> [2] =>
>>>>>>> )
>>>>>>>
>>>>>>> [4] => Array
>>>>>>> (
>>>>>>> [0] =>
>>>>>>> [1] =>
>>>>>>> [2] =>
>>>>>>> )
>>>>>>>
>>>>>>> )
>>>>>> well, that's no so good a start! i'll break out the old regex ide and
>>>>>> fix that...if you want.
>>>>> If you have the time, I would appreciate it. Otherwise I can struggle
>>>>> through this myself :-)
>>>> ok, here's the one to get the select:
>>>>
>>>> (select)\s*?[^n].*?(name)\s*?=\s*?(?:\'|")?([^\3>]*)?\3?\s*? [^>]
>>>>
>>>> here's the one to break out the inputs and capture each type, name, and
>>>> value:
>>>>
>>>> (input)\s*?[^n].*?(?:(name|type|value)
>>
>> hey...did you notice this above? it should be [^ntv]
>>
>> they may account for some of the wierdness. ;^)
>
> Yep, and I got it working. Thanks again!