Character Entity References

Character Entity References

am 30.03.2008 16:50:16 von tooheys

We had a consultant write some PHP software for us which in most
respects is working, but we have one issue we are having trouble
comprehending.

If the following company name "Jones & Jones" were entered into this
software, it appears in our database as {Jones & Jones} (brackets
mine).

How do we characterize this and what is {&} called.

Thanks

Frank

Re: Character Entity References

am 30.03.2008 16:59:32 von Michael Fesser

..oO(ft310)

>We had a consultant write some PHP software for us which in most
>respects is working, but we have one issue we are having trouble
>comprehending.
>
>If the following company name "Jones & Jones" were entered into this
>software, it appears in our database as {Jones & Jones} (brackets
>mine).
>
>How do we characterize this and what is {&} called.

The value in the DB should be "Jones & Jones" and nothing else. The
escaping is taking place only when you output the name to an HTML page.

Actually the & is called a named character reference (entity). Some
characters like the ampersand '&' for example have a special meaning in
HTML, which is why you have to write them as character references if you
want to output them as a literal string.

But what's the problem or the question?

Micha

Re: Character Entity References

am 30.03.2008 17:19:34 von Michael Fesser

..oO(Jerry Stuckle)

>It sounds like his problem is the consultant put the entity into the
>database instead of an ampersand.

Yep, I was about to say something like that (started with "incompetent"
and ended on "consultant"), but removed that line before posting. ;-)

But it shouldn't be too difficult to fix. The data can be fixed within
seconds on the DBMS command line with an UPDATE query and the output
scripts just need an htmlspecialchars() call here and there ...

Micha

Re: Character Entity References

am 30.03.2008 18:03:14 von George Maicovschi

The problem starting with escaping the input data using htmlentities()
and from my point of view, escaping data before it goes to the DB is a
rather good thing not a bad one.

If the data displays right in the output of the script no worries
there, he decoded it with html_decode_entities().

Why do you guys say it's a lousy consultant because he escaped the
input? Should he have just made the insert with whatever data came to
him? I would like to hear a strong point of view on this matter, since
escaping inputs is in my opinion (as well in many other devs' opinion)
a very good programming practice and a must.

Re: Character Entity References

am 30.03.2008 18:09:44 von Jerry Stuckle

Michael Fesser wrote:
> .oO(ft310)
>
>> We had a consultant write some PHP software for us which in most
>> respects is working, but we have one issue we are having trouble
>> comprehending.
>>
>> If the following company name "Jones & Jones" were entered into this
>> software, it appears in our database as {Jones & Jones} (brackets
>> mine).
>>
>> How do we characterize this and what is {&} called.
>
> The value in the DB should be "Jones & Jones" and nothing else. The
> escaping is taking place only when you output the name to an HTML page.
>
> Actually the & is called a named character reference (entity). Some
> characters like the ampersand '&' for example have a special meaning in
> HTML, which is why you have to write them as character references if you
> want to output them as a literal string.
>
> But what's the problem or the question?
>
> Micha
>

Micha,

It sounds like his problem is the consultant put the entity into the
database instead of an ampersand.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: Character Entity References

am 30.03.2008 18:27:50 von Michael Fesser

..oO(George Maicovschi)

>The problem starting with escaping the input data using htmlentities()
>and from my point of view, escaping data before it goes to the DB is a
>rather good thing not a bad one.

Escaping yes, but not in this way. Data in a DB should never be stored
in an output-specific or media-dependent encoding, but in a raw format.
Pure data, nothing else. Just think about things like

* output to something else than HTML, for example a PDF or a plain text
newsletter
* a fulltext search

Both tasks will be almost impossible or at least much more complicated
with HTML data in the DB, but pretty easy to do with raw data.

>If the data displays right in the output of the script no worries
>there, he decoded it with html_decode_entities().

There's nothing to decode, but to _encode_ if - and only if - necessary.
The current encoding is _not_ necessary.

>Why do you guys say it's a lousy consultant because he escaped the
>input?

Because it's simply wrong and just shows that the consultant obviously
didn't really understand what escapaing is for and where it has to be
used. Currently it's just the wrong method at the wrong place.

>Should he have just made the insert with whatever data came to
>him? I would like to hear a strong point of view on this matter, since
>escaping inputs is in my opinion (as well in many other devs' opinion)
>a very good programming practice and a must.

The escaping in this case doesn't prevent anything, but causes new
problems. Proper escaping for data that goes into a DB has to be done
with functions like mysql_real_escape_string() or prepared statements,
not with an HTML output(!) function.

Micha

Re: Character Entity References

am 30.03.2008 18:33:49 von Jerry Stuckle

Michael Fesser wrote:
> .oO(Jerry Stuckle)
>
>> It sounds like his problem is the consultant put the entity into the
>> database instead of an ampersand.
>
> Yep, I was about to say something like that (started with "incompetent"
> and ended on "consultant"), but removed that line before posting. ;-)
>
> But it shouldn't be too difficult to fix. The data can be fixed within
> seconds on the DBMS command line with an UPDATE query and the output
> scripts just need an htmlspecialchars() call here and there ...
>
> Micha
>

Yep, but then he needs to get another consultant in to fix to code so it
doesn't continue to happen.

And if it's in the database incorrectly, there's probably more code
required to display it correctly on a web page.

BTW - part of the problem is there are too many incompetent
"consultants" preying on innocent clients who don't know how to get a
good consultant.


--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: Character Entity References

am 30.03.2008 18:58:01 von lingoboyd

Michael Fesser posted in comp.lang.php:

> .oO(George Maicovschi)
>
>>The problem starting with escaping the input data using htmlentities()
>>and from my point of view, escaping data before it goes to the DB is a
>>rather good thing not a bad one.
>
> Escaping yes, but not in this way. Data in a DB should never be stored
> in an output-specific or media-dependent encoding, but in a raw format.
> Pure data, nothing else. Just think about things like
>
> * output to something else than HTML, for example a PDF or a plain text
> newsletter
> * a fulltext search
>
> Both tasks will be almost impossible or at least much more complicated
> with HTML data in the DB, but pretty easy to do with raw data.

You, Jerry, and others espouse this idea and I certainly understand the
merits. But it leaves me with a question.

How do you deal with display data that may be required in both HTML and/or
PDF? ie: italic word(s) within the data.

My current solution is storing the tags in the DB, but I don't really
like it for the very reasons you stated.


--
Mark A. Boyd
Keep-On-Learnin' :)

Re: Character Entity References

am 30.03.2008 19:09:25 von Jerry Stuckle

George Maicovschi wrote:
> The problem starting with escaping the input data using htmlentities()
> and from my point of view, escaping data before it goes to the DB is a
> rather good thing not a bad one.
>

Definitely NOT. htmlentities() is a display attribute, and has no
business in a database.

Do you work for Jones & Jones?

> If the data displays right in the output of the script no worries
> there, he decoded it with html_decode_entities().
>

Not necessary if it's not encoded in the first place.

> Why do you guys say it's a lousy consultant because he escaped the
> input? Should he have just made the insert with whatever data came to
> him? I would like to hear a strong point of view on this matter, since
> escaping inputs is in my opinion (as well in many other devs' opinion)
> a very good programming practice and a must.
>

Because someone who does that does not understand programming and
databases and is totally incompetent.

What goes in the database is DATA. It should NEVER be mixed with
display-specific attributes.


--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: Character Entity References

am 30.03.2008 19:51:07 von George Maicovschi

On Mar 30, 7:27 pm, Michael Fesser wrote:
> .oO(George Maicovschi)
>
> >The problem starting with escaping the input data using htmlentities()
> >and from my point of view, escaping data before it goes to the DB is a
> >rather good thing not a bad one.
>
> Escaping yes, but not in this way. Data in a DB should never be stored
> in an output-specific or media-dependent encoding, but in a raw format.
> Pure data, nothing else. Just think about things like
>
> * output to something else than HTML, for example a PDF or a plain text
> newsletter
> * a fulltext search
>
> Both tasks will be almost impossible or at least much more complicated
> with HTML data in the DB, but pretty easy to do with raw data.
>
> >If the data displays right in the output of the script no worries
> >there, he decoded it with html_decode_entities().
>
> There's nothing to decode, but to _encode_ if - and only if - necessary.
> The current encoding is _not_ necessary.
>
> >Why do you guys say it's a lousy consultant because he escaped the
> >input?
>
> Because it's simply wrong and just shows that the consultant obviously
> didn't really understand what escapaing is for and where it has to be
> used. Currently it's just the wrong method at the wrong place.
>
> >Should he have just made the insert with whatever data came to
> >him? I would like to hear a strong point of view on this matter, since
> >escaping inputs is in my opinion (as well in many other devs' opinion)
> >a very good programming practice and a must.
>
> The escaping in this case doesn't prevent anything, but causes new
> problems. Proper escaping for data that goes into a DB has to be done
> with functions like mysql_real_escape_string() or prepared statements,
> not with an HTML output(!) function.
>
> Micha

You're kinda right here, I jumped to conclusions. But if you use it
only for outputing to HTML then maybe htmlentities is a good way to
go....I use it combined with others techniques in one of my projects
and there haven't been any problems so far.

Re: Character Entity References

am 30.03.2008 21:32:19 von Alexey Kulentsov

Mark A. Boyd wrote:
> How do you deal with display data that may be required in both HTML and/or
> PDF? ie: italic word(s) within the data.
>
> My current solution is storing the tags in the DB, but I don't really
> like it for the very reasons you stated.

Data must be in native form for every media passed. If media have no
native form for this type of data you have to develop it or better use
already existing form. Ampersand have native form '&' in SQL and '&'
in HTML so you need to convert it when change media. But 'Italic text
fragment' have no native form in SQL so you can use existing form
'...' or '...' and convert it only when transferring
data PDF <-> HTML or database, it's O.k.

Re: Character Entity References

am 30.03.2008 23:52:52 von Jerry Stuckle

Mark A. Boyd wrote:
> Michael Fesser posted in comp.lang.php:
>
>> .oO(George Maicovschi)
>>
>>> The problem starting with escaping the input data using htmlentities()
>>> and from my point of view, escaping data before it goes to the DB is a
>>> rather good thing not a bad one.
>> Escaping yes, but not in this way. Data in a DB should never be stored
>> in an output-specific or media-dependent encoding, but in a raw format.
>> Pure data, nothing else. Just think about things like
>>
>> * output to something else than HTML, for example a PDF or a plain text
>> newsletter
>> * a fulltext search
>>
>> Both tasks will be almost impossible or at least much more complicated
>> with HTML data in the DB, but pretty easy to do with raw data.
>
> You, Jerry, and others espouse this idea and I certainly understand the
> merits. But it leaves me with a question.
>
> How do you deal with display data that may be required in both HTML and/or
> PDF? ie: italic word(s) within the data.
>
> My current solution is storing the tags in the DB, but I don't really
> like it for the very reasons you stated.
>
>

That's a bit more difficult, because you're talking about font
information vs. character encoding.

The problem here is also related to searching - for instance, if you have:

'John and Mary hosted a New Year's Eve party for their friends.'

Now - is someone ever going to want to search on "John and Mary hosted a
New Year's Eve party"? If so, they'll never find it in a the database
because of the embedded .

However, in this case there is no really good answer. You can put the
in the text as above. You could have multiple rows, each with
it's own font information in a separate column (no embedded font info,
but still can't easily search for phrases). You can have all of the
font column as above and have a second column for searching with
"sanitized" text (worst case, IMHO).

Depending on the overall needs, I'll generally pick one of the first
two. But neither is ideal.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: Character Entity References

am 31.03.2008 11:38:18 von Erwin Moller

Jerry Stuckle schreef:
> George Maicovschi wrote:
>> The problem starting with escaping the input data using htmlentities()
>> and from my point of view, escaping data before it goes to the DB is a
>> rather good thing not a bad one.
>>
>
> Definitely NOT. htmlentities() is a display attribute, and has no
> business in a database.
>
> Do you work for Jones & Jones?

Do you work for Jones & Jones?
Lol. :-) :-)
Thanks for a good laugh at the start of the week. ;-)

Erwin