Generar HTML de forma segura usando PHP

I used stackoverflow to find solution to my problems, so I didn't need to post a question so long. I search for a way to output HTML code but as many of you answered HTMLPurifier is the best solution around.

I find it hard to believe that this is the only way, like isn't supposed that PHP thought on how to clean the input from XSS attacks but still output data?

Htmlentities, htmlspecialchars, strip_tags are not the best candidates for this.

So, the question is: What is?

What I am trying to do is to output user's HTML data from MYSQL safely.

preguntado el 28 de agosto de 11 a las 02:08

@afuzzyllama : What I am trying to do is to output user's HTML data from MYSQL safely. -

Define "safely". You mean you want to clean it of certain tags? You want to escapar ¿eso? -

Typically, you should sufficiently sanitize datos de entrada datos en lugar de salida datos. -

@adlawson and how to sanitize input data with tags like <script>? -

strip_tags('<script>alert("Some very naughty script injection")</script>'). -

1 Respuestas

htmlentities works just fine in many cases. However, I believe the best method to prevent things like XSS is whitelisting acceptable characters. For example:

A person's name can have uppercase and lowercase letters, spaces, hyphens, and possibly apostrophes. So full names inputted into your system must match the regex /^[a-z'- ]+$/i.
Ejemplos: Henry Smith, John O'Neil, Heather Fischer-Gardener.

An email can contain the characters uppercase and lowercase A-Z, numbers, pluses, dashes, periods, and the at symbol. So the regex for the email would be: /[a-z0-9-.+]@[a-z0-9-.]+/i.

You can expand this to fit any data input. Just think about what characters could be typed. The best part about this system, is that you can allow inputs that match the regexes and record inputs that don't. You can look at the log of blocked inputs and see if you need to adjust regexes to allow valid characters or block users attempted to circumvent your security measures.

Respondido 28 ago 11, 07:08

That's only for the extraordinarily narrow situation when you expect pure English input. "A person's name can have only alphanumeric characters + spaces, hyphen and apostrophe"? Really? - deceze ♦

Your email regex would be true for +@9. No validate email with regex.… - Adlawson

@deceze This isn't intended for broad inputs. Limited inputs (usually textfields) where only certain characters are acceptable. And no, I'm sure there are more characters that can be used in valid names. The beauty of this system, is it logs what it doesn't accept and you can teach it more valid inputs. For example, if a user tried a name with diacritics, you would learn that they needed to be added as valid input. - Bailey Parker

@adlawson I wasn't intending for my email regex to be used in a real situation. There are better regexes out there for validating email, but as you mention they might not be the way to go. You could use filter_var() y FILTER_VALIDATE_EMAIL. - Bailey Parker

@ExoVillaro Well that makes things slightly more complicated. Malicious JS could be embedded in a script tag or any of the inline DOM events on attributes. You could use an XML parser and remove any script tag and any on attributes (such as onload, onclick, onkeyup) from user input. However, if you just want to allow the user to input formatted text (bold, italics, etc) consider using a format system (BBCode or maybe something similar to what SO uses). This way the user won't be entering actual HTML that could contain XSS or other bad things. - Bailey Parker

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.