Javascript regex para eliminar caracteres ilegales de DOM ID

I have a number of DOM elements being dynamically created on a web page. Their IDs are generated from an external list and sometimes these names may contain illegal characters for an ID like "@" or "&".

I need to remove chracters that do not match the following rules:

  • The string must begin with a letter
  • The first character may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".")

So, if the original string is:

99% of People are not the 1%

Then the resulting string with illegal characters removed would be:


Can anyone help me to write the regex in Javascript that will remove characters from a string that do not follow the above requirements?

preguntado el 09 de marzo de 12 a las 14:03

You mean it should be ofPeoplearenotthe1? -

You're absolutely correct. I've updated the question. -

6 Respuestas

var str = "99% of People are not the 1%";
str = str.replace(/^[^a-z]+|[^\w:.-]+/gi, "");

respondido 09 mar '12, 14:03

Note that IDs must also be unique. If you're removing the illegal characters to comply with standards, you will also need to maintain a list of "used" IDs, so that you can avoid collisions. - Matt

hello...can you provide the C# version of this regular expression please...?? - umair.ali

@umair.ali, it would be pretty much the same, and could be quoted like so: @"(?i:^[^a-z]+|[^\w:.-]+)" - Qimpuesto

This does not appear to removing periods from the ID? Not sure if this is invalid as per HTML spec; however it does prevent JQuery from accessing the element using the ID selector. I ended up using this str.replace(/^[^a-z]+|[^\w]+/gi, "") - jeffryhouser

The accepted answer uses the i flag when it is not really needed and may unnecessarily increase the regex run time. A more specific (and thus more efficient) regex would be: str = str.replace(/^[^a-zA-Z]+|[^\w:.-]+/g, ""); - Nadav

The HTML5 specification has been updated and according to id attributes can now contain literally any character for their value excepto espacio en blanco.

When specified on HTML elements, the id attribute value must be unique amongst all the IDs in the element's tree and must contain at least one character. The value must not contain any ASCII whitespace.

I'm not sure at which point elements could be assigned two id attributes nor what logical objective reasoning for it (perhaps the less matured comprehension at the time) though that has been nixed from the standard however that has been common knowledge in the web development community for years now.

Respondido 18 Feb 17, 18:02

I think the "uniqueness" mentioned in the spec is not about a possible assignment of two IDs to one element. But the requirement of the ID to be unique within the DOM tree. So that it can serve it's main purpose: helping with identifying and referencing elements. In most cases classes would be enough for that (and are mostly more flexible). But one example where IDs are still needed is when connecting form field inputs with labels via the label's "for" attribute: <input id="myUniqueId" /><label for="myUniqueId" /> - mwld

var id = "99% of People are not the 1%";
id = id.replace(/[^a-z0-9\-_:\.]|^[^a-z]+/gi, "");


The idea is to replace one or more non alpha characters at the beginning and then replace all other illegal characters in the remaining part of the string.

One might ask what is the point of even having an id that is not known ahead of time and is dynamically generated based on content. You can't very well use it in CSS if it's based on some content that can change.

respondido 09 mar '12, 14:03

This will output "9ofPeoplearenotthe1", IDs can't start with a number. - Cohete Hazmat

@Rocket - you are too quick. It was already edited to correct that even before you posted your comment. - amigo00

I am really quick today, probably got something to do with all the caffeine. - Cohete Hazmat

@umair.ali - sorry, don't know c# - amigo00

If anybody need this in Java:

    if(! htmlId.matches("^[A-Za-z0-9]+[\\w\\-\\:\\.]*$")){
        LOG.warn("html id "+htmlId+" is not valid, have to remove all invalid chars");

        htmlId = htmlId.replaceAll("[^^A-Za-z0-9\\w\\-\\:\\.]+", "");

In my case I checked the String and replaced all invalid with blank. Thanks to Qtax.

Respondido el 04 de enero de 16 a las 08:01

If you want something that is resistant to conflicts, try using btoa to convert into base64;

var badId1 = "99% of the 1%";
var badId2 = "999% of the 1%";
var validId1 = "ID_OTklIG9mIHRoZSAxJQ";
var validId2 = "ID_OTk5JSBvZiB0aGUgMS";

var makeId = function(text) { return "ID_" + btoa(text).slice(0,-2); };


Notice how the two IDS generate different keys, where the regex trim would not.

Respondido 18 Feb 17, 18:02

As John mentioned los HTML5 spec allows all characters for IDs except whitespaces.

That means the following RegEx (in JavaScript) would be enough to follow the HTML5 spec:

let str = "99% of People are not the 1%";
str = str.replace(/\s+/g, "");
// "99%ofPeoplearenotthe1%"

contestado el 05 de mayo de 17 a las 11:05

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.