Excluir palabras con barra inclinada en Java regexp

I'm trying to allow only certain words through a regexp filter in Java, i.e.:

Pattern p = Pattern.compile("^[a-zA-Z0-9\\s\\.-_]{1," + s.length() + "}$");

But I find that it allows through 140km/h because forward slash isn't handled. Ideally, this word should not be allowed.

Can anyone suggest a fix to my current version?

I'm new to regexp and don't particularly follow it fully yet.

The regexp is in a utils class method as follows:

public static boolean checkStringAlphaNumericChars(String s) {
   s = s.trim();
   if ((s == null) || (s.equals(""))) {
        return false;
   }

   Pattern p = Pattern.compile("^[a-zA-Z0-9\\s\\.-_]{1," + s.length() + "}$");
   // Pattern p = Pattern.compile("^[a-zA-Z0-9_\\s]{1," + s.length() + "}");
   Matcher m = p.matcher(s);
   if (m.matches()) {
       return true;
   }
   else {
       return false;
   }
}

I want to allow strings with underscore, space, period, minus. And to ensure that strings with alpha numerics like 123.45 or -500.00 are accepted but where 5,000.00 is not.

preguntado el 27 de agosto de 11 a las 15:08

There's really no need for this: {1," + s.length() + "} -

So what might replace it to guarantee the each character of a string is correctly parsed? -

What are you escaping the dot? What aren’t you using \w? What are you specifying {1,? Why are you using the range of all code points from dot through underscore to specify those 49 code points? Why are you using the number of code unidades to specify code puntos? What do you do when those numbers mismatch? &c&c&c&c&c&die! What are you trying to do in plain English, since we’ll never figure it out from your pattrern? -

NullUserException is right, and you can replace it with the Kleene operator '+'. Since your lower bound is currently 1, I'm assuming you don't want to allow zero-length strings (which you could do by using '*' instead). -

Also, you don't need the '^' and '$' when you use the match() method. It tests whether the input is completely matched by the regex. Using the line markers is useful when scanning progressively through a string with find(). -

2 Respuestas

Is it because the hyphen is second-to-last in your character set and is therefore defining a range from the '.' de las personas acusadas injustamente llamadas '_', Que incluye '/'?

Prueba esto:

Pattern p = Pattern.compile("^[a-zA-Z0-9\\s\\._-]$");

Also, NullUserException is right in that there is no need for {1," + s.length() + "}. The fact you start your expression with '^' y terminarlo con '$' will ensure that the entire string is consumed.

Finally, you can make use of \w como un sustituto de [a-zA-Z_0-9], simplifying your expression to "^[\\w\\s\\.-]$"

Respondido 27 ago 11, 20:08

I find it odd that if I remove the {1," + s.length() + "}, previously valid strings are now rejecting. - Señor morgan

Si encuentras un [^\w\s.-], it invalidates the string. - tchrist

@Mr Morgan - Don't simply remove it, replace it with +. - erickson

Puedes usar

public static boolean checkStringAlphaNumericChars(String s) { 
    return (s != null) && s.matches("[\\w\\s.-]+"); 
}
  • The short-circuited null check ensures s no es null when you try to do .matches() en ella.
  • Usar \w to look for alphanumerics plus the underscore. tchrist will also be the first to point out this is more correct than [A-Za-z0-9_]
  • Al + at the very end ensures you have at least one character (ie: the string is not empty)
  • No hay necesidad de usar ^ y $ desde .matches() tries to match the pattern against the whole string .
  • There's also no need to escape the dot (.) in a character class.

Nueva demostración: http://ideone.com/qraob

Respondido 27 ago 11, 20:08

This is good. But could it be extended to include double barrelled names like fitzwilliam-smythe or 5,000.00 as 5000.00? - Señor morgan

@Mr This already matches fitzwilliam-smythe, and you can just include the comma in the character class (eg: [\\w\\s.,-]+) if you want to allow commas. - NullUserException

To check whether a comma is being used as a thousands separator and only then accept it will add unnecessary complexity to the regex IMO, but it can be done. - NullUserException

Thanks NullUserException. I hate regexp though! - Señor morgan

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.