¿El código hash de las cadenas de Java es independiente de la configuración regional?

Es Java String.hashcode () completamente independiente de Lugar? En otras palabras, si alguien manipula el valor predeterminado Locale, ¿estamos 100% seguros de que esto no afectará el código hash?

Sabemos que esos efectos manipuladores toUpperCase() y toLowerCase().

preguntado el 27 de agosto de 11 a las 23:08

@Bryan Vi esa pregunta, pero no responde al problema que estoy planteando. -

4 Respuestas

La configuración regional no afecta el código hash de la cadena (directamente). Se basa únicamente en los caracteres almacenados en String. El hashCode es generado por

char[] val;

for (int i = 0; i < len; i++) {
    h = 31*h + val[off++];
}

pero el problema es cómo se genera la Cadena. Si es, por ejemplo, el resultado de toUpperCase, que depende de Locale, obviamente la Cadena resultante depende de Locale y también lo es el hashCode.

Respondido 28 ago 11, 04:08

Buena pregunta, hice una prueba rápida y parece que cambiar la configuración regional predeterminada no cambia (afortunadamente) el código hash ...

import java.util.Locale;

public class HashCodeTester {

    public static void main(String[] args) {

        String test = "test";
        int hashCode = test.hashCode();

        System.out.println("hashcode [" + hashCode + "] - locale [" + Locale.getDefault() + "]");

        Locale[] availableLocales = Locale.getAvailableLocales();
        for(int i=0; i<availableLocales.length; i++) {          
            Locale.setDefault(availableLocales[i]);
            System.out.println("hashcode [" + test.hashCode() + "] - locale [" + Locale.getDefault() + "]");
        }

    }
}

La salida es

hashcode [3556498] - locale [en_IE]
hashcode [3556498] - locale [ja_JP]
hashcode [3556498] - locale [es_PE]
hashcode [3556498] - locale [en]
hashcode [3556498] - locale [ja_JP_JP]
hashcode [3556498] - locale [es_PA]
hashcode [3556498] - locale [sr_BA]
hashcode [3556498] - locale [mk]
hashcode [3556498] - locale [es_GT]
hashcode [3556498] - locale [ar_AE]
hashcode [3556498] - locale [no_NO]
hashcode [3556498] - locale [sq_AL]
hashcode [3556498] - locale [bg]
hashcode [3556498] - locale [ar_IQ]
hashcode [3556498] - locale [ar_YE]
hashcode [3556498] - locale [hu]
hashcode [3556498] - locale [pt_PT]
hashcode [3556498] - locale [el_CY]
hashcode [3556498] - locale [ar_QA]
hashcode [3556498] - locale [mk_MK]
hashcode [3556498] - locale [sv]
hashcode [3556498] - locale [de_CH]
hashcode [3556498] - locale [en_US]
hashcode [3556498] - locale [fi_FI]
hashcode [3556498] - locale [is]
hashcode [3556498] - locale [cs]
hashcode [3556498] - locale [en_MT]
hashcode [3556498] - locale [sl_SI]
hashcode [3556498] - locale [sk_SK]
hashcode [3556498] - locale [it]
hashcode [3556498] - locale [tr_TR]
hashcode [3556498] - locale [zh]
hashcode [3556498] - locale [th]
hashcode [3556498] - locale [ar_SA]
hashcode [3556498] - locale [no]
hashcode [3556498] - locale [en_GB]
hashcode [3556498] - locale [sr_CS]
hashcode [3556498] - locale [lt]
hashcode [3556498] - locale [ro]
hashcode [3556498] - locale [en_NZ]
hashcode [3556498] - locale [no_NO_NY]
hashcode [3556498] - locale [lt_LT]
hashcode [3556498] - locale [es_NI]
hashcode [3556498] - locale [nl]
hashcode [3556498] - locale [ga_IE]
hashcode [3556498] - locale [fr_BE]
hashcode [3556498] - locale [es_ES]
hashcode [3556498] - locale [ar_LB]
hashcode [3556498] - locale [ko]
hashcode [3556498] - locale [fr_CA]
hashcode [3556498] - locale [et_EE]
hashcode [3556498] - locale [ar_KW]
hashcode [3556498] - locale [sr_RS]
hashcode [3556498] - locale [es_US]
hashcode [3556498] - locale [es_MX]
hashcode [3556498] - locale [ar_SD]
hashcode [3556498] - locale [in_ID]
hashcode [3556498] - locale [ru]
hashcode [3556498] - locale [lv]
hashcode [3556498] - locale [es_UY]
hashcode [3556498] - locale [lv_LV]
hashcode [3556498] - locale [iw]
hashcode [3556498] - locale [pt_BR]
hashcode [3556498] - locale [ar_SY]
hashcode [3556498] - locale [hr]
hashcode [3556498] - locale [et]
hashcode [3556498] - locale [es_DO]
hashcode [3556498] - locale [fr_CH]
hashcode [3556498] - locale [hi_IN]
hashcode [3556498] - locale [es_VE]
hashcode [3556498] - locale [ar_BH]
hashcode [3556498] - locale [en_PH]
hashcode [3556498] - locale [ar_TN]
hashcode [3556498] - locale [fi]
hashcode [3556498] - locale [de_AT]
hashcode [3556498] - locale [es]
hashcode [3556498] - locale [nl_NL]
hashcode [3556498] - locale [es_EC]
hashcode [3556498] - locale [zh_TW]
hashcode [3556498] - locale [ar_JO]
hashcode [3556498] - locale [be]
hashcode [3556498] - locale [is_IS]
hashcode [3556498] - locale [es_CO]
hashcode [3556498] - locale [es_CR]
hashcode [3556498] - locale [es_CL]
hashcode [3556498] - locale [ar_EG]
hashcode [3556498] - locale [en_ZA]
hashcode [3556498] - locale [th_TH]
hashcode [3556498] - locale [el_GR]
hashcode [3556498] - locale [it_IT]
hashcode [3556498] - locale [ca]
hashcode [3556498] - locale [hu_HU]
hashcode [3556498] - locale [fr]
hashcode [3556498] - locale [en_IE]
hashcode [3556498] - locale [uk_UA]
hashcode [3556498] - locale [pl_PL]
hashcode [3556498] - locale [fr_LU]
hashcode [3556498] - locale [nl_BE]
hashcode [3556498] - locale [en_IN]
hashcode [3556498] - locale [ca_ES]
hashcode [3556498] - locale [ar_MA]
hashcode [3556498] - locale [es_BO]
hashcode [3556498] - locale [en_AU]
hashcode [3556498] - locale [sr]
hashcode [3556498] - locale [zh_SG]
hashcode [3556498] - locale [pt]
hashcode [3556498] - locale [uk]
hashcode [3556498] - locale [es_SV]
hashcode [3556498] - locale [ru_RU]
hashcode [3556498] - locale [ko_KR]
hashcode [3556498] - locale [vi]
hashcode [3556498] - locale [ar_DZ]
hashcode [3556498] - locale [vi_VN]
hashcode [3556498] - locale [sr_ME]
hashcode [3556498] - locale [sq]
hashcode [3556498] - locale [ar_LY]
hashcode [3556498] - locale [ar]
hashcode [3556498] - locale [zh_CN]
hashcode [3556498] - locale [be_BY]
hashcode [3556498] - locale [zh_HK]
hashcode [3556498] - locale [ja]
hashcode [3556498] - locale [iw_IL]
hashcode [3556498] - locale [bg_BG]
hashcode [3556498] - locale [in]
hashcode [3556498] - locale [mt_MT]
hashcode [3556498] - locale [es_PY]
hashcode [3556498] - locale [sl]
hashcode [3556498] - locale [fr_FR]
hashcode [3556498] - locale [cs_CZ]
hashcode [3556498] - locale [it_CH]
hashcode [3556498] - locale [ro_RO]
hashcode [3556498] - locale [es_PR]
hashcode [3556498] - locale [en_CA]
hashcode [3556498] - locale [de_DE]
hashcode [3556498] - locale [ga]
hashcode [3556498] - locale [de_LU]
hashcode [3556498] - locale [de]
hashcode [3556498] - locale [es_AR]
hashcode [3556498] - locale [sk]
hashcode [3556498] - locale [ms_MY]
hashcode [3556498] - locale [hr_HR]
hashcode [3556498] - locale [en_SG]
hashcode [3556498] - locale [da]
hashcode [3556498] - locale [mt]
hashcode [3556498] - locale [pl]
hashcode [3556498] - locale [ar_OM]
hashcode [3556498] - locale [tr]
hashcode [3556498] - locale [th_TH_TH]
hashcode [3556498] - locale [el]
hashcode [3556498] - locale [ms]
hashcode [3556498] - locale [sv_SE]
hashcode [3556498] - locale [da_DK]
hashcode [3556498] - locale [es_HN]

Respondido 28 ago 11, 04:08

Probé su código con caracteres especiales (acentos, etc.) y el código hash tampoco cambia según la configuración regional. - Jérôme Verstrynge

El código hash de un determinado String El objeto no depende de la configuración regional. Eso debería ser obvio por el javadoc que vinculó.

Sin embargo, cualquier transformación que produzca diferentes caracteres en la cadena dará lugar a una cadena diferente (no igual) y un código hash diferente. Por ejemplo, traducir un montón de bytes a una cadena usando una codificación de caracteres predeterminada diferente enlatado dar como resultado diferentes caracteres.


Resumen, cambiar la configuración regional no afecta directamente los códigos hash de cadena, pero podría hacer que su aplicación produzca diferentes valores de cadena, y ESO afectará sus códigos hash.

Respondido 28 ago 11, 04:08

+1 para explicar eso, ¿puede dar un ejemplo de lo que dijo aquí 'traducir un montón de bytes a una cadena usando una codificación de caracteres predeterminada diferente puede resultar en caracteres diferentes' - eternidad

@eon: el problema principal es que si elige el Mal codificación, la traducción le dará los "caracteres extraños aleatorios" o reemplazará los bytes intraducibles con algún carácter (por ejemplo, '?') que indica un carácter no reconocido. El comportamiento real de la entrada no reconocida es "no especificado" si está utilizando (por ejemplo) un constructor de cadenas para convertir los bytes. - Stephen C

El método equals en String establece claramente que las cadenas solo son iguales si representan la misma secuencia de caracteres (es decir, aquí no hay conversiones).

Si bien eso no garantiza que el código hash no use información de la configuración regional (en general, podría hacerlo), la implementación en Oracle JVM se ve así:

public int hashCode() {
    int h = hash;
        int len = count;
    if (h == 0 && len > 0) {
        int off = offset;
        char val[] = value;

            for (int i = 0; i < len; i++) {
                h = 31*h + val[off++];
            }
            hash = h;
        }
        return h;
    }

Esto solo usa los caracteres y no la información local.

Respondido 28 ago 11, 04:08

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.