Latin Vs utf8 Charset y uso de índices (mysql 5.5)

My understanding with latin vs utf8 is as per below:

"latin supports only latin characters (like english) but utf8 supports all international languages like french, chineese, arabic etc. (even not support fully as it uses 3 byte per character while it should use 4 byte per character to take care all international utf8 character). As per standard latin stores 1 character in 1 byte while utf8 1 character in 1-3 bytes. But if we store all characters in latin even in a utf8 type column then it will store 1 char in 1 byte."

latin vs utf8 Index: "Columns value takes byte as per character in columns and as per charset type but index always stored value in bytes."

May some one clear my below queries, I will be very thankful.

Suppose there is a title varchar(250) column and there is an index on it as Alter table mytable add index (title(16)) in utf8 charset type table;

If this columns contains a sting "This is my Title", which contains 16 character and all latin. then clear below queries:

1) As string contains 16 chars and all are latin type means it should stores only 16 bytes even table charset is utf8 or else.

2) Index on 16 bytes is sufficient to take care this 16 character string or else.

Gracias,

Zafar

preguntado el 12 de febrero de 14 a las 07:02

2 Respuestas

1) Yes. 2) Yes.

Note that "latin" is not a character encoding. Encodings people usually call latin-something, like MySQL's "latin1," include characters that need 2 or 3 bytes when encoded in UTF-8. It's ASCII characters that can be stored with one byte in UTF-8.

Respondido 12 Feb 14, 08:02

@Tim : Thanks for this explanatory answer. - Zafar Malik

1) latin1 (ISO-8859-1) characters can be more than 1 byte in utf8. If the characters are ASCII (as in your example string), then it would only need 1 byte for each character in utf8. If they're non-ASCII but still latin1, then more bytes would be needed.

2) Again, assuming the characters in the 16 byte string are always ASCII, then 16 bytes in the utf8 index would cover it. Sin embargo, note that for indexes on a char/varchar/text column, the index length is personajes not bytes. So (16) would mean that your index could be up to 48 bytes for utf8. Also, your column definition is the same (so varchar(250) is 250 personajes which is up to 750 bytes for utf8).

Note that MySQL also supports the utf8mb4 encoding which is proper UTF-8 - i.e. characters can take up to 4 bytes to encode. However, if you use this and want longer indexes you'll need to mess around with table and row format/creation and InnoDB settings because indexes etc. will take up more than the standard 767 bytes (e.g. 250 character index would need space for 1000 bytes).

Respondido 12 Feb 14, 09:02

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.