La entrada del administrador de Django causa UnicodeDecodeError, ¿cómo?

Today I received data via the Django admin which couldn't be encoded. Somehow the encoding of the data is not in unicode. How is this possible?

Tengo un name property at my Client model which returns the data in unicode:

@property
def name(self):
    return u'{0} {1}'.format(self.firstname, self.lastname).strip()

But this doesnt work:

>>> client
<Client: [Bad Unicode data]>

>>> client.lastname
'Dani\xc3\xabl'

>>> client.lastname.__class__
<type 'str'>

>>> u"{0} {1}".format(client.firstname, client.lastname)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

Stange enough, encoding the first/lastname as regular string does work:

>>> "{0} {1}".format(client.firstname, client.lastname)
'Test Dani\xc3\xabl'

>>> "{0} {1}".format(client.firstname, client.lastname).decode('utf-8')
u'Test Dani\xebl'

What happened here? and how did this input get into my model via the admin?

System stack (it's an external server):

  • Debian 6.0.5 (Squeeze)
  • Django 1.4.1
  • 2.6.6 Python
  • MySQL 5.1.49
  • MySQL-python == 1.2.2

This is the relevant model code:

class Client(models.Model):
    firstname = models.CharField(_("Firstname"), max_length=255)
    lastname = models.CharField(_("Lastname"), max_length=255)
    email = models.EmailField(_("Email"), unique=True, max_length=255)

    class Meta:
        db_table = u'clients'
        ordering = ('firstname', 'lastname', 'email')

    def __unicode__(self):
        return u'{0} <{1}>'.format(self.name, self.email)

    @property
    def name(self):
        return u'{0} {1}'.format(self.firstname, self.lastname).strip()

preguntado el 01 de septiembre de 12 a las 12:09

Just to be sure, I take it that firstname y lastname are fields? Could you post the relevant model code? -

1 Respuestas

This is probably due to the collation you are using for your MySQL database.

Indeed, Django's behavior is to always return unicodestrings when retrieving data form the database - which would work with your code, as there is nothing wrong with it.

However, as you can see in the django documentation on database settings, section collation settings, using MySQLdb version 1.2.2 with an utf8_bincollated MySQL database will cause you to not to get unicode strings, but bytestrings, when retrieving charfields form the database.

You might want to investigate this issue (that is, check your MySQL collation settings), but it is likely that your problem is coming from there.

If this is the case, you will have to decode by hand any input that you are getting from MySQL. Alternatively, you could change the collation settings of your database.

Puede usar el SHOW TABLE STATUS FROM %YOURDB% to get the collation of the tables in your database.


 Excerpt from the relevant documentation section:

De forma predeterminada, con una base de datos UTF-8, MySQL utilizará la colación utf8_general_ci_swedish. Esto da como resultado que todas las comparaciones de igualdad de cadenas se realicen sin distinción entre mayúsculas y minúsculas. Es decir, "Fred" y "freD" se consideran iguales a nivel de base de datos. Si tiene una restricción única en un campo, sería ilegal intentar insertar tanto "aa" como "AA" en la misma columna, ya que se comparan como iguales (y, por lo tanto, no únicos) con la intercalación predeterminada.

In many cases, this default will not be a problem. However, if you really want case-sensitive comparisons on a particular column or table, you would change the column or table to use the utf8_bin collation. The main thing to be aware of in this case is that if you are using MySQLdb 1.2.2, the database backend in Django will then return bytestrings (instead of unicode strings) for any character fields it receive from the database. This is a strong variation from Django's normal practice of always returning unicode strings.

Respondido el 20 de junio de 20 a las 10:06

Autch... this seems to be the case. Any attempt to change the connection OPTIONS failed so far. The tables are indeed in latin1_swedish_ci collation. So far I haven't been able to fix the output of MySQLdb/Django however. - vdboor

@vdboor You can always change the collation of a table / database: dev.mysql.com/doc/refman/5.1/en/charset-table.html - Tomás Orozco

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.