Solo una parte de la cadena larga se escribe en la base de datos con pandas write_frame

I'm writing a pandas DataFrame a una MySql database. Following is how it displays on the screen.

                IP                                              Agent
0  108.225.156.214  Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) G...
1   125.214.169.32  Mozilla/5.0 (Symbian/3; Series60/5.3 NokiaN8-0...
2   125.214.169.32  Mozilla/5.0 (compatible; MSIE 9.0; Windows Pho...

Yo escribo el DataFrame with user-agent strings in to the database as follows.

db = MySQLdb.connect("host","user","","db")
cursor = db.cursor()
cursor.execute("DROP TABLE IF EXISTS Pattern")

sql.write_frame(df, con=db,name = 'Pattern',flavor='mysql')
db.close()

Problem is only first part of the user agent strings are written to the database?(Like what is displayed on the screen). How to avoid this?

ACTUALIZACIÓN

An example DataFrame:

df = pd.DataFrame({'IP':['108.225.156.214','141.0.8.111','94.174.16.147'],'UserAgent':['Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.29.13 (KHTML, like Gecko) Version/6.0.4 Safari/536.29.13','Mozilla/5.0 (Linux; Android 4.1.2; GT-I9300T Build/JZO54K) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.58 Mobile Safari/537.31','Opera/9.80 (J2ME/MIDP; Opera Mini/4.4.28684/29.3530; U; en) Presto/2.8.119 Version/11.10']})

preguntado el 27 de noviembre de 13 a las 05:11

@alko: This DataFrame is constructed using a Apache log file. Example user-agent......"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2)" -

@alko: Please referrer my update to the question. -

1 Respuestas

As you manually drop table (the same behaviour can be achived with recreate=True param), it seems that reason lies in creation statement, which for your case is generated as

CREATE TABLE pattern (
  `IP` VARCHAR (63),
  `UserAgent` VARCHAR (63)
                  );

aquí 63 is hardcoded constant from pandas.io.sql.get_sqltype

I think a solution would be precreation of table with needed structure, for example

CREATE TABLE pattern (
  `IP` VARCHAR (15),
  `UserAgent` VARCHAR (1000));

and emptifying it not with drop statement (this is a bad practice from dba point of view), but with delete from pattern .

For further usage, I'll add here simple mock connections class I used to inspect generated SQL:

class MockConnection(object):
   def __init__(self):
       self.query = []
   def executemany(self, *args):
       self.query.append(args)
   def cursor(self):
       return self
   def execute(self, *args):
       self.query.append(args)
   def close(self): pass
   def commit(self): pass

usage is like follows:

>>> con = MockConnection()
>>> pd.io.sql.write_frame(df, 'test', con, flavor ='mysql')

respondido 27 nov., 13:09

Yes, That is always creating a VARCHAR (63) for user-agent - Nilani Algiriyage

@NilaniAlgiriyage try create table manually, as proposed in my answer - alko

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.