Manejo de datos no delimitados por comas en un archivo txt con PHP y MySQL

I have a substantial amount of data that needs imported into a MySQL database. I'm used to importing comma and tab delimited data, but this is different, I will try to explain the format in the most simple terms below.

Entonces podría ser:

XXX XXX XXX

o podría ser:

XXX   X XXX

o podría ser:

X     X  XX

I hope the spaces show correctly! If so you'll see that each character has its allocated space, rather than having 3 sections delimited by spaces, tabs or commas. (they are technically delimited by spaces, but by differing numbers).

What I need to do, is say characters 1, 2 and 3 go into field_1 in the DB. Character 4 is always blank, and characters 5, 6 and 7 go into field_2, etc. Essentially each section must be in it's own field.

Now, I suppose I could import it as all one field into a temporary table, and perhaps use SUBSTR() to arrange this data into the correct format as described above.

But that seems a bit long winded.

Is there a better way to do this? I'd hopefully like to import it directly into the correct format in the DB from the text file, without taking any additional steps.

Muchas Gracias

preguntado el 22 de mayo de 12 a las 14:05

You can use substr, but I'd not do it in pure sql. Just read each line, use php.net/manual/en/function.substr.php to get your 3 variables, and insert. -

This is called "fixed-width data", btw, and is pretty common. See for example here: stackoverflow.com/questions/3876092/… -

Thanks Mellamokb, that's mega helpful! One of the reasons I posted the question here is because I didn't know what it was called, and thus was really struggling to search for anything useful on the subject. I've never came across it before. -

Nanne, this was one of my first thoughts but assumed it would be really resource hungry and potentially take longer? -

Compared to a loadfile kinda thing in mysql, sure, but if you're reading the file in PHP anyway (I mean, you've tagged the question php) I don't see the issue. You could always try it (it's not a big deal, you need 3 substr's, so it's not that much work) and see how fast it is :) -

2 Respuestas

I tried the PHP substr() route, and although it worked, I found it was taking a long time to process the data looping through each row, and given that I have hundreds of thousands of records to process, felt it was too slow.

As an alternative, I found this simple SQL solution, which processes the data very quickly:

LOAD DATA LOCAL 
    INFILE 'fixed-width-data.txt' 
INTO TABLE 
    my_table (@line)
SET 
    field_1 = SUBSTR(@line,1,3),
    field_2 = SUBSTR(@line, 5,3),
    field_3 = SUBSTR(@line, 9,3)    

contestado el 25 de mayo de 12 a las 13:05

substr() is one option, but regular expressions might be more elegant to work with. For your example where characters 1 through 3 are one field and 5 through 7 are another, you could do…

preg_match('/^(.{3}).(.{3})$/', $line_of_data, $matches);
$field_one = $matches[1];
$field_two = $matches[2];

This is obviously a simplified example, but I think that if you have many "fields" of data to work with, you'll find that using regular expressions instead of substr() over and over a lot more pleasant to work with in the long run.

contestado el 22 de mayo de 12 a las 14:05

Thanks Garret. I keep meaning to use regular expressions more often, I think you're right. Not sure why your answer has been voted down - a comment would be useful, but I suspect it was unintentional? - user1100149

Who knows. But feel free to mete justice on the scoundrel by voting it back up or maybe even accepting it. =P - Garret Albright

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.