php, mysql, mi memoria pierde

I didn't expect this script (throw-away) to be leaking and I haven't figured out what the culprit is. Can you spot anything? Although this is throw-away code, I'm concerned that I'll repeat this in the future. I've never had to manage memory in PHP, but with the number of rows in the db, it's blowing up my php instance (already upped the memory to 1Gb).

The california table is especially larger than the others (currently 2.2m rows, less as I delete duplicate rows). I get a memory error on line 31 ($row = mysql_fetch_assoc($res))

Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocat e 24 bytes) in C:\Documents and Settings\R\My Documents\My Webpages\cdiac\cdiac_ dup.php on line 31

PHP 5.3.0, mysql 5.1.36. part of a wamp install.

here's the entire code. the purpose of this script is to delete duplicate entries (data was acquired into segmented tables, which was far faster at the time, but now I have to merge those tables.)

what's causing it? something I'm overlooking? or do I just need to watch the memory size and call garbage collection manually when it gets big?


define('DBSERVER', 'localhost');
define('DBNAME', '---');
define('DBUSERNAME', '---');
define('DBPASSWORD', '---');

$dblink = mysql_connect(DBSERVER, DBUSERNAME, DBPASSWORD);
mysql_select_db(DBNAME, $dblink);

$state = "AL";
//if (isset($_GET['state'])) $state=mysql_real_escape_string($_GET['state']); 
if (isset($argv[1]) ) $state = $argv[1];

echo "Scanning $state\n\n";

// interate through listing of a state to check for duplicate entries (same station_id, year, month, day)
$DBTABLE = "cdiac_data_". $state;
$query = "select * from $DBTABLE ";
$query .= " order by station_id, year, month, day ";

$res = mysql_query($query) or die ("could not run query '$query': " . mysql_errno() . " " . mysql_error());

$last = "";
$i = 1;
$counter = 0;
echo ".\n";
while ($row = mysql_fetch_assoc($res)) {  
  $current = $row["station_id"] . "_" . $row["year"] . "_" . sprintf("%02d",$row["month"]) . "_" . sprintf("%02d",$row["day"]);
  echo str_repeat(chr(8), 80) . "$i  $current ";
  if ($last == $current) {
    //echo implode(', ', $row) . "\n";

    // merge $row and $prev_row
    // data_id  station_id, state_abbrev, year, month,  day,  TMIN, TMIN_flags, TMAX, TMAX_flags, PRCP, PRCP_flags, SNOW, SNOW_flags, SNWD, SNWD_flags

    printf("%-13s %8s %8s\n", "data_id:", $prev_row["data_id"], $row["data_id"]);
    if ($prev_row["data_id"] == $row["data_id"]) echo " + ";

    $set = "";
    if (!$prev_row["TMIN"] && $row["TMIN"])  $set .= "TMIN = " . $row["TMIN"] . ", ";
    if (!$prev_row["TMIN_flags"] && $row["TMIN_flags"])   $set .= "TMIN_flags = '" . $row["TMIN_flags"] . "', ";
    if (!$prev_row["TMAX"] && $row["TMAX"])   $set .= "TMAX = " . $row["TMAX"] . ", ";
    if (!$prev_row["TMAX_flags"] && $row["TMAX_flags"])   $set .= "TMAX_flags = '" . $row["TMAX_flags"] . "', ";
    if (!$prev_row["PRCP"] && $row["PRCP"])   $set .= "PRCP = " . $row["PRCP"] . ", ";
    if (!$prev_row["PRCP_flags"] && $row["PRCP_flags"])   $set .= "PRCP_flags = '" . $row["PRCP_flags"] . "', ";
    if (!$prev_row["SNOW"] && $row["SNOW"])   $set .= "SNOW = " . $row["SNOW"] . ", ";
    if (!$prev_row["SNOW_flags"] && $row["SNOW_flags"])   $set .= "SNOW_flags = '" . $row["SNOW_flags"] . "', ";
    if (!$prev_row["SNWD"] && $row["SNWD"])   $set .= "SNWD = " . $row["SNWD"] . ", ";
    if (!$prev_row["SNWD_flags"] && $row["SNWD_flags"])   $set .= "SNWD_flags = '" . $row["SNWD_flags"] . "', ";

    $delete = "";
    $update = "";
    if ($set = substr_replace( $set, "", -2 )) $update = "UPDATE $DBTABLE SET $set WHERE data_id=".$prev_row["data_id"]." and year=".$row["year"]." and month=".$row["month"]." and day=".$row["day"].";\n";
    if ($row["data_id"] != $prev_row["data_id"]) $delete = "delete from $DBTABLE where data_id=".$row["data_id"]." and year=".$row["year"]." and month=".$row["month"]." and day=".$row["day"].";\n\n";

    if ($update) {
      $r = mysql_query($update) or die ("could not run query '$update' \n".mysql_error());
    if ($delete) {
      $r = mysql_query($delete) or die ("could not run query '$delete' \n".mysql_error());

    //if ($counter++ > 5) exit(0);
  else {
    $last = $current;
    //copy $row to $prev_row
    foreach ($row as $key => $val) $prev_row[$key] = $val;


    echo "\n\nDONE\n"; 

preguntado el 27 de agosto de 11 a las 23:08

though it gets smaller as I delete things, my California table has 2,200,000+ rows. I run out of memory (set at 1Gb) around 1.7m rows. -

You should use one of the modern mysql extensions MySQLi or PDO_MYSQL. -

@kingcruch, I'm sure you're right. do you have any preference or defining reason for using one over the other? -

@fbas: mysqli is lower level and PDO is higher. mysqli allows more access to MySQL-specific features. PDO has a better interface and supports other DBs. See the question "mysqli or PDO - what are the pros and cons?". -

3 Respuestas

Intentaría dos cosas:

1) Instead of running the UPDATE and DELETE queries inside the loop using mysql_query, save them in a text file, to execute later. For example: file_put_contents('queries.sql', $update, FILE_APPEND );

2) Instead of doing everything inside the while ($row = mysql_fetch_assoc($res)) loop, first grab all SELECT query results, then close database connection freeing all database resources, including the query result. Only after this, perform the loop process.

If you run out of memory while storing the database results in one array, you can try saving the results in a temporary file instead (one record per line / FILE_APPEND), and then use this file in the loop (reading one line per record, using fgets función).

Respondido 28 ago 11, 04:08

I agree with J. Bruni, plus the last "else" block really smells. Why don't you just do $prev_row = $row instead of that foreach? I know this is not the cause of the memory leak anyway. - Sebastián Grignoli

@sebastian, I originally used $prev_row = $row, but when I got memory errors, that was the first thing I changed, wondering if that was the reason for the leak - so I was making a copy of the row rather than reference. It has no effect. - fbas

Work smarter, not harder:

SELECT station_id, year, month FROM table
    GROUP BY station_id, year, month
    HAVING COUNT(*) > 1

That'll get you all the station_id/year/month tuples that appear in the table more than once. Assuming that most of your data is not duplicates, that'll save you a lot of memory, since now you just have to go through these tuples and fix up the rows matching them.

Respondido 28 ago 11, 07:08

I tried something similar to that. Took forever. I cancelled it. - fbas

even with appropriate indexes? - glglgl

sorry, didn't realize my answer sounded so dismissive. your solution works. and doesn't gobble up memory because the result set is appropriately small. I'm still wondering if there's a way to release the memory glommed in the loop, but your query is good. - fbas

If you're using PHP 5.2, try upgrading to 5.3. It's got much better memory management. - user149341

it's 5.3.0, and I figure it's just an issue of having a large result set. it won't release it's memory. I guess the solution is to make smaller LIMIT queries of the larger set. this can be an issue if I'm editing rows in such a way that they will be re-ordered in subsequent queries. - fbas

I found this when trying to trace down a memory use problem on a script of mine. Having solved the issue for mine I thought it worth adding a reply here for the next person who comes along with the same issue.

I was using mysqli, but much the same applies for mysql.

The problem I found was the queries not freeing their results. The solution was to use mysqli_free_result() after executing the update and delete queries. But more importantly on the mysqli_query for the loop I used the extra parameter of *MYSQLI_USE_RESULT* . There are side effects of this, so use a separate connection for the update and delete queries.

contestado el 10 de mayo de 13 a las 18:05

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.