cambiar los números en la 1ra columna

I know sed or awk can tackle this kind of problem more elegantly perhaps. But I went the python way, so the problem is that I would like to renumber the first column of my data file from 1 to #of lines in the file. Is that a good idea to read the file by readlines? For small files perhaps, but large files not I suppose. So here is what I came up as a first attempt, any comments are appreciated.

#!/usr/bin/env python

import sys

try:
    infilename = sys.argv[1]; outfilename = sys.argv[2];
except:
    print "Usage is <script> inFile outFile"

ifile = open(infilename,'r')
ofile = open(outfilename, 'w')

lines = ifile.readlines();

i=1
for line in lines: 
    list = line.split();
    list[0] = i
    i += 1 
    for val in list:
        ofile.write("%d " % int(val))
    ofile.write('\n')
    del list

ifile.close()
ofile.close()

preguntado el 08 de enero de 11 a las 23:01

Don't use semicolons to end statement, this is not required in Python and is generally considered bad practice (as statement are more elegantly finished by a line return). -

@tokland, :S right, I should improve on that :) -

5 Respuestas

You can iterate over the file to keep only the current line in memory:

#!/usr/bin/env python
import sys

try:
    # dont use ; !
    infilename = sys.argv[1]
    outfilename = sys.argv[2]
except:
    print "Usage is <script> inFile outFile"


# you could use `with` here if you have a Python 2.7
ifile = open(infilename,'r')
ofile = open(outfilename, 'w')

# no need to count yourself, enumerate does that
# plus when you iterate over a file you get lines too
for i, line in enumerate(ifile, start=1):
    # dont shadow builtins like `list`
    parts = line.split()
    parts[0] = i
    # join is the inverse function to split
    new_line = ' '.join("%d" % int(val) for val in parts)
    ofile.write(new_line + '\n')

ifile.close()
ofile.close()

@Umut Tabak: ("%d" % int(val) for val in parts) es un expresión generadora, they are kind of like lazy lists. It gives the same items as the list comprehension ["%d" % int(val) for val in parts] but without actually creating the list.

Btw, the for block can be written even shorter, but it's slightly different because it doesn't enforce that all lines are ints anymore:

for i, line in enumerate(ifile, start=1):
    parts = line.split()
    parts[0] = "%d" % i
    new_line = ' '.join(parts)
    ofile.write(new_line + '\n')

Respondido el 09 de enero de 11 a las 03:01

"%d" % int(val) for val in parts how is this interpretted? int(val) for val in parts is a list comprehension, I guess not? - Umut Tabak

This will split on all whitespace but joins with just one space. Although I guess that probably doesn't matter. Could also use a csv.reader for more flexibility if the file format is well-specified. - Katriel

Don't do the readlines() at all, and instead:

for line in ifile: 

Also, avoid naming variables with the name list. Desde list() is a built-in function, you're shadowing that name which is poor practice.

No hay necesidad de del a local variable like you've done with del list; this is automatically taken care of by Python's garbage collector. (In CPython, the garbage collector is reference-counted and deterministic.)

Respondido el 09 de enero de 11 a las 02:01

with open(infilename,'r') as ifile:
    with open(outfilename, 'w') as ofile:
         for (nr, line) in enumerate(ifile):
             line = line.split()
             line[0] = nr
             line.append('\n')
             ofile.write(' '.join(line))

Respondido el 09 de enero de 11 a las 03:01

Creo que necesitas un .split() en alguna parte. - Greg Hewgill

@Greg Hewgill thanks I mistaked first column with first character;) - virhilo

what about without in-place statements? ofile.write(" ".join([nr]+line.split()[1:])) - tokland

but then you must use '+' on the lists which is bad;) - virhilo

@virhilo, there's nothing wrong with using + to concatenate two strings. It's the fastest way to do it. - aaronasterling

#!/usr/bin/env python
import sys

try:
    ifile = open(sys.argv[1], 'r')
    ofile = open(sys.argv[2], 'w+')
except:
    print "Usage is <script> inFile outFile"
else:
    for i, line in enumerate(ifile, start=1):
        items = [str(i)] + line.split()[1:]
        ofile.write(' '.join(items) + '\n')

    ifile.close()
    ofile.close()

There are a few points I'd like to discuss with my answer. The first is the try block, where I'm checking that I can open the files. If no filenames are input, or if either file isn't openable, you'll get the usage message. You could of course break this up: check for text, and return appropriately return usage, and try opening the files, and appropriately return file opening failed. Or, you could check for specific exceptions and return different messages.

Next, enumeration is a convenient way to have the interpreter keep track of the index. In the loop itself, I'm joining the enumeration index and a 'slice' of the read line (everything but the first item). I then join those with a space and write them with a newline.

This is clear and short.

Respondido el 09 de enero de 11 a las 03:01

You don't need to split the whole line, just split the first column:

for i,line in enumerate(ifile,1):
    first,remaining = line.split(' ',1)
    ofile.write("{0} {1}".format(i,remaining))

También tu except needs to exit or the rest of the file will run anyway.

Respondido el 09 de enero de 11 a las 03:01

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.