Python leyendo archivos en un directorio

I have a .csv with 3000 rows of data in 2 columns like this:

uc007ayl.1  ENSMUSG00000041439
uc009mkn.1  ENSMUSG00000031708
uc009mkn.1  ENSMUSG00000035491

In another folder I have a graphs with name like this:

uc007csg.1_nt_counts.txt
uc007gjg.1_nt_counts.txt

You should notice those graphs have a name in the same format of my 1st column

I am trying to use python to identify those rows that have a graph and print the name of 2nd column in a new .txt file

These are the codes I have

import csv
with open("C:/*my dir*/UCSC to Ensembl.csv", "r") as f:
reader = csv.reader(f, delimiter = ',')
    for row in reader:
        print row[0]

But this as far as I can get and I am stuck.

preguntado el 31 de julio de 12 a las 11:07

5 Respuestas

Ya casi estás ahí:

import csv
import os.path
with open("C:/*my dir*/UCSC to Ensembl.csv", "rb") as f:
    reader = csv.reader(f, delimiter = ',')
    for row in reader:
        graph_filename = os.path.join("C:/folder", row[0] + "_nt_counts.txt")
        if os.path.exists(graph_filename):
            print (row[1])

Note that the repeated calls to os.path.exists may slow down the process, especially if the directory lies on a remote filesystem and does not significantly more files than the number of lines in the CSV file. You may want to use os.listdir en lugar:

import csv
import os

graphs = set(os.listdir("C:/graph folder"))
with open("C:/*my dir*/UCSC to Ensembl.csv", "rb") as f:
    reader = csv.reader(f, delimiter = ',')
    for row in reader:
        if row[0] + "_nt_counts.txt" in graphs:
            print (row[1])

contestado el 23 de mayo de 17 a las 13:05

I tried your code but I get this error saying 'import sitecustomize' failed; use -v for traceback@phihag - ivanhoifung

I think I found the error, I forgot to put .png after .txt because those files are graphs - ivanhoifung

file should be opened in 'rb' mode on Python 2.x. You don't need to check every file, a single os.listdir () es suficiente - jfs

@J.F.Sebastian You're right about "rb", fixed. os.listdir is only faster if the number of entries in the CSV file is not significantly smaller than the number of files in the target directory. Added the alternative solution. - Phihag

First, try to see if print row[0] really gives the correct file identifier.

Second, concatenate the path to the files with row[0] and check if this full path exists (if the file exists, actually) with os.path.exists(path) (consulte la sección del http://docs.python.org/library/os.path.html#os.path.exists ).

If it exits, you can write the row[1] (the second column) to a new file with f2.write("%s\n" % row[1] (first you have to open f2 for writing of course).

Respondido 31 Jul 12, 11:07

Well, the next step would be to check if the file exists? There are a few ways, but I like the EAFP enfoque.

try:
   with open(os.path.join(the_dir,row[0])) as f: pass
except IOError:
   print 'Oops no file'

the_dir is the directory where the files are.

Respondido 31 Jul 12, 11:07

In this case, EAFP is not a good idea. There is a fundamental difference between a file being there and actually opening the file. Opening the file will usually trigger it actually being preread, requires file handle management (and locking on Windows), and requires the file to be readable by the current user. - Phihag

result = open('result.txt', 'w')
for line in open('C:/*my dir*/UCSC to Ensembl.csv', 'r'):
    line = line.split(',')
    try:
        open('/path/to/dir/' + line[0] + '_nt_counts.txt', 'r')
    except:
        continue
    else:
        result.write(line[1] + '\n')
result.close()

Respondido 31 Jul 12, 11:07

This implementation can leak file handles on some Python implementations, and requires the current user to be able to read the file. - Phihag

import csv
import os

# get prefixes of all graphs in another directory
suff = '_nt_counts.txt'
graphs = set(fn[:-len(suff)] for fn in os.listdir('another dir') if fn.endswith(suff))

with open(r'c:\path to\file.csv', 'rb') as f:
    # extract 2nd column if the 1st one is a known graph prefix
    names = (row[1] for row in csv.reader(f, delimiter='\t') if row[0] in graphs)
    # write one name per line
    with open('output.txt', 'w') as output_file:
        for name in names:
            print >>output_file, name

Respondido 31 Jul 12, 11:07

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.