Cadena dividida de Python entre comillas

I'm a python learner. If I have a lines of text in a file that looks like this

"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"

Can I split the lines around the inverted commas? The only constant would be their position in the file relative to the data lines themselves. The data lines could range from 10 to 100+ characters (they'll be nested network folders). I cannot see how I can use any other way to do those markers to split on, but my lack of python knowledge is making this difficult. I've tried

optfile=line.split("")

and other variations but keep getting valueerror: empty seperator. I can see why it's saying that, I just don't know how to change it. Any help is, as always very appreciated.

Muchas gracias

preguntado el 17 de mayo de 13 a las 08:05

10 Respuestas

Debes escapar del ":

input.split("\"")

resultados en

['\n',
 'Y:\\DATA\x0001\\SERVER\\DATA.TXT',
 ' ',
 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT',
 '\n']

To drop the resulting empty lines:

[line for line in [line.strip() for line in input.split("\"")] if line]

resultados en

['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']

contestado el 17 de mayo de 13 a las 08:05

This one is smarter and possibly faster than regex extraction. Even with some polishing it could split some "mixed CSV" (or space-SV): 'not quoted "quoted token"' -> ['not', 'quoted', '"quoted token"'] with preserving the information about quotes presence. - tomasz gandor

I'll just add that if you were dealing with lines that look like they could be command line parameters, then you could possibly take advantage of the módulo shlex:

import shlex

with open('somefile') as fin:
    for line in fin:
        print shlex.split(line)

Daría:

['Y:\\DATA\\00001\\SERVER\\DATA.TXT', 'V:\\DATA2\\00002\\SERVER2\\DATA2.TXT']

Respondido el 09 de Septiembre de 13 a las 23:09

The shlex module has limits - it's proper purpose is parsing [quoted] command line [arguments]. However, this is very easy to use and often gets you from A to B. If you want to keep quotes around the quoted tokens, specify shlex.split(line, posix=False). This causes other possible problems (How"about"stray"quotes, ha?), but again: in some use cases it will work and do the trick. - tomasz gandor

No regex, no split, just use csv.reader

import csv

sample_line = '10.0.0.1 foo "24/Sep/2015:01:08:16 +0800" www.google.com "GET /" -'

def main():
    for l in csv.reader([sample_line], delimiter=' ', quotechar='"'):
        print l

La salida es

['10.0.0.1', 'foo', '24/Sep/2015:01:08:16 +0800', 'www.google.com', 'GET /', '-']

Respondido el 25 de Septiembre de 15 a las 05:09

shlex módulo puede ayudarte.

import shlex

my_string = '"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
shlex.split(my_string)

This will spit

['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']

Referencia: https://docs.python.org/2/library/shlex.html

Respondido el 18 de junio de 16 a las 11:06

Finding all regular expression matches will do it:

input=r'"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'

re.findall('".+?"', # or '"[^"]+"', input)

This will return the list of file names:

["Y:\DATA\00001\SERVER\DATA.TXT", "V:\DATA2\00002\SERVER2\DATA2.TXT"]

To get the file name without quotes use:

[f[1:-1] for f in re.findall('".+?"', input)]

vea la sección re.finditer:

[f.group(1) for f in re.finditer('"(.+?)"', input)]

contestado el 17 de mayo de 13 a las 08:05

Nope, the output is ['"Y:\\DATA\x0001\\SERVER\\DATA.TXT" "V:\\DATA2\x0002\\SERVER2\\DATA2.TXT"'] (a single item) - nhahtdh

I first had the more complex '"[^"]+"', but edited a bit too much. - Tomas Jung

Note that your output still does not match whatever output by Python. It would be quite misleading. - nhahtdh

What do you mean? I forget to define the input as a raw string. Is there something else? - Tomas Jung

This is what output by Python ['"Y:\\DATA\\00001\\SERVER\\DATA MINER.TXT"', '"V:\\DATA2\\00002\\SERVER2\\DATA2.TXT"'] Nota la ' - nhahtdh

The following code splits the line at each occurrence of the inverted comma character (") and removes empty strings and those consisting only of whitespace.

[s for s in line.split('"') if s.strip() != '']

There is no need to use regular expressions, an escape character, some module or assume a certain number of whitespace characters between the paths.

Prueba:

line = r'"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
output = [s for s in line.split('"') if s.strip() != '']
print(output)
>>> ['Y:\\DATA\\00001\\SERVER\\DATA.TXT', 'V:\\DATA2\\00002\\SERVER2\\DATA2.TXT']

Respondido el 14 de Septiembre de 18 a las 10:09

I think what you want is to extract the filepaths, which are separated by spaces. That is you want to split the line Introducción items contained within quotations. I.e with a line

"FILE PATH" "FILE PATH 2"

¿Quieres

["FILE PATH","FILE PATH 2"]

En ese caso:

import re
with open('file.txt') as f:
    for line in f:
        print(re.split(r'(?<=")\s(?=")',line))

Con file.txt:

"Y:\DATA\00001\SERVER\DATA MINER.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"

Salidas:

>>> 
['"Y:\\DATA\\00001\\SERVER\\DATA MINER.TXT"', '"V:\\DATA2\\00002\\SERVER2\\DATA2.TXT"']

contestado el 17 de mayo de 13 a las 08:05

File names could contain spaces. You would split them in two pieces. - Tomas Jung

This was my solution. It parses most sane input exactly the same as if it was passed into the command line directly.

import re
def simpleParse(input_):
    def reduce_(quotes):
        return '' if quotes.group(0) == '"' else '"'
    rex = r'("[^"]*"(?:\s|$)|[^\s]+)'

    return [re.sub(r'"{1,2}',reduce_,z.strip()) for z in re.findall(rex,input_)]

Use case: Collecting a bunch of single shot scripts into a utility launcher without having to redo command input much.

Edit: Got OCD about the stupid way that the command line handles crappy quoting and wrote the below:

import re
tokens = list()
reading = False
qc = 0
lq = 0
begin = 0
for z in range(len(trial)):
    char = trial[z]
    if re.match(r'[^\s]', char):
        if not reading:
            reading = True
            begin = z
            if re.match(r'"', char):
                begin = z
                qc = 1
            else:
                begin = z - 1
                qc = 0
            lc = begin
        else:
            if re.match(r'"', char):
                qc = qc + 1
                lq = z
    elif reading and qc % 2 == 0:
        reading = False
        if lq == z - 1:
            tokens.append(trial[begin + 1: z - 1])
        else: 
            tokens.append(trial[begin + 1: z])
if reading:
    tokens.append(trial[begin + 1: len(trial) ])
tokens = [re.sub(r'"{1,2}',lambda y:'' if y.group(0) == '"' else '"', z) for z in tokens]

Respondido el 09 de Septiembre de 13 a las 23:09

I know this got answered a million year ago, but this works too:

input = '"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
input = input.replace('" "','"').split('"')[1:-1]

Should output it as a list containing:

['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']

Respondido el 15 de Septiembre de 16 a las 11:09

Mi pregunta Python - Error Caused by Space in argv Arument was marked as a duplicate of this one. We have a number of Python books doing back to Python 2.3. The oldest referred to using a list for argv, but with no example, so I changed things to:-

repoCmd = ['Purchaser.py', 'task', repoTask, LastDataPath]
SWCore.main(repoCmd)

and in SWCore to:-

sys.argv = args

The shlex module worked but I prefer this.

Respondido el 20 de Septiembre de 17 a las 14:09

thanks for answering, but this answers to your issue (making sure that you no have the split problem in the first place) but not to this question issue. - Jean-Francois Fabre

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.