python: regex solo obtiene la última aparición

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import re

text = "aaaa[ab][cd][ef]"

a = re.compile("^(\w+)(\[\w+\])*$").findall(text)

print a

i need all of them but it returns:

[('aaaa', '[ef]')]


a = re.compile("\[\w+\]").findall(text)

i get all of them but the first word is out...

['[ab]', '[cd]', '[ef]']

this text is random text i put this because of the stackoverflow standars quality

preguntado el 01 de febrero de 12 a las 22:02

4 Respuestas

Aquí sabrás como podrás hacerlo:

In [14]: a = re.compile(r"(\w+|\[\w+\])").findall(text)

In [15]: print a
['aaaa', '[ab]', '[cd]', '[ef]']

Each match returns one group of letters (with or without brackets).

Respondido 02 Feb 12, 02:02

There is only one match: the "^(\w+)" partidos de la parte "aaaa" y "(\[\w+\])*$" partidos de la parte "[ab][cd][ef]". Note that you get a list of one element (which is a tuple), so there's only one match. Each pair of parentheses you use in the regexp generates an element in the tuple, with the text that matched whatever was inside them. There are two pairs, so there are two elements in the tuple. The second pair of parentheses is starred, but that only causes that result to be "assigned" multiple times (which appears to keep the last value): it does not multiply the parentheses themselves, so you don't get a larger tuple.

I'm not sure what you expect, so I don't know what regexp to suggest.

Respondido 02 Feb 12, 02:02

i will do it in 2 steps there is no problem :) thanks for the info - ZiTAL

Based on your comment on aix's answer it appears that you want to require the non-bracketed part to match, maybe something like this is what you are looking for?

>>> a = re.compile(r"^(\w+)((?:\[\w+\])*)").findall(text)
>>> print a
[('aaaa', '[ab][cd][ef]')]

If you need to get the result ['aaaa', '[ab]', '[cd]', '[ef]'] instead of what is shown above here is one method:

>>> match = re.compile(r"^(\w+)((?:\[\w+\])*)").search(text)
>>> a = [] +"][", "] [").split()
>>> print a
['aaaa', '[ab]', '[cd]', '[ef]']

Respondido 02 Feb 12, 02:02

finally i do it with this code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import re

text = "aaaa[ab][cd][ef]"

var = []
if re.match("^(\w+)(\[\w+\])*$", text):
        a = re.findall("^\w+", text)[0]
        b = re.findall("\[\w+\]", text)
        for i in b:
print var


['aaaa', '[ab]', '[cd]', '[ef]']

all these solutions are great, thanks :)

Respondido 02 Feb 12, 12:02

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.