Seleccione ciertas filas (condición cumplida), pero solo algunas columnas en Python/Numpy
Frecuentes
Visto 91,176 equipos
27
I have an numpy array with 4 columns and want to select columns 1, 3 and 4, where the value of the second column meets a certain condition (i.e. a fixed value). I tried to first select only the rows, but with all 4 columns via:
I = A[A[:,1] == i]
which works. Then I further tried (similarly to matlab which I know very well):
I = A[A[:,1] == i, [0,2,3]]
which doesn't work. How to do it?
DATOS DE EJEMPLO:
>>> A = np.array([[1,2,3,4],[6,1,3,4],[3,2,5,6]])
>>> print A
[[1 2 3 4]
[6 1 3 4]
[3 2 5 6]]
>>> i = 2
# I want to get the columns 1, 3 and 4
# for every row which has the value i in the second column.
# In this case, this would be row 1 and 3 with columns 1, 3 and 4:
[[1 3 4]
[3 5 6]]
I am now currently using this:
I = A[A[:,1] == i]
I = I[:, [0,2,3]]
But I thought that there had to be a nicer way of doing it... (I am used to MATLAB)
5 Respuestas
38
>>> a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
>>> a
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> a[a[:,0] > 3] # select rows where first column is greater than 3
array([[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> a[a[:,0] > 3][:,np.array([True, True, False, True])] # select columns
array([[ 5, 6, 8],
[ 9, 10, 12]])
# fancier equivalent of the previous
>>> a[np.ix_(a[:,0] > 3, np.array([True, True, False, True]))]
array([[ 5, 6, 8],
[ 9, 10, 12]])
For an explanation of the obscure np.ix_()
, consulte nuestra página, https://stackoverflow.com/a/13599843/4323
Finally, we can simplify by giving the list of column numbers instead of the tedious boolean mask:
>>> a[np.ix_(a[:,0] > 3, (0,1,3))]
array([[ 5, 6, 8],
[ 9, 10, 12]])
contestado el 23 de mayo de 17 a las 13:05
So really two consecutive selections necessary? - Tim
If you're wishing you could do a[x][y]
where x and y are boolean masks, yeah, I wish that too, but it does not work. This seems to be a known problem, and I don't know why, but it's hardly important here. - Juan Zwinck
Not only that, I wished to be able to select the rows and colums in ONE single statement like this: A[row_indices_to_select, colum_indices_to_select]
, mientras que row_indices_to_select
would be coming from the condition I wanted to apply.. :( - Tim
I've added some more solutions--I like the last one using ix_() with a tuple. - Juan Zwinck
6
If you do not want to use boolean positions but the indexes, you can write it this way:
A[:, [0, 2, 3]][A[:, 1] == i]
Going back to your example:
>>> A = np.array([[1,2,3,4],[6,1,3,4],[3,2,5,6]])
>>> print A
[[1 2 3 4]
[6 1 3 4]
[3 2 5 6]]
>>> i = 2
>>> print A[:, [0, 2, 3]][A[:, 1] == i]
[[1 3 4]
[3 5 6]]
Seriamente,
contestado el 28 de mayo de 14 a las 14:05
Boolean positions actually are okay for me, I just would have wanted to do the selection in ONE step and not in two consecutive selections (which your solution is doing, isn't it?) because of performance reasons. - Tim
3
>>> a=np.array([[1,2,3], [1,3,4], [2,2,5]])
>>> a[a[:,0]==1][:,[0,1]]
array([[1, 2],
[1, 3]])
>>>
respondido 15 mar '16, 04:03
1
Esto también funciona.
I = np.array([row[[x for x in range(A.shape[1]) if x != i-1]] for row in A if row[i-1] == i])
print I
Edit: Since indexing starts from 0, so
i-1
debe ser usado.
contestado el 28 de mayo de 14 a las 14:05
The algorithm must be correct, but it is not very pythonic. - Taha
@Taha maybe not, bu it saves you double selection. The idea is actually simple, first choose cols then iterate over rows. - genclik27
@genclik27 I understood what you did. But lately, I am doing some numerical computation with large matrices. I always was in need of vectorized calculations. The problem of what you are proposing is that you create a new list. You cannot change the values directly in the matrix this way. It is indeed useful if you don't need to change the values of A. - Taha
1
I am hoping this answers your question but a piece of script I have implemented using pandas is:
df_targetrows = df.loc[df[col2filter]*somecondition*, [col1,col2,...,coln]]
Por ejemplo,
targets = stockdf.loc[stockdf['rtns'] > .04, ['symbol','date','rtns']]
this will return a dataframe with only columns ['symbol','date','rtns']
del stockdf
where the row value of rtns
satisfies, stockdf['rtns'] > .04
espero que esto ayude
Respondido el 21 de diciembre de 14 a las 06:12
No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas python numpy or haz tu propia pregunta.
A[A[:,1] == i][0,2,3]
didn't work either? - AprillionI = A[A[:,1] == i][0,2,3] --> IndexError: too many indices - tim
And apart from that I got to admit that I wouldn't really understand that indexing either, very different from matlab... - tim
@tim: Could you please post the array and what output do you expect? - Ankur Ankan
@Ankur Ankan: edited into the question. - tim