Mantenga el rango en data.frame (o tabla)

Me gustaria hacer esto

set.seed(667) 
df <- data.frame(a = sample(c(c(4,7),11,NA),  10, rep = TRUE), 
                 b = sample(c(1, 2, 3, NA, 5, 6),  10, rep=TRUE), 
                 c = sample(c(11, 12, 13, 14, 15, 16),  10, rep=TRUE))

but instead of getting this,

df
    a  b  c
1   4 NA 12
2   7  6 12
3  NA NA 14
4  11  1 16
5  NA  2 14
6  NA  3 13
7  11 NA 13
8  NA  6 15
9   7  3 16
10  7  5 16

I would like to get something this where I have a range at some points,

    a  b  c
1  4-7 NA 12
2  4-7  6 12
3  NA  NA 14
4  11   1 16
5  NA   2 14
6  NA   3 13
7  11  NA 13
8  NA   6 15
9  4-7  3 16
10 4-7  5 16

I'm confused and tired and asking for help.

Update after reading SimonO101's comments at 2013-09-09 22:30:14Z

I think my question could also be stated like this, I would like this data frame

data.frame(A = c(4:7, 9),B = c(1,2))

to show up like

  A   B
1 4:7 9
2   2 2

preguntado el 09 de septiembre de 13 a las 23:09

Cannot do it. This: c(4:7, 9) es siempre == c(4, 5,6,7, 9). Once it hits the parser, the : function will return a vector of integers, but will leave no trace of how that result was produced. I suppose you could pull out the contiguous sequences, but that does not appear to be your goal. -

3 Respuestas

¿Quizás quieres esto?

library(data.table)

d = data.table(A = list(c(4,7), 9),B = c(1,2))
#     A B
#1: 4,7 1
#2:   9 2

One more possibility is to store the unevaluated expression (it's really not clear what OP wants, so I'm just shooting in the dark here):

d = data.table(A = list(quote(4:7), 9), B = c(1,2))
#        A B
#1: <call> 1
#2:      9 2
d[,A]
#[[1]]
#4:7
#
#[[2]]
#[1] 9
lapply(d[, A], eval)
#[[1]]
#[1] 4 5 6 7
#
#[[2]]
#[1] 9

Respondido el 10 de Septiembre de 13 a las 15:09

Podrías usar cut to convert the values to whatever intervals you like, and also set appropriate labels for each of the intervals like so:

newdf <- sapply( df , cut , breaks = c(1:4,7.01,8:16) , labels = c(1:3,"4-7",8:16) , right = TRUE )
#      a     b     c   
# [1,] "3"   NA    "12"
# [2,] "4-7" "4-7" "12"
# [3,] NA    NA    "14"
# [4,] "11"  NA    "16"
# [5,] NA    "1"   "14"
# [6,] NA    "2"   "13"
# [7,] "11"  NA    "13"
# [8,] NA    "4-7" "15"
# [9,] "4-7" "2"   "16"
#[10,] "4-7" "4-7" "16"

Respondido el 09 de Septiembre de 13 a las 23:09

Thank you for responding to my question. Is it possible to hold it as a numeric and not a character variable? - Eric falla

Que es 4-7? It would be interpreted as a binary mathematical operation. - Simon O'Hanlon

4-7 is a range, the range I would like to hold/store in the data frame. Maybe it could be more correct to write 4:7 - Eric falla

@EricFail I realise that, I was asking rhetorically. A column of a data.frame holds an atomic vector so you can't do this. - Simon O'Hanlon

What exactly do you want to do with these ranges?

One simple option is to replace each column with 2 columns, the first is the minimum, the second is the maximum (so you would have a.min, a.max, b.min, etc.). You could represent exact values by either having the max be NA or by having the min and the max be the same.

Another option is to create a new object that is stored as a list with each row being either a vector of length 1 (exact value) or length 2 (the range). Write a method for format for your object that creates a character vector of either the single value or the range (e.g. 4-7) and when you print the data frame it calls the format function and ends up printing something like you show above. You will need other methods for working with those columns in whatever way you plan to work with this data.

Respondido el 09 de Septiembre de 13 a las 23:09

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.