Reemplazo de valores en R

I am working on a large dataset, an example of which is shown below:

Df1 <- data.frame(ID = c(1:7),                    
              home_pc = c("VB2 4RF","CB4 2DT", "NE5 7TH", "BY5 8IB", "DH4 6PB","MP9 7GH","KN4 5GH"),
              start_pc = c(NA,"Home", "FC5 7YH","Home", "CB3 5TH", "BV6 5PB",NA),
              end_pc = c(NA,"CB5 4FG","Home","Home","Home","GH6 8HG",NA))

Quiero hacer dos cosas:

  1. Firstly, delete rows which have an NA in the columns "start_pc" and "end_pc".
  2. When "Home" is written in either the "start_pc" or "end_pc" columns, I want to be able to replace this with the postcode in "home_pc".

How is best to tackle this problem - could anyone give me any ideas how best to do this?

Muchas gracias.

preguntado el 02 de febrero de 12 a las 10:02

Are you NA's really character strings "NA" o real NA ¿valores? -

I don't realy know I am afraid - the data was imported from an SPSS file which we got sent. I have however selected values which have these NA is by using "is.na" if that helps. -

It sounds like they are NA values then. I will edit your example data to reflect this. -

1 Respuestas

okay here's one starting point - others will surely give you more elaborate answers.

Nombre, getting rid of NA values:

  Df1 <-  na.omit(Df1)

this will do the job for all columns in the data.frame objeto

Segundo, replacing the start and end columns. try the ifelse() function which is vectorised:

Df1 <- within(Df1, 
{
  start_pc <- ifelse(start_pc == 'Home', home_pc, start_pc)
  end_pc <- ifelse(end_pc == 'Home', home_pc, end_pc)
})

hope i understood your question correctly! Some additional comments: if you want to prove if something is NA (e.g. within the ifelse() function) use is.na() the opposite is !is.na(). You may also build subsets of the dataframe with this: subset(Df1, !is.na(home_pc)) should work for example. Of course check out the help file for all these functions if you need some more hints: ?ifelse or ?subset etc.

Respondido 02 Feb 12, 15:02

And if I wanted to just delete the rows that had NA in two columns and not all my dataframe (which is about 200 columns) - how would I do this? - KT_1

Df1 <- Df1[!apply(Df1[, c('start_pc', 'end_pc')], 1, function(x) any(is.na(x))), ] should work for considering only NAs in the columns of interest. Alternatively, if end_pc will Siempre hay be NA when start_pc is NA, then Df1 <- Df1[-which(is.na(Df1$start_pc)), ] will also be fine. - jbaums

The way the data frame is created in the original post, "NA" is a factor. In this case, Df1 <- Df1[ as.character( Df1$start_pc ) != "NA" & as.character( Df1$end_pc ) != "NA", ] would do the row remove job. It may be a good idea to post a relevant part of the original data using dput(). - vaettchen

@KatieT You can use complete.cases: Df1[complete.cases(Df1[3:4]),] - James

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.