Hacer coincidir una columna de un marco de datos con las columnas de otro marco de datos y, si coinciden, agregar una nueva columna

I got two big data frames, one (df1) has this structure

    V1    V2    V3
1  Chr1  7507 10944
2  Chr1 10944 13170
3  Chr1 13170 20065
4  Chr1 20065 28273
5  Chr1 28273 29960
6  Chr1 29960 36599
7  Chr1 36599 37513
8  Chr1 37513 40360
9  Chr1 40360 48796
10 Chr1 48796 50661

The other (df2) has this

     V1    V2    V3 V4  V5
1  Chr1  7507  7507  1   1
2  Chr1 10944 10944  1   2
3  Chr1 13170 13170  1  22
4  Chr1 20065 20065  1   3
5  Chr1 28273 28273  1 161
6  Chr1 29960 29960  1  10
7  Chr1 36599 36599  1 604
8  Chr1 37513 37513  1 117
9  Chr1 40360 40360  1   8
10 Chr1 48796 48796  1   3

what I'm trying to do is to check if the column V2 or V3 (is the same) of df2 is = or between the range of V2 and V3 of df1 then I want to write the value of V5 of df2 in a new column in df1 if not write 0. the result that i want would be like :

Chr1    7507    10944   1
Chr1    10944   13170   2   
Chr1    13170   20065   22  
Chr1    20065   28273   3   
Chr1    28273   29960   161 
Chr1    29960   36599   10  
Chr1    36599   37513   604 
Chr1    37513   40360   117 
Chr1    40360   48796   8
.
.
.

Do you know any good way to do this? Thank you very much.

preguntado el 28 de mayo de 14 a las 14:05

Creo que necesitas usar merge. You can check the example in R help files (?merge) -

In your example, all V2 and V3 values of df2 have an exact match in V2 of df1. If this applies to your whole data, then a relatively simple merge is appropriate as suggested by @dickoa. If your actual data is different (so that you would need to check ranges), it would be better if you could also edit your sample data -

1 Respuestas

As @beginneR already mentioned in the comments, all V2 y V3 valores de df2 have an exact match with V2 of df1. If I interpret your question correctly, this is probably not what you wanted. The following example is what I yhink you are looking for.

Reading the two dataframes:

df1 <- read.table(header=TRUE, text="rn    V1    V2    V3
1  Chr1  7507 10944
2  Chr1 10944 13170
3  Chr1 13170 20065
4  Chr1 20065 28273
5  Chr1 28273 29960
6  Chr1 29960 36599
7  Chr1 36599 37513
8  Chr1 37513 40360
9  Chr1 40360 48796
10 Chr1 48796 50661")

df2 <- read.table(header=TRUE, text="rn     V1    V2    V3 V4  V5
1  Chr1  7507  7507  1   1
2  Chr1 10944 10944  1   2
3  Chr1 13170 13170  1  22
4  Chr1 20065 20065  1   3
5  Chr1 28273 28273  1 161
6  Chr1 29960 29960  1  10
7  Chr1 36599 36599  1 604
8  Chr1 37513 37513  1 117
9  Chr1 40360 40360  1   8
10 Chr1 48796 48796  1   3")

Getting rid of V3 in df2 as it is exactly the same as V2:

df2 <- df2[,-4]

Making the values in V2 of df2 higher

df2$V2 <- df2$V2 + 2000

Con la ifelse function you can assign the values of V5 to a new variable in df1 when the meet the requirements:

df1$V4 <- ifelse(df2$V2 >= df1$V2 & df2$V2 <= df1$V3, df2$V5, 0)

contestado el 28 de mayo de 14 a las 19:05

Thank you very much for your answer Jaap and @beginneR!! based on this i had to modify some elements and i did it! We have to take care some more parameters! so Now the code is like upto <- dim(df1)[1] for (i in 1:upto) { result <- df2$V1 == df1$V1[i] & df1$V2[i] <= df2$V2 & df1$V3[i] > df2$V2 sum <- sum(df2$V5[result]) df1$V4[i] <- sum } - user3683485

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.