r - Duplicate columns in dyadic-year data -


as title indicates have dyadic-year data. problem have (for reason...) duplicated dyadic column names – example, shown below, a , b b observations make no sense. real data on 70.000 observations.

what want generate dummy variable indicate same-same dyadic observations.

person1     person2      year                            1990                           1991                           1992                  b          1990                  b          1991                  b          1992                 c          1990                 c          1991                 c          1992        b           b          1990        b           b          1991        b           b          1992        ... 

the function duplicated() doesn't help, other basic r commands since it's dyadic data.

here's reproducible example

structure(list(person1 = structure(c(1l, 1l, 1l, 1l, 1l, 1l,  1l, 1l, 1l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 3l, 3l, 3l, 3l,  3l, 3l, 3l, 3l, 3l), .label = c("a", "b", "g"), class = "factor"),      person2 = structure(c(1l, 1l, 1l, 2l, 2l, 2l, 3l, 3l, 3l,      1l, 1l, 1l, 2l, 2l, 2l, 3l, 3l, 3l, 1l, 1l, 1l, 2l, 2l, 2l,      3l, 3l, 3l), .label = c("a", "b", "c"), class = "factor"),      year = c(1990l, 1991l, 1992l, 1990l, 1991l, 1992l, 1990l,      1991l, 1992l, 1990l, 1991l, 1992l, 1990l, 1991l, 1992l, 1990l,      1991l, 1992l, 1990l, 1991l, 1992l, 1990l, 1991l, 1992l, 1990l,      1991l, 1992l)), .names = c("person1", "person2", "year"), class = "data.frame", row.names = c(na,  -27l)) 

the desired output (duplicate dummy)

person1 person2 year    duplicate             1990    1             1991    1             1992    1          b    1990    0          b    1991    0          b    1992    0          c    1990    0          c    1991    0          c    1992    0 b             1990    0 b             1991    0 b             1992    0 b          b    1990    1 b          b    1991    1 b          b    1992    1 

we can comparing 'person1' 'person2'

setdt(df1)[, duplicate := as.integer(as.character(person1) == as.character(person2))]  head(df1, 15) #    person1 person2 year duplicate # 1:             1990         1 # 2:             1991         1 # 3:             1992         1 # 4:             b 1990         0 # 5:             b 1991         0 # 6:             b 1992         0 # 7:             c 1990         0 # 8:             c 1991         0 # 9:             c 1992         0 #10:       b       1990         0 #11:       b       1991         0 #12:       b       1992         0 #13:       b       b 1990         1 #14:       b       b 1991         1 #15:       b       b 1992         1 

or using base r

transform(df1, duplicate = as.integer(as.character(person1)== as.character(person2))) 

Comments

Popular posts from this blog

serialization - Convert Any type in scala to Array[Byte] and back -

matplotlib support failed in PyCharm on OSX -

python - Matplotlib: TypeError: 'AxesSubplot' object is not callable -