r - Duplicate columns in dyadic-year data -
as title indicates have dyadic-year data. problem have (for reason...) duplicated dyadic column names – example, shown below, a , b b observations make no sense. real data on 70.000 observations.
what want generate dummy variable indicate same-same dyadic observations.
person1 person2 year 1990 1991 1992 b 1990 b 1991 b 1992 c 1990 c 1991 c 1992 b b 1990 b b 1991 b b 1992 ...
the function duplicated()
doesn't help, other basic r commands since it's dyadic data.
here's reproducible example
structure(list(person1 = structure(c(1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l), .label = c("a", "b", "g"), class = "factor"), person2 = structure(c(1l, 1l, 1l, 2l, 2l, 2l, 3l, 3l, 3l, 1l, 1l, 1l, 2l, 2l, 2l, 3l, 3l, 3l, 1l, 1l, 1l, 2l, 2l, 2l, 3l, 3l, 3l), .label = c("a", "b", "c"), class = "factor"), year = c(1990l, 1991l, 1992l, 1990l, 1991l, 1992l, 1990l, 1991l, 1992l, 1990l, 1991l, 1992l, 1990l, 1991l, 1992l, 1990l, 1991l, 1992l, 1990l, 1991l, 1992l, 1990l, 1991l, 1992l, 1990l, 1991l, 1992l)), .names = c("person1", "person2", "year"), class = "data.frame", row.names = c(na, -27l))
the desired output (duplicate dummy)
person1 person2 year duplicate 1990 1 1991 1 1992 1 b 1990 0 b 1991 0 b 1992 0 c 1990 0 c 1991 0 c 1992 0 b 1990 0 b 1991 0 b 1992 0 b b 1990 1 b b 1991 1 b b 1992 1
we can comparing 'person1' 'person2'
setdt(df1)[, duplicate := as.integer(as.character(person1) == as.character(person2))] head(df1, 15) # person1 person2 year duplicate # 1: 1990 1 # 2: 1991 1 # 3: 1992 1 # 4: b 1990 0 # 5: b 1991 0 # 6: b 1992 0 # 7: c 1990 0 # 8: c 1991 0 # 9: c 1992 0 #10: b 1990 0 #11: b 1991 0 #12: b 1992 0 #13: b b 1990 1 #14: b b 1991 1 #15: b b 1992 1
or using base r
transform(df1, duplicate = as.integer(as.character(person1)== as.character(person2)))
Comments
Post a Comment