twitter - spliting hashtags in a data.frame object with R -
i collecting twitter's hashtags. each tweet can include hashtags.
tests <- c("xxxxxx #savethedate xxxxxx #histoire] xxxxxx #femmes xxxxxxx #ports", "xxxxxxxxxxxx", "xxxx #rock xxxxxx #nantes" , "xxxxxx #lvan xxxxxxx #nantes xxxxx #ilsepassetoujoursuntruc") library (stringr) hashtags <- str_extract_all(tests, "#\\s+") str (hashtags)
ma results:
str(hashtags) list of 4 $ : chr [1:4] "#savethedate" "#histoire]" "#femmes" "#ports" $ : chr(0) $ : chr [1:2] "#rock" "#nantes" $ : chr [1:3] "#lvan" "#nantes" "#ilsepassetoujoursuntruc"
what expect: data.frame 1 hashtag row
"#savethedate" "#histoire" "#femmes" "#ports" na ....
what tried:
hashtags_df <-as.data.frame(hashtags)
hashtags[!lengths(hashtags)] <- na
this replace length 0 lists nas. (better solution via dirty sock sniffer)
hashtags <- unlist(hashtags)
will give column vector of values. if you'd dataframe, can use as.data.frame now.
hashtags_df <- as.data.frame(hashtags)
i don't know best way extract hashtags, etc., should answer question asked.
Comments
Post a Comment