Generate bigram network to analyze given data with filters for all most common bigrams excluding stop words.
data | the sentiment140 train or test data containing variables |
---|---|
user_list | a vector of users for which to filter the dataset |
start_date_time | input start_date_time in POSIXct format on which to filter the dataset |
end_date_time | input end_date_time in POSIXct format on which to filter the dataset |
keyword_list | a list of string keywords on which to filter the dataset |
counts_quantile | the quantile beyond which to visualize the bigrams |
a list object with raw
filtered dataframe, bigram_counts
aggregated dataframe that holds the frequency counts of bigrams and a plot
representing the network.
bigram_network()#> $raw #> # A tibble: 354 x 14 #> polarity id date query user text nouns adjectives #> <chr> <int> <dttm> <chr> <chr> <chr> <int> <int> #> 1 Positive 217 2009-05-25 17:29:39 mcdo… Mami… mgg … 7 2 #> 2 Positive 2140 2009-05-20 02:38:17 nike Chet… ew n… 4 2 #> 3 Negative 224 2009-05-25 17:34:51 chen… QCWo… ife?… 9 3 #> 4 Positive 569 2009-06-07 21:38:16 kind… rach… @lon… 8 2 #> 5 Positive 2546 2009-06-08 00:13:48 kind… k8tb… " lo… 7 1 #> 6 Positive 1019 2009-05-11 05:21:25 lebr… unde… atch… 3 1 #> 7 Negative 2110 2009-05-18 01:14:35 Malc… blin… @por… 7 3 #> 8 Positive 256 2009-05-27 23:59:18 goog… maex… " am… 3 1 #> 9 Negative 413 2009-06-02 03:17:04 time… Jaso… " ha… 11 4 #> 10 Positive 1003 2009-05-11 03:18:59 kind… Happ… y Ki… 1 0 #> # … with 344 more rows, and 6 more variables: prepositions <int>, #> # articles <int>, pronouns <int>, verbs <int>, adverbs <int>, #> # interjections <int> #> #> $bigram_counts #> # A tibble: 982 x 3 #> word1 word2 counts #> <chr> <chr> <int> #> 1 time warner 25 #> 2 http bit.ly 24 #> 3 ime warner 7 #> 4 malcolm gladwell 7 #> 5 bobby flay 6 #> 6 http tinyurl.com 6 #> 7 twitter api 6 #> 8 warner cable 6 #> 9 love love 5 #> 10 museum 2 5 #> # … with 972 more rows #> #> $plot#>