Generate word correlation network to analyze given data with filters for all highly correlated words excluding stop words.
data | the sentiment140 train or test data containing variables |
---|---|
user_list | a vector of users for which to filter the dataset |
start_date_time | input start_date_time in POSIXct format on which to filter the dataset |
end_date_time | input end_date_time in POSIXct format on which to filter the dataset |
keyword_list | a list of string keywords on which to filter the dataset |
correlation_threshold | threshold beyond which to plot the network |
a list object with raw
filtered dataframe, word_cors
aggregated dataframe that holds the correlated words and a plot
representing the network.
#> $raw #> # A tibble: 354 x 14 #> polarity id date query user text nouns adjectives #> <chr> <int> <dttm> <chr> <chr> <chr> <int> <int> #> 1 Positive 217 2009-05-25 17:29:39 mcdo… Mami… mgg … 7 2 #> 2 Positive 2140 2009-05-20 02:38:17 nike Chet… ew n… 4 2 #> 3 Negative 224 2009-05-25 17:34:51 chen… QCWo… ife?… 9 3 #> 4 Positive 569 2009-06-07 21:38:16 kind… rach… @lon… 8 2 #> 5 Positive 2546 2009-06-08 00:13:48 kind… k8tb… " lo… 7 1 #> 6 Positive 1019 2009-05-11 05:21:25 lebr… unde… atch… 3 1 #> 7 Negative 2110 2009-05-18 01:14:35 Malc… blin… @por… 7 3 #> 8 Positive 256 2009-05-27 23:59:18 goog… maex… " am… 3 1 #> 9 Negative 413 2009-06-02 03:17:04 time… Jaso… " ha… 11 4 #> 10 Positive 1003 2009-05-11 03:18:59 kind… Happ… y Ki… 1 0 #> # … with 344 more rows, and 6 more variables: prepositions <int>, #> # articles <int>, pronouns <int>, verbs <int>, adverbs <int>, #> # interjections <int> #> #> $word_cors #> # A tibble: 5,256 x 3 #> item1 item2 correlation #> <chr> <chr> <dbl> #> 1 flay bobby 1 #> 2 bobby flay 1 #> 3 korea north 0.911 #> 4 north korea 0.911 #> 5 star trek 0.893 #> 6 trek star 0.893 #> 7 api twitter 0.879 #> 8 twitter api 0.879 #> 9 gladwell malcolm 0.820 #> 10 malcolm gladwell 0.820 #> # … with 5,246 more rows #> #> $plot#>