Geographical participation in the #30DayMapChallenge
I am curious about the geographical participation in the #30DayMapChallenge, so I thought it may be a good opportunity to learn the R-package rtweet and see if I can map from where people are twitting.
First, I installed rtweet
:
if (!require("rtweet")) install.packages("rtweet")
library(rtweet)
I also needed ggmap
for geocoding users (I registered with Google and set up the API key; in the following link you can see how to do that: link):
if (!require("ggmap")) install.packages("ggmap")
library(ggmap)
Finally, I used the following libraries:
library(tidyverse)
library(sf)
library(glue)
library(rnaturalearth)
library(emo)
library(knitr)
Once I loaded the required libraries, I did:
- Get all tweets with the hashtag #30DayMapChallenge:
Note that rtweets
only returns tweets from the past 6-9 days (I didn’t know that before so I won’t be able to analyse all tweets 🙍). Therefore, I have saved (loaded) the downloaded data for getting reproducible results and you would need to uncomment the lines for for downloading new data.
# NOT RUN
# rt_download <- search_tweets("#30DayMapChallenge",
# n = 18000,
# include_rts = FALSE)
#
# saveRDS(rt_download, "rt_download_2021_11_12.rds")
rt_download <- readRDS("rt_download_2021_11_12.rds")
date_min <- format(min(rt_download$created_at), "%y-%m-%d")
date_max <- format(max(rt_download$created_at), "%y-%m-%d")
- Get users:
Unfortunately my only tweet so far was on Day2 [lines] and thus I am not in this dataset; let’s see if I can identify me next time 😄).
# Total users
user_info <- lookup_users(unique(rt_download$user_id))
# users with a known location
user_loc <- lookup_users(unique(rt_download$user_id)) %>%
filter(location != "")
- Geocode users:
I did that with the function ggmap::geocode
.
# NOT RUN
# # Get coordinates
# coded <- user_loc$location %>%
# ggmap::geocode()
#
# # Add coordinates to users
# user_geocode <- user_loc %>%
# mutate(lon = coded$lon,
# lat = coded$lat) %>%
# select(user_id, screen_name, name, location, lon, lat)
#
# saveRDS(user_geocode, "user_geocode_2021-11-12.rds")
user_geocode <- readRDS("user_geocode_2021-11-12.rds")
# Remove NAs
user_geocode_na <- user_geocode %>%
drop_na()
- Convert the resulting dataset to a sf object:
user_sf <- user_geocode_na %>%
st_as_sf(coords = c("lon", "lat")) %>%
st_set_crs(4326)
# saveRDS(user_sf, "user_sf_2021_11_12.rds") #Just if I need it later
- Load the world map (with the
rnaturalearth
package):
world <- ne_countries(scale = "medium", returnclass = "sf") %>%
st_transform(crs = 4326 )
- Plot users on the world map:
p <- ggplot() +
geom_sf(data = world, fill= "antiquewhite", color = "grey", size = 0.05) +
geom_sf(data = user_sf, size = 0.2, col = "red") +
labs(title = "From where are we twitting #30DayMapChallenge?",
subtitle = glue("{nrow(user_info)} users have posted {nrow(rt_download)} tweets from {date_min} to {date_max}.
However, I only was able to identify {nrow(user_sf)} users (red points)")) +
theme_bw() +
theme(panel.grid.major = element_line(color = gray(.5),
linetype = "dashed",
size = 0.5),
panel.background = element_rect(fill = "aliceblue"))
ggsave(plot = p, filename = "tweets_30DayMapChallenge_03_12_Nov.png")
include_graphics("tweets_30DayMapChallenge_03_12_Nov.png")