Part 2 (final)- How to create hexbin choropleth map to visualize data?
Static and animated map to visualize COVID-19 data for Kenya
In the first part of this blog we read Kenya’s shapefiles from GADM, and county name abbreviations which will be shown on the the chart instead of full names to avoid clutter on the map. Then, we used “geogrid” package of R to convert the map polygons i.e. the polygons of Kenya’s counties into hexagons.
Calculating the centroid of each hexagon
After creating the polygon hexagons, we calculate the centroid of each hexagon. We will use the centroid values to position the county abbreviation names on the chart. To calculate the centroid of hexagons we will use rgeos
library which is used for topology operations on geometries.rgeos
has a handy gCentroid()
function which will provide us the centroid of each hexagon of each county polygon for Kenya.
#calculating the centroid of hexgons
library(rgeos)
centers <- data.frame(gCentroid(resultreg, byid = TRUE), id=resultreg@data$ABV)
#Following table shows the X, Y coordinates of centroid for each polygon
head(centers)
## x y id
## ID1 38.17338 -3.548623 TT
## ID2 39.21422 -3.548623 KW
## ID3 37.65295 -2.647223 KJ
## ID4 38.69380 -2.647223 MK
## ID5 39.73465 -2.647223 KF
## ID6 36.09168 -1.745822 NR
Tidying the shapefile
You might recall that we used resultreg
dataframe which had original shapefile data and was assigned to new position based on the position of hexagons that we created. However, the data is still not in a tidy format i.e. this data cannot be used in ggplot to draw charts. Therefore, we use tidy
function from broom
package to create a new tidied data frame.
Notice that we use an attribute region
in the function. This attribute makes sure that while tidying the shapefile data, the region i.e. county names are not removed.
library(broom)
#tidy() function of broom package returns the statistical findings of the model (such as coefficients)
#by specifying the region attribute we keep the region name
# tidy() function turns the data frame into a format which can be used in ggplot.
resultreg_fort <- tidy(resultreg, region = "NAME_1")
Below are the top five rows of the tidied shapefile data.
head(resultreg_fort)
## # A tibble: 6 x 7
## long lat order hole piece group id
## <dbl> <dbl> <int> <lgl> <fct> <fct> <chr>
## 1 38.2 3.06 1 FALSE 1 Baringo.1 Baringo
## 2 38.7 3.36 2 FALSE 1 Baringo.1 Baringo
## 3 39.2 3.06 3 FALSE 1 Baringo.1 Baringo
## 4 39.2 2.46 4 FALSE 1 Baringo.1 Baringo
## 5 38.7 2.16 5 FALSE 1 Baringo.1 Baringo
## 6 38.2 2.46 6 FALSE 1 Baringo.1 Baringo
Reading COVID-19 confirmed daily cumulative cases of Kenya and adding it to our tidy shapefile data
Note: This is not the latest data of confirmed daily cases. But, one can easily update this chart with latest data by using the methodology outlined in this post. I will also like to thank my colleagues Jiaqi Zhang and Kaci Kennedy McDade for helping me obtain the COVID-19 confirmed daily cases data for Kenya.
Now, we read the COIVID-19 confirmed cumulative cases and name it as df
.
##Reading the COVID-19 daily cases dataset
df <- read_excel(here("./content/post/2021-03-07-hexbin/Kenya.xlsx"))
#converting into `Date` column in the dataframe into date format
df$Date <- as.Date((df$Date))
#printing the top five rows
head(df)
## # A tibble: 6 x 3
## Date Region Confirmed
## <date> <chr> <dbl>
## 1 2020-03-12 Nairobi 1
## 2 2020-03-13 Nairobi 1
## 3 2020-03-14 Nairobi 1
## 4 2020-03-15 Nairobi 3
## 5 2020-03-16 Nairobi 3
## 6 2020-03-17 Nairobi 4
We will be using the names of counties to join the daily cases dataframe with the shapefile dataframe. However, to accomplish this task we need the names of counties in both the dataframe to be same. Therefore, first we check whether there is any mismatch in the county names between the two dataframes.
#compare the region name column in `resultreg_fort` dataframe to region column of `df` to see which rows match.
resultreg_fort$id %in% df$Region
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [13] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [25] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [37] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [49] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [73] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [85] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [97] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [109] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [121] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [133] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [145] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [157] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [169] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [181] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [193] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [205] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [217] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [229] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [241] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [253] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [265] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [277] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [289] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [301] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [313] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [325] TRUE TRUE TRUE TRUE TRUE
From the above result we see some FALSE
observations. These are the county names in the confirmed daily cases dataframe (df
) which do not match with the county names in the shapefile dataframe. However, manually checking the names that do not match will be really cumbersome. Therefore, we use the following code to obtain the names of counties that do not match in the two dataframes.
#find which names do not match
resultreg_fort$id[!resultreg_fort$id %in% df$Region]
## [1] "Elgeyo-Marakwet" "Elgeyo-Marakwet" "Elgeyo-Marakwet" "Elgeyo-Marakwet"
## [5] "Elgeyo-Marakwet" "Elgeyo-Marakwet" "Elgeyo-Marakwet" "Tharaka-Nithi"
## [9] "Tharaka-Nithi" "Tharaka-Nithi" "Tharaka-Nithi" "Tharaka-Nithi"
## [13] "Tharaka-Nithi" "Tharaka-Nithi"
We see here that two counties–“Elgeyo-Marakwet” and “Tharaka-Nithi” from df
do not match with the names of counties from resultreg_fort
. We will change the names of these counties in the df
dataframe to match with the names in resultreg_fort
dataframe.
#Rename the states in the df to match the names in the shape file
df$Region<- gsub("Elgeyo Marakwet","Elgeyo-Marakwet", df$Region)
df$Region<- gsub("Tharaka Nithi","Tharaka-Nithi", df$Region)
Finally, we attach the daily COVID-19 confirmed cases data with the shapefile data.
#Joining the COVID-19 data with the shapefile data
resultreg_covid<- left_join(resultreg_fort, df, by=c("id"="Region"))
#looking at the top two rows of the new shapefile dataframe
head(resultreg_covid,2)
## # A tibble: 2 x 9
## long lat order hole piece group id Date Confirmed
## <dbl> <dbl> <int> <lgl> <fct> <fct> <chr> <date> <dbl>
## 1 38.2 3.06 1 FALSE 1 Baringo.1 Baringo 2020-07-28 0
## 2 38.2 3.06 1 FALSE 1 Baringo.1 Baringo 2020-07-29 0
Creating the hexbin choropleth static map
One last step before we create the hexbin choropleth map is to add the bins to the shapefile data which will store different values of COVID-19 cases. These bins will be used to create the legend for choropleth. A user can decide the number of bins they want to create based on the data range. Here we create a total of eleven bins.
#Create bins for the legends
resultreg_covid <- resultreg_covid %>%
mutate(bin=case_when(
Confirmed<=50 ~ "<50",
Confirmed>50 & Confirmed<=500 ~ "50-500",
Confirmed>500 & Confirmed<=1000 ~ "500-1000",
Confirmed>1000 & Confirmed<=2000 ~ "1000-1500",
Confirmed>2000 & Confirmed<=3000 ~ "1500-2000",
Confirmed>3000 & Confirmed<=4000 ~ "2000-2500",
Confirmed>4000 & Confirmed<=5000 ~ "2500-3000",
Confirmed>5000 & Confirmed<=6000 ~ "3000-8000",
Confirmed>6000 & Confirmed<=7000 ~ "8000-12000",
Confirmed>7000 & Confirmed<=8000 ~ "12000-17000",
Confirmed>8000 ~ "17000+"
))
#Getting the levels for each bin
resultreg_covid$bin <- factor(resultreg_covid$bin, levels = c("<50","50-500", "500-1000", "1000-1500","1500-2000",
"2000-2500","2500-3000", "3000-8000",
"8000-12000","12000-17000", "17000+"))
#checking the number of unique levels for each bin
unique(resultreg_covid$bin)
## [1] <50 50-500 500-1000 1000-1500 1500-2000 2000-2500
## [7] 2500-3000 3000-8000 8000-12000 12000-17000 17000+
## 11 Levels: <50 50-500 500-1000 1000-1500 1500-2000 2000-2500 ... 17000+
Finaly, we are done with all the manipulating the shapefile data, data wrangling, and adding data to the shapefile dataframe. Now, comes the moment of truth where we will create the hexbin choropleth map using the shapefile data that we created.
We use ggplot to develop the hexbin choropleth map. We will use geom_polygon
function of ggplot which can read resultreg_covid
shapefile data. Also, create the choropleth for the latest data which is provided in the Date
attribute of geom_polygon
. We provide the longitude and latitude values in the aesthetic attribute of geom_polygon
. We then use geom_text
function of ggplot to add the county abbreviation names at the centroid of each hexagon polygons. The remaining ggplot arguments are simply adding themes, and working on beautifying the chart by manipulating caption, title, legend and colors. Note that the bins which do not have any data on the latest date are not shown in the legend on the map.
library(ggplot2)
#Creating static hexagon choropleth map with COVID-19 data
ggplot() +
geom_polygon(data = filter(resultreg_covid, Date==max(Date)), aes(fill = bin, x = long, y = lat, group = id) , size=0, alpha=0.9) +
geom_text(data=centers, aes(x=x, y=y, label=id, fontface="bold"), color="white", size=6, alpha=0.6) +
theme_void() +
theme(
legend.position = "right",
text = element_text(color = "#22211d"),
plot.background = element_rect(fill = "#f5f5f2", color = NA),
panel.background = element_rect(fill = "#f5f5f2", color = NA),
legend.background = element_rect(fill = "#f5f5f2", color = NA),
legend.text = element_text(size = 15),
plot.title = element_text(size= 22, hjust=0.7, face = "bold", color = "#4e4d47", margin = margin(b = -0.1, t = 0.4, l = 2, unit = "cm")),
plot.subtitle = element_text(size = 15, face = "bold"),
plot.caption = element_text(size = 12)
) +
scale_fill_viridis_d()+
labs(title = "Kenya's map of COVID-19 cases, county by county",
subtitle = paste0("Date:", max(resultreg_covid$Date)),
caption = "Data Source: Ministry of Health
Created by: Siddharth Dixit, Jiaqi Zhang
& Kaci Kennedy McDade
Baringo (BA), Bomet (BO), Bungoma (BN), Busia (BS), Elgeyo-Marakwet (EM),
Embu (EB), Garissa (GA), Homa Bay (HB), Isiolo (IS), Kajiado (KJ), Kakamega (KK),
Kericho (KR), Kiambu (KB), Kilifi (KF), Kirinyaga (KY), Kisii (KI), Kisumu (KU),
Kitui (KT), Kwale (KW), Laikipia (LK), Lamu (LM), Machakos (MC), Makueni (MK),
Mandera (MD), Marsabit (MB), Meru (ME), Migori (MG), Mombasa (MM), Murang'a (MU),
Nairobi (NB), Nakuru (NK), Nandi (ND), Narok (NR), Nyamira (NM), Nyandarua (NN),
Nyeri (NI), Samburu (SA), Siaya (SI), Taita Taveta (TT), Tana River (TR),
Tharaka-Nithi (NT), Trans Nzoia (TN), Turkana (TU), Uasin Gishu (UG), Vihiga (VI),
Wajir (WJ), West Pokot (WP)",
fill="")
Creating the hexbin choropleth animation map
Now, we will create the animation map of COVID-19 confirmed daily cases in Kenya. We will be using gganimate
package to create the animation map. We simply add three new arguments from gganimate
package to out static ggplot code. The transition_time() argument defines how the data relates to itself across time, enter_fade() and exit_fade() defines how new data should appear and how old data should disappear during the course of the animation. This will create a gif which can then be saved on your hard drive using anim_save() function.
library(gganimate)
#Creating animated hexagon choropleth map with COVID-19 data
p <- ggplot() +
geom_polygon(data = resultreg_covid, aes(fill = bin, x = long, y = lat, group = id) , size=0, alpha=0.9) +
geom_text(data=centers, aes(x=x, y=y, label=id), color="white", size=6, alpha=0.6) +
theme_void() +
theme(
legend.position = "right",
text = element_text(color = "#22211d"),
plot.background = element_rect(fill = "#f5f5f2", color = NA),
panel.background = element_rect(fill = "#f5f5f2", color = NA),
legend.background = element_rect(fill = "#f5f5f2", color = NA),
legend.text = element_text(size = 15),
plot.title = element_text(size= 22, hjust=0.8,face = "bold", color = "#4e4d47", margin = margin(b = -0.1, t = 0.4, l = 2, unit = "cm")),
plot.caption = element_text(hjust = 1, size = 12),
plot.subtitle = element_text(size = 15, face = "bold"),
) +
scale_fill_viridis_d()+
labs(title = "Kenya's map of COVID-19 cases, county by county",
subtitle = paste0("Date:{frame_time}"),
caption = "Data Source: Ministry of Health
Created by: Siddharth Dixit, Jiaqi Zhang
& Kaci Kennedy McDade
Baringo (BA), Bomet (BO), Bungoma (BN), Busia (BS), Elgeyo-Marakwet (EM),
Embu (EB), Garissa (GA), Homa Bay (HB), Isiolo (IS), Kajiado (KJ), Kakamega (KK),
Kericho (KR), Kiambu (KB), Kilifi (KF), Kirinyaga (KY), Kisii (KI), Kisumu (KU),
Kitui (KT), Kwale (KW), Laikipia (LK), Lamu (LM), Machakos (MC), Makueni (MK),
Mandera (MD), Marsabit (MB), Meru (ME), Migori (MG), Mombasa (MM), Murang'a (MU),
Nairobi (NB), Nakuru (NK), Nandi (ND), Narok (NR), Nyamira (NM), Nyandarua (NN),
Nyeri (NI), Samburu (SA), Siaya (SI), Taita Taveta (TT), Tana River (TR),
Tharaka-Nithi (NT), Trans Nzoia (TN), Turkana (TU), Uasin Gishu (UG), Vihiga (VI),
Wajir (WJ), West Pokot (WP)",
fill="")+
transition_time(Date)+
enter_fade()+
exit_fade()
animate(plot = p, fps=4, width = 700, height = 800, end_pause = 24)
We have now created a static and an animated hexbin chropleth map of COVID-19 cases for Kenya. One can use the same approach to create any hexbin map using a different dataset. I hope you liked the post. Let me know if you have any advice or comment on my approach to creating the hexbin choropleth map.