Take-Home-Exercise_04

Published

February 26, 2023

In this data exploration, we will look into the import and export data sets. The data set has been sourced by Department of Statistics, Singapore (DOS) under Merchandise Trade by Region/Market.

Import Data

To start, necessary libraries will be loaded, as well as data sets. Import and export data sets will be separately loaded to the work space.

Show the code

#Unpivot the data
pacman::p_load(plotly, DT, patchwork, ggstatsplot, tidyverse)
library(tidyr)
library(dplyr)
library(readr)

Show the code

imports <- read_csv("data/imports.csv")

Rows: 119 Columns: 566
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (457): Data Series, 2013 Dec, 2013 Nov, 2013 Oct, 2013 Sep, 2013 Aug, 20...
num (109): 2023 Jan, 2022 Dec, 2022 Nov, 2022 Oct, 2022 Sep, 2022 Aug, 2022 ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Show the code

exports <- read_csv("data/exports.csv")

Rows: 91 Columns: 566
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (409): Data Series, 2009 Dec, 2009 Nov, 2009 Oct, 2009 Sep, 2009 Aug, 20...
num (157): 2023 Jan, 2022 Dec, 2022 Nov, 2022 Oct, 2022 Sep, 2022 Aug, 2022 ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data Wrangling

The data is summarized in table format, however this format is not easy to run data analysis libraries. Therefore, the data wrangling will start from transposing the dataset. Here, I used gather() to transpose data by Data Series column, with Month and Amount.

Show the code

imports_trans <- gather(imports, "Month", "Import_Amt", -`Data Series`)
exports_trans <- gather(exports, "Month", "Export_Amt", -`Data Series`)

head(imports_trans)

# A tibble: 6 × 3
  `Data Series`                                Month    Import_Amt
  <chr>                                        <chr>    <chr>     
1 Total Merchandise Imports (Thousand Dollars) 2023 Jan 44393664  
2 America (Million Dollars)                    2023 Jan 6267.4    
3 Asia (Million Dollars)                       2023 Jan 30174     
4 Europe (Million Dollars)                     2023 Jan 6427.8    
5 Oceania (Million Dollars)                    2023 Jan 983.3     
6 Africa (Million Dollars)                     2023 Jan 541.1

Show the code

head(exports_trans)

# A tibble: 6 × 3
  `Data Series`                                Month    Export_Amt
  <chr>                                        <chr>    <chr>     
1 Total Merchandise Exports (Thousand Dollars) 2023 Jan 49575205  
2 America (Million Dollars)                    2023 Jan 5823.5    
3 Asia (Million Dollars)                       2023 Jan 34646.8   
4 Europe (Million Dollars)                     2023 Jan 5384      
5 Oceania (Million Dollars)                    2023 Jan 2844.5    
6 Africa (Million Dollars)                     2023 Jan 876.3

For this exploration, data analysis will only focus on the years from 2020 to 2022. In our data set, the import and export columns are in string. Therefore, the column values need to be converted to numeric. Second, for easier time series plotting, Month columns will be converted to Date variables using as.Date().

When you look into the Data Series, you can find that Total Merchandise Exports row is unnecessary as they are summary of the rest of countries. Also, each Data Series has “Thousand Dollars” or “Million Dollars” repetitive. For easier data analysis, we will remove them from Data Series value and leave only continent or country names.

Last but not least, as we are going to focus on recent three years, use filter() to remove unnecessary year rows.

Show the code

# Convert from string to date value
imports_mth <- imports_trans %>%
  mutate(Month = as.Date(paste0(Month, " 01"), format = "%Y %b %d"), Import_Amt = as.numeric(Import_Amt))

Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

Show the code

exports_mth <- exports_trans %>%
  mutate(Month = as.Date(paste0(Month, " 01"), format = "%Y %b %d"), Export_Amt = as.numeric(Export_Amt))

Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

Show the code

# Remove unnecessary rows
imports_recent <- imports_mth %>%
  filter(Month > as.Date("2019-12-01") & Month < as.Date("2023-01-01")) %>% 
  mutate(`Data Series` = str_replace_all(`Data Series`, "\\(|Thousand Dollars|\\)|Million Dollars", "")) %>%
  filter(`Data Series` != "Total Merchandise Imports ") # to remove total summary row

exports_recent <- exports_mth %>%
  filter(Month > as.Date("2019-12-01") & Month < as.Date("2023-01-01")) %>% 
  mutate(`Data Series` = str_replace_all(`Data Series`, "\\(|Thousand Dollars|\\)|Million Dollars", "")) %>%
  filter(`Data Series` != "Total Merchandise Exports ") # to remove total summary row

Two data sets, imports_recent and exports_recent, will again segmented by two data sets - by continents and by countries.

Show the code

# Import dataset
continents_imp <- imports_recent %>%
  filter(`Data Series` %in% c("America ", "Asia ", "Europe ", "Oceania ", "Africa ", "European Union "))

countries_imp <- imports_recent %>%
  filter(!`Data Series` %in% c("America ", "Asia ", "Europe ", "Oceania ", "Africa ", "European Union "))

# Export dataset
continents_exp <- exports_recent %>%
  filter(`Data Series` %in% c("America ", "Asia ", "Europe ", "Oceania ", "Africa ", "European Union "))

countries_exp <- exports_recent %>%
  filter(!`Data Series` %in% c("America ", "Asia ", "Europe ", "Oceania ", "Africa ", "European Union "))

Exploring time trend with line chart

Line chart is one of the most commonly used plot types to show the time trend. First, load the data using ggplot, then use geom_line() to start plotting line charts.

For x-axis, labels will be converted back to “2022 Dec” format from 2022-12-01. As each data set is segregated by month, x-axis scale will be broken by 1 month. Next, to put all x-axis labels without overlapping, labels will be rotated by 45-degree.

Finally, you can check the details of each time series by hovering to tool tips.

Show the code

library(ggplot2)

# Create the plot
plt_continent_imp <- ggplot(continents_imp, aes(x = Month, y = Import_Amt, color = `Data Series`)) +
  geom_line() +
  labs(title = "Imports by Continent", x = "Month", y = "Import Amount") +
  scale_x_date(date_breaks = "1 month", date_labels = "%Y %b",
               limits = as.Date(c("2020-01-01", "2022-12-01"), "%Y-%m-%d")) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplotly(plt_continent_imp, tooltip = c("color", "x", "y", "fill"))

Show the code

plt_continent_exp <- ggplot(continents_exp, aes(x = Month, y = Export_Amt, color = `Data Series`)) +
  geom_line() +
  labs(title = "Exports by Continent", x = "Month", y = "Export Amount") +
  scale_x_date(date_breaks = "1 month", date_labels = "%Y %b",
               limits = as.Date(c("2020-01-01", "2022-12-01"), "%Y-%m-%d")) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplotly(plt_continent_exp, tooltip = c("color", "x", "y", "fill"))

Key findings:

In terms of volume, Asia has the most amount across all continents, in both import and export amount.
In 2022 Feb, there was a sudden dip in both import and export amount in Asia.
The sudden dip of import and export in Asia in 2022 Feb may have impacted to be impacted by other continents as well, since other continents especially America, European Union, and Europe also showed similar sudden dip in this period.

Exploring time trend with heat map

Another way to look into time trend could be using heat map. To see the contrast in more vivid way, selecting colors in extreme end could be an option. However, it is important to choose an appropriate color scheme for people with color vision deficiency.

To plot a heat map, geom_tile() was used. Similarly, x-axis labels were formatted in “2022 Dec” style and rotated 45 degree for better readability.

Show the code

heat_continent_imp <- ggplot(continents_imp, aes(x = Month, y = `Data Series`, fill = Import_Amt)) +
  geom_tile() +
  labs(title = "Imports by Continent", x = "Month", y = "Import Amount", fill = "Import Amount") +
  scale_x_date(date_breaks = "1 month", date_labels = "%Y %b",
               limits = as.Date(c("2020-01-01", "2022-12-01"), "%Y-%m-%d")) +
  scale_fill_gradient(low = "lightgrey", high = "blue") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplotly(heat_continent_imp, tooltip = c("x", "y", "fill"))

Show the code

heat_continent_exp <- ggplot(continents_exp, aes(x = Month, y = `Data Series`, fill = Export_Amt)) +
  geom_tile() +
  labs(title = "Exports by Continent", x = "Month", y = "Export Amount") +
  scale_x_date(date_breaks = "1 month", date_labels = "%Y %b",
               limits = as.Date(c("2020-01-01", "2022-12-01"), "%Y-%m-%d")) +
  scale_fill_gradient(low = "lightgrey", high = "blue") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplotly(heat_continent_exp, tooltip = c("x", "y", "fill"))