Show the code
#Unpivot the data
pacman::p_load(plotly, DT, patchwork, ggstatsplot, tidyverse)
library(tidyr)
library(dplyr)
library(readr)February 26, 2023
In this data exploration, we will look into the import and export data sets. The data set has been sourced by Department of Statistics, Singapore (DOS) under Merchandise Trade by Region/Market.
To start, necessary libraries will be loaded, as well as data sets. Import and export data sets will be separately loaded to the work space.
Rows: 119 Columns: 566
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (457): Data Series, 2013 Dec, 2013 Nov, 2013 Oct, 2013 Sep, 2013 Aug, 20...
num (109): 2023 Jan, 2022 Dec, 2022 Nov, 2022 Oct, 2022 Sep, 2022 Aug, 2022 ...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 91 Columns: 566
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (409): Data Series, 2009 Dec, 2009 Nov, 2009 Oct, 2009 Sep, 2009 Aug, 20...
num (157): 2023 Jan, 2022 Dec, 2022 Nov, 2022 Oct, 2022 Sep, 2022 Aug, 2022 ...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The data is summarized in table format, however this format is not easy to run data analysis libraries. Therefore, the data wrangling will start from transposing the dataset. Here, I used gather() to transpose data by Data Series column, with Month and Amount.
# A tibble: 6 × 3
`Data Series` Month Import_Amt
<chr> <chr> <chr>
1 Total Merchandise Imports (Thousand Dollars) 2023 Jan 44393664
2 America (Million Dollars) 2023 Jan 6267.4
3 Asia (Million Dollars) 2023 Jan 30174
4 Europe (Million Dollars) 2023 Jan 6427.8
5 Oceania (Million Dollars) 2023 Jan 983.3
6 Africa (Million Dollars) 2023 Jan 541.1
# A tibble: 6 × 3
`Data Series` Month Export_Amt
<chr> <chr> <chr>
1 Total Merchandise Exports (Thousand Dollars) 2023 Jan 49575205
2 America (Million Dollars) 2023 Jan 5823.5
3 Asia (Million Dollars) 2023 Jan 34646.8
4 Europe (Million Dollars) 2023 Jan 5384
5 Oceania (Million Dollars) 2023 Jan 2844.5
6 Africa (Million Dollars) 2023 Jan 876.3
For this exploration, data analysis will only focus on the years from 2020 to 2022. In our data set, the import and export columns are in string. Therefore, the column values need to be converted to numeric. Second, for easier time series plotting, Month columns will be converted to Date variables using as.Date().
When you look into the Data Series, you can find that Total Merchandise Exports row is unnecessary as they are summary of the rest of countries. Also, each Data Series has “Thousand Dollars” or “Million Dollars” repetitive. For easier data analysis, we will remove them from Data Series value and leave only continent or country names.
Last but not least, as we are going to focus on recent three years, use filter() to remove unnecessary year rows.
Warning in mask$eval_all_mutate(quo): NAs introduced by coercion
Warning in mask$eval_all_mutate(quo): NAs introduced by coercion
# Remove unnecessary rows
imports_recent <- imports_mth %>%
filter(Month > as.Date("2019-12-01") & Month < as.Date("2023-01-01")) %>%
mutate(`Data Series` = str_replace_all(`Data Series`, "\\(|Thousand Dollars|\\)|Million Dollars", "")) %>%
filter(`Data Series` != "Total Merchandise Imports ") # to remove total summary row
exports_recent <- exports_mth %>%
filter(Month > as.Date("2019-12-01") & Month < as.Date("2023-01-01")) %>%
mutate(`Data Series` = str_replace_all(`Data Series`, "\\(|Thousand Dollars|\\)|Million Dollars", "")) %>%
filter(`Data Series` != "Total Merchandise Exports ") # to remove total summary rowTwo data sets, imports_recent and exports_recent, will again segmented by two data sets - by continents and by countries.
# Import dataset
continents_imp <- imports_recent %>%
filter(`Data Series` %in% c("America ", "Asia ", "Europe ", "Oceania ", "Africa ", "European Union "))
countries_imp <- imports_recent %>%
filter(!`Data Series` %in% c("America ", "Asia ", "Europe ", "Oceania ", "Africa ", "European Union "))
# Export dataset
continents_exp <- exports_recent %>%
filter(`Data Series` %in% c("America ", "Asia ", "Europe ", "Oceania ", "Africa ", "European Union "))
countries_exp <- exports_recent %>%
filter(!`Data Series` %in% c("America ", "Asia ", "Europe ", "Oceania ", "Africa ", "European Union "))Line chart is one of the most commonly used plot types to show the time trend. First, load the data using ggplot, then use geom_line() to start plotting line charts.
For x-axis, labels will be converted back to “2022 Dec” format from 2022-12-01. As each data set is segregated by month, x-axis scale will be broken by 1 month. Next, to put all x-axis labels without overlapping, labels will be rotated by 45-degree.
Finally, you can check the details of each time series by hovering to tool tips.
library(ggplot2)
# Create the plot
plt_continent_imp <- ggplot(continents_imp, aes(x = Month, y = Import_Amt, color = `Data Series`)) +
geom_line() +
labs(title = "Imports by Continent", x = "Month", y = "Import Amount") +
scale_x_date(date_breaks = "1 month", date_labels = "%Y %b",
limits = as.Date(c("2020-01-01", "2022-12-01"), "%Y-%m-%d")) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplotly(plt_continent_imp, tooltip = c("color", "x", "y", "fill"))plt_continent_exp <- ggplot(continents_exp, aes(x = Month, y = Export_Amt, color = `Data Series`)) +
geom_line() +
labs(title = "Exports by Continent", x = "Month", y = "Export Amount") +
scale_x_date(date_breaks = "1 month", date_labels = "%Y %b",
limits = as.Date(c("2020-01-01", "2022-12-01"), "%Y-%m-%d")) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplotly(plt_continent_exp, tooltip = c("color", "x", "y", "fill"))Key findings:
In terms of volume, Asia has the most amount across all continents, in both import and export amount.
In 2022 Feb, there was a sudden dip in both import and export amount in Asia.
The sudden dip of import and export in Asia in 2022 Feb may have impacted to be impacted by other continents as well, since other continents especially America, European Union, and Europe also showed similar sudden dip in this period.
Another way to look into time trend could be using heat map. To see the contrast in more vivid way, selecting colors in extreme end could be an option. However, it is important to choose an appropriate color scheme for people with color vision deficiency.
To plot a heat map, geom_tile() was used. Similarly, x-axis labels were formatted in “2022 Dec” style and rotated 45 degree for better readability.
heat_continent_imp <- ggplot(continents_imp, aes(x = Month, y = `Data Series`, fill = Import_Amt)) +
geom_tile() +
labs(title = "Imports by Continent", x = "Month", y = "Import Amount", fill = "Import Amount") +
scale_x_date(date_breaks = "1 month", date_labels = "%Y %b",
limits = as.Date(c("2020-01-01", "2022-12-01"), "%Y-%m-%d")) +
scale_fill_gradient(low = "lightgrey", high = "blue") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplotly(heat_continent_imp, tooltip = c("x", "y", "fill"))heat_continent_exp <- ggplot(continents_exp, aes(x = Month, y = `Data Series`, fill = Export_Amt)) +
geom_tile() +
labs(title = "Exports by Continent", x = "Month", y = "Export Amount") +
scale_x_date(date_breaks = "1 month", date_labels = "%Y %b",
limits = as.Date(c("2020-01-01", "2022-12-01"), "%Y-%m-%d")) +
scale_fill_gradient(low = "lightgrey", high = "blue") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplotly(heat_continent_exp, tooltip = c("x", "y", "fill"))