Interactive time series data visualization

PART 1

Introduction

Dygraph is a data visualization library originally written in Javascript and handles large datasets such as time series data really well. Among its many advantages that has attracted a large community of users and developers alike is interactivity.

Interactive by default: With default interactivity enabled for mouse hovers to shows values, you don’t need to go through a hell of time to produce interactive charts. And yet if this is not enough, you are still covered by additional functions for the extra. Like many important visualization librabries, dygraphs has an R package which serve as an interface to the dygraph library. This means that, instead of worrying about learning javascript in order to start using dygraphs, we can just use our R programming knowledge. As usual, all we need is to install the package which in this case is Dygraphs. The dygraphs package is accessible from the CRAN repository. Afterwards, we have to load the dygraph library and we are good to go!

This post is a simple beginner-friendly tutorial to using dygraphs package in R and Rstudio for time series visualization. You will also learn the following along side; * Making an api call to retrieve data * Preparing the dataset for visualization in dygraphs

Dataset to be used

Nitrous Oxide data

Input data format

dygraph accepts time series data that are ts or xts objects.

Tools used for this tutorial

R 4.1 and Rstudio version 1.4.1

At this point, introduction is enough, now its time to jump into our Rstudio and start coding. First, we install the packages needed if they are not already installed and load them as libraries for the work.

## I have commented-out installation of packages because I have are already installed them locally
# install.packages("magrittr")
# install.packages("rjson")
# install.packages("jsonlite")
# install.packages("dygraphs")


library(magrittr)
library(rjson)
library(jsonlite)
library(dygraphs)  ## dygraphs

Retrieving data for analysis

We will be using Nitrous Oxide data from global-warning.org. They have an API with an endpoint that returns json as response.

In order to download the data, we need to get API endpoint and assign it to a variable. Then, we will use fromJSON() function from the jsonlite package to download the data and convert it to dataframe. This can be achieve with the code below;

## retrieve NO2 data from API
no2_url <- "https://global-warming.org/api/nitrous-oxide-api"
no2_json <- fromJSON(no2_url)
no2_dataframe <- as.data.frame(no2_json)

Convert data to time series

Now that we have our data ready, we need to convert it to a time series object using the ts() function. Various arguments need to be passed to ts() in order to correctly define the period of occurrence of the series. The frequency argument indicates the number of observations made for the time period. For a monthly data recording, this will be 12 indicating 12 months in a year. The start argument is specified as a vector c(2001, 3) indicating that the first recording of the data was made on March, 2001.

After converting the data to a time series object, its ready for plotting with dygraph( ). We want to plot the average monthly Nitrous Oxide emission which is the second column; so we will subset our data and pass it to dygraph( ) function. Because we want to demonstrate the use of various dygraph() elements by adding them incrementally, we will create a dygraph object by assigning the plot to a variable. The code below does just that.

ts_no2 <- ts(data = no2_dataframe, frequency = 12, start = c(2001, 3))
dy_ts_no2 <- dygraph(ts_no2[, 2], main = "Nitrous dioxide emissions", xlab = "Years", 
                     ylab = "Emissions in part per billion (ppb)")
dy_ts_no2

Focusing on data highlights

Sometimes, we are not only interested in visualizing your data but also focusing on highlighting certain insights that are of interest to your audience. This can be achieved using a number of functions.

dyAnnotation()

The dyAnnotation () function enables us to annonate our chart and create a tooltip that details the message or highlight we want to convey for that data point. The arguments of dyAnnonation () allows us to specify where to place the annotation on the chart (x argument), the label or annotation on the chart (text argument) and the message to convey when we hover over it (tooltip argument). When we hover over the point, it displays the message

dyEvent()

The dyEvent( ) function as the name suggests is used to specified the time that an event occurred on the time series. This is usually done when we believe that event probably influenced a noticeable change in the time series and should noted. For instance, if we are visualizing sales data we notice a spike in sales on a particular day, it may be useful to use dyEvent( ) to indicate on the time series chart that it was Christmas day, a festival or a related event that cause it.

dyLimit()

The dyLimit() function is similar to dyEvent() function but different in its usage and orientation. The dyLimit( ) can be used as a reference point on the time series based on some descriptive statistics such as the median value of the entire time series. For example, we can use the dyLimit() to position a horizon line at the mean value of Nitrous Oxide emissions and be able to discern which periods were above the mean emission.

dyShading()

The dyShading() function is used to draw attention to a period or several periods on the time series data. For instance, if there is a period in time where an anomaly or change in trend seems to be captured in the time series and we want to highlight that; dyShading() will be of good use. dyShading() achieves this with the arguments from, to, and color.

The code for demonstrating this is below;

dy_ts_no2 %>%
  dyAnnotation(x = "2019-12-1", text = "C2", tooltip = "COVID-19 emission period start") %>%
  dyAnnotation(x = "2002-7-1", text = "C1", tooltip = "COVID-02 emission start") %>%
  dyShading(from = "2005-1-1", to = "2010-1-1", color = "#FFECBB") %>%
  dyShading(from = "2015-1-1", to = "2020-1-1", color = "#FCDBEE") %>%
  dyEvent(x = "2008-1-1", label = "Global economic recession", labelLoc = "top", 
          color = "red") %>%
  dyLimit(limit = mean(ts_no2), label = "Mean", labelLoc = "left", color = "blue")

Our time series plot above is seemingly crowded with the combination of various dygraph() functions for drawing focus to an aspect of the time series. Nonetheless, it enables us to communicate important insights using their combination. For example, we can deduce that during the global economic recession and period prior to 2010, Nitrous Oxide emissions were below the mean for the time series period. We can have indicators better than mean to make reference to. The message that remains is that a combination of dyLimit(), dyEvent() and dyShading() and possiblly dyAnnotation() functions enables better insights, that are not readily available visually, to be gained.

Focusing on sections of time series interactively

For a large time series data covering a long time period, some trends and seasonality may be hidden visually or we may just be interested in allowing our audience the option to focus on time periods of interest to them. This can be achieved with the dyRangeSelector() function.

dyRangeSelector( )

The dyRangeSelector() function accepts a dygraph or time series object as an argument and allows for various customization with other arguments. We can zoom-in to a time period by default using the dateWindow argument while allowing users to interactively choose other time periods. Among others, we can customize colors to reflect our desired aesthetics using the fillColor and strokeColor arguments.

The code below shows how to achieve that;

dy_ts_no2 %>%
  dyRangeSelector(dateWindow = c("2010-01-01", "2021-12-31"), fillColor = "#febacd", strokeColor = "#bffdea")

Customizing time series charts

While the various functions have a flair for customizing the outlook of charts, a full set of customization that meets the mock-up of a client will require a dedicated tutorial with discussions on css styling among others. To have digestible chunks of information, this tutorial will draw down the curtains with using dyOptions for customizing graphs and leave the use of css for another day (sooner than later). That being said, lets look at using dyOptions( ) function for dygraph customization.

DyOptions() function accepts numerous arguments such as line colours, widths, labels among several others to be customized. In fact, the range of customization possible implies that they have to be used with care to prevent introducing noise into the data visualization.

We can use the code below to just demonstrate a limited selection of the options possible. In fact, the graph is not the best in terms of colour and width used. It is only for demonstration purposes and should even be seen to buttress the point that not all arguments can be used together for a visually appealing graph.

dy_ts_no2 %>%
  dyOptions(fillGraph = TRUE, stepPlot = T, drawPoints = T, colors = "#abecbd", axisLineColor = "red", axisLineWidth = 3, gridLineColor = "gray", axisLabelColor = "#fbbdcc")

Conclusion

In this tutorial we practiced how to use dygraphs package in R to visualize a time series data. The focus was only on a univariate dataset (a single variable being Nitrous Oxide emissions) so the type of plots produced were limited to that. Nonetheless, we used various functions that are relevant for other types of data and highlighted situations where they will be appropriate. The default interactivity that is available to us without stress and options to customize their looks proves worthwhile to invest our efforts in dygraphs as a time series visualization library.

This is just the part 1 of using dygraphs for time series visualization. In our next tutorial of these series, we will visualize multivariate time series datasets and also look at customization with css.

Interactive time series data visualization – Dygraphs in R