Documentation - Imputation Visualizer

Overview

Many phenomena in the physical world manifest cyclical behaviour of some sort. Measuring and visualizing those behaviours properly is not an easy task. Questions of how to best visualize the data for the intended purpose arise. To that end, I implemented a paper that visualizes time-series data in a linear, as well as cyclical fashion. This enables the user to compare datapoints that lie at the same spot in the cycle. Given a yearly dataset with months, one could compare all Januaries side by side.

Often, data possesses missing data points, which need to be imputed somehow. The paper combines the visualization of the data with the possibility to visualize the results of different imputation methods in different ways. In the paper, mainly Multiple Imputation methods were used to impute the data. Since the exact methods weren't specified, I chose among the most popular to implement for my application. In addition to Multiple Imputation methods, I implemented various Single Imputation methods. The difference between those two is, that Multiple Imputation creates multiple imputed datasets, and pools the result from those.

To visualize the uncertainty that is introduced through that, the paper uses Boxplots and/or Error Boundaries, which show the range in which the imputed points lie. Since I implemented Single Imputation methods as well, one can choose between viewing Boxplots/Error Boundaries (Error Boundaries are for Multiple Imputation methods only) and viewing single points.

The data set used is the popular sunspot data set, which contains the strength of sun eruptions of the years 1749 to 2013. Originally, it contains 12 measurements per year. For the sake of preventing cluttering, I reduced it to 1 measurement per year. The data has a main cycle of 10.95 years (I rounded to 11). Therefore, I split the data set into 11 sub-sets to view in the cyclical chart.

How to use the App

The App was written in R and makes heavy use of the Shiny and ggplot packages. So to use the app, one needs a current version of R (preferrably in combination with RStudio). The file "app.R" is all that is needed (with the datafile in the same directory). Therefore, there is no "bin" folder. Once the "app.R" is opened and executed, a browser window should open with the App running.

The upper chart is the data set in regular linear time and the bottom one in cyclical time. To see where a point lies in the other chart, one can mark a point by clicking on it. The point then appears red in both charts.

The data set is without missing values to begin with. To randomly generate 5 missing values, one has to click th "Delete random Values" button. Then, the imputed values are shown in red on both charts. The default method is "mean", which simply imputes the mean value of the data set. A feature which the original paper doesn't have, but which I found to be useful, is the "toggle" function. With the button "Toggle actual/imputed" one can always switch between the original data set and the imputed one. The selection box lets one choose between viewing Error Boundaries or Single Values in combination with Boxplots - Single Values for single value imputation methods and Boxplots for Multiple Imputation methods (Note: Error Boundaries are only thought to be used with Multiple Imputation methods. If no multiple Imputation methods is chosen, the visualization is wrong).

Once random vaues were deleted, one can choose among the 16 imputation methods and press "Impute Values" to impute and compare the values. Since when using more than one method, the chart gets cluttered quickly, one can zoom in. This is done by drawing a rectangle across the desired area and double clicking on the rectangle. To zoom out, one has to double click on the chart again. To generate new random values, one can repeatedly click "Delete random Values". Note: The button "Impute Values" can only be clicked when the chart is in imputed mode (toggling from actual to imputed solves the problem).

General note: There is no good code documentation, since my code is mainly ggplot commands and manipulating reactive shiny objects. I attempted to create a doc via Roxygen (Doxygen for R), but I wasn't successful. (I only have one explicitly named function in the code, which is the server function).

Download Source