imMens

with Biye Jiang and Jeffrey Heer


Data analysts must make sense of increasingly large data sets, sometimes with billions or more records. imMens is a system designed to support interactive visual exploration of large data sets. The scalable visual representations are based on binned aggregation and support a variety of data types: ordinal, numeric, temporal and geographic.

Figure 1. Using Google Fusion Tables (left) and imMens (right) to visualize a dataset of 4M Brightkite user checkins. Fusion Table’s symbol map visualizes a sample of the data, while imMens’ heatmap shows the density of checkins by aggregation. Compared to the heatmap, sampling misses important structures such as inter-state highway travel and Hurricane Ike, while dense regions still suffer from over-plotting. Moreover, imMens supports real-time brushing and linking among various dimensions of the dataset.

To achieve interactive querying (e.g., brushing and linking) between the visualizations, imMens precomputes multivariate data projections and store these as data tiles. The browser-based front-end dynamically loads appropriate data tiles and uses WebGL for data processing and rendering. In benchmarks imMens sustains 50 frames-per-second brushing & linking among dozens of visualizations, with invariant performance on data sizes ranging from thousands to billions of records.

Figure 2. Multiple coordinated views of Brightkite user checkins in North America. Cyan lines in the heatmap indicate data tile boundaries. Each visualization region is annotated by its backing data dimensions and indices.

Papers

imMens: Real-time Visual Querying of Big Data
PDF (EuroVis 2013)

The effects of interactive latency on exploratory visual analysis
PDF (InfoVis 2014)

Demo

– 4.5 million user checkins on Brightkite
– 35.6 million flight delays in the U.S. from 1989 to 2008
– 10K to 1B synthetic data points visualized as scatterplot matrices (SPLOM)

ISTC Blog Posts

About the imMens system
Study on the effects of latency

Software

Source code available on Github

Study Materials

Log Data | Verbal Data