with Bernard Kerr, Mira Dontcheva, Justin Grover, Matthew Hoffman, Alan Wilson
Event sequence datasets with high event cardinality and long sequences are difficult to visualize and analyze. In particular, it is hard to generate a high level visual summary of paths and volume of flow. Existing approaches of mining and visualizing frequent sequential patterns look promising, but have limitations in terms of scalability, interpretability and utility. We propose CoreFlow, a technique that automatically extracts and visualizes branching patterns in event sequences. CoreFlow constructs a tree by recursively applying a three-step procedure: rank events, divide sequences into groups, and trim sequences by the chosen event. The resulting tree contains key events as nodes, and links represent aggregated flows between key events. Based on CoreFlow, we have developed an interactive system for event sequence analysis. Our approach can compute branching patterns for millions of events in a few seconds, with improved interpretability of extracted patterns compared to previous work. We also present case studies of using the system in three different domains and discuss success and failure cases of applying CoreFlow to real-world analytic problems. These case studies call forth future research on metrics and models to evaluate the quality of visual summaries of event sequences.