Visual Analytics of Online Streaming Text
Final Report Abstract
With the rise of online news outlets and social media platforms, the amount of unstructured content such as text has tremendously increased in recent years. Journalists, business traders, sociologists, computer scientist, first responders, and many more stakeholders could heavily benefit from analyzing this rich source of human-generated information. In addition, we are facing unprecedented threats posed by the fast and uncontrolled global spread of misinformation and rumors on these channels. We need computational approaches to tackle these challenges due to the volume and frequency of published content. However, dealing with unstructured data is particularly challenging for machines. We can perform a variety of different operations on numerical data, but text is more than a sequence of characters. Finding and deciding on a suitable representation is inherently part of the algorithm. Most of the past research has either focused on the analysis of individual documents or on rather broad aggregations of corpora in an offline setting. Furthermore, in order to understand the evolution of content patterns, detect anomalous information, and discover large scale coordinated activities, we have to cope with the inherent challenges of real-time streaming text. To close this gap, the goal of this project was to develop novel visual analytics approaches that not only facilitate the analysis of large text corpora but also enable the real-time analysis and monitoring of high-volume streaming posts. We first acquired a suitable benchmark repository from news outlets, social networks, and microblogs to enable realistic evaluation scenarios with simulated replays. We proposed efficient text aggregation methods that work dynamically in a streaming setting and extract important words, phrases, and connected chunks of phrases. Based on these aggregations, we developed a new interactive visual layout that conveys the most important concepts over time in large corpora such as news articles. We proposed a new dynamic clustering algorithm that is particularly efficient for clustering streaming text. We integrated our text mining and clustering techniques into a novel visual analytics system for the real-time monitoring and analysis of high-volume streaming posts. It provides a dynamic, visual overview of the topical landscape. Analysts can choose to monitor specific topics in greater detail at different levels of granularity. Our visual summaries help to aggregate and filter the content to reduce the cognitive load while still presenting key developments. The resulting visual analytics approaches developed as part of this project enable analysts from various disciplines to analyze larger amounts of data faster and in more detail than before, especially in a streaming scenario.
Publications
-
Highlighting Text Regions of Interest with Character-Based LSTM Recurrent Networks. In: Postersession at the IEEE Conference on Visualization, 2018. - Peer Reviewed Poster.
Knittel, Johannes; Koch, Steffen & Ertl, Thomas
-
Interactive Hierarchical Quote Extraction for Content Insights. In: EuroVis 2019 - Posters : The Eurographics Association, 2019. - Peer Reviewed Poster. ISBN 978-3-03868-088-8
Knittel, Johannes; Koch, Steffen; Ertl, Thomas; Madeiras Pereira, João & Raidou, Renata Georgia
-
Pattern-Based Semantic and Temporal Exploration of Social Media Messages. 2019 IEEE Conference on Visual Analytics Science and Technology (VAST), 134-135. IEEE.
Knittel, Johannes; Koch, Steffen & Ertl, Thomas
-
Efficient sparse spherical k-means for document clustering. Proceedings of the 21st ACM Symposium on Document Engineering, 1-4. ACM.
Knittel, Johannes; Koch, Steffen & Ertl, Thomas
-
ELSKE. Proceedings of the 21st ACM Symposium on Document Engineering, 1-4. ACM.
Knittel, Johannes; Koch, Steffen & Ertl, Thomas
-
Online Study of Word-Sized Visualizations in Social Media. In: EuroVis 2021 - Posters : The Eurographics Association, 2021. - Peer Reviewed Poster. ISBN 978- 3-03868-144-1
Huth, Franziska; Awad-Mohammed, Miriam; Knittel, Johannes; Blascheck, Tanja; Isenberg, Petra; Byška, Jan; Jänicke, Stefan & Schmidt, Johanna
-
PlotThread: Creating Expressive Storyline Visualizations using Reinforcement Learning. IEEE Transactions on Visualization and Computer Graphics, 27(2), 294-303.
Tang, Tan; Li, Renzhong; Wu, Xinke; Liu, Shuhan; Knittel, Johannes; Koch, Steffen; Ertl, Thomas; Yu, Lingyun; Ren, Peiran & Wu, Yingcai
-
PyramidTags: Context-, Time-and Word Order-Aware Tag Maps to Explore Large Document Collections. IEEE Transactions on Visualization and Computer Graphics, 27(12), 4455-4468.
Knittel, Johannes; Koch, Steffen & Ertl, Thomas
-
Visual Neural Decomposition to Explain Multivariate Data Sets. IEEE Transactions on Visualization and Computer Graphics, 27(2), 1374-1384.
Knittel, Johannes; Lalama, Andres; Koch, Steffen & Ertl, Thomas
-
Large-scale analysis of textual and multivariate data combining machine learning and visualization. Dissertation, Universität Stuttgart 2022
Knittel, Johannes
-
Real-Time Visual Analysis of High-Volume Social Media Posts. IEEE Transactions on Visualization and Computer Graphics, 28(1), 879-889.
Knittel, Johannes; Koch, Steffen; Tang, Tan; Chen, Wei; Wu, Yingcai; Liu, Shixia & Ertl, Thomas
