Project Details
Projekt Print View

Visual Analytics of Online Streaming Text

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2018 to 2023
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 392087235
 
Final Report Year 2023

Final Report Abstract

With the rise of online news outlets and social media platforms, the amount of unstructured content such as text has tremendously increased in recent years. Journalists, business traders, sociologists, computer scientist, first responders, and many more stakeholders could heavily benefit from analyzing this rich source of human-generated information. In addition, we are facing unprecedented threats posed by the fast and uncontrolled global spread of misinformation and rumors on these channels. We need computational approaches to tackle these challenges due to the volume and frequency of published content. However, dealing with unstructured data is particularly challenging for machines. We can perform a variety of different operations on numerical data, but text is more than a sequence of characters. Finding and deciding on a suitable representation is inherently part of the algorithm. Most of the past research has either focused on the analysis of individual documents or on rather broad aggregations of corpora in an offline setting. Furthermore, in order to understand the evolution of content patterns, detect anomalous information, and discover large scale coordinated activities, we have to cope with the inherent challenges of real-time streaming text. To close this gap, the goal of this project was to develop novel visual analytics approaches that not only facilitate the analysis of large text corpora but also enable the real-time analysis and monitoring of high-volume streaming posts. We first acquired a suitable benchmark repository from news outlets, social networks, and microblogs to enable realistic evaluation scenarios with simulated replays. We proposed efficient text aggregation methods that work dynamically in a streaming setting and extract important words, phrases, and connected chunks of phrases. Based on these aggregations, we developed a new interactive visual layout that conveys the most important concepts over time in large corpora such as news articles. We proposed a new dynamic clustering algorithm that is particularly efficient for clustering streaming text. We integrated our text mining and clustering techniques into a novel visual analytics system for the real-time monitoring and analysis of high-volume streaming posts. It provides a dynamic, visual overview of the topical landscape. Analysts can choose to monitor specific topics in greater detail at different levels of granularity. Our visual summaries help to aggregate and filter the content to reduce the cognitive load while still presenting key developments. The resulting visual analytics approaches developed as part of this project enable analysts from various disciplines to analyze larger amounts of data faster and in more detail than before, especially in a streaming scenario.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung