Combining content and link information for describing, classifying, clustering and visualizing networked information spaces.
The focus of the group is the application of machine learning, graph theory and natural language processing to problems in networked information spaces, i.e. large document collections which have the form of a graph, where nodes are occupied by documents and links represent relations between documents (hyperlinks or citations). Specific research problems addressed include similarity and clustering based on both content and link information, low-dimensional representations of special text corpora based on lexical ontologies and automatically extracted terms, summarization of web document
collections, and information extraction. Networked information spaces of particular interest include the scientific and medical research literature, the Web and corporate Web spaces. To address the computational requirements associated with processing large data sets, attention is focusing on the use of coarse-grained parallelism (on clusters of Linux workstations).