Home | CHOROLOGOS: Semantic Spatio-textual Data Analysis and Processing

Background

CHOROLOGOS is a research project that is funded by the Hellenic Foundation for Research and Innovation (HFRI) and the General Secretariat for Research and Technology (GSRT), under grant agreement No [HFRI-FM17-81]. The instrument aims to support research of Academic Staff and Researchers, and the principal investigator of CHOROLOGOS is Christos Doulkeridis. The project is hosted at the Department of Digital Systems in the University of Piraeus.

Objectives

image_mobility

Formulation of expressive query types that enable selection of underlying spatio-temporal-textual data based on diverse information needs, going beyond exact or syntactical matching and towards semantic retrieval. Examples of such queries include similarity matching, pattern-based matching, as well as semantic similarity matching.
Theoretical contributions in terms of properties and search bounds for the proposed query types, thus laying the foundations for efficient processing and search.
Design of appropriate access methods that jointly index space, time, and text, in an appropriate way to support filtering of data that is irrelevant to the query at hand.
Efficient query processing algorithms following well-established methodologies, including filter-and-refine and branch-and-bound, aiming at fast delivery of accurate query results.
Parallel processing of the proposed query types, towards scalable algorithms that make the analysis of vast-sized data sets feasible in practice.

Dissemination

Newsletter Issue 4 (April 2022)

Newsletter Issue 3 (March 2022)

Newsletter Issue 2 (January 2022)

Newsletter Issue 1 (November 2021)

Brochure

Description of Work

The combination of spatio-textual data with spatio-temporal data at scale opens up new research directions, while at the same time challenges existing data processing solutions. As a result, the following main research and technological challenges need to be addressed by the project:

Formulation of novel query types: The acquisition of massive complex data, described by spatial, temporal and textual dimensions, has motivated the research of novel query types, in order to retrieve data in flexible, expressive, and meaningful ways. Consequently, miscellaneous interesting query types have emerged, which raise challenges for query processing algorithms. These query types include reverse query operators (reverse top-k, reverse k-NN, etc.), why-not operators, queries that retrieve groups of objects (instead of single objects), complex joins, pattern queries, optimal location queries, and so on. The resulting challenge for CHOROLOGOS is to formulate meaningful and useful query types, along with the theoretical properties which will enable pruning the search space effectively.
Indexing structures for data combining space, time and text: In the past, several indexing structures have been proposed for mobility data (spatio-temporal or trajectory indexes), while more recently spatio-textual indexes have emerged too. Both these types of indexes have to face significant challenges research-wise, related to the dynamic nature of the temporal dimension in the former case, and to the high-dimensionality of text in the latter case. Designing efficient index structures for the combination of the three types of dimensions (space, time and text) is far more difficult to accomplish. CHOROLOGOS targets exactly this pressing need for efficient access methods of spatio-temporal-textual data, in order to increase the performance of data processing and analysis.
Efficient query processing algorithms: The combination of multiple dimensions in conjunction with complex query types (e.g., joins) has typically a devastating effect on the performance of query processing, since the size of the search space increases significantly. To address this challenge, efficient algorithms are sought that can eagerly prune the search space, and retrieve the query result as fast as possible. CHOROLOGOS is going to exploit algorithms belonging to the filter-and-refine paradigm, where effective filtering of candidate results drastically reduces the combinations of objects that need to be evaluated in the refinement phase, leading to performance gains. To design such algorithms, we are going to derive appropriate search bounds that provide guarantees about the correctness of filtering. Moreover, branch-and-bound algorithms will be proposed that capitalize on the derived search bounds and prune the search space.
Parallel and scalable framework for processing massive data: Last, but not least, as part of innovation, CHOROLOGOS will deliver a parallel data processing version of the algorithmic framework, in order to meet the scalability challenges posed by today’s massive data sets (social networks, surveillance networks, IoT and sensor networks). For the development, we will use a state-of-the-art parallel data processing framework, such as Apache Spark or Apache Flink, which offer salient features, including fault-tolerance, resource management (e.g., when coupled with YARN), following the common research practice that extends such frameworks to produce application-targeted prototypes (e.g., SpatialHadoop, ST-Hadoop, LocationSpark, etc.). The major underlying challenge in this context is to achieve efficient data partitioning, fair work allocation, and efficient indexing at local as well as global level.

Expected Results

CHOROLOGOS aims at advancing the state-of-the-art in spatio-temporal-textual query processing, by introducing a novel framework that tightly combines spatio-textual and spatio-temporal querying with semantic retrieval, focusing on expressive query formulation beyond syntactical matching, efficient indexing and query processing, and scalable analysis of massive spatio-textual data.

Impact

CHOROLOGOS promises to move the research frontier a step forward in the area of semantic spatio-textual data management. Effective and efficient retrieval of spatio-temporal-textual data is a challenging topic, which has attracted considerable attention recently, not only from the academia, but also from the industry. Search engines (such as Google, Yahoo and Bing) and social network providers (Twitter, Foursquare, etc.) either collect or own vast-sized spatio-textual data sets, and conduct research in new methods and technologies for advanced analytics, in order to provide personalized recommendations, targeted marketing, etc. By exploiting CHOROLOGOS the analysis of massive spatio-textual datasets, typically encountered in the aforementioned domains and especially in social networks, is going to be facilitated significantly.

For more information please contact Christos Doulkeridis, email: cdoulk at unipi dot gr

For more information about the research group and the department, please visite the respective home pages:
Department of Digital Systems

Subscribe to