Zum Inhalt

FAIR Workshop on Sequence & Streaming Data Analysis

The interdisciplinary research area FAIR organizes a two-day workshop on Sequence and Streaming Data Analysis.

The goal of this workshop is to obtain a basic understanding of similarity measures, classification and clustering algorithms for sequence data as well as streaming data analysis.

Our invited speakers are:

  • André Nusser (University of Copenhagen)
  • Chris Schwiegelshohn (Aarhus University)

When and where

Date and Time: November 22 and 23, 2022. 9:00-13:00 CET
Location: Otto-Hahn-Str. 14, Room E04, (Computer Science Building), klick to view it on a map
Virtual: Online in Zoom (a link will be sent only to registered participants by email) 

How to register

The number of on-site participants is strictly limited while online participation will be possible for a potentially large number of persons. A registration is necessary in both cases.

To register for the workshop, please send an informal email to Amer Krivošija (amer.krivosijatu-dortmundde) until November 16, stating

  1. your name, title, and status (e.g. PostDoc),
  2. how you would prefer to participate (on-site or online),
  3. your institution (e.g. TU Dortmund), and faculty (e.g. Faculty of Statistics),
  4. whether you would like to participate in a workshop dinner (Nov. 22, 7pm),
  5. (optional) a brief statement of motivation.

Invited Speakers

© FAIR​/​TU Dortmund

André Nusser

Postdoc at Basic Algorithms Research Copenhagen (BARC), Department of Computer Science (DIKU), University of Copenhagen

André obtained his PhD at the Max Planck Institute for Informatics in Saarbrücken. He now is as Postdoc at Basic Algorithms Research Copenhagen (BARC) at Copenhagen University. He is interested in algorithm design, fine-grained lower bounds, and algorithm engineering in computational geometry, and in particular, sequence and point set similarity measures.

Website: https://people.mpi-inf.mpg.de/~anusser/

An Overview of Geometric Sequence Similarity Measures

Abstract: Sequence data is ubiquitous in any area where any type of quantitative measurements are performed in a specific order. To understand and analyze this data, we need a way to measure the similarity between sequences. As there are multiple natural measures for this task, our focus of this series of talks is to discuss different sequence similarity measures, especially the ones that are based on a geometric view of sequences.

We analyze advantages and disadvantages of the introduced sequence similarity measures and discuss settings where each measure would be the preferred choice, respectively. One main usage of similarity measures is the classification and clustering of sequence data. To that end, we discuss different general clustering techniques and how they are applicable to sequence data.

The talks are aimed at people who have a very basic mathematical background knowledge and are new-ish to sequence similarity measures.

© FAIR​/​TU Dortmund

Chris Schwiegelshohn

Assistant professor for computer science and algorithm design, MADALGO, Department of Computer Science, Aarhus University

Chris is a home grown researcher, having completed his PhD under the supervision of Christian Sohler at TU Dortmund. Subsequently, he joined Sapienza, University of Rome, first as a Postdoc hosted by Stefano Leonardi and then as a faculty member. In 2020, he joined Aarhus University as a tenure track assistant professor. Chris' research focusses on algorithm design in general, with an emphasis on sketching, streaming and learning algorithms, as well as approximation and online algorithms.

Website: https://cs.au.dk/~schwiegelshohn/

A Painless Introduction to Coresets

Abstract: Coresets are arguably the most important paradigm used in the design and analysis of big data and data stream algorithms. Succintly, a coreset compresses the input such that for any candidate query, the query evaluation on the coreset and the query evaluation on the original data are approximately the same. For clustering, this means that a coreset is a small weighted sample of the points such that for any set of centers, the cost on the original point set and the cost on the coreset are equal up to some small multiplicative distortion. In this talk, we will give an in-depth and yet also very simple and basic introduction into coreset algorithms and their analysis.