All events are in Central time unless specified.

Vinod Variyam, University of Nebraska-Lincoln

Distinct Elements in Streams: An Algorithm for the (Text) Book

4:00 pm – 5:00 pm
Avery Hall Room: 115
1144 T St
Lincoln NE 68508
Additional Info: AVH
Petronela Radu
The basic computational task of estimating the number of unique elements in datasets, known as the Distinct Elements Problem, is critical in numerous domains including network traffic analysis, database management, marketing analysis, bioinformatics, textual data analysis, and fraud detection. This problem becomes particularly challenging when the dataset is too large to be stored or processed in its entirety.

This talk will highlight recent algorithmic progress on the distinct elements problem within the data streaming model. The data streaming model is popular for handling large amounts of streaming data. In this framework, data items arrive sequentially from a finite set, with the model’s parameters imposing strict limitations on storage and processing capabilities. Designing algorithms with limited resources for the distinct elements problem has been a significant focus for researchers, given its broad appeal. While earlier approaches relied on hashing techniques, a pure sampling-based algorithm remained elusive for many years. We have recently discovered a simple sampling-based algorithm (the CVM algorithm) for the distinct elements problem. This talk will present the CVM algorithm and its generalization in a broader data streaming scenario.

The talk is based on joint research with Sourav Chakraborty of the Indian Statistical Institute and Kuldeep S. Meel of the University of Toronto.

Download this event to my calendar

This event originated in Math.