2021 Whova User Guide Videos: Youtube | Bilibili
We prepare two similar runs on each day for the convenience of people in different time zones. Note that all the time has been adjusted to your local time zone.


Beijing Time



TUESDAY (JUNE 22, 2021, Beijing Time)


22 JUN

07:30 - 09:00


SIGMOD Opening and Keynote:
Utilizing (and Designing) Modern Hardware for Data-Intensive Computations: The Role of Abstraction


Session Chair: Stratos Idreos

Third International Conference Hall (国三)

Zoom Link
Youtube Live
Bilibili Live
Youtube Video
Bilibili Video

Speaker: Kenneth A. Ross (Columbia University)
Abstract: Modern information-intensive systems, including data management systems, operate on data that is mostly resident in RAM. As a result, the data management community has shifted focus from I/O optimization to addressing performance issues higher in the memory hierarchy.
In this keynote, I will give a personal perspective of these developments, illustrated by work from my group at Columbia University. I will use the concept of abstraction as a lens through which various kinds of optimizations for modern hardware platforms can be understood and evaluated. Through this lens, some “cute implementation tricks” can be seen as much more than mere implementation details.
I will discuss abstractions at various granularities, from single lines of code to whole programming/query languages. I will touch on software and hardware design for data-intensive computations. I will also discuss data processing in a conventional programming language, and how the data management community might contribute to the design of compilers.


22 JUN

09:00 - 09:30

Sponsor Talk of Huawei

Third International Conference Hall (国三)

Zoom Link
Youtube Live
Bilibili Live


22 JUN

09:30 - 10:30

SIGMOD Curated Session:
Data Management for ML


Session Chair:
Umar Farooq Minhas
Arun Kumar

Multimedia II Hall 1 (多二1厅)

Zoom Link
Youtube Live
Bilibili Live

Keynote 1: Challenges in building instance optimized systems
Umar Farooq Minhas (Microsoft)

Slot 1: Benchmarking and in Database Inference

Towards Demystifying Serverless Machine Learning Training

Jiawei Jiang (ETH Zurich)*; Shaoduo Gan (ETH Zurich); Yue Liu (ETH Zurich); Fanlin Wang (ETHZ); Gustavo Alonso (ETHZ); Ana Klimovic (ETH Zurich); Ankit Singla (ETH Zurich); Wentao Wu (Microsoft Research); Ce Zhang (ETH)

Towards Benchmarking Feature Type Inference for AutoML Platforms

Vraj Shah (University of California, San Diego)*; Jonathan Lacanlale (California State University, Northridge); Premanand Kumar (University of California, San Diego); Kevin Yang (University of California, San Diego); Arun Kumar (University of California, San Diego)

Transforming ML Predictive Pipelines into SQL with MASQ

Francesco Del Buono (University of Modena e Reggio Emilia); Matteo Paganelli (Università di Modena e Reggio Emilia); Paolo Sottovia (Huawei); Matteo Interlandi (Microsoft); Francesco Guerra (University of Modena e Reggio Emilia)*

Slot 2: Privacy & New Algorithms

HedgeCut: Maintaining Randomized Trees for Low-Latency Machine Unlearning

Sebastian Schelter (University of Amsterdam)*; Stefan Grafberger (TU Munich); Ted Dunning (MapR Technologies)

VF^2Boost: Very Fast Vertical Federated Gradient Boosting for Cross-Enterprise Learning

Fangcheng Fu (Peking University)*; Yingxia Shao (BUPT); Lele Yu (Peking University); Jiawei Jiang (ETH Zurich); Huanran Xue (Tencent Inc.); Yangyu Tao (Tencent); Bin Cui (Peking University)

New Algorithms for Monotone Classification

Yufei Tao and Yu Wang

Slot 3: Distributed Training and Graph Networks

Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce

Xupeng Miao (Peking University)*; Xiaonan Nie (Peking University); Yingxia Shao (BUPT); Zhi Yang (Peking University); Jiawei Jiang (ETH Zurich); Lingxiao Ma (Peking University); Bin Cui (Peking University)

Agile and Accurate CTR Prediction Model Training for Massive-Scale Online Advertising Systems

Zhiqiang Xu (Baidu Research); Dong Li (Baidu); Weijie Zhao (Baidu Research)*; Xing Shen (Baidu); Tianbo Huang (Baidu); Xiaoyun Li (Rutgers University); Ping Li (Baidu Research)

ALG: Fast and Accurate Active Learning Framework for Graph Convolutional Networks

Wentao Zhang (Peking University)*; Yu Shen (Peking University); Yang Li (Peking University); Lei Chen (Hong Kong University of Science and Technology); Zhi Yang (Peking University); Bin Cui (Peking University)

Vertex-Centric Visual Programming for Graph Neural Networks

Yidi Wu (The Chinese University of Hong Kong)*; Yuntao Gui (The Chinese University of Hong Kong); Tatiana Jin (CUHK); James Cheng (CUHK); Xiao Yan (Southern University of Science and Technology); Peiqi Yin (Southern University of Science and Technology); Yufei Cai (Southern University of Science and Technology); Bo Tang (Southern University of Science and Technology); Fan Yu (Huawei Technologies Co. Ltd)

... ...

SIGMOD Curated Session:
Data Structures


Session Chair:
Manos Athanassoulis

Multimedia II Hall 3 (多二3厅)

Zoom Link
Youtube Live
Bilibili Live

Slot 1: Filters, Trees, Compression

Conditional Cuckoo Filters

Daniel Ting (Tableau Software)*; Rick Cole (Tableau)

Vector Quotient Filters: Overcoming the Time/Space Trade-Off in Filter Design

Prashant Pandey (LBNL & UC Berkeley)*; Alex Conway (VMware Research); Joe Durie (Rutgers University); Michael A Bender (Stony Brook); Martin Farach-Colton (Rutgers University); Rob Johnson (VMware Research)

Building Fast and Compact Sketches for Approximately Multi-Set Multi-Membership Querying

Rundong Li (Xi'an Jiaotong University)*; Pinghui Wang (Xi'an Jiaotong University); Jiongli Zhu (Xi'an Jiaotong University); Junzhou Zhao (Xi'an Jiaotong University); Jia Di (Xi'an Jiaotong University); Xiaofei Yang (Xi'an Jiaotong University); Kai Ye (Xi'an Jiaotong University)

Fast Processing and Querying of 170TB of Genomics Data via a Repeated And Merged BloOm Filter (RAMBO)

Gaurav Gupta (Rice University)*; Minghao Yan (Rice University); Benjamin Coleman (Ric); Bryce Kille (Rice University); R. A. Leo Elworth (Rice University); Tharun Medini (Rice University); Todd Treangen (Rice University); Anshumali Shrivastava (Rice University)

A-Tree: A Dynamic Data Structure to Efficiently Index Arbitrary Boolean Expressions

Shuping Ji (Institute of Software, Chinese Academy of Sciences)*; Hans-Arno Jacobsen (University of Toronto)

Adaptive Compression for Fast Scans on String Columns

Yannis E Foufoulas (University of Athens)*; Lefteris Sidirourgos (National and Kapodistrian University of Athens); Eleftherios Stamatogiannakis (University of Athens); Yannis Ioannidis (University of Athens)

Slot 2: Sketches and (their) Applications

COMPASS: Online Sketch-based Query Optimization for In-Memory Databases

Yesdaulet Izenov (University of California, Merced); Asoke Datta (University of California, Merced); Florin Rusu (UC Merced)*; Jun Hyung Shin (University of California, Merced)

Correlation Sketches for Approximate Join-Correlation Queries

Aécio Santos (New York University)*; Aline Bessa (New York University); Fernando Chirigati (Springer Nature); Christopher Musco (New York University); Juliana Freire (New York University)

At-the-time and Back-in-time Persistent Sketches

Benwei Shi (University of Utah)*; Zhuoyue Zhao (University of Utah); Yanqing Peng (University of Utah); Feifei Li (University of Utah); Jeff Phillips (University of Utah)

Bidirectionally Densifying LSH Sketches with Empty Bins

Peng Jia (Xi'an Jiaotong University)*; Pinghui Wang (Xi'an Jiaotong University); Junzhou Zhao (Xi'an Jiaotong University); Shuo Zhang (Xi'an Jiaotong University); Yiyan Qi (Xi'an Jiaotong University); Min Hu (China Mobile Research Institute); Chao Deng (China Mobile Research Institute); Xiaohong Guan (Xi'an Jiaotong University)

Active Sampling Count Sketch (ASCS) for Online SparseEstimation of a Trillion Scale Covariance Matrix

Zhenwei Dai (Rice University)*; Aditya Desai (Rice University); Anshumali Shrivastava (Rice University); Reinhard Heckel (Rice University)

A Learned Sketch for Subgraph Counting

Kangfei Zhao (The Chinese University of Hong Kong)*; Jeffrey Xu Yu (Chinese University of Hong Kong); Hao Zhang (Chinese University of Hong Kong); Qiyan Li (Wuhan University ); Yu Rong (Tencent AI Lab)

... ...

SIGMOD Curated Session:
Streams


Session Chair: Jorge Quiané

Multimedia II Hall 5 (多二5厅)

Zoom Link
Youtube Live
Bilibili Live

Slot 1:

EIRES: Efficient Integration of Remote Data in Event Stream Processing

Bo Zhao (Humboldt University of Berlin)*; Han van der Aa (Universität Mannheim); Thanh Tam Nguyen (Leibniz Universitat Hannover); Quoc Viet Hung Nguyen (Griffith University); Matthias Weidlich (Humboldt-Universität zu Berlin)

Index-Accelerated Pattern Matching in Event Stores

Michael Körber (University of Marburg)*; Nikolaus Glombiewski (University of Marburg); Bernhard Seeger (University of Marburg)

Parallelizing Intra-Window Join on Multicores: An Experimental Study

Shuhao Zhang (Singapore University of Technology and Design)*; Yancan Mao (National University of Singapore); Jiong He (A*Star); Philipp Marian Grulich (Technische Universität Berlin); Steffen Zeuch (Humboldt Universität zu Berlin); Bingsheng He (National University of Singapore); Richard T.B. Ma (National University of Singapore); Volker Markl (Technische Universität Berlin)

To Share, or not to Share Online Event Trend Aggregation Over Bursty Event Streams

Olga Poppe (Microsoft)*; Chuan Lei (IBM Research - Almaden); Lei Ma (WPI); Allison M Rozet (MathWorks); Elke A Rundensteiner (WPI)

MuSE Graphs for Flexible Distribution of Event Stream Processing in Networks

Samira Akili (HU Berlin )*; Matthias Weidlich (Humboldt-Universität zu Berlin)

Imminence Monitoring of Critical Events: A Representation Learning Approach

Yan Li (University of Massachusetts, Lowell); Tingjian Ge (University of Massachusetts, Lowell)*

Slot 2:

BurstSketch: Finding Bursts in Data Streams

Zheng Zhong (Peking University)*; Shen Yan (Peking University); Zikun Li (Peking University); Decheng Tan (Peking University); Tong Yang (Peking University); Bin Cui (Peking University)

Terrace: A Hierarchical Graph Container for Skewed Dynamic Graphs

Prashant Pandey (LBNL & UC Berkeley)*; Brian Wheatman (Johns Hopkins University); Helen Xu (MIT); Aydin Buluc (Lawrence Berkeley National Laboratory)

Out of Many We are One: Measuring Item Batch with Clock-Sketch

Peiqing Chen (Peking University); Dong Chen (Peking University); Lingxiao Zheng (Peking University); Jizhou Li (Peking University); Tong Yang (Peking University)*

Sliding Window-based Approximate Triangle Counting over Streaming Graphs with Duplicate Edges

Xiangyang Gou (Peking University); Lei Zou (Peking University)*

RisGraph: A Real-Time Streaming System for Evolving Graphs to Support Sub-millisecond Per-update Analysis at Millions ops/s

Guanyu Feng (Tsinghua University)*; Zixuan Ma (Tsinghua University); Daixuan Li (Tsinghua University); Shengqi Chen (Tsinghua University); Xiaowei Zhu (Tsinghua University); Wentao Han (Tsinghua University); Wenguang Chen (Tsinghua University)

Distributed Stream kNN Join

Amirhesam Shahvarani (Technical University of Munich)*; Hans-Arno Jacobsen (TUM)

... ...

SIGMOD Tutorial:
Practical Security and
Privacy for Database Systems


Third International Conference Hall (国三)

Zoom Link
Youtube Live
Bilibili Live
Youtube Video
Bilibili Video

Presenters:Xi He (University of Waterloo); Jennie Rogers (Northwestern University); Johes Bater (Duke University); Ashwin Machanavajjhala (Duke University); Chenghong Wang (Duke University); Xiao Wang (Northwestern University)
Abstract: Computing technology has enabled massive digital traces of our personal lives to be collected and stored. These datasets play an important role in numerous real-life applications and research analysis, such as contact tracing for COVID 19, but they contain sensitive information about individuals. When managing these datasets, privacy is usually addressed as an afterthought, engineered on top of a database system optimized for performance and usability. This has led to a plethora of unexpected privacy attacks in the news. Specialized privacy-preserving solutions usually require a group of privacy experts and they are not directly transferable to other domains. There is an urgent need for a general trustworthy database system that offers end-to-end security and privacy guarantees. In this tutorial, we will first describe the security and privacy requirements for database systems in different settings and cover the state-of-the-art tools that achieve these requirements. We will also show challenges in integrating these techniques together and demonstrate the design principles and optimization opportunities for these security and privacy-aware database systems.  This is designed to be a three-hour tutorial.

... ...

PODS Session:
Counting and Enumeration


Session Chair: Phokion Kolaitis

Administration Convention Room (行政会议室)

Zoom Link
Youtube Live
Bilibili Live

Model Counting meets F0 Estimation

A. Pavan, N. V. Vinodchandran, Arnab Bhattacharya and Kuldeep S. Meel

A Dichotomy for the Generalized Model Counting Problem for Unions of Conjunctive Queries

Batya Kenig and Dan Suciu

Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries

Nofar Carmeli, Nikolaos Tziavelis, Wolfgang Gatterbauer, Benny Kimelfeld and Mirek Riedewald

... ...


22 JUN

10:30 - 11:30

PODS Invited Tutorial 2:
Approximation Algorithms
for Large Scale Data Analysis


Session Chair:
Mahmoud Abo Khamis

Administration Convention Room (行政会议室)

Zoom Link
Youtube Live
Bilibili Live
Youtube Video
Bilibili Video

Speaker: Barna Saha (University of California Berkeley)
Abstract: One of the greatest successes of computational complexity theory is the classification of countless fundamental computational problems into polynomial-time and NP-hard ones, two classes that are often referred to as tractable and intractable, respectively. However, this crude distinction of algorithmic efficiency is clearly insufficient when handling today’s large scale of data. We need a finer-grained design and analysis of algorithms that pinpoints the exact exponent of polynomial running time, and a better understanding of when a speed-up is not possible. Based on stronger complexity assumptions than P vs NP, like the Strong Exponential Time Hypothesis, recently conditional lower bounds for a variety of fundamental problems in P have been proposed. Unfortunately, these conditional lower bounds often break down when one may settle for a near-optimal solution. Indeed, approximation algorithms can play a significant role when designing fast algorithms not just for traditional NP Hard problems, but also for polynomial time problems.
For some applications arising in machine learning, the time complexity of the underlying algorithms is not sufficient to ensure a fast solution. It is often needed to collect side information about the data to ensure high accuracy. This requires low query complexity.
In this presentation, we will cover new facets of fast algorithm design for large scale data analysis that emphasizes on the role of developing approximation algorithms for better polynomial time/query complexity.

Paper Link ... ...


22 JUN

11:30 - 12:30


22 JUN

12:30 - 13:30

Sponsor Talk of SequoiaDB

Third International Conference Hall (国三)

Zoom Link
Youtube Live
Bilibili Live


22 JUN

13:30 - 15:00

SIGMOD Demo Plenary (1)

Session Chair:
Spyros Blanas
Katja Hose

Zoom Link

mlinspect: a Data Distribution Debugger for Machine Learning Pipelines

Stefan Grafberger (TU Munich); Shubha Guha (University of Amsterdam); Julia Stoyanovich (New York University); Sebastian Schelter (University of Amsterdam)*

Dendrite: Bolt-on Adaptivity for Data Systems

Brad Glasbergen (University of Waterloo)*; Fangyu Wu (University of Waterloo); Khuzaima Daudjee (University of Waterloo)

Crosstown Foundry: A Scalable Data-driven Journalism Platform for Hyper-local News

Authors Online
Luciano Nocera (University of Southern California)*; Giorgos Constantinou (University of Southern California); Luan V Tran (University of Southern California); Seon Ho Kim (University of Southern California); Gabriel Kahn (University of Southern California); Cyrus Shahabi (Computer Science Department. University of Southern California)

Vertex-Centric Visual Programming for Graph Neural Networks

Authors Online
Yidi Wu (The Chinese University of Hong Kong)*; Yuntao Gui (The Chinese University of Hong Kong); Tatiana Jin (CUHK); James Cheng (CUHK); Xiao Yan (Southern University of Science and Technology); Peiqi Yin (Southern University of Science and Technology); Yufei Cai (Southern University of Science and Technology); Bo Tang (Southern University of Science and Technology); Fan Yu (Huawei Technologies Co. Ltd)

Transforming ML Predictive Pipelines into SQL with MASQ

Francesco Del Buono (University of Modena e Reggio Emilia); Matteo Paganelli (Università di Modena e Reggio Emilia); Paolo Sottovia (Huawei); Matteo Interlandi (Microsoft); Francesco Guerra (University of Modena e Reggio Emilia)*

IndoorViz: A Demonstration System for Indoor Spatial Data Management

Authors Online
Yue Li (East China Normal University); Shiyu Yang (Guangzhou University)*; Muhammad Aamir Cheema (Monash University); Zhou Shao (Monash University); Xuemin Lin (University of New South Wales)

FeatTS: Feature-based Time Series Clustering

Authors Online
Donato Tiano (Université Lyon 1)*; Angela Bonifati (Univ. of Lyon); Raymond Ng (UBC)

Demonstrating UDO: A Unified Approach for Optimizing Transaction Code, Physical Design, and System Parameters via Reinforcement Learning

Junxiong Wang (Cornell University)*; Immanuel Trummer (Cornell); Debabrota Basu (Inria)



... ...

SIGMOD Demo Plenary (2)

Session Chair:
Spyros Blanas
Katja Hose

Zoom Link

TSExplain: Surfacing Evolving Explanations for Time Series

Yiru Chen (Columbia University)*; Silu Huang (Microsoft)

CoCo: Interactive Exploration of Conformance Constraints for Data Understanding and Data Cleaning

Authors Online
Anna Fariha (University of Massachusetts Amherst)*; Ashish Tiwari (Microsoft); Alexandra Meliou (University of Massachusetts Amherst); Arjun Radhakrishna (Microsoft); Sumit Gulwani (Microsoft Research)

SOFOS: Demonstrating the Challenges of Materialized View Selection on Knowledge Graphs

Georgia Troullinou (FORTH-ICS); Haridimos Kondylakis (FORTH-ICS); Matteo Lissandrini (Aalborg University); Davide Mottin (Aarhus University)*

Boomerang: Proactive Insight-Based Recommendations for Guiding Conversational Data Analysis

Doris Lee (UC Berkeley); Abdul H Quamar (IBM Research Almaden)*; Eser Kandogan (Megagon Labs); Fatma Ozcan (Google)

Demonstrating Robust Voice Querying with MUVE: Optimally Visualizing Results of Phonetically Similar Queries

Ziyun Wei (Cornell University)*; Immanuel Trummer (Cornell); Connor Anderson (Cornell University)

QuTE: Answering Quantity Queries from Web Tables

Authors Online
Vinh Thinh Ho (Max Planck Institute for Informatics)*; Koninika Pal (Max Planck Institute for Informatics ); Gerhard Weikum (Max-Planck-Institut fur Informatik)

PyExplore: Query Recommendations for Data Exploration without Query Logs

Apostolos Glenis (UNIPI)*; Georgia Koutrika (ATHENA Research Center)

INCA: Inconsistency-Aware Data Profiling and Querying

Ousmane Issa (UCA, LIMOS)*; Angela Bonifati (Univ. of Lyon); Farouk Toumani (UCA, LIMOS)

CAvSAT: Answering Aggregation Queries over Inconsistent Databases via SAT Solving

Authors Online
Akhil A Dixit (University of California, Santa Cruz)*; Phokion Kolaitis (UCSC & IBM Research - Almaden)

RawVis: A System for Efficient In-situ Visual Analytics

Stavros Maroulis (Research Center ATHENA)*; Nikos Bikakis (Athena); George Papastefanatos (ATHENA Research Center); Panos Vassiliadis (University of Ioannina); Yannis Vassiliou (NTUA)



... ...

SIGMOD Demo Plenary (3)

Session Chair:
Spyros Blanas
Katja Hose

Zoom Link

SRA: Smart Recovery Advisor for Cyber Attacks

Ka-Ho Chow (Georgia Institute of Technology)*; Umesh Deshpande (IBM Research - Almaden); Sangeetha Seshadri (IBM Research - Almaden); Ling Liu (Georgia Institute of Technology)

TardisDB: Extending SQL to Support Versioning

Maximilian E Schüle (Technical University of Munich)*; Josef Schmeißer (Technical University of Munich); Thomas Blum (TUM); Alfons Kemper (TUM); Thomas Neumann (TUM)

A System for Automated Open-Source Threat Intelligence Gathering and Management

Authors Online
Peng Gao (University of California, Berkeley)*; Xiaoyuan Liu (University of California, Berkeley); Edward Choi (University of California, Berkeley); Bhavna Soman (Microsoft); Chinmaya Mishra (Microsoft); Kate Farris (Microsoft); Dawn Song (UC Berkeley)

DataMingler: A Novel Approach to Data Virtualization

Damianos Chatziantoniou (Athens University of Economics and Business)*; Verena Kantere (National Technical University of Athens)

GRIP: Constraint-based Explanation of Missing Answers for Graph Queries

Qi Song (Amazon.com)*; Hanchao Ma (Case Western Reserve University); Peng Lin (Washington State University); Yinghui Wu (Case Western Reserve University)

A Byzantine Fault Tolerant Storage for Permissioned Blockchain

Authors Online
Xiaodong Qi (East China Normal University)*; Zhihao Chen (East China Normal University); Zhao Zhang (East China Normal University); Cheqing Jin (East China Normal University); Aoying Zhou (East China Normal University ); Haizhen Zhuo (Ant Group); Quangqing Xu (Ant Group)

Attaining Workload Scalability and Strong Consistency for Replicated Databases with Hihooi

Michael Georgiou (Cyprus University of Technology); Michael Panayiotou (Cyprus University of Technology); Lambros Odysseos (Cyprus University of Technology); Aristodemos Paphitis (Cyprus University of Technology); Michael Sirivianos (Cyprus University of Technology); Herodotos Herodotou (Cyprus University of Technology)*

DPGraph: A Benchmark Platform for Differentially Private Graph Analysis

Siyuan Xia (University of Waterloo); Beizhen Chang (University of Waterloo); Karl Knopf (University of Waterloo); Yihan He (New York University); Yuchao Tao (Duke University); Xi He (University of Waterloo)*

BEER: Blocking for Effective Entity Resolution

Sainyam Galhotra (University of Massachusetts Amherst)*; Donatella Firmani (Roma Tre University); Barna Saha (University of California, Berkeley); Divesh Srivastava (AT&T Labs Research)



... ...


22 JUN

15:00 - 15:30

Sponsor Talk of Alibaba

Third International Conference Hall (国三)

Zoom Link
Youtube Live
Bilibili Live


22 JUN

15:30 - 16:30

SIGMOD Curated Session:
High Performance Systems


Session Chair:
Pinar Tozun
Tianzheng Wang

Multimedia II Hall 1 (多二1厅)

Zoom Link
Youtube Live
Bilibili Live

Slot 1: Invited Talk

The Case for In-Process Analytics
Hannes Mühleisen (CWI)

Storage system design for machine learning
Ana Klimovic (ETH)

Slot 2: Modern Networks & Storage

DFI - The Data Flow Interface for High-Speed Networks

Lasse Thostrup (TU Darmstadt)*; Jan Skrzypczak (Zuse Institue, Berlin); Matthias Jasny (TU Darmstadt); Tobias Ziegler (TU Darmstadt); Carsten Binnig (TU Darmstadt)

CoRM: Compactable Remote Memory over RDMA

Konstantin Taranov (ETH Zurich)*; Salvatore Di Girolamo (ETH Zurich); Torsten Hoefler (ETH Zurich)

Nova-LSM: A Distributed, Component-based LSM-tree Key-value Store

Haoyu Huang (University of Southern California)*; Shahram Ghandeharizadeh (USC)

Chucky: A Succinct Cuckoo Filter for LSM-Tree

Niv Dayan (Pliops)*; Moshe Twitto (Pliops)

Spitfire: A Three-Tier Buffer Manager for Volatile and Non-Volatile Memory

Xinjing Zhou (Tencent Inc.)*; Joy Arulraj (Georgia Tech); Andrew Pavlo (Carnegie Mellon University); David E Cohen (Intel)

Maximizing Persistent Memory Bandwidth Utilization for OLAP Workloads

Björn Daase (Hasso Plattner Institute, University of Potsdam)*; Lars Jonas Bollmeier (Hasso Plattner Institute, University of Potsdam); Lawrence Benson (Hasso Plattner Institute, University of Potsdam); Tilmann Rabl (HPI, University of Potsdam)

Slot 3: Intra-query/Transaction Optimizations

MxTasks: How to Make Efficient Synchronization and Prefetching Easy

Jan Mühlig (TU Dortmund University)*; Jens Teubner (TU Dortmund University)

Building Advanced SQL Analytics From Low-Level Plan Operators

André Kohn (Technical University of Munich)*; Viktor Leis ( Friedrich-Alexander-Universitat Erlangen-Nürnberg); Thomas Neumann (TU Munich)

To partition, or not to partition, that is the join question in a real system.

Maximilian Bandle (TUM)*; Jana Giceva (TU Munich); Thomas Neumann (TUM)

Jigsaw: A Data Storage and Query Processing Engine for Irregular Table Partitioning

Donghe Kang (The Ohio State University)*; Ruochen Jiang (The Ohio State University); Spyros Blanas (The Ohio State University)

Self-Tuning Query Scheduling for Analytical Workloads

Benjamin Wagner (Technical University of Munich)*; André Kohn (Technical University of Munich); Thomas Neumann (TU Munich)

Klink: Progress-Aware Scheduling for Streaming Data Systems

Omar Farhat (University of Waterloo)*; Khuzaima Daudjee (University of Waterloo); Leonardo Querzoni (Sapienza University of Rome)

... ...

SIGMOD Curated Session:
Query Processing
and Optimization


Session Chair:
S. Sudarshan
Renata Borovica-Gajic
Oliver Kennedy

Multimedia II Hall 3 (多二3厅)

Zoom Link
Youtube Live
Bilibili Live

Slot 1: Query Processing

Worst-Case Optimal Graph Joins in Almost No Space

Diego Arroyuelo (UTFSM, Chile); Aidan Hogan (University of Chile); Gonzalo Navarro (University of Chile); Juan Reutter (PUC)*; Javiel Rojas (University of Chile); Adrian Soto Suárez (FIC, UAI Chile)

One WITH RECURSIVE is Worth Many GOTOs

Denis Hirn (Universität Tübingen); Torsten Grust (Universität Tübingen)*

Resource-efficient Shared Query Execution via Exploiting Time Slackness

Dixin Tang (University of California, Berkeley)*; Zechao Shang (University of Chicago); William W Ma (University of Chicago); Aaron J Elmore (University of Chicago); Sanjay Krishnan (U Chicago)

TreeToaster: Towards an IVM-Optimized Compiler

Darshana Balakrishnan (State University of New York at Buffalo)*; Carl Nuessle (University of Buffalo, SUNY); Oliver A Kennedy (University at Buffalo, SUNY); Lukasz Ziarek (University at Buffalo, SUNY)

SOFOS: Demonstrating the Challenges of Materialized View Selection on Knowledge Graphs

Georgia Troullinou (FORTH-ICS); Haridimos Kondylakis (FORTH-ICS); Matteo Lissandrini (Aalborg University); Davide Mottin (Aarhus University)*

HADAD: A Lightweight Approach for Optimizing Hybrid Complex Analytics Queries

Rana Alotaibi (University of California, San Diego)*; Bogdan Cautis (University of Paris-Saclay); Alin Deutsch (UCSD); Ioana Manolescu (INRIA and Institut Polytechnique de Paris)

Slot 2: Sampling and Uncertain Data

Weighted Distinct Sampling: Cardinality Estimation for SPJ Queries

Yuan Qiu (Hong Kong Univ. of Science and Technology ); Yilei Wang (HKUST); Ke Yi (Hong Kong Univ. of Science and Technology)*; Feifei Li (Alibaba Group); Bin Wu (Alibaba); Chaoqun Zhan (Alibaba Inc.)

PGMJoins: Random Join Sampling with Graphical Models

Ali Mohammadi Shanghooshabad (University of Warwick); Meghdad Kurmanji (University of Warwick); Qingzhi Ma (University of Warwick); Michael Shekelyan (University of Warwick); Mehrdad Almasi (University of Warwick); Peter Triantafillou (University of Warwick)*

Small Selectivities Matter: Lifting the Burden of Empty Samples

Axel Hertzschuch (Technische Universität Dresden)*; Guido Moerkotte (University of Mannheim); Wolfgang Lehner (TU Dresden); Norman May (SAP SE); Florian Wolf (SAP SE); Lars Fricke (SAP SE)

Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing

Xi Liang (University of Chicago)*; Stavros Sintos (University of Chicago); Zechao Shang (University of Chicago); Sanjay Krishnan (UChicago)

Efficient Uncertainty Tracking for Complex Queries with Attribute-level Bounds

Su Feng (Illinois Institute of Technology)*; Aaron Huber (SUNY Buffalo); Oliver A Kennedy (University at Buffalo, SUNY); Boris Glavic (Illinois Institute of Technology)

CAvSAT: Answering Aggregation Queries over Inconsistent Databases via SAT Solving

Akhil A Dixit (University of California, Santa Cruz)*; Phokion Kolaitis (UCSC & IBM Research - Almaden)

Slot 3: Query Processing Systems

The Power of Nested Parallelism in Big Data Processing -- Hitting Three Flies with One Slap

Gábor E. Gévay (Technische Universität Berlin)*; Jorge Arnulfo Quiane Ruiz (TU Berlin); Volker Markl (Technische Universität Berlin)

Vertex-centric Parallel Computation of SQL Queries

Ainur AS Smagulova (UC San Diego)*; Alin Deutsch (UCSD)

Good to the last bit: Data-Driven Encoding with CodecDB

Hao Jiang (University of Chicago)*; Chunwei Liu (University of Chicago); John Paparrizos (University of Chicago); Andrew A Chien (University of Chicago); Jihong Ma (Alibaba Group); Aaron J Elmore (University of Chicago)

Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities

Doris Xin (UC Berkeley)*; Hui Miao (Google); Aditya Parameswaran (University of California, Berkeley); Neoklis Polyzotis (Google)

Logical Schema Design that Quantifies Update Inefficiency and Join Efficiency

Sebastian Link (University of Auckland)*; Ziheng Wei (University of Auckland)

Shedding Light on Opaque Application Queries

Kapil Khurana (Indian Institute of Science); Jayant Haritsa (Indian Institute of Science)*

... ...

SIGMOD Curated Session:
Security, Fairness and Privacy


Session Chair:
Graham Cormode
Xiaokui Xiao

Multimedia II Hall 5 (多二5厅)

Zoom Link
Youtube Live
Bilibili Live

Slot 1: Privacy

On Optimizing the Trade-off between Privacy and Utility in Data Provenance

Daniel Deutch (Tel Aviv University); Ariel Frankenthal (Tel Aviv University); Amir Gilad (Duke University)*; Yuval Moskovitch (University of Michigan)

DP-Sync: Hiding Update Patterns in Secure Outsourced Databases with Differential Privacy

Chenghong Wang (Duke University)*; Johes Bater (Duke University); Kartik Nayak (DUKE UNIVERSITY); Ashwin Machanavajjhala (Duke)

Residual Sensitivity for Differentially Private Multi-Way Joins

Wei DONG (Hong Kong University of Science and Technology, Hong Kong); Ke Yi (" Hong Kong University of Science and Technology, Hong Kong")*

PCOR: Private Contextual Outlier Releasevia Differentially Private Search

Masoumeh Shafieinejad (University of Waterloo)*; Florian Kerschbaum (University of Waterloo); Ihab F Ilyas (U. of Waterloo)

DPGraph: A Benchmark Platform for Differentially Private Graph Analysis

Siyuan Xia (University of Waterloo); Beizhen Chang (University of Waterloo); Karl Knopf (University of Waterloo); Yihan He (New York University); Yuchao Tao (Duke University); Xi He (University of Waterloo)*

PRISM: Private Verifiable Set Computation over Multi-Owner Outsourced Databases

Yin Li (Xinyang Normal University); Dhrubajyoti Ghosh (UC Irvine); Peeyush Gupta (UC Irvine); Sharad Mehrotra (U.C. Irvine); Nisha Panwar (UC Irvine); Shantanu Sharma (UC Irvine)*

Slot 2: Security

Secure Yannakakis: Join-Aggregate Queries over Private Data

Yilei Wang (HKUST); Ke Yi (" Hong Kong University of Science and Technology, Hong Kong")*

SQL Ledger: Cryptographically Verifiable Data in Azure SQL Database

Panagiotis Antonopoulos (Microsoft)*; Raghav Kaushik (Microsoft); Hanuma Kodavalla (Microsoft); Sergio Rosales Aceves (Microsoft); Reilly Wong (Microsoft); Jason Anderson (Microsoft); Jakub Szymaszek (Microsoft)

When the Recursive Diversity Anonymity Meets the Ring Signature

Wangze Ni (Hong Kong University of Science and Technology); Peng CHENG (East China Normal University)*; Lei Chen (Hong Kong University of Science and Technology); Xuemin Lin (University of New South Wales)

SRA: Smart Recovery Advisor for Cyber Attacks

Ka-Ho Chow (Georgia Institute of Technology)*; Umesh Deshpande (IBM Research - Almaden); Sangeetha Seshadri (IBM Research - Almaden); Ling Liu (Georgia Institute of Technology)

A System for Automated Open-Source Threat Intelligence Gathering and Management

Peng Gao (University of California, Berkeley)*; Xiaoyuan Liu (University of California, Berkeley); Edward Choi (University of California, Berkeley); Bhavna Soman (Microsoft); Chinmaya Mishra (Microsoft); Kate Farris (Microsoft); Dawn Song (UC Berkeley)

De-anonymization Attacks on Neuroimaging Datasets

Vikram Ravindra (Purdue University)*; Ananth Grama (Purdue University)

Slot 3: Fairness

EquiTensors: Learning Fair Integrations of Heterogeneous Urban Data

An Yan (University of Washington)*; Bill G Howe (University of Washington)

OmniFair: A Declarative System for Model-Agnostic Group Fairness in Machine Learning

Hantian Zhang (Georgia Tech)*; Xu Chu (GATECH); Abolfazl Asudeh (University of Illinois at Chicago); Shamkant Navathe (GaTech)

Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals

Sainyam Galhotra (University of Massachusetts Amherst)*; Romila Pradhan (University of California San Diego); Babak Salimi (Unievristy of California at San Diego)

Slice Tuner: A Selective Data Acquisition Framework for Accurate and Fair Machine Learning Models

KIHYUN TAE (KAIST); Steven Whang (KAIST)*

Enforcing Constraints for Machine Learning Systems via Declarative Feature Selection: An Experimental Study

Felix Neutatz (TU Berlin)*; Felix Biessmann (Einstein Center Digital Future); Ziawasch Abedjan (Leibniz Universität Hannover)

mlinspect: a Data Distribution Debugger for Machine Learning Pipelines

Stefan Grafberger (TU Munich); Shubha Guha (University of Amsterdam); Julia Stoyanovich (New York University); Sebastian Schelter (University of Amsterdam)*

... ...

SIGMOD Tutorial:
AI Meets Database: AI4DB and DB4AI


Third International Conference Hall (国三)

Zoom Link
Youtube Live
Bilibili Live
Youtube Video
Bilibili Video

Presenters: Guoliang Li (Tsinghua University, China); Xuanhe Zhou (Tsinghua University, China); Lei Cao (MIT, USA)
Abstract: Database and Artificial Intelligence (AI) can benefit from each other. On one hand, AI can make database more intelligent (AI4DB). For example, traditional empirical database optimization techniques (e.g., cost estimation, join order selection, knob tuning, index and view advisor) cannot meet the high-performance requirement for large-scale database instances, various applications and diversified users, especially on the cloud. Fortunately, learning-based techniques can alleviate this problem. On the other hand, database techniques can optimize AI models (DB4AI). For example, AI is hard to deploy, because it requires developers to write complex codes and train complicated models. Database techniques can be used to reduce the complexity of using AI models, accelerate AI algorithms and provide AI capability inside databases. DB4AI and AI4DB have been extensively studied recently. In this tutorial, we review existing studies on AI4DB and DB4AI. For {AI4DB}, we review the techniques on learning-based database configuration, optimization, design, monitoring, and security. For {DB4AI}, we review AI-oriented declarative language, data governance, training acceleration, and inference acceleration. Finally, we provide research challenges and future directions in AI4DB and DB4AI.

... ...

PODS Session:
Test-of-Time Award and Data Streams


Session Chair: Angela Bonifati

Administration Convention Room (行政会议室)

Zoom Link
Youtube Live
Bilibili Live

Tight bounds for Lp samplers, finding duplicates in streams, and related problems (Test-of-Time Award)

Hossein Jowhari, Mert Saglam, Gábor Tardos

Frequent Elements with Witnesses in Data Streams

Christian Konrad

... ...


22 JUN

16:30 - 17:30

PODS Session:
Best Paper Award and Data Streams


Session Chair: Reinhard Pichler

Administration Convention Room (行政会议室)

Zoom Link
Youtube Live
Bilibili Live

Relative Error Streaming Quantiles (Best Paper)

Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler and Pavel Vesely

Stackless Processing of Streamed Trees

Corentin Barloy, Filip Murlak, Charles Paperman

Estimating the Size of Unions of Sets in Streaming Models

Kuldeep S. Meel, N.V. Vinodchandran, Sourav Chakraborty

... ...


22 JUN

17:30 - 18:30

PODS Session:
Multidimensional Data


Session Chair: Batya Kenig

Administration Convention Room (行政会议室)

Zoom Link
Youtube Live
Bilibili Live

New Algorithms for Monotone Classification

Yufei Tao and Yu Wang

Subspace Exploration: Bounds on Projected Frequency Estimation

Graham Cormode, Charlie Dickens and David P. Woodruff

Data-Independent Space Partitionings for Summaries

Graham Cormode, Minos Garofalakis, Michael Shekelyan

... ...


22 JUN

18:30 - 20:00

Reception with Chinese Culture Event

Zoom Link
Youtube Live
Bilibili Live

Youtube Video
Bilibili Video

Location: Qujiang Hotel (曲江宾馆第一国际会议厅)


Second Run


22 JUN

20:00 - 21:00


SIGMOD Keynote:
Utilizing (and Designing) Modern Hardware for Data-Intensive Computations: The Role of Abstraction


Session Chair: Stratos Idreos

Zoom Link
Youtube Live
Bilibili Live
Youtube Video
Bilibili Video

Speaker: Kenneth A. Ross (Columbia University)
Abstract: Modern information-intensive systems, including data management systems, operate on data that is mostly resident in RAM. As a result, the data management community has shifted focus from I/O optimization to addressing performance issues higher in the memory hierarchy.
In this keynote, I will give a personal perspective of these developments, illustrated by work from my group at Columbia University. I will use the concept of abstraction as a lens through which various kinds of optimizations for modern hardware platforms can be understood and evaluated. Through this lens, some “cute implementation tricks” can be seen as much more than mere implementation details.
I will discuss abstractions at various granularities, from single lines of code to whole programming/query languages. I will touch on software and hardware design for data-intensive computations. I will also discuss data processing in a conventional programming language, and how the data management community might contribute to the design of compilers.


22 JUN

21:00 - 21:30

Sponsor Talk of Oracle

Zoom Link
Youtube Live


22 JUN

21:30 - 22:30

SIGMOD Curated Session:
Data Management for ML


Session Chair:
Umar Farooq Minhas
Arun Kumar

Zoom Link
Youtube Live

Keynote 2: The New DBfication of ML/AI
Arun Kumar (UCSD)

Slot 1: Benchmarking and in Database Inference

Towards Demystifying Serverless Machine Learning Training

Jiawei Jiang (ETH Zurich)*; Shaoduo Gan (ETH Zurich); Yue Liu (ETH Zurich); Fanlin Wang (ETHZ); Gustavo Alonso (ETHZ); Ana Klimovic (ETH Zurich); Ankit Singla (ETH Zurich); Wentao Wu (Microsoft Research); Ce Zhang (ETH)

Towards Benchmarking Feature Type Inference for AutoML Platforms

Vraj Shah (University of California, San Diego)*; Jonathan Lacanlale (California State University, Northridge); Premanand Kumar (University of California, San Diego); Kevin Yang (University of California, San Diego); Arun Kumar (University of California, San Diego)

Transforming ML Predictive Pipelines into SQL with MASQ

Francesco Del Buono (University of Modena e Reggio Emilia); Matteo Paganelli (Università di Modena e Reggio Emilia); Paolo Sottovia (Huawei); Matteo Interlandi (Microsoft); Francesco Guerra (University of Modena e Reggio Emilia)*

Slot 2: Privacy & New Algorithms

HedgeCut: Maintaining Randomized Trees for Low-Latency Machine Unlearning

Sebastian Schelter (University of Amsterdam)*; Stefan Grafberger (TU Munich); Ted Dunning (MapR Technologies)

VF^2Boost: Very Fast Vertical Federated Gradient Boosting for Cross-Enterprise Learning

Fangcheng Fu (Peking University)*; Yingxia Shao (BUPT); Lele Yu (Peking University); Jiawei Jiang (ETH Zurich); Huanran Xue (Tencent Inc.); Yangyu Tao (Tencent); Bin Cui (Peking University)

New Algorithms for Monotone Classification

Yufei Tao and Yu Wang

Slot 3: Distributed Training and Graph Networks

Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce

Xupeng Miao (Peking University)*; Xiaonan Nie (Peking University); Yingxia Shao (BUPT); Zhi Yang (Peking University); Jiawei Jiang (ETH Zurich); Lingxiao Ma (Peking University); Bin Cui (Peking University)

Agile and Accurate CTR Prediction Model Training for Massive-Scale Online Advertising Systems

Zhiqiang Xu (Baidu Research); Dong Li (Baidu); Weijie Zhao (Baidu Research)*; Xing Shen (Baidu); Tianbo Huang (Baidu); Xiaoyun Li (Rutgers University); Ping Li (Baidu Research)

ALG: Fast and Accurate Active Learning Framework for Graph Convolutional Networks

Wentao Zhang (Peking University)*; Yu Shen (Peking University); Yang Li (Peking University); Lei Chen (Hong Kong University of Science and Technology); Zhi Yang (Peking University); Bin Cui (Peking University)

Vertex-Centric Visual Programming for Graph Neural Networks

Yidi Wu (The Chinese University of Hong Kong)*; Yuntao Gui (The Chinese University of Hong Kong); Tatiana Jin (CUHK); James Cheng (CUHK); Xiao Yan (Southern University of Science and Technology); Peiqi Yin (Southern University of Science and Technology); Yufei Cai (Southern University of Science and Technology); Bo Tang (Southern University of Science and Technology); Fan Yu (Huawei Technologies Co. Ltd)

... ...

SIGMOD Curated Session:
Data Structures


Session Chair:
Manos Athanassoulis

Zoom Link
Youtube Live

Slot 1: Filters, Trees, Compression

Conditional Cuckoo Filters

Daniel Ting (Tableau Software)*; Rick Cole (Tableau)

Vector Quotient Filters: Overcoming the Time/Space Trade-Off in Filter Design

Prashant Pandey (LBNL & UC Berkeley)*; Alex Conway (VMware Research); Joe Durie (Rutgers University); Michael A Bender (Stony Brook); Martin Farach-Colton (Rutgers University); Rob Johnson (VMware Research)

Building Fast and Compact Sketches for Approximately Multi-Set Multi-Membership Querying

Rundong Li (Xi'an Jiaotong University)*; Pinghui Wang (Xi'an Jiaotong University); Jiongli Zhu (Xi'an Jiaotong University); Junzhou Zhao (Xi'an Jiaotong University); Jia Di (Xi'an Jiaotong University); Xiaofei Yang (Xi'an Jiaotong University); Kai Ye (Xi'an Jiaotong University)

Fast Processing and Querying of 170TB of Genomics Data via a Repeated And Merged BloOm Filter (RAMBO)

Gaurav Gupta (Rice University)*; Minghao Yan (Rice University); Benjamin Coleman (Ric); Bryce Kille (Rice University); R. A. Leo Elworth (Rice University); Tharun Medini (Rice University); Todd Treangen (Rice University); Anshumali Shrivastava (Rice University)

A-Tree: A Dynamic Data Structure to Efficiently Index Arbitrary Boolean Expressions

Shuping Ji (Institute of Software, Chinese Academy of Sciences)*; Hans-Arno Jacobsen (University of Toronto)

Adaptive Compression for Fast Scans on String Columns

Yannis E Foufoulas (University of Athens)*; Lefteris Sidirourgos (National and Kapodistrian University of Athens); Eleftherios Stamatogiannakis (University of Athens); Yannis Ioannidis (University of Athens)

Slot 2: Sketches and (their) Applications

COMPASS: Online Sketch-based Query Optimization for In-Memory Databases

Yesdaulet Izenov (University of California, Merced); Asoke Datta (University of California, Merced); Florin Rusu (UC Merced)*; Jun Hyung Shin (University of California, Merced)

Correlation Sketches for Approximate Join-Correlation Queries

Aécio Santos (New York University)*; Aline Bessa (New York University); Fernando Chirigati (Springer Nature); Christopher Musco (New York University); Juliana Freire (New York University)

At-the-time and Back-in-time Persistent Sketches

Benwei Shi (University of Utah)*; Zhuoyue Zhao (University of Utah); Yanqing Peng (University of Utah); Feifei Li (University of Utah); Jeff Phillips (University of Utah)

Bidirectionally Densifying LSH Sketches with Empty Bins

Peng Jia (Xi'an Jiaotong University)*; Pinghui Wang (Xi'an Jiaotong University); Junzhou Zhao (Xi'an Jiaotong University); Shuo Zhang (Xi'an Jiaotong University); Yiyan Qi (Xi'an Jiaotong University); Min Hu (China Mobile Research Institute); Chao Deng (China Mobile Research Institute); Xiaohong Guan (Xi'an Jiaotong University)

Active Sampling Count Sketch (ASCS) for Online SparseEstimation of a Trillion Scale Covariance Matrix

Zhenwei Dai (Rice University)*; Aditya Desai (Rice University); Anshumali Shrivastava (Rice University); Reinhard Heckel (Rice University)

A Learned Sketch for Subgraph Counting

Kangfei Zhao (The Chinese University of Hong Kong)*; Jeffrey Xu Yu (Chinese University of Hong Kong); Hao Zhang (Chinese University of Hong Kong); Qiyan Li (Wuhan University ); Yu Rong (Tencent AI Lab)

... ...

SIGMOD Curated Session:
Streams


Session Chair: Jorge Quiané

Zoom Link
Youtube Live

Slot 1:

EIRES: Efficient Integration of Remote Data in Event Stream Processing

Bo Zhao (Humboldt University of Berlin)*; Han van der Aa (Universität Mannheim); Thanh Tam Nguyen (Leibniz Universitat Hannover); Quoc Viet Hung Nguyen (Griffith University); Matthias Weidlich (Humboldt-Universität zu Berlin)

Index-Accelerated Pattern Matching in Event Stores

Michael Körber (University of Marburg)*; Nikolaus Glombiewski (University of Marburg); Bernhard Seeger (University of Marburg)

Parallelizing Intra-Window Join on Multicores: An Experimental Study

Shuhao Zhang (Singapore University of Technology and Design)*; Yancan Mao (National University of Singapore); Jiong He (A*Star); Philipp Marian Grulich (Technische Universität Berlin); Steffen Zeuch (Humboldt Universität zu Berlin); Bingsheng He (National University of Singapore); Richard T.B. Ma (National University of Singapore); Volker Markl (Technische Universität Berlin)

To Share, or not to Share Online Event Trend Aggregation Over Bursty Event Streams

Olga Poppe (Microsoft)*; Chuan Lei (IBM Research - Almaden); Lei Ma (WPI); Allison M Rozet (MathWorks); Elke A Rundensteiner (WPI)

MuSE Graphs for Flexible Distribution of Event Stream Processing in Networks

Samira Akili (HU Berlin )*; Matthias Weidlich (Humboldt-Universität zu Berlin)

Imminence Monitoring of Critical Events: A Representation Learning Approach

Yan Li (University of Massachusetts, Lowell); Tingjian Ge (University of Massachusetts, Lowell)*

Slot 2:

BurstSketch: Finding Bursts in Data Streams

Zheng Zhong (Peking University)*; Shen Yan (Peking University); Zikun Li (Peking University); Decheng Tan (Peking University); Tong Yang (Peking University); Bin Cui (Peking University)

Terrace: A Hierarchical Graph Container for Skewed Dynamic Graphs

Prashant Pandey (LBNL & UC Berkeley)*; Brian Wheatman (Johns Hopkins University); Helen Xu (MIT); Aydin Buluc (Lawrence Berkeley National Laboratory)

Out of Many We are One: Measuring Item Batch with Clock-Sketch

Peiqing Chen (Peking University); Dong Chen (Peking University); Lingxiao Zheng (Peking University); Jizhou Li (Peking University); Tong Yang (Peking University)*

Sliding Window-based Approximate Triangle Counting over Streaming Graphs with Duplicate Edges

Xiangyang Gou (Peking University); Lei Zou (Peking University)*

RisGraph: A Real-Time Streaming System for Evolving Graphs to Support Sub-millisecond Per-update Analysis at Millions ops/s

Guanyu Feng (Tsinghua University)*; Zixuan Ma (Tsinghua University); Daixuan Li (Tsinghua University); Shengqi Chen (Tsinghua University); Xiaowei Zhu (Tsinghua University); Wentao Han (Tsinghua University); Wenguang Chen (Tsinghua University)

Distributed Stream kNN Join

Amirhesam Shahvarani (Technical University of Munich)*; Hans-Arno Jacobsen (TUM)

... ...

SIGMOD Tutorial:
Practical Security and
Privacy for
Database Systems


Zoom Link
Youtube Live
Youtube Video
Bilibili Video

Presenters: Xi He (University of Waterloo); Jennie Rogers (Northwestern University); Johes Bater (Duke University); Ashwin Machanavajjhala (Duke University); Chenghong Wang (Duke University); Xiao Wang (Northwestern University)
Abstract: Computing technology has enabled massive digital traces of our personal lives to be collected and stored. These datasets play an important role in numerous real-life applications and research analysis, such as contact tracing for COVID 19, but they contain sensitive information about individuals. When managing these datasets, privacy is usually addressed as an afterthought, engineered on top of a database system optimized for performance and usability. This has led to a plethora of unexpected privacy attacks in the news. Specialized privacy-preserving solutions usually require a group of privacy experts and they are not directly transferable to other domains. There is an urgent need for a general trustworthy database system that offers end-to-end security and privacy guarantees. In this tutorial, we will first describe the security and privacy requirements for database systems in different settings and cover the state-of-the-art tools that achieve these requirements. We will also show challenges in integrating these techniques together and demonstrate the design principles and optimization opportunities for these security and privacy-aware database systems.  This is designed to be a three-hour tutorial.

... ...

Student
Research
Competition
(Round 2)


Detail Info

Zoom Link

PODS Session:
Counting and Enumeration


Session Chair: Arnaud Durand

Zoom Link
Youtube Live

Model Counting meets F0 Estimation

A. Pavan, N. V. Vinodchandran, Arnab Bhattacharya and Kuldeep S. Meel

A Dichotomy for the Generalized Model Counting Problem for Unions of Conjunctive Queries

Batya Kenig and Dan Suciu

Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries

Nofar Carmeli, Nikolaos Tziavelis, Wolfgang Gatterbauer, Benny Kimelfeld and Mirek Riedewald

... ...


22 JUN

22:30 - 23:30

PODS Invited Tutorial 2:
Approximation Algorithms
for Large Scale Data Analysis


Session Chair: Juan Reutter

Zoom Link
Youtube Live
Youtube Video
Bilibili Video

Speaker: Barna Saha (University of California Berkeley)
Abstract: One of the greatest successes of computational complexity theory is the classification of countless fundamental computational problems into polynomial-time and NP-hard ones, two classes that are often referred to as tractable and intractable, respectively. However, this crude distinction of algorithmic efficiency is clearly insufficient when handling today’s large scale of data. We need a finer-grained design and analysis of algorithms that pinpoints the exact exponent of polynomial running time, and a better understanding of when a speed-up is not possible. Based on stronger complexity assumptions than P vs NP, like the Strong Exponential Time Hypothesis, recently conditional lower bounds for a variety of fundamental problems in P have been proposed. Unfortunately, these conditional lower bounds often break down when one may settle for a near-optimal solution. Indeed, approximation algorithms can play a significant role when designing fast algorithms not just for traditional NP Hard problems, but also for polynomial time problems.
For some applications arising in machine learning, the time complexity of the underlying algorithms is not sufficient to ensure a fast solution. It is often needed to collect side information about the data to ensure high accuracy. This requires low query complexity.
In this presentation, we will cover new facets of fast algorithm design for large scale data analysis that emphasizes on the role of developing approximation algorithms for better polynomial time/query complexity.

Paper Link ... ...


22 JUN

23:30 - 00:30 (+1 day)

SIGMOD Panel:
Automation of Data Prep, ML,
and Data Science: New Cure or Snake Oil?


Session Chair: Arun Kumar

Zoom Link
Youtube Live


23 JUN

00:30 - 01:30


Break


23 JUN

01:30 - 03:00

SIGMOD Demo Plenary (1)

Session Chair:
Spyros Blanas
Katja Hose

Zoom Link

mlinspect: a Data Distribution Debugger for Machine Learning Pipelines

Authors Online
Stefan Grafberger (TU Munich); Shubha Guha (University of Amsterdam); Julia Stoyanovich (New York University); Sebastian Schelter (University of Amsterdam)*

Dendrite: Bolt-on Adaptivity for Data Systems

Authors Online
Brad Glasbergen (University of Waterloo)*; Fangyu Wu (University of Waterloo); Khuzaima Daudjee (University of Waterloo)

Crosstown Foundry: A Scalable Data-driven Journalism Platform for Hyper-local News

Authors Online
Luciano Nocera (University of Southern California)*; Giorgos Constantinou (University of Southern California); Luan V Tran (University of Southern California); Seon Ho Kim (University of Southern California); Gabriel Kahn (University of Southern California); Cyrus Shahabi (Computer Science Department. University of Southern California)

Vertex-Centric Visual Programming for Graph Neural Networks

Yidi Wu (The Chinese University of Hong Kong)*; Yuntao Gui (The Chinese University of Hong Kong); Tatiana Jin (CUHK); James Cheng (CUHK); Xiao Yan (Southern University of Science and Technology); Peiqi Yin (Southern University of Science and Technology); Yufei Cai (Southern University of Science and Technology); Bo Tang (Southern University of Science and Technology); Fan Yu (Huawei Technologies Co. Ltd)

Transforming ML Predictive Pipelines into SQL with MASQ

Authors Online
Francesco Del Buono (University of Modena e Reggio Emilia); Matteo Paganelli (Università di Modena e Reggio Emilia); Paolo Sottovia (Huawei); Matteo Interlandi (Microsoft); Francesco Guerra (University of Modena e Reggio Emilia)*

IndoorViz: A Demonstration System for Indoor Spatial Data Management

Yue Li (East China Normal University); Shiyu Yang (Guangzhou University)*; Muhammad Aamir Cheema (Monash University); Zhou Shao (Monash University); Xuemin Lin (University of New South Wales)

FeatTS: Feature-based Time Series Clustering

Authors Online
Donato Tiano (Université Lyon 1)*; Angela Bonifati (Univ. of Lyon); Raymond Ng (UBC)

Demonstrating UDO: A Unified Approach for Optimizing Transaction Code, Physical Design, and System Parameters via Reinforcement Learning

Authors Online
Junxiong Wang (Cornell University)*; Immanuel Trummer (Cornell); Debabrota Basu (Inria)



SIGMOD Demo Plenary (2)

Session Chair:
Spyros Blanas
Katja Hose

Zoom Link

TSExplain: Surfacing Evolving Explanations for Time Series

Authors Online
Yiru Chen (Columbia University)*; Silu Huang (Microsoft)

CoCo: Interactive Exploration of Conformance Constraints for Data Understanding and Data Cleaning

Anna Fariha (University of Massachusetts Amherst)*; Ashish Tiwari (Microsoft); Alexandra Meliou (University of Massachusetts Amherst); Arjun Radhakrishna (Microsoft); Sumit Gulwani (Microsoft Research)

SOFOS: Demonstrating the Challenges of Materialized View Selection on Knowledge Graphs

Authors Online
Georgia Troullinou (FORTH-ICS); Haridimos Kondylakis (FORTH-ICS); Matteo Lissandrini (Aalborg University); Davide Mottin (Aarhus University)*

Boomerang: Proactive Insight-Based Recommendations for Guiding Conversational Data Analysis

Authors Online
Doris Lee (UC Berkeley); Abdul H Quamar (IBM Research Almaden)*; Eser Kandogan (Megagon Labs); Fatma Ozcan (Google)

Demonstrating Robust Voice Querying with MUVE: Optimally Visualizing Results of Phonetically Similar Queries

Authors Online
Ziyun Wei (Cornell University)*; Immanuel Trummer (Cornell); Connor Anderson (Cornell University)

QuTE: Answering Quantity Queries from Web Tables

Vinh Thinh Ho (Max Planck Institute for Informatics)*; Koninika Pal (Max Planck Institute for Informatics ); Gerhard Weikum (Max-Planck-Institut fur Informatik)

PyExplore: Query Recommendations for Data Exploration without Query Logs

Authors Online
Apostolos Glenis (UNIPI)*; Georgia Koutrika (ATHENA Research Center)

INCA: Inconsistency-Aware Data Profiling and Querying

Authors Online
Ousmane Issa (UCA, LIMOS)*; Angela Bonifati (Univ. of Lyon); Farouk Toumani (UCA, LIMOS)

CAvSAT: Answering Aggregation Queries over Inconsistent Databases via SAT Solving

Authors Online
Akhil A Dixit (University of California, Santa Cruz)*; Phokion Kolaitis (UCSC & IBM Research - Almaden)

RawVis: A System for Efficient In-situ Visual Analytics

Authors Online
Stavros Maroulis (Research Center ATHENA)*; Nikos Bikakis (Athena); George Papastefanatos (ATHENA Research Center); Panos Vassiliadis (University of Ioannina); Yannis Vassiliou (NTUA)



SIGMOD Demo Plenary (3)

Session Chair:
Spyros Blanas
Katja Hose

Zoom Link

SRA: Smart Recovery Advisor for Cyber Attacks

Authors Online
Ka-Ho Chow (Georgia Institute of Technology)*; Umesh Deshpande (IBM Research - Almaden); Sangeetha Seshadri (IBM Research - Almaden); Ling Liu (Georgia Institute of Technology)

TardisDB: Extending SQL to Support Versioning

Authors Online
Maximilian E Schüle (Technical University of Munich)*; Josef Schmeißer (Technical University of Munich); Thomas Blum (TUM); Alfons Kemper (TUM); Thomas Neumann (TUM)

A System for Automated Open-Source Threat Intelligence Gathering and Management

Peng Gao (University of California, Berkeley)*; Xiaoyuan Liu (University of California, Berkeley); Edward Choi (University of California, Berkeley); Bhavna Soman (Microsoft); Chinmaya Mishra (Microsoft); Kate Farris (Microsoft); Dawn Song (UC Berkeley)

DataMingler: A Novel Approach to Data Virtualization

Authors Online
Damianos Chatziantoniou (Athens University of Economics and Business)*; Verena Kantere (National Technical University of Athens)

GRIP: Constraint-based Explanation of Missing Answers for Graph Queries

Authors Online
Qi Song (Amazon.com)*; Hanchao Ma (Case Western Reserve University); Peng Lin (Washington State University); Yinghui Wu (Case Western Reserve University)

A Byzantine Fault Tolerant Storage for Permissioned Blockchain

Xiaodong Qi (East China Normal University)*; Zhihao Chen (East China Normal University); Zhao Zhang (East China Normal University); Cheqing Jin (East China Normal University); Aoying Zhou (East China Normal University ); Haizhen Zhuo (Ant Group); Quangqing Xu (Ant Group)

Attaining Workload Scalability and Strong Consistency for Replicated Databases with Hihooi

Authors Online
Michael Georgiou (Cyprus University of Technology); Michael Panayiotou (Cyprus University of Technology); Lambros Odysseos (Cyprus University of Technology); Aristodemos Paphitis (Cyprus University of Technology); Michael Sirivianos (Cyprus University of Technology); Herodotos Herodotou (Cyprus University of Technology)*

DPGraph: A Benchmark Platform for Differentially Private Graph Analysis

Authors Online
Siyuan Xia (University of Waterloo); Beizhen Chang (University of Waterloo); Karl Knopf (University of Waterloo); Yihan He (New York University); Yuchao Tao (Duke University); Xi He (University of Waterloo)*

BEER: Blocking for Effective Entity Resolution

Authors Online
Sainyam Galhotra (University of Massachusetts Amherst)*; Donatella Firmani (Roma Tre University); Barna Saha (University of California, Berkeley); Divesh Srivastava (AT&T Labs Research)



PODS Founders Event
in Honor of Turing Award winners
Alfred Aho and Jeffrey Ullman


Speakers:
Alfred Aho, Jeffrey Ullman,
Catriel Beeri, Phil Bernstein,
Ronald Fagin, Moshe Vardi

Session Chair: Leonid Libkin

Zoom Link
Youtube Live


23 JUN

03:00 - 03:15

Sponsor Talk of Amazon

Zoom Link
Youtube Live


23 JUN

03:15 - 03:30

Sponsor Talk of Intel

Zoom Link
Youtube Live


23 JUN

03:30 - 04:30

SIGMOD Curated Session:
High Performance Systems


Session Chair:
Pinar Tozun
Tianzheng Wang

Zoom Link
Youtube Live

Slot 1: Invited Talk

The Case for In-Process Analytics
Hannes Mühleisen (CWI)

Storage system design for machine learning
Ana Klimovic (ETH)

Slot 2: Modern Networks & Storage

DFI - The Data Flow Interface for High-Speed Networks

Lasse Thostrup (TU Darmstadt)*; Jan Skrzypczak (Zuse Institue, Berlin); Matthias Jasny (TU Darmstadt); Tobias Ziegler (TU Darmstadt); Carsten Binnig (TU Darmstadt)

CoRM: Compactable Remote Memory over RDMA

Konstantin Taranov (ETH Zurich)*; Salvatore Di Girolamo (ETH Zurich); Torsten Hoefler (ETH Zurich)

Nova-LSM: A Distributed, Component-based LSM-tree Key-value Store

Haoyu Huang (University of Southern California)*; Shahram Ghandeharizadeh (USC)

Chucky: A Succinct Cuckoo Filter for LSM-Tree

Niv Dayan (Pliops)*; Moshe Twitto (Pliops)

Spitfire: A Three-Tier Buffer Manager for Volatile and Non-Volatile Memory

Xinjing Zhou (Tencent Inc.)*; Joy Arulraj (Georgia Tech); Andrew Pavlo (Carnegie Mellon University); David E Cohen (Intel)

Maximizing Persistent Memory Bandwidth Utilization for OLAP Workloads

Björn Daase (Hasso Plattner Institute, University of Potsdam)*; Lars Jonas Bollmeier (Hasso Plattner Institute, University of Potsdam); Lawrence Benson (Hasso Plattner Institute, University of Potsdam); Tilmann Rabl (HPI, University of Potsdam)

Slot 3: Intra-query/Transaction Optimizations

MxTasks: How to Make Efficient Synchronization and Prefetching Easy

Jan Mühlig (TU Dortmund University)*; Jens Teubner (TU Dortmund University)

Building Advanced SQL Analytics From Low-Level Plan Operators

André Kohn (Technical University of Munich)*; Viktor Leis ( Friedrich-Alexander-Universitat Erlangen-Nürnberg); Thomas Neumann (TU Munich)

To partition, or not to partition, that is the join question in a real system.

Maximilian Bandle (TUM)*; Jana Giceva (TU Munich); Thomas Neumann (TUM)

Jigsaw: A Data Storage and Query Processing Engine for Irregular Table Partitioning

Donghe Kang (The Ohio State University)*; Ruochen Jiang (The Ohio State University); Spyros Blanas (The Ohio State University)

Self-Tuning Query Scheduling for Analytical Workloads

Benjamin Wagner (Technical University of Munich)*; André Kohn (Technical University of Munich); Thomas Neumann (TU Munich)

Klink: Progress-Aware Scheduling for Streaming Data Systems

Omar Farhat (University of Waterloo)*; Khuzaima Daudjee (University of Waterloo); Leonardo Querzoni (Sapienza University of Rome)

... ...

SIGMOD Curated Session:
Query Processing
and Optimization


Session Chair:
S. Sudarshan
Renata Borovica-Gajic
Oliver Kennedy

Zoom Link
Youtube Live

Slot 1: Query Processing

Worst-Case Optimal Graph Joins in Almost No Space

Diego Arroyuelo (UTFSM, Chile); Aidan Hogan (University of Chile); Gonzalo Navarro (University of Chile); Juan Reutter (PUC)*; Javiel Rojas (University of Chile); Adrian Soto Suárez (FIC, UAI Chile)

One WITH RECURSIVE is Worth Many GOTOs

Denis Hirn (Universität Tübingen); Torsten Grust (Universität Tübingen)*

Resource-efficient Shared Query Execution via Exploiting Time Slackness

Dixin Tang (University of California, Berkeley)*; Zechao Shang (University of Chicago); William W Ma (University of Chicago); Aaron J Elmore (University of Chicago); Sanjay Krishnan (U Chicago)

TreeToaster: Towards an IVM-Optimized Compiler

Darshana Balakrishnan (State University of New York at Buffalo)*; Carl Nuessle (University of Buffalo, SUNY); Oliver A Kennedy (University at Buffalo, SUNY); Lukasz Ziarek (University at Buffalo, SUNY)

SOFOS: Demonstrating the Challenges of Materialized View Selection on Knowledge Graphs

Georgia Troullinou (FORTH-ICS); Haridimos Kondylakis (FORTH-ICS); Matteo Lissandrini (Aalborg University); Davide Mottin (Aarhus University)*

HADAD: A Lightweight Approach for Optimizing Hybrid Complex Analytics Queries

Rana Alotaibi (University of California, San Diego)*; Bogdan Cautis (University of Paris-Saclay); Alin Deutsch (UCSD); Ioana Manolescu (INRIA and Institut Polytechnique de Paris)

Slot 2: Sampling and Uncertain Data

Weighted Distinct Sampling: Cardinality Estimation for SPJ Queries

Yuan Qiu (Hong Kong Univ. of Science and Technology ); Yilei Wang (HKUST); Ke Yi (Hong Kong Univ. of Science and Technology)*; Feifei Li (Alibaba Group); Bin Wu (Alibaba); Chaoqun Zhan (Alibaba Inc.)

PGMJoins: Random Join Sampling with Graphical Models

Ali Mohammadi Shanghooshabad (University of Warwick); Meghdad Kurmanji (University of Warwick); Qingzhi Ma (University of Warwick); Michael Shekelyan (University of Warwick); Mehrdad Almasi (University of Warwick); Peter Triantafillou (University of Warwick)*

Small Selectivities Matter: Lifting the Burden of Empty Samples

Axel Hertzschuch (Technische Universität Dresden)*; Guido Moerkotte (University of Mannheim); Wolfgang Lehner (TU Dresden); Norman May (SAP SE); Florian Wolf (SAP SE); Lars Fricke (SAP SE)

Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing

Xi Liang (University of Chicago)*; Stavros Sintos (University of Chicago); Zechao Shang (University of Chicago); Sanjay Krishnan (UChicago)

Efficient Uncertainty Tracking for Complex Queries with Attribute-level Bounds

Su Feng (Illinois Institute of Technology)*; Aaron Huber (SUNY Buffalo); Oliver A Kennedy (University at Buffalo, SUNY); Boris Glavic (Illinois Institute of Technology)

CAvSAT: Answering Aggregation Queries over Inconsistent Databases via SAT Solving

Akhil A Dixit (University of California, Santa Cruz)*; Phokion Kolaitis (UCSC & IBM Research - Almaden)

Slot 3: Query Processing Systems

The Power of Nested Parallelism in Big Data Processing -- Hitting Three Flies with One Slap

Gábor E. Gévay (Technische Universität Berlin)*; Jorge Arnulfo Quiane Ruiz (TU Berlin); Volker Markl (Technische Universität Berlin)

Vertex-centric Parallel Computation of SQL Queries

Ainur AS Smagulova (UC San Diego)*; Alin Deutsch (UCSD)

Good to the last bit: Data-Driven Encoding with CodecDB

Hao Jiang (University of Chicago)*; Chunwei Liu (University of Chicago); John Paparrizos (University of Chicago); Andrew A Chien (University of Chicago); Jihong Ma (Alibaba Group); Aaron J Elmore (University of Chicago)

Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities

Doris Xin (UC Berkeley)*; Hui Miao (Google); Aditya Parameswaran (University of California, Berkeley); Neoklis Polyzotis (Google)

Logical Schema Design that Quantifies Update Inefficiency and Join Efficiency

Sebastian Link (University of Auckland)*; Ziheng Wei (University of Auckland)

Shedding Light on Opaque Application Queries

Kapil Khurana (Indian Institute of Science); Jayant Haritsa (Indian Institute of Science)*

... ...

SIGMOD Curated Session:
Security, Fairness and Privacy


Session Chair:
Graham Cormode
Xiaokui Xiao

Zoom Link
Youtube Live

Slot 1: Privacy

On Optimizing the Trade-off between Privacy and Utility in Data Provenance

Daniel Deutch (Tel Aviv University); Ariel Frankenthal (Tel Aviv University); Amir Gilad (Duke University)*; Yuval Moskovitch (University of Michigan)

DP-Sync: Hiding Update Patterns in Secure Outsourced Databases with Differential Privacy

Chenghong Wang (Duke University)*; Johes Bater (Duke University); Kartik Nayak (DUKE UNIVERSITY); Ashwin Machanavajjhala (Duke)

Residual Sensitivity for Differentially Private Multi-Way Joins

Wei DONG (Hong Kong University of Science and Technology, Hong Kong); Ke Yi (" Hong Kong University of Science and Technology, Hong Kong")*

PCOR: Private Contextual Outlier Releasevia Differentially Private Search

Masoumeh Shafieinejad (University of Waterloo)*; Florian Kerschbaum (University of Waterloo); Ihab F Ilyas (U. of Waterloo)

DPGraph: A Benchmark Platform for Differentially Private Graph Analysis

Siyuan Xia (University of Waterloo); Beizhen Chang (University of Waterloo); Karl Knopf (University of Waterloo); Yihan He (New York University); Yuchao Tao (Duke University); Xi He (University of Waterloo)*

PRISM: Private Verifiable Set Computation over Multi-Owner Outsourced Databases

Yin Li (Xinyang Normal University); Dhrubajyoti Ghosh (UC Irvine); Peeyush Gupta (UC Irvine); Sharad Mehrotra (U.C. Irvine); Nisha Panwar (UC Irvine); Shantanu Sharma (UC Irvine)*

Slot 2: Security

Secure Yannakakis: Join-Aggregate Queries over Private Data

Yilei Wang (HKUST); Ke Yi (" Hong Kong University of Science and Technology, Hong Kong")*

SQL Ledger: Cryptographically Verifiable Data in Azure SQL Database

Panagiotis Antonopoulos (Microsoft)*; Raghav Kaushik (Microsoft); Hanuma Kodavalla (Microsoft); Sergio Rosales Aceves (Microsoft); Reilly Wong (Microsoft); Jason Anderson (Microsoft); Jakub Szymaszek (Microsoft)

When the Recursive Diversity Anonymity Meets the Ring Signature

Wangze Ni (Hong Kong University of Science and Technology); Peng CHENG (East China Normal University)*; Lei Chen (Hong Kong University of Science and Technology); Xuemin Lin (University of New South Wales)

SRA: Smart Recovery Advisor for Cyber Attacks

Ka-Ho Chow (Georgia Institute of Technology)*; Umesh Deshpande (IBM Research - Almaden); Sangeetha Seshadri (IBM Research - Almaden); Ling Liu (Georgia Institute of Technology)

A System for Automated Open-Source Threat Intelligence Gathering and Management

Peng Gao (University of California, Berkeley)*; Xiaoyuan Liu (University of California, Berkeley); Edward Choi (University of California, Berkeley); Bhavna Soman (Microsoft); Chinmaya Mishra (Microsoft); Kate Farris (Microsoft); Dawn Song (UC Berkeley)

De-anonymization Attacks on Neuroimaging Datasets

Vikram Ravindra (Purdue University)*; Ananth Grama (Purdue University)

Slot 3: Fairness

EquiTensors: Learning Fair Integrations of Heterogeneous Urban Data

An Yan (University of Washington)*; Bill G Howe (University of Washington)

OmniFair: A Declarative System for Model-Agnostic Group Fairness in Machine Learning

Hantian Zhang (Georgia Tech)*; Xu Chu (GATECH); Abolfazl Asudeh (University of Illinois at Chicago); Shamkant Navathe (GaTech)

Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals

Sainyam Galhotra (University of Massachusetts Amherst)*; Romila Pradhan (University of California San Diego); Babak Salimi (Unievristy of California at San Diego)

Slice Tuner: A Selective Data Acquisition Framework for Accurate and Fair Machine Learning Models

KIHYUN TAE (KAIST); Steven Whang (KAIST)*

Enforcing Constraints for Machine Learning Systems via Declarative Feature Selection: An Experimental Study

Felix Neutatz (TU Berlin)*; Felix Biessmann (Einstein Center Digital Future); Ziawasch Abedjan (Leibniz Universität Hannover)

mlinspect: a Data Distribution Debugger for Machine Learning Pipelines

Stefan Grafberger (TU Munich); Shubha Guha (University of Amsterdam); Julia Stoyanovich (New York University); Sebastian Schelter (University of Amsterdam)*

... ...

SIGMOD Tutorial:
AI Meets Database: AI4DB and DB4AI


Zoom Link
Youtube Live
Youtube Video
Bilibili Video

Presenters: Guoliang Li (Tsinghua University, China); Xuanhe Zhou (Tsinghua University, China); Lei Cao (MIT, USA)
Abstract: Database and Artificial Intelligence (AI) can benefit from each other. On one hand, AI can make database more intelligent (AI4DB). For example, traditional empirical database optimization techniques (e.g., cost estimation, join order selection, knob tuning, index and view advisor) cannot meet the high-performance requirement for large-scale database instances, various applications and diversified users, especially on the cloud. Fortunately, learning-based techniques can alleviate this problem. On the other hand, database techniques can optimize AI models (DB4AI). For example, AI is hard to deploy, because it requires developers to write complex codes and train complicated models. Database techniques can be used to reduce the complexity of using AI models, accelerate AI algorithms and provide AI capability inside databases. DB4AI and AI4DB have been extensively studied recently. In this tutorial, we review existing studies on AI4DB and DB4AI. For {AI4DB}, we review the techniques on learning-based database configuration, optimization, design, monitoring, and security. For {DB4AI}, we review AI-oriented declarative language, data governance, training acceleration, and inference acceleration. Finally, we provide research challenges and future directions in AI4DB and DB4AI.

... ...

PODS Session:
Test-of-Time Award and Data streams


Session Chair: Filip Murlak

Zoom Link
Youtube Live

Tight bounds for Lp samplers, finding duplicates in streams, and related problems (Test-of-Time Award)

Hossein Jowhari, Mert Saglam, Gábor Tardos

Frequent Elements with Witnesses in Data Streams

Christian Konrad

... ...


23 JUN

04:30 - 05:30

PODS Session:
Best Paper Award and Data Streams


Session Chair: Liat Peterfreund

Zoom Link
Youtube Live

Relative Error Streaming Quantiles (Best Paper)

Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler and Pavel Vesely

Stackless Processing of Streamed Trees

Corentin Barloy, Filip Murlak, Charles Paperman

Estimating the Size of Unions of Sets in Streaming Models

Kuldeep S. Meel, N.V. Vinodchandran, Sourav Chakraborty

... ...


23 JUN

05:30 - 06:30

PODS Session:
Multidimensional Data


Session Chair:
Srikanta Tirthapura

Zoom Link
Youtube Live

New Algorithms for Monotone Classification

Yufei Tao and Yu Wang

Subspace Exploration: Bounds on Projected Frequency Estimation

Graham Cormode, Charlie Dickens and David P. Woodruff

Data-Independent Space Partitionings for Summaries

Graham Cormode, Minos Garofalakis, Michael Shekelyan

... ...


23 JUN

06:30 - 07:30

Sponsor Talk of Microsoft

Zoom Link
Youtube Live