We prepare two similar runs on each day for the convenience of people in different time zones. Note that all the time has been adjusted to your local time zone.
Speaker: Kenneth A. Ross (Columbia University) Abstract: Modern information-intensive systems, including data management systems, operate on data that is mostly resident in RAM. As a result, the data management community has shifted focus from I/O optimization to addressing performance issues higher in the memory hierarchy.
In this keynote, I will give a personal perspective of these developments, illustrated by work from my group at Columbia University. I will use the concept of abstraction as a lens through which various kinds of optimizations for modern hardware platforms can be understood and evaluated. Through this lens, some “cute implementation tricks” can be seen as much more than mere implementation details.
I will discuss abstractions at various granularities, from single lines of code to whole programming/query languages. I will touch on software and hardware design for data-intensive computations. I will also discuss data processing in a conventional programming language, and how the data management community might contribute to the design of compilers.
Keynote 1: Challenges in building instance optimized systems Umar Farooq Minhas (Microsoft)
Slot 1: Benchmarking and in Database Inference
Towards Demystifying Serverless Machine Learning Training Jiawei Jiang (ETH Zurich)*; Shaoduo Gan (ETH Zurich); Yue Liu (ETH Zurich); Fanlin Wang (ETHZ); Gustavo Alonso (ETHZ); Ana Klimovic (ETH Zurich); Ankit Singla (ETH Zurich); Wentao Wu (Microsoft Research); Ce Zhang (ETH)
Towards Benchmarking Feature Type Inference for AutoML Platforms Vraj Shah (University of California, San Diego)*; Jonathan Lacanlale (California State University, Northridge); Premanand Kumar (University of California, San Diego); Kevin Yang (University of California, San Diego); Arun Kumar (University of California, San Diego)
Transforming ML Predictive Pipelines into SQL with MASQ Francesco Del Buono (University of Modena e Reggio Emilia); Matteo Paganelli (Università di Modena e Reggio Emilia); Paolo Sottovia (Huawei); Matteo Interlandi (Microsoft); Francesco Guerra (University of Modena e Reggio Emilia)*
Vertex-Centric Visual Programming for Graph Neural Networks Yidi Wu (The Chinese University of Hong Kong)*; Yuntao Gui (The Chinese University of Hong Kong); Tatiana Jin (CUHK); James Cheng (CUHK); Xiao Yan (Southern University of Science and Technology); Peiqi Yin (Southern University of Science and Technology); Yufei Cai (Southern University of Science and Technology); Bo Tang (Southern University of Science and Technology); Fan Yu (Huawei Technologies Co. Ltd)
Building Fast and Compact Sketches for Approximately Multi-Set Multi-Membership Querying Rundong Li (Xi'an Jiaotong University)*; Pinghui Wang (Xi'an Jiaotong University); Jiongli Zhu (Xi'an Jiaotong University); Junzhou Zhao (Xi'an Jiaotong University); Jia Di (Xi'an Jiaotong University); Xiaofei Yang (Xi'an Jiaotong University); Kai Ye (Xi'an Jiaotong University)
Adaptive Compression for Fast Scans on String Columns Yannis E Foufoulas (University of Athens)*; Lefteris Sidirourgos (National and Kapodistrian University of Athens); Eleftherios Stamatogiannakis (University of Athens); Yannis Ioannidis (University of Athens)
Correlation Sketches for Approximate Join-Correlation Queries Aécio Santos (New York University)*; Aline Bessa (New York University); Fernando Chirigati (Springer Nature); Christopher Musco (New York University); Juliana Freire (New York University)
At-the-time and Back-in-time Persistent Sketches Benwei Shi (University of Utah)*; Zhuoyue Zhao (University of Utah); Yanqing Peng (University of Utah); Feifei Li (University of Utah); Jeff Phillips (University of Utah)
Bidirectionally Densifying LSH Sketches with Empty Bins Peng Jia (Xi'an Jiaotong University)*; Pinghui Wang (Xi'an Jiaotong University); Junzhou Zhao (Xi'an Jiaotong University); Shuo Zhang (Xi'an Jiaotong University); Yiyan Qi (Xi'an Jiaotong University); Min Hu (China Mobile Research Institute); Chao Deng (China Mobile Research Institute); Xiaohong Guan (Xi'an Jiaotong University)
A Learned Sketch for Subgraph Counting Kangfei Zhao (The Chinese University of Hong Kong)*; Jeffrey Xu Yu (Chinese University of Hong Kong); Hao Zhang (Chinese University of Hong Kong); Qiyan Li (Wuhan University ); Yu Rong (Tencent AI Lab)
EIRES: Efficient Integration of Remote Data in Event Stream Processing Bo Zhao (Humboldt University of Berlin)*; Han van der Aa (Universität Mannheim); Thanh Tam Nguyen (Leibniz Universitat Hannover); Quoc Viet Hung Nguyen (Griffith University); Matthias Weidlich (Humboldt-Universität zu Berlin)
Parallelizing Intra-Window Join on Multicores: An Experimental Study Shuhao Zhang (Singapore University of Technology and Design)*; Yancan Mao (National University of Singapore); Jiong He (A*Star); Philipp Marian Grulich (Technische Universität Berlin); Steffen Zeuch (Humboldt Universität zu Berlin); Bingsheng He (National University of Singapore); Richard T.B. Ma (National University of Singapore); Volker Markl (Technische Universität Berlin)
BurstSketch: Finding Bursts in Data Streams Zheng Zhong (Peking University)*; Shen Yan (Peking University); Zikun Li (Peking University); Decheng Tan (Peking University); Tong Yang (Peking University); Bin Cui (Peking University)
Presenters:Xi He (University of Waterloo); Jennie Rogers (Northwestern University); Johes Bater (Duke University); Ashwin Machanavajjhala (Duke University); Chenghong Wang (Duke University); Xiao Wang (Northwestern University) Abstract: Computing technology has enabled massive digital traces of our personal lives to be collected and stored. These datasets play an important role in numerous real-life applications and research analysis, such as contact tracing for COVID 19, but they contain sensitive information about individuals. When managing these datasets, privacy is usually addressed as an afterthought, engineered on top of a database system optimized for performance and usability. This has led to a plethora of unexpected privacy attacks in the news. Specialized privacy-preserving solutions usually require a group of privacy experts and they are not directly transferable to other domains. There is an urgent need for a general trustworthy database system that offers end-to-end security and privacy guarantees. In this tutorial, we will first describe the security and privacy requirements for database systems in different settings and cover the state-of-the-art tools that achieve these requirements. We will also show challenges in integrating these techniques together and demonstrate the design principles and optimization opportunities for these security and privacy-aware database systems. This is designed to be a three-hour tutorial.
Speaker: Barna Saha (University of California Berkeley) Abstract: One of the greatest successes of computational complexity theory is the classification of countless fundamental computational problems into polynomial-time and NP-hard ones, two classes that are often referred to as tractable and intractable, respectively. However, this crude distinction of algorithmic efficiency is clearly insufficient when handling today’s large scale of data. We need a finer-grained design and analysis of algorithms that pinpoints the exact exponent of polynomial running time, and a better understanding of when a speed-up is not possible. Based on stronger complexity assumptions than P vs NP, like the Strong Exponential Time Hypothesis, recently conditional lower bounds for a variety of fundamental problems in P have been proposed. Unfortunately, these conditional lower bounds often break down when one may settle for a near-optimal solution. Indeed, approximation algorithms can play a significant role when designing fast algorithms not just for traditional NP Hard problems, but also for polynomial time problems.
For some applications arising in machine learning, the time complexity of the underlying algorithms is not sufficient to ensure a fast solution. It is often needed to collect side information about the data to ensure high accuracy. This requires low query complexity.
In this presentation, we will cover new facets of fast algorithm design for large scale data analysis that emphasizes on the role of developing approximation algorithms for better polynomial time/query complexity.
Crosstown Foundry: A Scalable Data-driven Journalism Platform for Hyper-local News Authors Online Luciano Nocera (University of Southern California)*; Giorgos Constantinou (University of Southern California); Luan V Tran (University of Southern California); Seon Ho Kim (University of Southern California); Gabriel Kahn (University of Southern California); Cyrus Shahabi (Computer Science Department. University of Southern California)
Vertex-Centric Visual Programming for Graph Neural Networks Authors Online Yidi Wu (The Chinese University of Hong Kong)*; Yuntao Gui (The Chinese University of Hong Kong); Tatiana Jin (CUHK); James Cheng (CUHK); Xiao Yan (Southern University of Science and Technology); Peiqi Yin (Southern University of Science and Technology); Yufei Cai (Southern University of Science and Technology); Bo Tang (Southern University of Science and Technology); Fan Yu (Huawei Technologies Co. Ltd)
Transforming ML Predictive Pipelines into SQL with MASQ Francesco Del Buono (University of Modena e Reggio Emilia); Matteo Paganelli (Università di Modena e Reggio Emilia); Paolo Sottovia (Huawei); Matteo Interlandi (Microsoft); Francesco Guerra (University of Modena e Reggio Emilia)*
IndoorViz: A Demonstration System for Indoor Spatial Data Management Authors Online Yue Li (East China Normal University); Shiyu Yang (Guangzhou University)*; Muhammad Aamir Cheema (Monash University); Zhou Shao (Monash University); Xuemin Lin (University of New South Wales)
QuTE: Answering Quantity Queries from Web Tables Authors Online Vinh Thinh Ho (Max Planck Institute for Informatics)*; Koninika Pal (Max Planck Institute for Informatics ); Gerhard Weikum (Max-Planck-Institut fur Informatik)
RawVis: A System for Efficient In-situ Visual Analytics Stavros Maroulis (Research Center ATHENA)*; Nikos Bikakis (Athena); George Papastefanatos (ATHENA Research Center); Panos Vassiliadis (University of Ioannina); Yannis Vassiliou (NTUA)
SRA: Smart Recovery Advisor for Cyber Attacks Ka-Ho Chow (Georgia Institute of Technology)*; Umesh Deshpande (IBM Research - Almaden); Sangeetha Seshadri (IBM Research - Almaden); Ling Liu (Georgia Institute of Technology)
TardisDB: Extending SQL to Support Versioning Maximilian E Schüle (Technical University of Munich)*; Josef Schmeißer (Technical University of Munich); Thomas Blum (TUM); Alfons Kemper (TUM); Thomas Neumann (TUM)
A System for Automated Open-Source Threat Intelligence Gathering and Management Authors Online Peng Gao (University of California, Berkeley)*; Xiaoyuan Liu (University of California, Berkeley); Edward Choi (University of California, Berkeley); Bhavna Soman (Microsoft); Chinmaya Mishra (Microsoft); Kate Farris (Microsoft); Dawn Song (UC Berkeley)
A Byzantine Fault Tolerant Storage for Permissioned Blockchain Authors Online Xiaodong Qi (East China Normal University)*; Zhihao Chen (East China Normal University); Zhao Zhang (East China Normal University); Cheqing Jin (East China Normal University); Aoying Zhou (East China Normal University ); Haizhen Zhuo (Ant Group); Quangqing Xu (Ant Group)
Attaining Workload Scalability and Strong Consistency for Replicated Databases with Hihooi Michael Georgiou (Cyprus University of Technology); Michael Panayiotou (Cyprus University of Technology); Lambros Odysseos (Cyprus University of Technology); Aristodemos Paphitis (Cyprus University of Technology); Michael Sirivianos (Cyprus University of Technology); Herodotos Herodotou (Cyprus University of Technology)*
BEER: Blocking for Effective Entity Resolution Sainyam Galhotra (University of Massachusetts Amherst)*; Donatella Firmani (Roma Tre University); Barna Saha (University of California, Berkeley); Divesh Srivastava (AT&T Labs Research)
Maximizing Persistent Memory Bandwidth Utilization for OLAP Workloads Björn Daase (Hasso Plattner Institute, University of Potsdam)*; Lars Jonas Bollmeier (Hasso Plattner Institute, University of Potsdam); Lawrence Benson (Hasso Plattner Institute, University of Potsdam); Tilmann Rabl (HPI, University of Potsdam)
Worst-Case Optimal Graph Joins in Almost No Space Diego Arroyuelo (UTFSM, Chile); Aidan Hogan (University of Chile); Gonzalo Navarro (University of Chile); Juan Reutter (PUC)*; Javiel Rojas (University of Chile); Adrian Soto Suárez (FIC, UAI Chile)
TreeToaster: Towards an IVM-Optimized Compiler Darshana Balakrishnan (State University of New York at Buffalo)*; Carl Nuessle (University of Buffalo, SUNY); Oliver A Kennedy (University at Buffalo, SUNY); Lukasz Ziarek (University at Buffalo, SUNY)
Weighted Distinct Sampling: Cardinality Estimation for SPJ Queries Yuan Qiu (Hong Kong Univ. of Science and Technology ); Yilei Wang (HKUST); Ke Yi (Hong Kong Univ. of Science and Technology)*; Feifei Li (Alibaba Group); Bin Wu (Alibaba); Chaoqun Zhan (Alibaba Inc.)
PGMJoins: Random Join Sampling with Graphical Models Ali Mohammadi Shanghooshabad (University of Warwick); Meghdad Kurmanji (University of Warwick); Qingzhi Ma (University of Warwick); Michael Shekelyan (University of Warwick); Mehrdad Almasi (University of Warwick); Peter Triantafillou (University of Warwick)*
Small Selectivities Matter: Lifting the Burden of Empty Samples Axel Hertzschuch (Technische Universität Dresden)*; Guido Moerkotte (University of Mannheim); Wolfgang Lehner (TU Dresden); Norman May (SAP SE); Florian Wolf (SAP SE); Lars Fricke (SAP SE)
Good to the last bit: Data-Driven Encoding with CodecDB Hao Jiang (University of Chicago)*; Chunwei Liu (University of Chicago); John Paparrizos (University of Chicago); Andrew A Chien (University of Chicago); Jihong Ma (Alibaba Group); Aaron J Elmore (University of Chicago)
SQL Ledger: Cryptographically Verifiable Data in Azure SQL Database Panagiotis Antonopoulos (Microsoft)*; Raghav Kaushik (Microsoft); Hanuma Kodavalla (Microsoft); Sergio Rosales Aceves (Microsoft); Reilly Wong (Microsoft); Jason Anderson (Microsoft); Jakub Szymaszek (Microsoft)
When the Recursive Diversity Anonymity Meets the Ring Signature Wangze Ni (Hong Kong University of Science and Technology); Peng CHENG (East China Normal University)*; Lei Chen (Hong Kong University of Science and Technology); Xuemin Lin (University of New South Wales)
SRA: Smart Recovery Advisor for Cyber Attacks Ka-Ho Chow (Georgia Institute of Technology)*; Umesh Deshpande (IBM Research - Almaden); Sangeetha Seshadri (IBM Research - Almaden); Ling Liu (Georgia Institute of Technology)
A System for Automated Open-Source Threat Intelligence Gathering and Management Peng Gao (University of California, Berkeley)*; Xiaoyuan Liu (University of California, Berkeley); Edward Choi (University of California, Berkeley); Bhavna Soman (Microsoft); Chinmaya Mishra (Microsoft); Kate Farris (Microsoft); Dawn Song (UC Berkeley)
Presenters: Guoliang Li (Tsinghua University, China); Xuanhe Zhou (Tsinghua University, China); Lei Cao (MIT, USA) Abstract: Database and Artificial Intelligence (AI) can benefit from each other. On one hand, AI can make database more intelligent (AI4DB). For example, traditional empirical database optimization techniques (e.g., cost estimation, join order selection, knob tuning, index and view advisor) cannot meet the high-performance requirement for large-scale database instances, various applications and diversified users, especially on the cloud. Fortunately, learning-based techniques can alleviate this problem. On the other hand, database techniques can optimize AI models (DB4AI). For example, AI is hard to deploy, because it requires developers to write complex codes and train complicated models. Database techniques can be used to reduce the complexity of using AI models, accelerate AI algorithms and provide AI capability inside databases. DB4AI and AI4DB have been extensively studied recently. In this tutorial, we review existing studies on AI4DB and DB4AI. For {AI4DB}, we review the techniques on learning-based database configuration, optimization, design, monitoring, and security. For {DB4AI}, we review AI-oriented declarative language, data governance, training acceleration, and inference acceleration. Finally, we provide research challenges and future directions in AI4DB and DB4AI.
Speaker: Kenneth A. Ross (Columbia University) Abstract: Modern information-intensive systems, including data management systems, operate on data that is mostly resident in RAM. As a result, the data management community has shifted focus from I/O optimization to addressing performance issues higher in the memory hierarchy.
In this keynote, I will give a personal perspective of these developments, illustrated by work from my group at Columbia University. I will use the concept of abstraction as a lens through which various kinds of optimizations for modern hardware platforms can be understood and evaluated. Through this lens, some “cute implementation tricks” can be seen as much more than mere implementation details.
I will discuss abstractions at various granularities, from single lines of code to whole programming/query languages. I will touch on software and hardware design for data-intensive computations. I will also discuss data processing in a conventional programming language, and how the data management community might contribute to the design of compilers.
Keynote 2: The New DBfication of ML/AI Arun Kumar (UCSD)
Slot 1: Benchmarking and in Database Inference
Towards Demystifying Serverless Machine Learning Training Jiawei Jiang (ETH Zurich)*; Shaoduo Gan (ETH Zurich); Yue Liu (ETH Zurich); Fanlin Wang (ETHZ); Gustavo Alonso (ETHZ); Ana Klimovic (ETH Zurich); Ankit Singla (ETH Zurich); Wentao Wu (Microsoft Research); Ce Zhang (ETH)
Towards Benchmarking Feature Type Inference for AutoML Platforms Vraj Shah (University of California, San Diego)*; Jonathan Lacanlale (California State University, Northridge); Premanand Kumar (University of California, San Diego); Kevin Yang (University of California, San Diego); Arun Kumar (University of California, San Diego)
Transforming ML Predictive Pipelines into SQL with MASQ Francesco Del Buono (University of Modena e Reggio Emilia); Matteo Paganelli (Università di Modena e Reggio Emilia); Paolo Sottovia (Huawei); Matteo Interlandi (Microsoft); Francesco Guerra (University of Modena e Reggio Emilia)*
Vertex-Centric Visual Programming for Graph Neural Networks Yidi Wu (The Chinese University of Hong Kong)*; Yuntao Gui (The Chinese University of Hong Kong); Tatiana Jin (CUHK); James Cheng (CUHK); Xiao Yan (Southern University of Science and Technology); Peiqi Yin (Southern University of Science and Technology); Yufei Cai (Southern University of Science and Technology); Bo Tang (Southern University of Science and Technology); Fan Yu (Huawei Technologies Co. Ltd)
Building Fast and Compact Sketches for Approximately Multi-Set Multi-Membership Querying Rundong Li (Xi'an Jiaotong University)*; Pinghui Wang (Xi'an Jiaotong University); Jiongli Zhu (Xi'an Jiaotong University); Junzhou Zhao (Xi'an Jiaotong University); Jia Di (Xi'an Jiaotong University); Xiaofei Yang (Xi'an Jiaotong University); Kai Ye (Xi'an Jiaotong University)
Adaptive Compression for Fast Scans on String Columns Yannis E Foufoulas (University of Athens)*; Lefteris Sidirourgos (National and Kapodistrian University of Athens); Eleftherios Stamatogiannakis (University of Athens); Yannis Ioannidis (University of Athens)
Correlation Sketches for Approximate Join-Correlation Queries Aécio Santos (New York University)*; Aline Bessa (New York University); Fernando Chirigati (Springer Nature); Christopher Musco (New York University); Juliana Freire (New York University)
At-the-time and Back-in-time Persistent Sketches Benwei Shi (University of Utah)*; Zhuoyue Zhao (University of Utah); Yanqing Peng (University of Utah); Feifei Li (University of Utah); Jeff Phillips (University of Utah)
Bidirectionally Densifying LSH Sketches with Empty Bins Peng Jia (Xi'an Jiaotong University)*; Pinghui Wang (Xi'an Jiaotong University); Junzhou Zhao (Xi'an Jiaotong University); Shuo Zhang (Xi'an Jiaotong University); Yiyan Qi (Xi'an Jiaotong University); Min Hu (China Mobile Research Institute); Chao Deng (China Mobile Research Institute); Xiaohong Guan (Xi'an Jiaotong University)
A Learned Sketch for Subgraph Counting Kangfei Zhao (The Chinese University of Hong Kong)*; Jeffrey Xu Yu (Chinese University of Hong Kong); Hao Zhang (Chinese University of Hong Kong); Qiyan Li (Wuhan University ); Yu Rong (Tencent AI Lab)
EIRES: Efficient Integration of Remote Data in Event Stream Processing Bo Zhao (Humboldt University of Berlin)*; Han van der Aa (Universität Mannheim); Thanh Tam Nguyen (Leibniz Universitat Hannover); Quoc Viet Hung Nguyen (Griffith University); Matthias Weidlich (Humboldt-Universität zu Berlin)
Parallelizing Intra-Window Join on Multicores: An Experimental Study Shuhao Zhang (Singapore University of Technology and Design)*; Yancan Mao (National University of Singapore); Jiong He (A*Star); Philipp Marian Grulich (Technische Universität Berlin); Steffen Zeuch (Humboldt Universität zu Berlin); Bingsheng He (National University of Singapore); Richard T.B. Ma (National University of Singapore); Volker Markl (Technische Universität Berlin)
BurstSketch: Finding Bursts in Data Streams Zheng Zhong (Peking University)*; Shen Yan (Peking University); Zikun Li (Peking University); Decheng Tan (Peking University); Tong Yang (Peking University); Bin Cui (Peking University)
Presenters: Xi He (University of Waterloo); Jennie Rogers (Northwestern University); Johes Bater (Duke University); Ashwin Machanavajjhala (Duke University); Chenghong Wang (Duke University); Xiao Wang (Northwestern University) Abstract: Computing technology has enabled massive digital traces of our personal lives to be collected and stored. These datasets play an important role in numerous real-life applications and research analysis, such as contact tracing for COVID 19, but they contain sensitive information about individuals. When managing these datasets, privacy is usually addressed as an afterthought, engineered on top of a database system optimized for performance and usability. This has led to a plethora of unexpected privacy attacks in the news. Specialized privacy-preserving solutions usually require a group of privacy experts and they are not directly transferable to other domains. There is an urgent need for a general trustworthy database system that offers end-to-end security and privacy guarantees. In this tutorial, we will first describe the security and privacy requirements for database systems in different settings and cover the state-of-the-art tools that achieve these requirements. We will also show challenges in integrating these techniques together and demonstrate the design principles and optimization opportunities for these security and privacy-aware database systems. This is designed to be a three-hour tutorial.
Speaker: Barna Saha (University of California Berkeley) Abstract: One of the greatest successes of computational complexity theory is the classification of countless fundamental computational problems into polynomial-time and NP-hard ones, two classes that are often referred to as tractable and intractable, respectively. However, this crude distinction of algorithmic efficiency is clearly insufficient when handling today’s large scale of data. We need a finer-grained design and analysis of algorithms that pinpoints the exact exponent of polynomial running time, and a better understanding of when a speed-up is not possible. Based on stronger complexity assumptions than P vs NP, like the Strong Exponential Time Hypothesis, recently conditional lower bounds for a variety of fundamental problems in P have been proposed. Unfortunately, these conditional lower bounds often break down when one may settle for a near-optimal solution. Indeed, approximation algorithms can play a significant role when designing fast algorithms not just for traditional NP Hard problems, but also for polynomial time problems.
For some applications arising in machine learning, the time complexity of the underlying algorithms is not sufficient to ensure a fast solution. It is often needed to collect side information about the data to ensure high accuracy. This requires low query complexity.
In this presentation, we will cover new facets of fast algorithm design for large scale data analysis that emphasizes on the role of developing approximation algorithms for better polynomial time/query complexity.
Dendrite: Bolt-on Adaptivity for Data Systems Authors Online Brad Glasbergen (University of Waterloo)*; Fangyu Wu (University of Waterloo); Khuzaima Daudjee (University of Waterloo)
Crosstown Foundry: A Scalable Data-driven Journalism Platform for Hyper-local News Authors Online Luciano Nocera (University of Southern California)*; Giorgos Constantinou (University of Southern California); Luan V Tran (University of Southern California); Seon Ho Kim (University of Southern California); Gabriel Kahn (University of Southern California); Cyrus Shahabi (Computer Science Department. University of Southern California)
Vertex-Centric Visual Programming for Graph Neural Networks Yidi Wu (The Chinese University of Hong Kong)*; Yuntao Gui (The Chinese University of Hong Kong); Tatiana Jin (CUHK); James Cheng (CUHK); Xiao Yan (Southern University of Science and Technology); Peiqi Yin (Southern University of Science and Technology); Yufei Cai (Southern University of Science and Technology); Bo Tang (Southern University of Science and Technology); Fan Yu (Huawei Technologies Co. Ltd)
Transforming ML Predictive Pipelines into SQL with MASQ Authors Online Francesco Del Buono (University of Modena e Reggio Emilia); Matteo Paganelli (Università di Modena e Reggio Emilia); Paolo Sottovia (Huawei); Matteo Interlandi (Microsoft); Francesco Guerra (University of Modena e Reggio Emilia)*
QuTE: Answering Quantity Queries from Web Tables Vinh Thinh Ho (Max Planck Institute for Informatics)*; Koninika Pal (Max Planck Institute for Informatics ); Gerhard Weikum (Max-Planck-Institut fur Informatik)
RawVis: A System for Efficient In-situ Visual Analytics Authors Online Stavros Maroulis (Research Center ATHENA)*; Nikos Bikakis (Athena); George Papastefanatos (ATHENA Research Center); Panos Vassiliadis (University of Ioannina); Yannis Vassiliou (NTUA)
SRA: Smart Recovery Advisor for Cyber Attacks Authors Online Ka-Ho Chow (Georgia Institute of Technology)*; Umesh Deshpande (IBM Research - Almaden); Sangeetha Seshadri (IBM Research - Almaden); Ling Liu (Georgia Institute of Technology)
TardisDB: Extending SQL to Support Versioning Authors Online Maximilian E Schüle (Technical University of Munich)*; Josef Schmeißer (Technical University of Munich); Thomas Blum (TUM); Alfons Kemper (TUM); Thomas Neumann (TUM)
A System for Automated Open-Source Threat Intelligence Gathering and Management Peng Gao (University of California, Berkeley)*; Xiaoyuan Liu (University of California, Berkeley); Edward Choi (University of California, Berkeley); Bhavna Soman (Microsoft); Chinmaya Mishra (Microsoft); Kate Farris (Microsoft); Dawn Song (UC Berkeley)
A Byzantine Fault Tolerant Storage for Permissioned Blockchain Xiaodong Qi (East China Normal University)*; Zhihao Chen (East China Normal University); Zhao Zhang (East China Normal University); Cheqing Jin (East China Normal University); Aoying Zhou (East China Normal University ); Haizhen Zhuo (Ant Group); Quangqing Xu (Ant Group)
Attaining Workload Scalability and Strong Consistency for Replicated Databases with Hihooi Authors Online Michael Georgiou (Cyprus University of Technology); Michael Panayiotou (Cyprus University of Technology); Lambros Odysseos (Cyprus University of Technology); Aristodemos Paphitis (Cyprus University of Technology); Michael Sirivianos (Cyprus University of Technology); Herodotos Herodotou (Cyprus University of Technology)*
DPGraph: A Benchmark Platform for Differentially Private Graph Analysis Authors Online Siyuan Xia (University of Waterloo); Beizhen Chang (University of Waterloo); Karl Knopf (University of Waterloo); Yihan He (New York University); Yuchao Tao (Duke University); Xi He (University of Waterloo)*
BEER: Blocking for Effective Entity Resolution Authors Online Sainyam Galhotra (University of Massachusetts Amherst)*; Donatella Firmani (Roma Tre University); Barna Saha (University of California, Berkeley); Divesh Srivastava (AT&T Labs Research)
Maximizing Persistent Memory Bandwidth Utilization for OLAP Workloads Björn Daase (Hasso Plattner Institute, University of Potsdam)*; Lars Jonas Bollmeier (Hasso Plattner Institute, University of Potsdam); Lawrence Benson (Hasso Plattner Institute, University of Potsdam); Tilmann Rabl (HPI, University of Potsdam)
Worst-Case Optimal Graph Joins in Almost No Space Diego Arroyuelo (UTFSM, Chile); Aidan Hogan (University of Chile); Gonzalo Navarro (University of Chile); Juan Reutter (PUC)*; Javiel Rojas (University of Chile); Adrian Soto Suárez (FIC, UAI Chile)
TreeToaster: Towards an IVM-Optimized Compiler Darshana Balakrishnan (State University of New York at Buffalo)*; Carl Nuessle (University of Buffalo, SUNY); Oliver A Kennedy (University at Buffalo, SUNY); Lukasz Ziarek (University at Buffalo, SUNY)
Weighted Distinct Sampling: Cardinality Estimation for SPJ Queries Yuan Qiu (Hong Kong Univ. of Science and Technology ); Yilei Wang (HKUST); Ke Yi (Hong Kong Univ. of Science and Technology)*; Feifei Li (Alibaba Group); Bin Wu (Alibaba); Chaoqun Zhan (Alibaba Inc.)
PGMJoins: Random Join Sampling with Graphical Models Ali Mohammadi Shanghooshabad (University of Warwick); Meghdad Kurmanji (University of Warwick); Qingzhi Ma (University of Warwick); Michael Shekelyan (University of Warwick); Mehrdad Almasi (University of Warwick); Peter Triantafillou (University of Warwick)*
Small Selectivities Matter: Lifting the Burden of Empty Samples Axel Hertzschuch (Technische Universität Dresden)*; Guido Moerkotte (University of Mannheim); Wolfgang Lehner (TU Dresden); Norman May (SAP SE); Florian Wolf (SAP SE); Lars Fricke (SAP SE)
Good to the last bit: Data-Driven Encoding with CodecDB Hao Jiang (University of Chicago)*; Chunwei Liu (University of Chicago); John Paparrizos (University of Chicago); Andrew A Chien (University of Chicago); Jihong Ma (Alibaba Group); Aaron J Elmore (University of Chicago)
SQL Ledger: Cryptographically Verifiable Data in Azure SQL Database Panagiotis Antonopoulos (Microsoft)*; Raghav Kaushik (Microsoft); Hanuma Kodavalla (Microsoft); Sergio Rosales Aceves (Microsoft); Reilly Wong (Microsoft); Jason Anderson (Microsoft); Jakub Szymaszek (Microsoft)
When the Recursive Diversity Anonymity Meets the Ring Signature Wangze Ni (Hong Kong University of Science and Technology); Peng CHENG (East China Normal University)*; Lei Chen (Hong Kong University of Science and Technology); Xuemin Lin (University of New South Wales)
SRA: Smart Recovery Advisor for Cyber Attacks Ka-Ho Chow (Georgia Institute of Technology)*; Umesh Deshpande (IBM Research - Almaden); Sangeetha Seshadri (IBM Research - Almaden); Ling Liu (Georgia Institute of Technology)
A System for Automated Open-Source Threat Intelligence Gathering and Management Peng Gao (University of California, Berkeley)*; Xiaoyuan Liu (University of California, Berkeley); Edward Choi (University of California, Berkeley); Bhavna Soman (Microsoft); Chinmaya Mishra (Microsoft); Kate Farris (Microsoft); Dawn Song (UC Berkeley)
Presenters: Guoliang Li (Tsinghua University, China); Xuanhe Zhou (Tsinghua University, China); Lei Cao (MIT, USA) Abstract: Database and Artificial Intelligence (AI) can benefit from each other. On one hand, AI can make database more intelligent (AI4DB). For example, traditional empirical database optimization techniques (e.g., cost estimation, join order selection, knob tuning, index and view advisor) cannot meet the high-performance requirement for large-scale database instances, various applications and diversified users, especially on the cloud. Fortunately, learning-based techniques can alleviate this problem. On the other hand, database techniques can optimize AI models (DB4AI). For example, AI is hard to deploy, because it requires developers to write complex codes and train complicated models. Database techniques can be used to reduce the complexity of using AI models, accelerate AI algorithms and provide AI capability inside databases. DB4AI and AI4DB have been extensively studied recently. In this tutorial, we review existing studies on AI4DB and DB4AI. For {AI4DB}, we review the techniques on learning-based database configuration, optimization, design, monitoring, and security. For {DB4AI}, we review AI-oriented declarative language, data governance, training acceleration, and inference acceleration. Finally, we provide research challenges and future directions in AI4DB and DB4AI.