SIGMOD 2021: Panel Discussions
Automation of Data Prep, ML, and Data Science: New Cure or Snake Oil?
Organizer
Arun Kumar (University of California, San Diego)
Abstract
As machine learning (ML), artificial intelligence (AI), and Data Science grow in practical importance, a large part of the ML/AI software industry claims to have built tools and platforms to automate the entire workflow of ML. That includes vexing problems of data preparation (prep), studied intensively by the database (DB) community for decades, with basically no resolution so far. Such claims by the ML/AI industry face a stunning lack of scientific scrutiny from the DB and ML research worlds, largely due to the lack of meaningful, large, and objective benchmarks. As such tools rapidly gain adoption among enterprises and other customers, this panel will debate whether the new ML/AI industry is basically selling "snake oil" to such users, how to evolve away from the status quo by instituting meaningful new benchmarks, creating new partnerships between industry and academia for this, and other pressing questions in this important arena. We aim to spur vigorous conversations that will hopefully lead to genuine new cures for an age-old affliction in Data Science.
Comfired Panelists
- Felix Naumann: He works on prep for file ingestion and has recently surveyed commercial data prep tools. Perspectives from his research and community organization; as PC co-chair for VLDB'21, along with Luna, he introduced the Benchmarks dimension to VLDB’s scope and started the Scalable Data Science Research category.
- Ihab Ilyas: He works on logic-based methods and ML methods for data prep and has product experience in enterprise data software. Perspectives from his research and his companies' customers.
- Joseph Hellerstein: He works on human-in-the-loop and program synthesis methods for data prep, has worked on platforms for enterprise ML (Apache MADlib), and has product experience in enterprise data software. Perspectives from his research and his company's customers.
- Sarah Catanzaro: She invests in and advises high-potential startups in machine intelligence, data management, and distributed systems. She has also defined data strategy and led data science teams at startups and in the defense/intelligence sector. Perspectives bridging the worlds of research and industry, including through investments in data/AI software startups and interactions with their customers.
- Xin Luna Dong: She works on ML/deep learning methods for data prep and has product experience in knowledge extraction, integration, cleaning, and mining across both Google and Amazon. Perspectives from her research and from major Web companies; as PC co-chair for VLDB'21, along with Felix, she introduced the Benchmarks dimension to VLDB’s scope and started the Scalable Data Science Research category.
Data Management to Social Science and Back in the Future of Work
Organizers
Sihem Amer-Yahia (CNRS, Univ of Grenoble Alpes)
Senjuti Basu Roy (New Jersey Institute of Technology)
Abstract
How will we work, live, and thrive in the post-pandemic future? The rapid mushrooming of online job markets has been transforming the definition of work and workplaces. After the pandemic, as we "cope with the new normal", the future world of work may change forever and become predominantly virtual. This makes an unprecedented pool of talent available at our beck and calls to work on "gigs" that disband when the job is over; this also is the time of destabilization and changing nature of job security. As scientists, we have a big responsibility and a tremendous opportunity in shaping the Future of Work (FoW) post pandemic, by designing effective platforms that support productive employment, mitigate social costs, and provide an effective and safe learning environment.
A research agenda for FoW must mobilize the participation of various scientific, regulatory and miscellaneous stakeholders [10]. We will ask the questions: what is the role of Data Management (DM) in shaping research on FoW? Is now a ripe time to get Economics, Labor Theory, Psychology of Work and AI to help put DM research and technology at the center of research on FoW? Are we at all interested? The panelists will debate two complementary views: A pessimistic view on whether FoW will tend to see humans as machines, robots, or low-level agents and use them in the service of broader AI goals vs. a more optimistic view, where AI and Social Science will help DM to develop technologies that empower humans for future workforce and workplaces.
Comfired Panelists
- Lei Chen is a Chair Professor and the Director of HKUST Big Data Institute. Lei is a world expert in crowdsourcing-based data processing.
- Krishna Gummadi is a scientific director at the Max Planck Institute for Software Systems (MPI-SWS) in Germany. Krishna's research interests are in the measurement, analysis, design, and evaluation of complex Internet-scale systems. His work focus is on enhancing fairness and transparency of data-driven decision making in social computing systems.
- Saiph Savage is Director HCI Lab, WVU. She studies AI systems to help workers develop their digital skills to access better jobs and fight disinformation. Saiph was named one of the 35 Innovators under 35 by the MIT Technology Review for her civic tech research.
- Jaime Teevan is Chief Scientist for Microsoft's Experiences and Devices, and this past year coordinated the company’s research efforts to understand how people’s work practices have changed since the start of the pandemic: http://aka.ms/newfutureofwork.
- Koichiro Yoshida is the CEO of CrowdWorks (the largest crowdsourcing company in Japan, went to public in 2014). He will bring his expertise as a practitioner and a proponent of workers' rights in online marketplaces.