Abstract:Database management systems (DBMSs) have traditionally relied on empirical approaches and handcrafted rules that encode human intuitions or heuristics to manage data storage and query processing. While these approaches perform well in common scenarios, they are rarely optimal for any actual application since they are not tailored for the specific application properties (e.g., user workload patterns) and complex interaction with the running environment (e.g., hardware and operating system). This has led to the development of machine learning-driven DBMS components (e.g., indexing, query optimization, query scheduling) to enhance their adaptability. However, a primary concern that emerged regarding using these learned components in practice is their robustness, particularly with fluctuating query execution times as the underlying data and query distributions change. In this talk, I will present our recent efforts to solve this robustness issue in the learned query optimizer, a DBMS component that translates declarative user queries into efficient execution plans, as a case study. Specifically, I will talk about (1) how we can build a lifelong learned query optimizer that is robust against frequent dynamic changes in data and query workloads, and (2) how we verify, with formal guarantees, the decisions of the learned query optimizer against user-specified performance constraints.
Bio:Ibrahim Sabek is an Assistant Professor of Computer Science and Spatial Sciences at University of Southern California. Before that, he was a Postdoctoral Associate at the MIT CSAIL Data Systems Group and an NSF/CRA Computing Innovation Fellow from 2020 to 2023. He completed his PhD in Computer Science from University of Minnesota, Twin Cities in January 2020. Ibrahim is interested in building the next generation of data management, processing, and analysis systems using machine learning and quantum computing. He also broadly researches data management for machine learning systems, scalable knowledge base construction, big spatial data management and analysis, video analytics, and causal analysis. Ibrahim was awarded the University-wide Best Doctoral Dissertation Honorable Mention Award for his PhD work, Best Paper Runner-Up in SIGSPATIAL 2018, Best Paper in GUIDE-AI@SIGMOD 2024, and 1st place in the SIGSPATIAL 2019 Student Research Competition, among others. For more info, check his website: http://viterbi-web.usc.edu/~sabek/.
Abstract:Database management systems (DBMSs) have traditionally relied on empirical approaches and handcrafted rules that encode human intuitions or heuristics to manage data storage and query processing. While these approaches perform well in common scenarios, they are rarely optimal for any actual application since they are not tailored for the specific application properties (e.g., user workload patterns) and complex interaction with the running environment (e.g., hardware and operating system). This has led to the development of machine learning-driven DBMS components (e.g., indexing, query optimization, query scheduling) to enhance their adaptability. However, a primary concern that emerged regarding using these learned components in practice is their robustness, particularly with fluctuating query execution times as the underlying data and query distributions change. In this talk, I will present our recent efforts to solve this robustness issue in the learned query optimizer, a DBMS component that translates declarative user queries into efficient execution plans, as a case study. Specifically, I will talk about (1) how we can build a lifelong learned query optimizer that is robust against frequent dynamic changes in data and query workloads, and (2) how we verify, with formal guarantees, the decisions of the learned query optimizer against user-specified performance constraints.
Bio:Ibrahim Sabek is an Assistant Professor of Computer Science and Spatial Sciences at University of Southern California. Before that, he was a Postdoctoral Associate at the MIT CSAIL Data Systems Group and an NSF/CRA Computing Innovation Fellow from 2020 to 2023. He completed his PhD in Computer Science from University of Minnesota, Twin Cities in January 2020. Ibrahim is interested in building the next generation of data management, processing, and analysis systems using machine learning and quantum computing. He also broadly researches data management for machine learning systems, scalable knowledge base construction, big spatial data management and analysis, video analytics, and causal analysis. Ibrahim was awarded the University-wide Best Doctoral Dissertation Honorable Mention Award for his PhD work, Best Paper Runner-Up in SIGSPATIAL 2018, Best Paper in GUIDE-AI@SIGMOD 2024, and 1st place in the SIGSPATIAL 2019 Student Research Competition, among others. For more info, check his website: http://viterbi-web.usc.edu/~sabek/.