Abstract: Machine learning models often operate in environments where training data is incomplete, noisy, or biased, such as financial risk assessments, medical diagnostics, or predictive policing, leading to unreliable and inconsistent predictions. In many cases, simply cleaning the data is not feasible, as the ground truth is inherently unrecoverable. Moreover, relying on heuristics for preprocessing can result in models that appear accurate and fair during training but ultimately fail, becoming unfair and inaccurate during inference. Addressing these challenges requires tools that account for data uncertainty by propagating its impact through the entire decision-making pipeline, tracing how errors, missing values, or biases in the data influence model predictions and decisions, thereby enabling robust and reliable predictions.In this talk, we introduce a novel approach rooted in possible worlds semantics from database theory to systematically reason about data uncertainty. By learning a set of possible models from possible variations of data—such as imputations for missing values, corrections for biases, or alternative scenarios for noisy inputs—this framework effectively propagates uncertainty from training data to model predictions, ensuring both robustness and interpretability. These methods provide a principled pathway for addressing uncertainty, offering actionable and reliable insights in complex and dynamic machine learning environments.
Bio: Babak Salimi is an Assistant Professor in the HDSI department at UC San Diego. His research bridges data management and machine learning, focusing on responsible data management and trustworthy data analysis. He emphasizes transparency, fairness, reliability, and robustness in algorithmic decision-making while developing tools that enable informed and confident choices by decision-makers. His contributions have been recognized with awards such as the Postdoc Research Award at the University of Washington, the Best Demonstration Paper Award at VLDB 2018, the Best Paper Award at SIGMOD 2019, the Research Highlight Award at SIGMOD 2020, and the NSF CAREER Award in 2024.
Abstract: Machine learning models often operate in environments where training data is incomplete, noisy, or biased, such as financial risk assessments, medical diagnostics, or predictive policing, leading to unreliable and inconsistent predictions. In many cases, simply cleaning the data is not feasible, as the ground truth is inherently unrecoverable. Moreover, relying on heuristics for preprocessing can result in models that appear accurate and fair during training but ultimately fail, becoming unfair and inaccurate during inference. Addressing these challenges requires tools that account for data uncertainty by propagating its impact through the entire decision-making pipeline, tracing how errors, missing values, or biases in the data influence model predictions and decisions, thereby enabling robust and reliable predictions.In this talk, we introduce a novel approach rooted in possible worlds semantics from database theory to systematically reason about data uncertainty. By learning a set of possible models from possible variations of data—such as imputations for missing values, corrections for biases, or alternative scenarios for noisy inputs—this framework effectively propagates uncertainty from training data to model predictions, ensuring both robustness and interpretability. These methods provide a principled pathway for addressing uncertainty, offering actionable and reliable insights in complex and dynamic machine learning environments.
Bio: Babak Salimi is an Assistant Professor in the HDSI department at UC San Diego. His research bridges data management and machine learning, focusing on responsible data management and trustworthy data analysis. He emphasizes transparency, fairness, reliability, and robustness in algorithmic decision-making while developing tools that enable informed and confident choices by decision-makers. His contributions have been recognized with awards such as the Postdoc Research Award at the University of Washington, the Best Demonstration Paper Award at VLDB 2018, the Best Paper Award at SIGMOD 2019, the Research Highlight Award at SIGMOD 2020, and the NSF CAREER Award in 2024.