时间:2024年12月5日 14:00
报告地点:B7-303
主持人:陈健
报告人:Xiaohui Yu
Title: Optimizing Data Acquisition for Machine Learning
Abstract:
High-quality training data is essential for improving machine learning (ML) model performance, but acquiring such data effectively remains a challenging task. This talk explores advanced strategies for optimizing data acquisition to enhance both model accuracy and confidence. To improve accuracy, we introduce two innovative approaches: Estimation and Allocation (EA), which balances exploration and exploitation by estimating data utility, and Sequential Predicate Selection (SPS), which adaptively focuses on data regions that are most promising for improving model outcomes. For improving model confidence, we propose Bulk Acquisition (BA) and Sequential Acquisition (SA) methods, supported by efficient approximations such as kNN-BA and kNN-SA that limit acquisitions to promising subsets. Additionally, a Distribution-based Acquisition framework is presented to generalize these techniques across diverse datasets and settings. Extensive experiments across various ML models and data pools demonstrate the effectiveness of these methods in practical applications, highlighting their ability to address real-world constraints while achieving significant performance gains.
Introdution
Xiaohui Yu is a Professor and the Graduate Program Director in the School of Information Technology, York University, Canada. He obtained his PhD degree from the University of Toronto. His research interests lie in the broad area of data science, with a particular focus on the intersection of data management and machine learning (ML). The results of his research have been published in top data science journals and conferences, such as SIGMOD, VLDB, ICDE, and TKDE. He regularly serves on the program committees of leading conferences and is an Associate/Area Editor for the IEEE Transactions on Knowledge and Data Engineering (TKDE), the ACM Transactions on Knowledge Discovery in Data (TKDD), and Information Systems. He is a General Co-Chair for the KDD 2025 conference. He has collaborated regularly with industry partners, and some research results have been incorporated into large-scale production systems.
禹教授将面向本科和硕士毕业班招收博士,有兴趣的同学请前往咨询。