Abstract: Multi-GPU systems have gained significant popularity in modern computing. While employing multiple GPUs intuitively offers aggregated memory capacity and combined computational parallelism, the delivered performance rarely keeps up with the increase in GPU counts. The scalability is severely limited by several factors, such as inefficient address translation, non-uniform memory accesses, and inter-GPU communication overheads. Consequently, critical questions remain unaddressed: How to design multi-GPU computing architectures? and How to harness multi-GPU advantages in emerging applications? In this talk, I will share my research on maximizing the potential of multi-GPU computing. First, I will discuss our work of short-circuiting page table walks to mitigate the address translation wall. Next, I will introduce our lightweight invalidation approach to reduce page migration overhead. Finally, I will introduce our work on efficient multi-tenancy in multi-instance GPUs through shared-aware sub-entry TLBs. Looking ahead, I will outline my vision for next-generation computing systems, including harnessing GPUs advantages for LLM inference and efficient GPU virtualization.
Bio: Bingyao Li joined UC Riverside as an assistant professor in Computer Science and Engineering in July 2025. She completed her Ph.D. at the University of Pittsburgh Computer Science Department. Her research lies broadly in advanced computer architecture, high-performance computing, and emerging parallel applications, with a focus on GPU ecosystem from architecture to applications.
Abstract: Multi-GPU systems have gained significant popularity in modern computing. While employing multiple GPUs intuitively offers aggregated memory capacity and combined computational parallelism, the delivered performance rarely keeps up with the increase in GPU counts. The scalability is severely limited by several factors, such as inefficient address translation, non-uniform memory accesses, and inter-GPU communication overheads. Consequently, critical questions remain unaddressed: How to design multi-GPU computing architectures? and How to harness multi-GPU advantages in emerging applications? In this talk, I will share my research on maximizing the potential of multi-GPU computing. First, I will discuss our work of short-circuiting page table walks to mitigate the address translation wall. Next, I will introduce our lightweight invalidation approach to reduce page migration overhead. Finally, I will introduce our work on efficient multi-tenancy in multi-instance GPUs through shared-aware sub-entry TLBs. Looking ahead, I will outline my vision for next-generation computing systems, including harnessing GPUs advantages for LLM inference and efficient GPU virtualization.
Bio: Bingyao Li joined UC Riverside as an assistant professor in Computer Science and Engineering in July 2025. She completed her Ph.D. at the University of Pittsburgh Computer Science Department. Her research lies broadly in advanced computer architecture, high-performance computing, and emerging parallel applications, with a focus on GPU ecosystem from architecture to applications.