Daniel Wong --- Graphical processing units (GPUs) are the main provider of computational power for many emerging workloads, such as data mining, machine learning, cryptocurrency mining, and high-performance computing. The mainstream adoption of GPUs over the past decade has mainly been driven by the GPU’s ability to provide order-of-magnitude energy efficiency and throughput compared to traditional multi-core processors. However, GPUs face many challenges as its adoption scales in data center environments. GPUs consume more raw power than traditional multi-core processors, making energy-efficiency a first-order design constraint. Furthermore, as applications scale to take advantage of multiple GPUs, existing multi-GPU frameworks are limited in functionality. In this talk, I will highlight performance and power challenges for GPUs in data center environments. Specifically, we will identify limitations of existing GPU dynamic power management policies, and limitations in topology-awareness for multi-GPU server systems. We found that GPUs are designed for maximum performance, with little consideration for energy-efficiency. To that end, we identify opportunities to save power in GPUs through frequency-scaling and thread block-scaling. Furthermore, we found that existing multi-GPU frameworks do not effectively utilize the underlying GPU interconnects. In order to fully take advantage of built-in high-performance interconnects, we introduce new multi-GPU programming frameworks and GPU-assisted routing.