The goal of this project is to provide architectural innovations and mechanisms for increasing the performance of GPGPU applications. The initial focus has been on improving tolerance (or reducing) to memory access latency and on studying the effect of instruction fetch and DRAM scheduling on the performance of GPGPU applications. For improving tolerance to memory accesses, we have proposed both software and hardware based prefetcher mechanisms for GPUs with adaptive behavior. We are also working on optimized memory controllers for further improving tolerance to memory accesses.

Students: Jaekyu Lee, Nagesh B Lakshminarayana, Dilan Manatunga
Sponsors: Intel, Sandia National Lab
The push toward heterogeneous architectures to increase performance while reducing energy consumption creates considerable challenges for software development. For example, programmers must make non-trivial decisions about when to use special accelerators vs. powerful core CPUs and also become steeped in complex architectural details to tune effectively. The goal of this research project is to alleviate these challenges using a novel framework that enables a wide-range of computations to be expressed at a high-level and subsequently tuned automatically for the underlying heterogeneous platform.

Qameleon, a new programming environment that can cooperatively tune the program and the hardware configuration automatically and continuously using statistical machine learning techniques. Qameleon has two major components: Qilin+ (pronounced ``chill-in plus'') and an autotuner module. Qilin+ is a dynamic compilation system which generates code for different architectures at run-time. Programmers write their code using common programming environment and Qilin+ distributes work among different system processors CPUs, GPUs. Qilin+ also includes a profiling infrastructure for monitoring execution and collecting data for later feedback into the autotuner.

Students: Sunpyo Hong, Puyan Lotfi, Aniruddha Dasgupta
Collaborators: Richard Vuduc (GT), Chi-Keung Luk (Intel)
Sponsors: Intel, NSF, SRC
Prospector is a software system to help parallel programming. It identifies where to parallelize and how to parallelize them. It also predicts performance benefits after parallelization.

Students: Minjang Kim, Puyan Lotfi
Collaborators: Chi-Keung Luk (Intel)
Sponsors: Intel NVIDIA CUDA is one of the most popular and effective instruments for general purpose programming on GPUs today. Though the results obtained through CUDA can be very encouraging, it takes a significant amount of tuning and experimenting on the part of the programmer to achieve this. The optimization principles for CUDA are many but usually have a lot of tradeoffs involved. In this project we are trying to provide programmers with tools to predict performance given an optimization and give an insight on what are the bottlenecks involved. We are creating an enhanced Analytical GPU model which would provide us with metrics pointing out to key performance factors.

Students: Sunpyo Hong
Collaborators: Prof. Richard Vuduc
Sponsors: NSF