The goal of this project is to provide architectural innovations and mechanisms for increasing the performance of GPGPU applications. The initial focus has been on improving tolerance (or reducing) to memory access latency and on studying the effect of instruction fetch and DRAM scheduling on the performance of GPGPU applications. For improving tolerance to memory accesses, we have proposed both software and hardware based prefetcher mechanisms for GPUs with adaptive behavior. We are also working on optimized memory controllers for further improving tolerance to memory accesses.
: Hyojong Kim, Dilan Manatunga
: Intel, Sandia National Lab
Recognition ability and, more broadly, machine learning techniques enable robots and IoT devices to perform complex tasks and allow them to function in diverse situations. In fact, these devices can easily access an abundance of sensor data that are recorded in real time such as speech, image, and video. Since such data are time sensitive, processing them in real time is a necessity. Furthermore, usually recoreded data are highly sensitive and requires careful privacy constarints. At the same time, machine learning techniques are known to be computationally intensive and resource hungry. As a result, an individual resource-constrained robot or IoT device, in terms of computation power and energy supply, is often unable to handle such heavy real-time computations alone. To overcome this obstacle, in this project, we are working on techniques and frameworks to harvest the aggregated computational power of several devices to perform real-time perception.
: Ramyad Hadidi, Jiashen Cao
: Intel, NSF
Three-dimensional (3D)-stacking technology, which enables the integration of DRAM and logic dies, offers high bandwidth and low energy consumption. This technology also empowers new memory designs for executing tasks not traditionally associated with memories. A practical 3D-stacked memory is Hybrid Memory Cube (HMC), which provides significant access bandwidth and low power consumption in a small area. In this project, we characterize and analyze the behaviors of 3D-stacked memories such as HMC, and provide novel techniques and methods to execute current applications on these devices more efficiently.
: Hyojong Kim, Ramyad Hadidi