GPU Improvements for Reverse Automatic Differentiation

Back to all technologies
Download as PDF
Researchers at Purdue University have developed two novel methods for reducing GPU memory requirements for reverse automatic differentiation. Neural networks are increasing in layers (getting deeper) and nodes (getting wider), but GPUs have limited memory, limiting the run time of differentiable programs. Even with increased capacity, sometimes multiple GPUs are required for processing. The methods developed by Purdue researchers improve the running of extremely deep neural networks and extremely long-running differential programs. One of the methods, termed divide and conquer checkpointing, reduces the memory requirement for storing the intermediate values and results. This is particularly useful for reverse automatic differentiation, which requires saving intermediate results of the forward sweep to perform the reverse sweep. Another method, termed tensor streaming, performs just-in-time migration of data back and forth between the CPU and GPU. This utilizes the higher memory of CPUs compared to GPUs; the highest-performing CPUs (8 TB) have 100 times more memory in a single node than the highest-performing GPUs (80 GB).

Technology Validation: Evaluation of the researchers’ system in various real-world examples showed highly efficient use of CPU and GPU resources.

- Reduces GPU memory requirements
- Training large deep-learning models
Dec 23, 2021
PCT-Gov. Funding

Sep 10, 2021
Provisional-Gov. Funding
United States
Purdue Office of Technology Commercialization
The Convergence Center
101 Foundry Drive, Suite 2500
West Lafayette, IN 47906

Phone: (765) 588-3475
Fax: (765) 463-3486