2D Matrix Multiplication on GPUs with Limited Memories
There was a time when I imagined feeding my big big data to the graphics cards smoothly, but Mom told me that is impossible because my data stream is much too big and the memory of the graphics cards is limited. The memory is to a graphics card what capacity to a human brain.
But we have hard drives and CPU memories which are a lot larger than GPU memories. Using CUDA one can perform such operations like slicing matrices into different parts then calculating sub-matrices multiplications once at a time on GPU and composing the little matrices into the final result, so as to ease the burden when the matrices are too big to load on the GPU memory at a time, which is just what this article introduced.