

myApp, I can quickly see a summary of all the kernels and memory copies that it used, as shown in the following sample output. Sometimes this is just a sanity check: is the app running kernels on the GPU at all? Is it performing excessive memory copies? By running my application with nvprof. I often find myself wondering if my CUDA application is running as I expect it to. But nvprof is much more than that to me, nvprof is the light-weight profiler that reaches where other tools can’t. At first glance, nvprof seems to be just a GUI-less version of the graphical profiling features available in the NVIDIA Visual Profiler and NSight Eclipse edition.

nvprof is a command-line profiler available for Linux, Windows, and OS X. CUDA 5 added a powerful new tool to the CUDA Toolkit: nvprof.
