

Game: C++ or Blueprint gameplay operation.Frame: Total time to finish each frame, similar to ms per frame.: Unobtrusive view of frames per second (FPS) and ms per frame. The most important commands pruned from the above list: These can serve as a supplement to profiling with RGP.Ī list of all stat commands is officially documented here: This section covers the built-in UE4 profiling tools. Next, we can enable our optimization to see the performance impact: r. 1Īfter taking another performance capture with RDP and going back to the Event Timings view in RGP:

We did this by switching to a compute shader and leveraging LDS (local data store/groupshared memory) – a hardware feature available on modern GPUs which support Shader Model 5. ) for this shader shows that there is a lengthy loop that we need to parallelize if we want to maximize the GPU hardware and eliminate any partial waves. The ISA view is also useful for other optimizations like scalarization which are not covered here ( ) The ISA tab will give us the exact shader instructions that are executed on GPU hardware as well as VGPR/SGPR occupancy. On GCN GPUs and above, this kind of GPU workload will execute in ‘partial waves’ which means the GPU is being underutilized. The Information tab shows that our pixel shader is only running 1 wavefront and only taking up 32 threads of that wavefront. To inspect the details of the pixel shader running on the GPU, right-click on the draw call, select “View in Pipeline State” and click on PS in the pipeline. We can see that the DrawIndexedInstanced()Ĭall takes 211us to complete. Many third-party tools exist, but the Radeon Developer Panel that comes with the Radeon GPU Profiler has a Device Clocks tab under Applications which can be used to set a stable clock on AMD RDNA™ GPUs, as shown below: You may fix the clocks on your GPU to reduce this variance. But this trades lower power consumption for performance and can introduce noise in our benchmarks, as the clocks may not scale the same way between runs of our application. Most GPUs have a default power management system that switches to a lower clock frequency when idle to save power.

This can be useful in gathering repeatable average frame time data for your level.Īnother technique for helping reduce noise in profile results is to run with fixed clocks. It will then shutdown automatically after a fixed number of frames. This means that, if you have your project set up to run a camera flythrough on startup, it will advance through the flythrough using fixed timesteps and a fixed random seed. Rather, it runs 211×60=12,660 frames using a fixed timestep of 1/60=16.67 milliseconds. In the above example, benchmarkseconds is not wall-clock seconds (unless every frame of the demo runs at exactly 60 fps).
