User Tools

Site Tools


CPU Topics

There is a great deal of flexibility in the SDK for tuning CPU and GPU performance. This page outlines some approaches. For information on optimizing memory usage, please see the Keeping Memory Usage Low topic.


CPU Performance

  • Tree Cell Sizes: Pick a tree cull cell size that works best with your forest configuration. The cell size affects the number of cells the SDK has to process in the rough and fine culling stages. The larger the cell size, the quicker the visible cells can be determined. Once the cells are determined, for any cell that intersect the frustum and aren't at a billboard distance the SDK must loop through the 3D instances to determine their individual visibility. Having smaller cells reduces this time. The reference application, whose world units are feet, uses a default cell size of 1200.0 feet (set in the SFC file, world::3dtree_cell_size parameter), which we believe strikes a good balance for the example forest's population density, but since every forest is different and uses different units, you should definitely experiment with different sizes.
  • Grass Cell Sizes: You can pick different cell sizes per grass type and it can impact both CPU and GPU performance a great deal. Each grass model is culled in a separate call, so separate sizes are easily accomodated. In contrast to tree instances, grass instances are not individually culled. If the cell is in the frustum, then every instance in it will be rendered. This cuts down on CPU usage greatly, but can be hard on the GPU for high densities. To strike a balance, smaller cell sizes can be used. In fact, we recommend smaller cell sizes for high density grass, and larger sizes for sparser populations like rocks or boulders.

    By way of example, the reference application's Plantation forest (modeled in feet) uses cell sizes of 20 for the high density grass, and up to 100 for the rocks and butterfly models.
  • Impact of 3D Trees: The more 3D trees that appear on screen (as opposed to billboards), the more LOD computations the CPU will have to do. This includes updating the instance vertex buffers as well. Keeping the billboard transition line as close to the camera as possible helps both GPU and CPU costs. With larger billboard textures, a good texture filter, and global billboard wind, billboards can get pretty close without losing too much quality.
  • Draw Distance: It seems obvious, but bears discussion: a large draw distance greatly impacts performance. For the most part, the 3D tree rendering system is unaffected, but the CPU load needed to stream billboards in and out of a large frustum will take its toll. Considering that the volume of a frustum increases exponentially with distance, the number of visible billboards can get out of hand very quickly. We've routinely tested with one- and two-mile visibilities, but they do come at a price.
  • Grass Densities: Nothing can kill performance faster than high grass densities. Keep populations as sparse as possible, keep the pixel effects as low as possible, and keep the LOD range reasonable. Remember that the grass models have only a single LOD level, so whichever effect you choose will be used for every instance of that type.
  • Keep App Data SDK-Friendly: The entire streaming and culling system is based on organizing trees into evenly-spaced cells. It is important for performance reasons that the application can quickly populate the cells provided by the SDK. This mostly means not wasting cycles during a render loop determining which instances go into which cells. We provide the example class CMyInstancesContainer, defined in the reference application in MyPopulate.h/cpp. It shows how to quickly and easily organize an existing population of base trees and instances into cells so that they can be quickly passed into the SDK. Even without an example, it's not difficult to organize instances into evenly-spaced cells.
  • Populating Grass: Grass-populating code is also critical. Try to precompute as much about the grass instances as possible. Profiling your population function is recommended. During the development of our reference application, we found that while we were using a random generator that provides float values very quickly, it was slow to reseed. As a result, we avoid reseeding it per cell.
  • Parallelize: The culling and streaming functions are thread safe. Opportunities for parallel culling/stream include:

    * Trees and grass in separate threads

    * Each grass layer in a separate thread

    * A thread per light view (e.g. for use with a cascaded shadow map)

    Note that at the RenderInterface library level, these updates involve instance vertex buffer updates. Graphics APIs like OpenGL don't necessarily deal well with simultaneous GPU memory writes.
  • Stalls: The instance vertex buffers are double buffered (or better). The number of buffers is defined as c_nNumInstBuffers in ForestRI.h and can be adjusted easily. However, even with double buffering, sometimes the buffer updates can causes a wait-on-GPU condition that will be accounted as CPU time by the SDK. Spikes in the SDK's reported cull/stream time are almost always due to this.