User Tools

Site Tools


GPU Topics

There is a great deal of flexibility in the SDK for tuning CPU and GPU performance. This topic will outline some approaches. For information on optimizing memory usage, please see the Keeping Memory Usage Low topic.


GPU Performance

  • LOD is King: GPU performance is perhaps most impacted by the overall LOD settings of a given forest. In the reference application, there is a global LOD scalar, world::tree_lod_scalar in the SFC file. The closer the LOD, the more obvious the dynamic LOD adjustments become, so a balance must be struck. Remember that no other single adjustment will impact the frame rate as much as scaling the LOD parameters down.
  • Effects LOD Dialog: The “Effects LOD” dialog in the Compiler app gives the user a great deal of control over how many shader effects are active per-LOD and per-material. Worst case scenario is full effects across all LODs. Best case is minimum effects (per-vertex lighting) across all LODs. The “Standard” preset on the dialog is helpful, but does default a bit high, including per-pixel lighting at the highest LOD (this allows normal maps on the truck, etc). For platforms with limited performance, careful tuning and heavy use of per-vertex lighting where possible makes the biggest impact.

    Your branch geometry probably never needs transmission enabled. If there's a detail layer, is it easy to spot after the first LOD? Is per-pixel lighting really necessary for the leaves? Does your branch/trunk geometry really need a specular effect? An often-used combination is per-pixel/non-specular for branch geometry, and per-vertex lighting for leaves. SpeedTree's different lighting models blend really well and transition seamlessly in most cases. Take advantage where you can.
  • Wind: SpeedTree's wind system can scale up to some very expensive effects. The “Wind LOD” dialog in the Compiler gives some control, allowing the user to choose among Full, Branch, and Global wind for the tree's LODs, but finer control is also available in the Modeler.

    For wind LOD, go to global or no wind as soon as possible. We have found that for trees with full wind, going from full in the highest LOD to global in the next works very well in terms of performance and perception. For smaller models like shrubs and bushes, going to no wind as soon as possible is also recommended since when rendered very small, wind effects are difficult to perceive.
  • Texture Sizes: The SpeedTree model library comes with very high resolution textures. Be sure to use the texture controls on the Compiler to keep the output textures at a reasonable size. Controls as available for the overall outputs, per 3D and billboard atlases, and individual textures within the atlases.

    Cap textures (especially those in our library) are often very large relative to the size of the geometry they're applied to. More often than not, these can and should be scaled down considerably.
  • Billboard Cutouts: Using the Compiler, it's possible to tradeoff more vertex processing for less pixel processing in the billboard geometry. In our experience, using four triangles per billboard (e.g. “slice” cutouts with three slices), almost always sees a performance benefit, more so when billboard global wind is disabled, making the per-vertex cost less expensive. Note the billboard preview window when making adjustments to ensure that enough of the map is being cutout.
  • Unique Model Count: It's tempting to put 50 or more unique SRT files in a given scene. The SDK will certainly handle this well, but it's more efficient to use fewer models. For a given forest of one species, for example, three unique models randomly scaled and rotated with hue variation go a long way.
  • Favor Individual Atlases: For most platforms, it's better to use unique 3D and billboard atlases, as long as the space is used judisicously. It makes it far easier to mix and match compiled trees, too, as trees compiled into an atlas with others isn't nearly as relocatable. Plus, there's less wasted space when using multiple smaller atlases as opposed to a few large ones.
  • Reduce Shader Count: With a diverse forest, the Compiler can create quite a few shader combinations. Using the Compiler's “Shared shader path” will make all of the shaders compile to the same location (as opposed to a subfolder at the SRT's destination). This will avoid any duplication among the tree models.
  • Overdraw: For dense forests, overdraw can get out of hand quickly. Be sure to have the artists trim as many interior leaf triangle as possible. Also, having minimum shader effects on the leaf geometry will also help. And of course, when possible, render front to back.
  • Depth-Only Prepass: When selected, the Compiler will generate optimized depth-only shaders. It can make a big difference in the overall render speed in forward or deferred mode, especially when cascaded shadow maps are used.
  • Draw Distance: Keep the draw distance as short as possible. Draw distance mostly impacts the number of billboards rendered and will dramatically increase the amount of vertex buffer space used for the instance lists. Remember that the number of billboards rendered will increase exponentially as the draw distance increases.
  • Use the Grass System: Take advantage of the grass system. While grass models have the restriction of allowing on a single LOD and single material, it is very convenient to use for any sort of ground clutter and cover, be it grass, weeds, rocks, broken twigs, litter, fallen leaves, etc. While the reference application places these instances randomly, you are free to place them according to any rules you wish, which may circumvent having to store specific locations for every model.
  • Fog: Fog computations can be expensive, especially when the fog color isn't constant. Try turning fog off for the 3D geometry but keeping it on for the billboards. Make sure to push the fog back so that it doesn't start taking effect until after the billboard line.