Overview

The goal of this project is to address the perception that Aqsis is too slow to be useful. Many previous attempts have been made to address this, all failing to provide any worthwhile improvement. This project is an umbrella project for all work to improve speed and memory use of Aqsis.

This project will be run in a different way to most development branches. Instead of a single identifiable target, with 1 or 2 developers working on it, this is intended to be a much larger undertaking. The advantages of bringing all optimisation work under a single containing project is that there is less chance of concurrent optimisation efforts cancelling each other out.

Project Goals

The main goal of the project is to improve the speed and memory use of Aqsis during production style renders. While we will no doubt compare with other RenderMan renderers during this work, the primary concern is not to be able to compare well against other renderers, but rather to provide a solution that performs well in production. In addition to this condition, any such cross renderer comparisons must be made in the context of quality, our goal is speed, but not at the cost of quality.

Progress & Results

Sub Project Status Overview

Project Status Notes
Bucket Overlap Caching Implementation 2008/01/30: Implemented replacement occlusion code.
ShaderVM Inner Loop Redesign Design Need to demonstrate techniques for reducing shaderVM code duplication.
Removal of IqBound Testing Implemented at #1841
CqMatrix Optimizations Testing Implemented at #1854
Reducing shader loads for triangle rendering Design
Geometric optimizations for quadrics Design Flesh out what to do about DicePoint().

Results Submission

Whenever significant changes have been made a new tag should be created in the git repository in the format Beaker-N. The process to produce a full results set is as follows.

  1. Ensure that you have the code at the specified Git tag, and make a clean build.
  2. Ensure that you have the latest/specified revision of the test suite.
  3. Ensure that the RTS will be finding the proper Beaker executable.
  4. Ensure that the pdiff tool can be found.
  5. From within the testing/performance folder, execute the RTS tool ../regression/rendertester.py -x -c <config_name> -T Beaker-<tag_num> beaker.
  6. Results should be submitted using the web interface.

Results

Results are automatically managed by the beaker web interface - see the results table.

Base Point

Before any meaningful optimisation can take place we need a base point for comparison. To be able to do this, we need two things, a suite of test content and a set of test machine configurations.

The test content suite should be defined to cover a wide range of features, and test performance specifically, so a cube in an empty world isn't going to cut it. We need a small, but broad set of test content, initially I'd suggest around 5 or 6 pieces. Once defined, the content can be used within the context of the regression test suite framework. The speed test content should be stored in a separate folder, and not included as part of the standard regression test configuration. A new configuration file should be provided to run the speed test easily.

The test machine configuration should cover at least Linux, Windows and MacOSX. Again, we don't need a huge number of machines, just a handful on which to test the content. The machines chosen will have to guarantee not to change setup dramatically during the lifetime of the project.

Once these two have been defined, a matrix will be created here in which we can keep track of the performance figures as we progress.

Test Content

ID Title Stressed Features Notes
1 capsules Occlusion, RIB Parsing, Scene Setup This file comes from the jrMan distribution.
2 makehuman Texturing This file comes from the MakeHuman project.
3 250mm Depth of Field Sampling
4 25mm Depth of Field Sampling
5 0mm Depth of Field Sampling
6 killeroo Subdivision Surfaces Model, textures and shaders courtesy of headus
7 smoky Shading
8 petunia Motion Blur Model courtesy of Macouno http://www.alienhelpdesk.com, motion capture courtesy of http://www.mocapdata.com, exported from Blender using Mosaic.

Test Configurations

To gather this information, use the following methods.

  • Windows - Run systeminfo from the command prompt.
  • Linux - Run cat /proc/cpuinfo, cat /proc/meminfo and cat /proc/version from the console.
  • MacOSX - Run system_profiler | grep CPU, system_profiler | grep Memory and system_profiler | grep Version from the console.
Hardware Software
ID Name Owner Model Processor No. Processors Memory OS (version) Compiler (version)
1 Khepri Paul Gregory Dell Precision Workstation 670 x86 Family 15 Model 4 Stepping 1, GenuineIntel ~3192Mhz 2 (HT) 1,022MB Microsoft Windows XP Professional (5.1.2600 Service Pack 2 Build 2600) Microsoft Visual Studio 2005 [Express Edition] (Version 8.0.50727.762 (SP.050727-7600))
2 trinity Leon Tony Atkinson Mac mini PowerPC G4 (1.2), 1.42GHz 1 256MB Mac OS X 10.4.11 (8S165) GCC Version 4.0.0 (Apple Computer, Inc. build 5026)
3 osiris Paul Gregory Dell Precision M50 Mobile Intel(R) Pentium(R) 4 - M CPU 2.00GHz (512KB cache) 1 768MB Linux version 2.6.23.1-49.fc8 gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33)
4 niobe Leon Tony Atkinson Mac mini Intel Core 2 Duo (1.8), 1.83GHz 1 (Dual Core) 2GB Mac OS X 10.5.6 (9G55) GCC Version 4.0.1
5 cuboctahedron Chris Foster Custom AMD Athlon™ 64 Processor 3000+ (2000MHz) 1 1GB 64 bit linux (currently running kernel 2.6.25-gentoo-r5) g++ (GCC) 4.1.2 (Gentoo 4.1.2 p1.1)

Approved subprojects

Completed

In progress

Proposed subprojects

Extended occlusion culling

The aqsis occlusion tree is ineffective in the presence of heavy motion blur and depth of field. This means that all moving or blurred surfaces have to be diced and shaded, even when they may eventually be occluded in all the relevant samples.

Proposal

The occlusion tree is currently split only along the two spatial directions, not along the time direction. With this setup it's impossible to detect that the back side of a rapidly moving object will remain unseen. Instead, the occlusion tree should split along three directions - two spatial and one time. The spacetime bounding box of an object can then be checked against this enhanced tree in an analogous way.

Depth of field may present a slightly different challange since depth of field is not a property of the object bounding box. This requires some thought, but can probably be made to work since most occluded objects will remain occluded even in the presence of focal blur.

Reducing shader loads for triangle rendering

Overview

Triangle rendering in aqsis works by turning the triangle into a quadrilateral by creating a ghost vertex. Ghost vertices propagate during splitting, such that the resulting grids may contain some “ghost micropolygons” which are eventually culled.

Currently, shaders run over the entire grids which result from this procedure, including the ghost micropolygons - this inefficiency can be avoided.

Proposal

Since some of the micropolygons are ghosts and will eventually be culled, shading never needs to be performed over this part of the grid. This proposal involves setting the running state of the newly created grid to false over the area containing ghost micropolygons. This would effectively turn off the shader in those regions.

Shading performance gains of up to 50% are expected with this approach, for dense meshes in which a single triangle corresponds to a single grid.

Status

Current Status: Design


Personal Tools