Trace: » soc_geometry_cache
Geometry Cache
Project Details
Aqsis is a REYES-based renderer adhering to the RenderMan® Interface and it is subject to the architecture’s limitations, which motivates the implementation of a hybrid solution that can take advantage of both REYES and Ray Tracing best features. Ray Tracing can compute accurate shadows, and supports complex lighting, realistic surface reflection models and global illumination. However, Ray Tracing algorithms require random access to the entire scene since rays can pass through many different regions of a scene in a very short period of time. So it is necessary that all parts of the scene geometry be available at any point during rendering.
In order to make this viable, an on-demand caching technique that stores a subset of the geometry of the scene is suggested for implementation.
The development of a geometry caching system will be composed of the following:
1. Data structure – a ray intersection acceleration structure will be used, but there are a myriad of possibilities to choose from, with very different advantages and disadvantages, so this too shall be considered a research topic for the project. We’ll likely aim at fewer ray-object intersections for acceleration, with approaches such as space subdivision (KD-tree) and bounding volume hierarchies (Kay-Kajiya). So the spatial acceleration structure and the tesselated grid will be developed in this stage.
2. Filtering – implementation of the ray component with derivative information. We'll probably choose between Ray Differentials or Beam Tracing. I'll conduct the study of each of these approaches and prototype a solution. [Christensen et al., 2003] clearly prefers ray differentials over beam tracing, since, according to his work, intersection, reflection and refraction calculations are harder with beam tracing, with the added problem of having the light beam split after a bounce in the scene. Still in this stage, I'll develop the code to transverse the spatial acceleration structure -or, if during development, it seems to be a better approach, I'll write this part of the code briefly before the intersection code.
3. Memory management and the Geometry Caches – since we are concerned with holding geometry at different resolutions, it is expected that we have more than one cache, but the exact number of caches is a topic of research for this project prior to implementation. [Christensen et al., 2003] suggests 3 separate caches for coarse, medium and fine tessellations, but encourages experimentations with a different number of caches; [Pharr 97], on the other hand, uses 2 grids, a geometry grid with thousands of triangles and an acceleration grid with a few hundred triangles for ray intersection acceleration. Given the possibilities, research and experimentation is suggested throughout the project. Deciding on the size of the geometry caches, if all separate caches will be of the same or different sizes in memory, controlling them and choosing a replacement strategy when the caches are full are all part of this research topic for the project. Least Recently Used replacement strategy is by far the most commonly used, but investigation is ideal.
4. Intersections, Packaging and integrating components - I'll finally develop the intersection with bounding hierarchies and with tesselated grids, which will be the only missing parts of the project, and then package the components together and perform system-wide testing, with image output.
Throughout the project, discussions with the mentors and the Aqsis community, unit testing and documentation writing will be carried out as needed.
Reading List of Technical Papers
These are the papers I've been working with so far.
- Ray Differentials and Multiresolution Geometry Caching for Distribution Ray Tracing in Complex Scenes (Christensen et al.)
- Ray Tracing for the Movie 'Cars' (Christensen et al.)
- Geometry Caching for Ray Tracing Displacement Maps (Pharr/Hanrahan)
- Tracing Ray Differentials (Igehy et al.)
Notes and Comments on the Papers
This information was sent to the development mailing-list, but in order to keep a journal or a sort of documentation that might be helpful when tracking the decisions made, all pertinent comments and notes will be added here as well for easy reference.
A Survey of Ray Tracing Acceleration Techniques (Arvo/Kirk)
- what's new
Not much, really. In this paper - as the title says - a survey of the state of the art of acceleration techniques as of 1989 is presented, but I think we're safe to assume the taxonomy of strategies is still very current. Basically, the most promising acceleration techniques are those of (1) reducing the average cost of ray-object intersection and (2) reducing the total number of ray-object intersections. There is also (3) replacing individual rays with a more general entity, such as beams or cones, but I won't get into this now because I haven't gone over its more specific details. Examples of: (1):: Bounding Volume Hierarchies (BVH), 3D spatial subdivision (uniform or non-uniform); (2):: Adaptive tree-depth control, which terminates a ray tree taking into consideration the maximum contribution of a ray to the pixel color;
BVH::
- theoretical logarithmic time complexity in the number of objects, and not linear. More specifically, it's O( I log N ) on average, where I is number of pixels of the resulting image and N is the number of objects.
- the data structure is a tree;
- the only costs associated with the algorithm are those of intersecting the bounding volume and the object. Intersections are indeed more costly, but we're counting on the fact that most objects will be discarded very soon when their bounding boxes are discarded, compensating the costlier intersection test.
- I used the word 'boxes', but many bounding volumes are possible, and further investigation is needed to evaluate the trade-offs between tightness of fit and cost of intersection - the tighter the fit, the costlier the intersection.
- Bounding volumes might intersect - this is not desirable, but can be dealt with quite simply.
3D Spatial Subdivision::
- subdivides the environment instead of enveloping the objects in bounding volumes; space is divided in non-overlapping voxels (cuboid);
- the idea is paying attention only to the objects close to the path of the ray, so the objects tested for intersection are the ones contained by a voxel pierced by the ray;
- voxels are traversed in order if the ray doesn't intersect with any objects; if it does, it's already the closest object, so further tests are not needed (same idea as BVH);
- Data structures used include octrees and BSP trees (and there is so much to them they deserved an email like this just for them!);
- depends on the ability to efficiently access voxels in the order defined by the path of the ray - this might be a problem, storage-wise;
- Uniform Spatial Subdivision
- the subdivision is independent of the structure of the scene;
- voxels are easily accessed by incremental calculation;
- Non-uniform Spatial Subdivision
- adapts to the scene, having larger voxels in low density or void areas and smaller voxels in high density portions of the scene;
- there might be a need for hashing in order to find the next voxel, increasing storage requirements;
General comments:: These three strategies can be (and already were) implemented in isolation if we really prefer one over the others, however, the use of more than one acceleration technique can (and usually does) perform better than any of the isolated ones. Glassner, for example, used nonuniform space subdivision to guide the construction of a bounding volume hierarchy, selecting volumes that are both tight fitting and disjoint - and this is very desirable.
- what I learned
Everything that was presented, but particularly what are octrees, BSP trees and kd-trees (not mentioned here, but went on by curiosity).
- what is of interest to Aqsis
We do have to choose an acceleration technique, and this should provide us with enough food for thought for now.
Ray Tracing Complex Scenes (Kay/Kajiya)
In this paper, three interesting aspects of the whole project are described, and they are: - A new type of bounding volume - A more specific data structure - An algorithm to traverse the hierarchy
- The bounding volume
The basic idea behind the bounding volume is to use slabs to fit convex hulls as tightly as desired. The more tightly a primitive object is fit, the less ray-object misses there will be, at the expense of having costlier tests, because the more complex a bounding volume is, the costlier it is to test it for intersection.
The bounding volume uses plane-sets (families of parallel planes). Each plane-set is defined by a single unitary vector, called the plane-set normal. Each plane of one given plane-set is uniquely determined by the distance from the plane to the origin. So the idea is to get two planes that best bracket the object (thinking in 2D, that is), and the region between these two planes is called a slab, represented by a min-max interval, where min is the smallest distance and max, well, the largest. In 3D, three slabs are necessary to bound an object (two are enough in 2D), and it is a requirement that the plane-set normals be linearly independent. We can use more slabs if we want to better tighten the object, but the added complexity comes with the price of longer computation times, so we're looking for the best trade-off.
It is generally suggested that the same plane-set normals be used for all objects in a scene so we can perform many calculations only once, greatly accelerating the task of intersecting rays with bounding volumes.
- The data structure
This is the Kay-Kajiya tree Christensen mentions in his paper (meaning: this is the data structure Pixar uses). It's a binary tree, constructed top-down. At each level, all the objects are sorted by their x coordinate, and the objects are partitioned at their median, defining which objects go on each side of the tree. The process is recursive, except for the fact that at each level, the objects are sorted and partitioned according to a different axis.
Even though this takes more time to construct, near objects will be grouped together, giving us all the benefits of coherency.
- The traversal
The traversal algorithm searches the tree in an order derived while the traversal is in progress. A candidate node is one that has been found to be in the path of the ray, but whose children have yet to be tested, and a heap keeps track of the candidate nodes. The candidate with the smallest positive distance estimate is selected, where the distance estimate is the nearer number of the two yielded by the intersection of a ray with a bounding volume. These numbers impose an order in which to search and resolve the ray's intersection, exploiting the distance interval optimization Arvo and Kirk mention in their article.
- General comments
The techniques presented by Kay and Kajiya still seem to be very competitive performance-wise, so I think these should be considered for implementation. Many, many authors mention this paper as a reference implementation of ray tracing acceleration techniques. I really would like to hear your opinions on what other aspects we should take into consideration when choosing the best techniques to implement. Still this week, I want to cover a few other possibilities so we can compare -at least theoretically- which techniques are better than others, so as to speed up designing and, if desired, prototyping this RT module to be.
Ray Differentials and Multiresolution Geometry Caching for Distribution Ray Tracing in Complex Scenes (Christensen et al.)
Instead of the summary approach, I decided to keep to a shorter, quicker, bullet-point note-taking from this paper onward. After all, in case of any doubts or if in need of further clarification, we can always refer back to the original paper.
GOAL: Render Ray Tracing and Global Illumination effects in scenes so complex that a finely tesselated representation of all the objects cannot fit in memory, requiring a geometry cache.
The article focuses on distributed Ray Tracing in complex scenes, and it builds on geometry coherence and simplification, ray differentials and texture caching.
Ray Tracing tesselated geometry is faster than ray tracing high level geometry representations. However, for that to hold true, tesselation must fit in memory. So an idea is to simplify geometry, specially in parts of the image where a detailed tesselation will be of no benefit.
Ray tracing is considerably faster when the rays cast are coherent. And only directly visible objects and the reflected, refracted and shadow-casting objects need to be kept in tesselated form. Surfaces with high curvatures or high frequency bumps or displacements lead to reflected and refracted incoherent rays, causing geometry access to be costlier. Therefore, coherent rays can be dealt with. It's the incoherent rays that aggravate processing. So we have to address the storage cost of tesselation, with special care when considering the incoherent rays and the geometry they might intersect.
- The need for a geometry cache is somewhat clear, however there is the problem caused by incoherent rays. Incoherent geometry access can cause a cache to thrash. The solution found for this is having a demand-driven caching technique minimizing the storage cost of tesselation.
The key insight is that there are 2 types of rays:
- Coherent rays - Specular reflection and refraction rays from surfaces with low curvatures. It is characterized by having a small differential, high accuracy and requires geometry to be finely tesselated (higher detail). A geometry cache for finely tesselated geometry can hold few entries, since access is coherent.
- Incoherent rays - Specular reflection and refraction rays from wide distribution ray tracing. These rays are characterized by having large differentials, no need for high accuracy and a coarse tesselation should be fine since details are unnecessary. A geometry cache needs to hold many entries of geometry that is fast to recompute, since there will be many cache misses.
Given this observation, separate caches are used for coarsely, medium and finely tesselated surfaces. Geometry is tesselated on demand and the tesselation rate is based on ray size, given by its cross-section, which depends on the ray differentials.
Ray differential correspond to a fraction of the sampling hemisphere for distribution ray tracing of diffuse reflection and translucency.
Implementation
Rays are only shot to compute reflection shadows on the vertices shaded by Reyes, so there are no camera rays. At the time this paper was published, different representations of the geometry for Reyes and Ray Tracing were used - this was later reconsidered.
Three geometry caches are stored, but 5 different tesselation resolutions are used by merging sets of quads from the 2 adjacent caches. The cache replacement strategy is Least Recently Used, and it works well if there are many entries for the coarse tesselation cache.
Of special interest to Aqsis:: We have to define a hybrid architecture. The one presented here seems not difficult to implement and the solution would provide the benefits we want from ray tracing. I'm currently unaware of any other hybrid reyes/rt renderers at this time!
Ray Tracing for the Movie 'Cars' (Christensen et al.)
This paper builds on the last one, so I'll add only the considerations that add upon the previous paper.
Tesselation
Reyes chooses tesselation rates for a surface patch depending on viewing distance, surface curvature and possibly, viewing angle. The highest tesselation rate used for Ray Tracing is the same as the Reyes tesselation rate for that same patch.
Subsets of the vertices are used for coarser tesselations, ensuring bounding volume consistency, despite the tesselation. The coarsest tesselation of a given patch is the four corners of the patch.
Now, the same tesselation rates from Reyes are used, for 2 reasons:
- There are fewer quads to test for intersection this way; and
- There are fewer self-intersection problems.
Geometry cache
Three caches are used: the coarse cache holds 4 vertices patches (1 quad), the medium holds patches with at most 25 vertices, and for the finest tesselation, the cache holds patches of at most 289 vertices.
Accelerator Structure
A bounding volume hierarchy is used - a Kay-Kajiya tree built dynamically. For the finest tesselation level, there is one additional bounding volume for groups of 4×4 quads, which are stored in the fine tesselated geometry cache with the tesselated geometry.
Shading
For each ray hit, 3 shading points are created - one is the ray hit point, and the other 2 are the differentials. Secondary rays, however, are only traced from the ray hit point, not from the differentials.
Hybrid Rendering
Shading of directly visible objects by Reyes cause rays to be traced. All first level rays originate from Reyes shading points.
General comments
This only adds to the last paper, adding details about their implementation at the time. The hybrid structure is still interesting!
System Architecture
Ignoring, for a moment, that the implementation will be a hybrid one, I came up with this rough design to have a better idea of what modules will be part of this project. I'm considering that we'll work with distributed ray tracing. No other assumptions about algorithms are made.
The Scene information will be readily available from Reyes. The camera will probably be non-existent in a hybrid implementation.
The shaded rectangles are the modules that will be implemented as part of SoC. The remaining modules will be coded in the future. The geometry cache will come in the “Bounding Volume” in the illustration.

