Trace: » roadmap » guide » metric » soc_dicing_optimisation » todo » bug_reporting » doc » malco » playground » soc_dsm
SOC Project: Deep Shadow Maps
Rendering a deep shadow map
The following is an overview of the DSM theory.
My current understanding of the DSM algorithm is based on the paper by Lokovic and Veach, and a website posted by Benjamin F. Gregorski.
Psuedonym has a wiki page on DSM here.
Notation and terminology
- A light source has initial outgoing radiant energy Einitial = 1.0
- Eoutgoing is the light energy at any given dpeth: initialize Eoutgoing = Einitial
- Eincoming is the light energy immediately before intersecting a given surface.
- A surface has opacity Osurface between 0.0 and 1.0, where 1.0 is completely opaque and allows no light through, and 0.0 is invisible.
- Transmittance: T
- surface transmittance: Ts
- volume transmittance: Tv
- Visibility, V at pixel (i,j) and depth z in the DSM: V(i,j,z)
Transmittance functions for subpixel samples
Surface transmittance, Ts
From the light's perspective, we calculate surface transmittance over a depth range for each subpixel sample in the map (or bucket).
In a raytracer, this would involve casting a ray through the subpixel center, initializing the lights energy to 1.0, then detecting, in order of depth, every micropolygon surface the ray intersects.
Then Reference each micropolygon's material properties and fetch its opacity. Calculate outgoing light energy as follows:
Eoutgoing = (1.0 - Osurface)*Eincoming
Note: Eoutgoing = transmittance at this depth.
Store in a buffer the depth value as well as Eincoming and Eoutgoing (2 Energy entries: one before, and one after surface intersection, and 1 depth value (step discontinuity)).
While light energy > 0, continue until every intersecting micropolygon's surface transmittance is stored. Stop if light energy falls to 0.
Since we use a scanline renderer, not raytracing, get a depth sorted list of the micropolygons covering each subpixel and traverse in closest-to-farthest order, much like the raycasting approach.
We render the scene, from the camera's perspective, as a series of buckets. Aqsis stores a depth-sorted list of micropolygon samples attached to each subpixel sample in the bucket.
Traverse this list to construct the surface transmittance functions for each subpixel sample.
Volume transmittance, Tv
If the view frustum containes volumetric objects, calculate volume transmittance over a depth range for every subpixel.
Record the volume transmittances in a buffer much like we did for surface transmittance, until either the light energy falls to 0, or there is no more volume along a ray.
Unlike surface transmittance, where transmittance was recorded at critical depths (intersection points), volume transmittance is recorded at regular intervals along a ray (user-specified delta?). Most likely we will have to sample everywhere along the ray, whether or not the sample point is inside a volume.
Total transmittance, T = Ts*Tv
For each point in the surface transmittance data, multiply the surface transmittance at that point with the volume transmittance at the same depth.
Since the volume data need not have a data point at the same depth, linearly interpolate (or quadratic interpolate…) between the two volume data points encompassing it and store the result in a third buffer. Do this in such a way that the number of points in the final set of transmittance data is (Tv union Ts) - (Tv intersection Ts): |T| = |(Ts U Tv)| - |(Ts ^ Tv)|
To put it more succintly:
- start at the beginning of both sets of data (Ts and Tv)
- find the data point from the two sets with the next smallest depth value
- From the other set, use interpolation to find what the transmittance would be at the same depth
- multiply and store product to T (final transmittance data)
- goto (2)
This is all done on a per-subpixel basis, resulting in transmittance functions for each subpixel.
Calculating Visibility functions
Uncompressed visibility functions
V(i,j,z) = Summation(wk*Tk(z)) where k is a sample, and w is a filter weight.
The visibility V(i,j,z) at pixel (i,j) and depth z is the piece of information that is stored in the DSM. Not just one element is stored per pixel: each pixel has as many visibility elements as its largest subpixel, and each pixel may have a different number of visibility values.
Calculate V(i,j,z) as a weigthed sum of all the subpixel transmittance values at each depth z (every unique depth in every subpixel). If there are n subpixel sample points, then there are about n times as many visibility values as there are transmittance vertices for an average subpixel.
First, consider that a pixel is subdivided into n subpixels, each with an associated transmittance function. The visibility for the pixel is a weighted average of these. If the subpixel transmittance functions have m points on average, then the visibility function will have about m*n points. We need to process these is z-depth order.
Here is the process:
- start with z-depth = 0 and V = 1.0
- Find the next smallest z-dpeth between the n transmittance functions we are working with.
- Compute what the transmittance would be on the other (n-1) functions at the same depth. Take the weighted average, according to our sample filter. Store this as the uncompressed visibility for the current depth. Goto (2).
The Pixar paper points out that taking the weighted average of all n transmittance functions directly at each depth is a rather slow way to compute the uncompressed visibility function. This is because there are m*n depths, and taking the average at each depth incurrs O(n) work. Thus the total work is O(m*n^2) which is non-optimal.
A more efficient algorithm is the following:
- Put all “nodes” of the n transmittance functions together into a list, keeping track of the filter weights on each node. By “nodes” we mean the (z,T) pairs which define the piecewise linear representation.
- Sort this list of nodes by depth - this takes O(m*n log(m*n)) time. (In fact, assuming that the n lists are already sorted, it should be possible to do this merge of the lists in O(m*n) time. std::list has a merge() function which is almost what we want here.)
- Scan through the list of nodes from z=0 to z=infinity. At each node the output visibility function may be updated in O(1) time, according to the prescription suggested in the paper (TODO: write this down). The scan therefore takes O(m*n) time.
The algorithm above runs in O(m*n log(m*n)) time, so should be much more efficient. The pixar paper mentions a heap, but this seems to be only one way of implementing the sorting of the node list. Presumably we could just use quicksort which would be simpler and probably more efficient.
Visibility function compression
Vuncompressed → Compression → Vcompressed with fewer data points
The idea is to get rid of some of the V values for a given pixel. Imagine three points lie on a line, or almost a line, then discard the middle point. You can still calculate the visibility at that depth by linearly interpolating between the other points. There is little or no error. Extend this idea to remove even more points. We try to discard as many points as possible while keeping the error below a threshold.
The process is:
- Initialize acceptable slope range to [-inf, inf]
- From an origin point (zi', Vi') at the beginning of the input visibility function step forward one point to (zj', Vj')
- Calculate the slope of the line segment between these points: Slope = (Vj' - Vi') / (zj' - zi')
- If the slope is within acceptable range, recalculate the new acceptable range based on the point just added (zj', Vj +/- epsilon), then goto (1), otherwise continue
- New acceptable slope range is: []
- The line segment must successively pass through all the target ranges for all the points between (zi', Vi') and (zj', Vj')
- Notice the range tends to narrow as we add more points, so it is less and less likely that the next point will fall in range
- Back up one point to z(j-1)' and discard any data points between depths zi' and z(j-1)'.
- Add z(j-1)' to the list of compressed points, with assicated visiblity value such that the slope is the midpoint between the acceptable slope range.
- set zi' to zj' and goto (1)
Stop when zj' is the last point in the function. Store the visibility to a DSM file.
One restriction is to only use original input z values in the output z values. Therefore the output may contain a subset of the input, but no new points.
The following diagram tries to capture the algorithm.
Deep shadow map lookup
Like ordinary shadow maps, render from the scene camera, transform each surface point into light space, projecting the point onto the light's image plane and fetch the visibility at that pixel, and the depth of the point. Use that value to shadow the point. Multiply the visibility with the surface color and get semitransparent shadows if 0 < V < 1. If the visibility is 1.0 then the point is not in shadow. If it is 0, the point is completely in shadow.
DSM lookups work as follows:
The visibility V(x,y,z) for a point in the scene with coordinates (x,y,z) in light space, which projects to pixel (i,j) on the shadow image plane, is a weighted average of all the visibility values at that depth, within a filter region on the shadow map (so the surrounding pixels influence the current pixel).
If you want to access the visibility at (i,j,z) in the shadow map, pixel (i,j) has a list of z-values in increasing order. Use a binary search to find the one you are looking for.
Since shadow lookups are often spacially local, keep a pointer with each pixel to the last referenced z-value. This is used as the starting point for subsequent lookups; search forward or backward from this point depending on if the next lookup dpeth is greater than or less than this depth.
DTEX File Format Specification
We need a binary file to store deep shadow maps to disk. Standard file types such as TIFF cannot be used because of the nature of the data: most importantly, the amount of data per-pixel is non-uniform. These files will potentially be very large, perhaps as big as 50 MB for a typical case. In order to access this information in an efficient way, the file format will be a tiled image format, such that the whole image is treated as a grid of sub-regions, or tiles, which may be individually accessed and loaded, in any order.
The DSM file should have a file header, which contains a table indicating the start position (file offset) of each tile. This table should indicate the sub-region 1) this tile covers so that the user may access the desired tile base on the pixels they need.
The file header should contain:
- Identification of the file type as a DSM tiled image.
- The total size of the file, the size of the deep data by itself (in case the user will want to load the entire image into memory if the data is small enough).
- The format of a visibility node: number of color channels, size/composition of a node
- The size of the tile table in number of tiles
- The tile table containing file positions of the beginning of each tile, and the region the tile covers.
In addition to the file header, the beginning of each tile will contain a tile header, indicating the relative start position of each pixel, or visibility function in the tile. Also contain a field indicating the size of the entire tile (total number of nodes in the tile). This is important so that the user may pre-determine how much memory is required to buffer the tile
The beginning of the visibility function for each pixel will be prepended with a field indicating the length, in units of number of nodes, of the function (0 for empty).
Tile width and height will be uniform, except in the case of boundary tiles, when the image dimensions are not divisible into a whole number of uniform tiles. This will work with padding as described in the TIFF standard (section 15) here.
Empty tiles should have minimal storage: perhaps an indication in the tile table in the file header that the tile is empty. Make the fileOffset 0.
We can start with the header proposed by Andrew in his Aqsis File Format specification from here. The file header should have any bits of information that are file-wide, while the tile headers contain the information that is specific to the tile.
File Header
magic 8*Uint8 fileSize Uint64 fileVersion Uint32 headerSize Uint32 dataSize Uint64 headerChunkOffset FileOffset(?) tileWidth Uint32 tileHeight Uint32 numberOfTiles Uint32 tileTable Variable size
The magic field contains the following bytes: “\0x89AqF\0x0b\0x0a\0x16\0x0a”
Perhaps the dtex/dsm file magic number will be different, maybe AqDTEX.
The tile table contains 1 entry per tile. An entry has 3 unsigned ints: (tileCoordX, tileCoordY, fileOffset). The tile table is stored into a vector of std::vectors or something similar so that in order to load a tile containing the desired image pixel, you can use something like:
loadTileForPixel(int x, int y)
{
int tileX = x / tileWidth;
int tileY = y / tileHeight;
int fileOffset = offsets[tileX][tileY];
loadTileAtOffset(fileOffset);
}
Implementation considerations
We will implement two procedures for dealing with DTEX files: one for reading and one for writing. The class for writing DTEX files will be used in the dsm display device, d_dsm.cpp. It should have a single public function to receive data from the display. It will be further responsible to create tiles from the data, create the DTEX file, create and update the headers/tables, and write it all disk. This class might live in aqsis/displays/d_dsm.
The DTEX reading procedure will consist of two abstractions:
- An interface allowing retreival of on-disk tiles from tiled images.
- A way to index those tiles at the per-pixel level, used by the texture sampling code.
So (1) is responsible for dealing with the DTEX file, but (2) is only responsible for accessing deep data from tiles, and does not worry about the DTEX file. These classes will live in aqsis/renderer/render.
