Leif Node

Voxel Cone Traced Global Illumination

Leif Erkenbrach — Mon, 04 May 2015 07:06:11 +0000

I’ve been looking at voxel cone traced global illumination for a while as something that I want to implement since it gives a decent approximation of global illumination in real time for dynamic scenes. In the past month I’ve finally given myself a chance to look at the algorithm more in-depth and try at implementing it.

Voxel Cone Traced Global Illumination

Voxel cone traced global illumination allows real-time evaluation of indirect lighting. It works by voxelizing a scene into a structure on the GPU that stores outgoing radiance and occlusion. Then the scene is rendered as normal, but cones are cast through the volume from each fragment to approximate indirect diffuse and specular lighting.

Voxelization

The first step of the algorithm is to voxelize the scene. The original implementation builds a sparse octree structure on the GPU. The octree implementation helps reduce memory usage on the GPU significantly so it is possible to voxelize the scene at higher resolution, but traversing the structure is not incredibly fast.

Instead of using a sparse voxel octree I just used a 3D texture in my implementation to simplify the cone tracing and mip mapping steps.

The OpenGL Insights chapter about voxelization using the hardware rasterizer has helpful for getting an idea of how to do voxelization. The method for averaging colors on voxels using interlocked operations was useful, but I ran into some problems when using it with DirectX.

When I’m voxelizing geometry it gets passed through the geometry shader to find the normal of triangle faces and get the dominant axis from that normal to project the triangle onto. Once the triangle is projected onto its dominant axis it gets passed through a pixel shader that writes to the target 3D texture.

I store the diffuse albedo, normal, and emisssive color in three 3D textures. All of the textures have the format RGBA8. In total, along with the radiance texture these end up taking about 350MB of memory on the GPU with 256x256x256 volumes since they are not sparse volumes so there is a lot of unused, wasted space. A 512x512x512 volume takes about 2.5 GB of GPU memory so I normally stick with the 256 volume.

Injecting Radiance

Next I render a shadow map from the light’s perspective. I then run a compute shader on the resulting depth map that unprojects each pixel back into world space and then into voxel volume coordinates. It then gets the diffuse color and normal at the position of the pixel, calculates the diffuse lighting on the voxel, and stores the result in a radiance texture.

Mip Mapping

Anisotropic voxels mip for the -Z direction

In order to do cone tracing the volume needs to be mip mapped. I mip map the volume into anisotropic voxels in the same way that the original implementation does. Anisotropic voxels store a color that varies based on the direction that the voxel gets sampled from.

I store a color for positive and negative X, Y, and Z. These values are calculated by taking the 8 values that go into the upper mip and doing volume integration on those voxels in the direction that the anisotropic voxel is for.

Since DirectX has no 3D texture array structure I just store the colors for the different axis of the anisotropic voxels in one larger 3D texture where the X axis size is extended to be 6x the size of the mipmap x dimension. This is not that terrible for memory usage since I don’t need to do this on the base volume mip, and only on the upper mip maps. Since I normally use a 256x256x256 volume this means that the first mip level will end up being 768x128x128.

Because of this, the anisotropic voxel volume is stored as a different texture than the base radiance texture.

Cone Tracing

Cone tracing is a way to approximate the result of casting many rays into a scene with a distribution on a lobe. This is done by taking samples along a ray, but as the samples get further from the ray origin the sampled mip map level increases.

Each circle is a sample along a ray; the size of the circles corresponds to the mip map sampling level

It’s not possible to get perfect spheres during sampling, but quadralinearly interpolating the sampled colors works well enough to approximate a sphere.

Each sample is accumulated based on the sample’s occlusion value and color as the cone traces outwards. Once the accumulated occlusion is close to 1.0 the cone tracing stops.

This can be seen when you’re doing cone tracing for the specular reflection cone. As the samples get further from the ray origin their sample radius increases. This makes reflections of objects close to a surface appear sharper, but appear more blurry as they get further from the surface.

Specular Cone Tracing

Tracing cones with wide apertures along the view vector reflected long surface normals to approximate specular reflections

I ended up working out a function to map the roughness value I’m using in my physically based shading implementation to a cone aperture using the GGX importance sampling function for image based lighting. So it’s pretty simple to just take the roughness and get a cone that would give a similar approximation of roughness to the image based lighting.

Only the indirect specular component with roughness to vary cone aperture

Diffuse Cone Tracing

Using cone tracing to figure out indirect diffuse lighting works the same way, but you trace multiple cones in different directions to try and approximate a hemisphere on the surface’s normal. I use 6 cones with 60 degree apertures. One points in the direction of the surface normal and 5 others circle around it.

In this stage it’s also possible to approximate ambient occlusion by using the average occlusion of all of the diffuse cones.

Final Composition

Now it’s pretty simple to just add the diffuse and specular components into the direct lighting. I have yet to go and put in the rest of the physically based rendering work at the moment though. So more specular surfaces can appear to be metallic.

Emissive Materials

Voxel cone tracing also makes it pretty simple to add direct lighting from emissive materials. This is done simply by adding the emissive color of a material to the radiance volume.

This is useful because it supports arbitrarily shaped lights with emissive colors that can vary across the surfaces of objects. This makes it possible to approximate the illumination from area lights easily.

Performance

There’s still a lot that I want to optimize. At the moment everything runs with decent frame rates, but there’s not much room left for other GPU work. I’m running these tests on my GTX 980.

When the only thing that needs to be done is cone tracing, each frame only takes about 4 ms @ 720p and 10 ms @ 1080p

The whole injection and mip mapping process takes about 4 ms on top of that so if the directional light is moving then that will be added to the total.

Re-voxelizing the entire Sponza scene also takes about 4 ms and is necessary at the moment if there are dynamic objects since I’m not flagging static geometry or anything.

Drawbacks

The most significant drawback with my implementation is that it takes a significant chunk of GPU memory to store the scene. My implementation using a volume with the resolution of 256x256x256 takes ~350MB. And that’s for a comparably very small scene. Because of this it’s not very plausible to make the volume any higher resolution than it currently is. A volume with 512 resolution takes ~2.5GB. This is mostly an issue because it becomes hard to scale this to larger scenes and maintain decent quality. Even on a small scene like Sponza with 256x256x256 sized volumes each voxel is about 10 cm wide.

Performance is also a major concern. While my implementation is not yet optimized that well, the cone tracing step performs better than the sparse octree tracing since there’s a lot less cache thrashing.

Problems at the Moment

At the moment I’m not mip mapping the radiance volume using a gaussian kernel. I started by doing this, but when I switched over to anisotropic voxels I did not get around to implementing it again. This causes a lot of banding when sampling specular and diffuse. At the moment I’m just continuing the cone until its occlusion reaches 0.999, but this ruins the occlusion on diffuse and specular so you can see color bleed through occluding geometry. Another issue that this would probably alleviate is apparent flickering in indirect diffuse and specular illumination from dynamic objects.

I got the interlocked average during voxelization converted to HLSL syntax using InterlockedCompareExchange, but once I try to do it to average values in multiple output textures it seems like the shader gets deadlocked because of scheduling.

Other Storage Methods

I did the most basic implementation by just using a plain 3D texture. There are several other ways that the voxel structure can be stored.

Sparse Voxel Octrees

The first alternative is to store voxels in a sparse octree structure. This is what the original implementation uses and allows much more effective use of memory since it does not store anything for empty areas of the volume. Though there is a performance tradeoff due to needing to traverse the tree.

Sparse Voxel DAGs

There’s an extension to the SVO technique called Sparse Voxel Directed Acyclic Graphs. These work by taking the basic sparse voxel octree and merging identical nodes together and redirecting the pointers of parent nodes to single nodes. This is capable of decreasing the memory footprint further. However, it seems like it would not work well unless you just need to store occlusion values like in the paper’s implementation. If you store more data such as diffuse albedo, then it would become much less likely to find identical child nodes to the point where it would probably not be worth the extra time to build the tree.

Cascaded Voxel Volumes

Another method is to extend the concept of cascaded shadow maps to VXGI. This has multiple volumes with identical resolution, but different scale that are centered around the player and voxelize the scene for varying levels of coverage and spacial resolution. The Tomorrow Children by Q-Games does this to get efficient coverage of large scenes on the PS4. They also stagger the updates of each cascade across frames and prioritize the one closest to the player. It seems like NVIDIA’s recent implementation of VXGI in Unreal Engine 4 also does this based on the observation that specular reflections lose quality at greater distances from the player.

Tiled Volume Textures

Finally, DirectX 11.3 and 12 bring volume tiled resources in as a feature to primarily target the memory issues while maintaining high performance. Tiled resources were first added to DirectX in 11.2 which is included with Windows 8.1, but there was no support for 3D textures. Tiled resources expose some of the virtual addressing capability that graphics cards have to allow you to load high resolution textures that would otherwise not fit on the GPU by only having part of the texture loaded on the GPU at any given time. id Tech 5 used virtual texturing for Rage which is close to the same thing as tiled resources, but it had to spend a large portion of time to texture everything since the engine had to manage all of the pages itself.

Having tiled resources as a hardware feature makes it both easier to implement, and more efficient. It allows the performance of 3D texture volume while allowing you to mark blocks of the texture as unused to maintain some of the sparseness of SVOs and DAGs. This will probably be a good enough compromise for memory, though the mapping of tiles in the volume texture needs to be done by the CPU. This means that dynamic parts of the scene would need to flag which bricks need to be marked as active, then write the list of bricks back to the CPU, then have the CPU mark the bricks as active/inactive. Reading back to the CPU takes a few frames so the dynamic objects may not get fully voxelized across the bounds of bricks until the CPU can mark the brick as used.

Where to Go From Here

First I am going to implement the correct filtering on the radiance volume so that I can get more correct-looking occlusion.

I also want to implement the sparse octree structure and tracing just to use it for comparing to other implementations on performance and memory usage. I am inclined to try a DAG with this, but I don’t think it will be that worthwhile for this application so I’m not sure if I’ll get around to that.

I really want to implement the sparse 3D texture when I can, but at the moment Microsoft has not released the public SDK for DirectX 11.3 and 12. I’m waiting for a response to my application to the early access program, but have not gotten a response in the past month so I’m not confident on that. Along with this I want to do more to manage voxelization of scene geometry so that I can mark static and dynamic geometry.

Resources

-Interactive Indirect Illumination Using Voxel Cone Tracing

-Implementing Voxel Cone Tracing

-Octree-Based Sparse Voxelization Using the GPU Hardware Rasterizer

-GigaVoxels: Ray-Guided Streaming for Efficient and Detailed Voxel Rendering

-Cascaded Voxel Cone Tracing in The Tomorrow Children

-High Resolution Sparse Voxel DAGs

Tiled Deferred Shading

Leif Erkenbrach — Sat, 02 May 2015 04:55:28 +0000

I’ve been looking at a lot of resources on current rendering algorithms to get nice looking real-time graphics and thought that it’s time that I actually go and implement some of them. This is the first project that I worked on in a series of three that I used to improve my understanding of some graphics algorithms. I’m also using physically based shading in some of these screen shots.

Deferred Shading

Deferred shading is an alternative method of doing lighting calculations for a 3D scene. The traditional method of forward shading is to go render each object to the back buffer and do lighting calculations for that object at the same time. Forward shading has been the primary method that most rasterizer-based renderers use for a long time. Standard forward rendering has the drawback that it quickly becomes hard to manage when you want to have more than a few dynamic lights affecting an object. The most common solutions are to either pass an array of lights into each shader and let the shader evaluate the shading for all lights in the list on each object, or to render the same object multiple times with additive blending for each light that affects the object.

Deferred shading goes and does as its name says. It defers the lighting calculations until all objects have been rendered, and then it goes and shades the whole scene in one pass. This is done by rendering information about each object to a set of render targets that contain data about the surface of the object this set of render targets is normally called the G-buffer.

For instance the normals of each object encoded into the 0 to 1 range as one of the render targets:

This is also done with diffuse albedo, specular color and power, depth, and emissive. Each get packed into their own textures.

At the moment I’m using textures packed with the following formats:

Diffuse: RGBA8 texture
Specular Color & Roughness: RGBA8 texture where the alpha is the roughness
World Space Normal: RGBA16 texture where alpha is currently unused
Emissive Color: RGBA16 texture
Depth: R32 texture

A while back I made a very basic implementation of deferred shading that would just render a scene to a G buffer and then draw a quad on the screen that evaluated lighting from hundreds of point lights at each fragment. This ended up running pretty poorly since it was just shading with brute force. I ended up just going with forward shading for a while and using a single directional light for many of my projects afterwards.

I looked around some and found a number of culling techniques that could significantly improve deferred rendering performance. A few involved drawing proxy geometry that approximated the bounds of each type of light and evaluating lighting by sampling from the G buffer for each fragment that the geometry touched. This can be implemented with varying complexity of proxy geometry. Some implementations just used billboarded quads with enough width and height in world space to approximate the bounds of the area that the light influences. For instance a point light would just have a quad with a width and height the same as the lights radius of influence. Other implementations actually draw 3D proxy geometry like spheres for point lights and cones for spotlights.

These implementations have the issue that they require many additional samples of the G buffer. Each light still needs to sample the G buffer for each texture that it has; in my case 5 textures. So each fragment of the G buffer gets sampled 5 * the number of lights affecting that fragment. Additionally these techniques incur a lot of overdraw since many of the proxy geometry objects will overlap and cannot be culled most of the time.

Tiled Deferred Shading

Tiled Deferred Shading allows you to avoid the overdraw and only needs to sample each G buffer texture once so it’s generally capable of performing much better than using proxy geometry. The main resource I found on tiled deferred shading was this presentation and implementation.

Tiled deferred shading takes a different approach and does all of the culling of lights in the same pass as the shading calculations as opposed to using proxy geometry to do all the shading and executing multiple times per fragment.

Tiled deferred shading splits the viewport frustum into many smaller frustums in a grid of tiles that are extended along z in view space and does the culling on each of those frustums.

This is what a grid of tiles looks like when colored by the number of lights intersecting each frustum:

In my implementation I split the screen into 16×16 tiles using a compute shader with 16x16x1 sized thread groups. Each tile contains a list of indices into a global array of lights. The tiled deferred shader starts by constructing frustums for each tile that are capped by the minimum and maximum depth stored in the 16×16 tile on the depth texture. This is why in the previous picture edges are highlighted; it’s because the frustum on those tiles has much larger variation in the minimum and maximum so the frustum has the potential to intersect more point lights.

Once the frustums are constructed, each tile checks if a light overlaps the frustum. If a light overlaps the frustum, then the light’s index in the global light list is appended to the tile’s index list which is stored in group shared memory.

Finally once the index list for the tile is constructed each thread loops through the index list and accumulates the lighting from all lights in the list.

Here’s what the Sponza scene looks like with 512 lights (I’m also using a physically based BRDF but I won’t talk about that in this post):

Performance

Performance really depends on several factors so it’s difficult to gauge the performance for all scenarios. My computer has a GTX 980 so all measurement is made using that. With my current settings I can render the Sponza scene at 1080p with 512 lights at 16 ms a frame. 256 lights generally takes 4.5 ms and 1028 takes 30 ms.

In larger scenes the performance would be much better since there will generally be many less lights intersecting any given tile.

Drawbacks

Tiled deferred shading is not a magic bullet, and has some major problems associated with it.

The first is that it can take a lot of memory, especially with higher resolution outputs. With my current buffer layout I’m being pretty greedy since I want extra precision on normals and a large range on emissive without encoding an intensity. At 1080p my G buffer takes up 55 MB on the GPU. This is less of a problem on newer GPUs and consoles with 2, 4, and 8GB of GPU memory. However, it still grows with resolution and can eat away at the texture budget. If you’re rendering at 4K then the G buffer will suddenly be 236MB.

The second and largest issue is that there’s no way to do transparency unless you store a deep G buffer with data for multiple fragments per pixel which makes the memory situation far worse. Most engines just do a forward pass after the deferred pass to render all of the transparent geometry, but it’s harder to shade the transparent geometry efficiently.

Another issue is that it’s difficult to change the shading model on geometry, so if you want to shade most of the level using a standard microfacet BRDF, then render some characters with a subsurface scattering effect on their skin, then render their eyes, then render some cars with clearcoat surfaces it’s not going to work using just basic deferred shading. With these situations I’ve seen engines do one of two things; some go and build the shading model into the tiled differed shader and dynamic branch between the models depending on a value from the G buffer, other go and do a forward rendering pass with the alternate shading model.

The final significant issue is that it’s difficult to do good quality anti-aliasing with decent performance. Most engines that use deferred shading pipelines use FXAA or some other screen space anti-aliasing solution. The problem with FXAA is that it can tend to blur details that shouldn’t be blurred. It is possible to do MSAA with deferred shading by rendering to a buffer with storage for multiple samples, and then finding the edges of objects in the G buffer and executing shading for all samples on the edges. However, this has a more significant performance hit with deferred shading than it does with forward shading because of the additional texture sampling.

Improvements to Tiled Deferred Shading

It’s possible to improve the culling of lights in various ways so that you avoid as many erroneous positive overlap checks on lights as possible. I have yet to try any of these in my implementation.

The first way is change the way that intersections tests are preformed. The standard algorithm uses simple sphere-frustum intersection tests, but it’s possible and can be more effective to use alternative intersection tests. Constructing a bounding box around the frustum and additionally checking against that can remove overlaps that are wrongly detected with the frustum-sphere test. The bounding box test also does better on its own. In regards to false-positives with the frustum tests Iñigo Quilez has a good article here.

The second way is to partition the frustum more effectively. There are several ways that people have done this. One way is the split the frustum into two frustums at the center along the depth of the frustum between the minimum and maximum depth of the frustum. Then find the minimum and maximum depths of the two smaller frustums and make two separate frustums to check against. This is what Unreal Engine 4 does and it works decently to resolve depth discontinuities. It’s also possible to take a simpler approach and just split the frustum into two lists, one for the first half, and one for the second. The correct list then gets chosen in the shading stage depending on the depth of the fragment being shaded.

Determining the minimum and maximum depth values for each tile can also be improved fairly simply. Most implementations of tiled deferred shading just brute force determining the minimum and maximum depth values by using InterlockedMin and InterlockedMax. It is possible to get small performance boost by doing parallel reduction on each tile, though it would require a separate shader.

Similar Algorithms

Tiled deferred shading can actually be fairly easily extended to resolve many of the issues with the original implementation; in exchange it requires a pre-pass over the scene to render depth. This technique is normally referred to as Forward+.

Forward+ rendering renders the scene from the viewport’s perspective to a depth buffer, then runs the same tiled culling algorithm that tiled deferred shading uses on the depth texture and builds a set of index lists in a global buffer. Then all geometry in the scene is rendered a second time and shaded. During the forward shading of the geometry, the shader determines what index list to loop through based on the screen space position of the fragment.

This algorithm generally performs better than tiled deferred shading for less than 2048 lights, but sees performance gains across the board when comparing MSAA performance between the two algorithms. It also solves the problems with transparent geometry and varying shading models.

Another more recent extension to tiled deferred shading is an algorithm called Clustered Deferred Shading. Clustered deferred shading does frustum culling in 3 dimensions instead of two by splitting the frustums along the Z axis with exponential spacing. It can also be extended similarly to work for forward shading.

Instead of doing a brute force check of every light against every cell of the frustums, it does hierarchical culling by building an octree out of the cells. It then merges cells with identical index lists and stores them in a page table. This allows for much faster and effective culling. In the paper’s implementation they are able to cull 1 million lights in under 6 ms.

Where to Go From Here

I want to go and improve the culling at some point to be more tight on what it can cull. It will probably use a combination of frustum culling and bounding box culling. Next would be implementing Forward+, but I like the abstraction that deferred shading provides so I need to figure out a good way to author shaders for Forward+ that’s not crazy complicated.

I want to also try at implementing clustered forward shading since it seems to allow crazy large numbers of lights in real time. However, the original implementation uses CUDA for the culling, and I’m not completely sure if that means that it’s impossible or just inefficient to implement in DirectX compute shaders. Some other implementations have done the culling on the CPU. I need to look more at the implementation to see if it’s plausible to do in compute shaders, otherwise I’ll probably just do the CUDA implementation and interop with DirectX since I’ve looked at CUDA before and it does not seem that complex.

Useful Resources

Andrew Lauritzen - Deferred Rendering for Current and Future Rendering Pipelines

Gareth Thomas - Advancements In Tiled Rendering

Ola Olsson, Markus Billeter, and Ulf Assarsson - Clustered Deferred and Forward Shading

Video Reel

Leif Erkenbrach — Tue, 28 Apr 2015 00:17:15 +0000

It’s been a while since I’ve done any updates, and I need to write some updates on specifics of stuff I’ve been doing. I put together a reel of some of my older projects along with some of my more recent projects that I have not had a chance to update about for a show my college was doing so I’ll post that here.

Most of my work over the past four months has been on learning and implementing some more advanced shading techniques so I did some work on tiled deferred shading, physically based shading, and voxel cone traced global illumination.

LEAP Motion + Oculus Rift Particle Interactions

Leif Erkenbrach — Wed, 15 Oct 2014 09:34:17 +0000

I recently put together a demo for the art hop in Burlington. This demo renders 2.5 million particles and colors them based upon their velocity.

This demo was written in C++ using Direct3D 11. The particles are simulated on a compute shader that I pass finger positions reported by the LEAP motion in world space. The LEAP is velcroed to the front of my Oculus Rift DK1 (waiting on my DK2). The finger positions are used as attractors for the particles. I ended up using no distance falloff on the force that the fingers exert on particles. Instead of using just the tips of fingers as attractors I use all finger joints reported by the LEAP v2 SDK and give more weight to the attractors as they move further up the fingers.

The hands in the picture are my own hands captured using the infrared cameras on the LEAP motion which give a black and white video feed. The hands are rendered by just putting a textured quad that is offset on the Z axis in camera space using the palm position reported by the LEAP. The black and white color is then shifted to be closer to my skin color and any pixels with a brightness below 0.9 are clipped. This is a quick hack that works decently. I want to try rendering screen aligned quads spanning between each of the joints on my fingers using screen space texture coordinates. This will probably provide somewhat higher occlusion quality.

Lining up the images of the hands to match the positions reported by the LEAP was somewhat difficult, especially since most of the sample code provided by LEAP for the infrared image integration is pretty obfuscated by a lot of complex code to manually attempt to account for the Oculus Rift’s chromatic aberration shaders, distortion shaders, and other stuff like color grading. It’s still not quite accurate and because of how I’m attempting to compensate for the difference of IPD between the stereo infrared cameras in the LEAP and my eyes (about a 30mm difference) if the video feed is shown without clipping all of the video apart from the hands appears to distort when you move your hands close to your face. It’s actually a pretty interesting effect since visually it seems like the hands are the only things unaffected by the warping.

Something I forgot to do in this demo was twirl my fingers which gives a pretty cool effect.

Game Engine Development

Leif Erkenbrach — Mon, 25 Aug 2014 22:10:36 +0000

I’ve been spending some of my free time researching and implementing rendering techniques (mostly researching as of this moment). I am currently focusing on developing a game engine and game that utilize the Oculus Rift and the Sixense STEM System. I have posted some videos of what I’m doing in my YouTube channel:

Compute Texturing

Leif Erkenbrach — Tue, 22 Apr 2014 14:25:16 +0000

Originally I was computing the height of terrain per frame which was pretty wasteful since if I wanted more complex noise functions I would not be able to maintain a runnable frame rate. I moved the computation of terrain height to a compute shader which stores the height in a texture. I was also able to compute and store normals in the rgb space of the texture.

In order to create a unique texture for each node I create a cache that holds all of the textures that a node may use in a map using an unsigned long long to give each texture a unique ID.

As the tree is traversed downwards, the nextId is passed to the next recursive call and used as the currentId. The QUADRANT_ID is an integer from 0 to 3 which identifies the quadrant that is being split. The ID is added onto using bit shifting using 3 bits to describe each LOD level where the first two bits are used to describe the quadrant and the last bit is used to give each node a unique ID even if it has child nodes.

At some point I will probably switch to just holding a tree structure and splitting/joining nodes every frame in order to remove the need for unique IDs since at the moment the maximum number of levels of detail that can be stored is 21 because long long‘s only have 64 bits to work with.

The terrain height map is generated using the same fractional Brownian motion function that I described in my earlier post. The calculated height is stored in a 2D shared float array in the compute shader which is the size of each group. Prior to the normal calculation, the height value is stored in the shared float array named workGroupHeight.

workGroupHeight[gl_LocalInvocationID.x][gl_LocalInvocationID.y] = heightValue;

The normal calculation in this step is done in a second pass on the compute shader that generates out the height map. Once the height values are calculated and saved to a temporary 2D array of floats I call the GLSL function barrier(). Since GPUs are highly parallel, the processing for each pixel of the height map is separated into its own thread so it is possible for some threads to finish sooner than others. This is problematic since the normal computation step requires knowledge of the neighboring pixels’ height and if other threads query the neighboring pixel values before they have been computed in other threads then you can get sections of pixels that do not have correct normals. The call of the function barrier() makes all of the threads wait once they reach the barrier until every other thread has finished, then the program continues in parallel. I was not calling the correct barrier function early on, and instead used one of the memory barriers which resulted in a bar code pattern of normals that faced toward the center of the planet and normals that were correctly computed.

I calculate the normals by using the cross product of the point that the shader is currently working on and the vectors from this point to its two neighbors at (x+1, y) and (x, y+1) on the texture. Once the normals have been calculated I save the final normal and height values to an rgba32f texture using the GLSL function imageStore().

The normals that I generate in this step are stored in object space since this texture will only be used on this planet in this orientation. This has the advantage that I don’t need to work with tangent space and can just directly read the normal from the texture’s rgb, move the value stored in the texture to -1.0 – 1.0 from the 0.0-1.0 value stored in the texture, and use that resulting vector.

Planetary Scale LOD Terrain Generation

Leif Erkenbrach — Tue, 22 Apr 2014 12:51:07 +0000

In the week following my procedural terrain I added a dynamic quadtree-based LOD system in order to be capable of rendering large scale terrain at distances ranging from several kilometers to a few meters.

At this point I ran into the issue that floating point is not accurate enough to accomplish centimeter precision at distances of several hundred thousand meters from 0.0, 0.0, 0.0. I still need to move the height calculations into view space so that they do not lose accuracy.

In order to determine what level of detail I need to use for a given section of terrain I precalculate a list of distances that each LOD level will be within. I do this by starting with the distance that I want to be at before I can see the most detailed tiles. I normally have this base distance at around 50 meters. Then for every level of detail up to the maximum depth that I set, I double this initial value to 100, 200, 400 and so on. Then from the root node I check if the center of the plane is within 50*2^(max level of detail) distance from the camera. If it is then I split the tree into four subsections and check if their center’s distance to the camera is less than 50*2^(max level of detail – 1), 50*2^(max level of detail – 2), … , 50*2^(max level of detail – n) until I have checked the entirety of the tree. This is done every frame and all tiles that are in range get stored in a vector that includes the level of detail that the tile is at, the scale of the tile which is equivalent to 0.5^(node depth), and the offset of the tile which is summed together as the tree is traversed downwards.

I generate a single plane which I draw multiple times with different parameters in order to cover the entire quad tree. This plane has four separate index buffers that specify the indices for four subsections of the mesh that constitute its four quadrants. If a quadrant is within range of the player then it gets split into another node and the indices that constitute that area of the parent node are skipped during drawing.

Once the tree has been traversed I draw the mesh for each tile stored in the list of visible tiles. The shaders that draw this take in the offset and scale to transform the plane to fit into its section. After the plane is scaled and offset I project each point onto a sphere using the equation here. Each point is then used as input in a fractional Brownian motion function that uses 3D simplex noise in order to determine the height of the terrain at that point on the unit sphere.

All of the meshes are generated per frame and the program runs at ~200 fps on my GTX 480.

Procedural Fractal Terrain

Leif Erkenbrach — Tue, 22 Apr 2014 10:34:14 +0000

After doing atmospheric scattering I though that it would be an interesting challenge to make a full scale planet renderer. This is what I started with when looking to generate the terrain. The terrain height and color is determined through a height map that I generate in the vertex and fragment shaders.

The height map for another piece of terrain

I generated the height map using the 2D simplex noise implementation from webgl-noise implemented in the vertex and fragment shaders. Simplex noise is similar to Perlin noise in how it looks visually and gives a slight performance increase when using 2D and 3D noise functions. Alone, simplex noise would make for pretty boring terrain. However, if you add two or more functions of simplex noise together you start to get some more detail. This is called Fractional Brownian Motion, and is what I have used to create these fairly detailed height maps.

This what the factional Brownian motion function that I use looks like.

float fbm(vec3 x, float initialFrequency, float lacunarity, float gain, int octaves)
{
	float total = 0.0f;
	float frequency = initialFrequency;
	float amplitude = gain;

	for (int i = 0; i < octaves; ++i)
	{
		total += simplexNoise(x * frequency) * amplitude;
		frequency *= lacunarity;
		amplitude *= gain;
	}

	return total;
}

Atmospheric Scattering, Skybox, and Logarithmic Depth Buffering

Leif Erkenbrach — Tue, 22 Apr 2014 10:06:44 +0000

A few months ago I decided to implement atmospheric scattering in my OpenGL renderer in order to create a day/night cycle.

This turns out to be a fairly cheap effect to calculate if you do not need it to be completely accurate of implement multiscattering. One of the early GPU gems books had an implementation of atmospheric scattering that creates an accurate enough representation. The book is now online and the article can be found here.

This article comes with sample shader code for the implementation, but this ended up raising some larger issues in regards to the depth buffer. In order to keep the atmosphere to scale I wanted to render a planetary scale sphere. In order to do this I increased the distance of the far clipping plane. Since the near plane of the projection frustum is situated at 1 meter away from the camera and the far plane is 300,000km away pretty much everything in the frustum was flickering in and out of view. Instead of using a linear depth buffer like I was, I switched to using a logarithmic depth buffer which allows for a far greater range by concentrating most of the accuracy of the depth buffer closer to the near clipping plane. I used the vertex shader implementation demonstrated on the Outerra Blog and this fixed the flickering issues that I was having.

Finally in order to make it seem like a planet in space, I needed to implement a space sky box. The most trouble that I had with this was locating the tools to create the cube map that I would use for the sky box texture. I ended up finding a great tool named Spacescape that I could use to design space sky boxes (although I ended using one of the included samples). In order to get the generated .png files into a cube map I used ATI CubeMapGen. This allowed me to conveniently load in each .png and place it in its correct position, then export the cube map as a .dds file. I did not have much trouble from there since I had already loaded .dds files in previous work I had done. I just needed to render a cube with inverse winding order without any translation based on the view.

Networked Map Viewer

Leif Erkenbrach — Sat, 15 Mar 2014 12:59:30 +0000

This application was made using XNA embedded in a Winforms project. The original tutorial for embedding XNA in Windows Forms can be found here. I needed to make some modifications to the ContentBuilder.cs included with the demo project to allow it to compile shaders and load fonts.

I retrieved the map textures from the Google maps API. The height values were retrieved from the Bing maps API since the limit for maximum number of height queries per day on Bing maps is much higher than Google or many other alternatives. In addition to the benefit of a high limit on queries, the Bing maps API allows me to query height values in a grid by defining the four corners of a patch with latitude and longitude instead of the alternative of querying based on a large list that contains each vertex’s lat/lon.

The main concern when building a mapping application like this was getting the height values retrieved from Bing maps to line up with the textures from Google maps. Google maps receives image requests by taking a single latitude, longitude, and zoom level. The zoom level exponentially shrinks the area of coverage of a map. so if a zoom level 1 covers a 1000x1000km square then zoom level 2 will cover a 500x500km square and the next will cover 250x250km. Bing maps differs in that it takes in a list of four latitudes and longitudes along with two integers specifying the dimensions of the grid to query. Unfortunately it is not quite as easy as querying the center point + half the coverage size of the Google maps tile. This is because we must account for the curvature of the Earth as illustrated in the figure below. This is done by projecting the latitude and longitude from cylindrical space to spherical space using the Mercator projection. This is explained in further detail at Wolfram.

Once I have the latitudes and longitudes of the points that I need to query I create an HttpWebRequest and append the latitude and longitude to the REST API url. I then send off this request in a separate thread so that rendering does not hang while tiles are loading.

Here’s a demo of the application

I have attached the project files below. The included binaries work correctly, however I have omitted my Bing maps API key from the source so if you wish to compile the project and retrieve the height data you will need to sign up for a key. There are instructions for doing so here.

REST Map Viewer.zip