Screen-space Water Rendering

The latest in the series of tech art / rendering problems I’ve been working on is finding a good solution to rendering water. Specifically, rendering narrow, fast-moving particle-based streams of water. Over the past week, I think I’ve gotten some good results, so I’m going to write it up here.

I don’t really like voxelized / marching cubes approaches to water rendering (see, for example, Blender’s fluid simulation rendering). When the water volume is on a similar scale to the underlying grid used to render, the motion is noticeably discrete. This can be addressed by increasing the grid resolution, but for a narrow stream over relatively long distances this simply isn’t practical in realtime without serious compromises in runtime and memory usage. (There is some precedent using sparse octree structures in voxel rendering, which improves this. I’m not sure how well that works for dynamic systems, and in any case it’s more complexity than I want to deal with.)

The first alternative I looked at was Müller’s “Screen Space Meshes”. This involves rendering the water particles to a depth buffer, smoothing it, identifying connected patches of similar depth, and then building a mesh from the result using marching squares. This is probably *more* feasible now than it was in 2007 (as you can reasonably build the mesh in a compute shader), but still more complexity and cost than I’d like.

Finally, I stumbled upon Simon Green’s 2010 GDC presentation, “Screen Space Fluid Rendering For Games”. This begins the same way Screen Space Meshes does: render particles to a depth buffer and smooth it. But rather than build a mesh, use the resulting buffer to shade and composite the liquid over the main scene (by explicitly writing depth.) This was what I decided to implement.


The past few projects in Unity have taught me not to fight its rendering conventions. So, the fluid buffers are rendered by a second camera, with lower camera depth so that it renders before the main scene. Any fluid system lives on a separate render layer; primary camera excludes the fluid layer, while the second camera only renders the fluid. Both cameras are then parented to an empty object to keep them aligned.

This setup means I can render pretty much anything to the fluid layer, and it should work as expected. In the context of my demo scene, this means that multiple streams and splashes from subemitters can merge together. This should also allow for mixing in other water systems, like heightfield-based volumes, which will then be rendered consistently. (I have not tested this yet.)

The water source in my test scene is a standard particle system. No actual fluid simulation is being performed. This in turn means that particles overlap in a not-quite-physical way, but the final shading looks acceptable in practice.

Fluid Buffer Rendering

The first step in this technique is rendering the base fluid buffer. This is an offscreen buffer that contains (at this point in my implementation): fluid thickness, screen-space motion vector, and a noise value. We also render a depth buffer, explicitly writing depth from the fragment shader to turn each particle quad into a spherical (well, ellipsoidal) blob.

The depth and thickness calculations are straightforward:

frag_out o;

float3 N;
N.xy = i.uv*2.0 - 1.0;
float r2 = dot(N.xy, N.xy);
if (r2 > 1.0) discard;

N.z = sqrt(1.0 - r2);

float4 pixel_pos = float4(i.view_pos + N * i.size, 1.0);
float4 clip_pos = mul(UNITY_MATRIX_P, pixel_pos);
float depth = clip_pos.z / clip_pos.w;

o.depth = depth;

float thick = N.z * i.size * 2;

(The depth calculation can be simplified, of course; we only need z and w from the clip position.)

We’ll come back to the fragment shader for motion vectors and noise in a bit.

The vertex shader is where the fun starts, and where I diverge from Green’s technique. The goal of this project is rendering high-speed streams of water; this is possible with spherical particles, but takes an inordinate number of them to produce a continuous stream. Instead, I stretch the particle quads based on velocity, which in turn stretches the depth blobs to be elliptical, not spherical. (Since the depth calculation is based on UV, which is unchanged, this Just Works.)

Experienced Unity developers may be wondering why I’m not using the built-in Stretched Billboard mode provided by Unity’s particle system. Stretched Billboard stretches unconditionally along the velocity vector in world space. In general this is fine, however it causes a very noticeable problem when the velocity vector is aligned to (or very close to) the camera’s forward vector. The billboard stretches *into* the screen, very clearly highlighting its 2D nature.

Instead, I use a camera-facing billboard and project the velocity vector onto the plane of the particle, using that to stretch the quad. If the velocity vector is perpendicular to the plane (into or out of the screen), the particle stays unstretched and spherical, exactly as it should, while velocity across the screen causes the particle to stretch in that direction, as expected.

Long explanation aside, it’s quite a simple function:

float3 ComputeStretchedVertex(float3 p_world, float3 c_world, float3 vdir_world, float stretch_amount)
    float3 center_offset = p_world - c_world;
    float3 stretch_offset = dot(center_offset, vdir_world) * vdir_world;

    return p_world + stretch_offset * lerp(0.25f, 3.0f, stretch_amount);

In order to calculate the screenspace motion vector, we actually compute two sets of vertex positions:

float3 vp1 = ComputeStretchedVertex(
    velocity_dir_w, rand);
float3 vp0 = ComputeStretchedVertex(
    vertex_wp - velocity_w * unity_DeltaTime.x,
    center_wp - velocity_w * unity_DeltaTime.x,
    velocity_dir_w, rand);

o.motion_0 = mul(_LastVP, float4(vp0, 1.0));
o.motion_1 = mul(_CurrVP, float4(vp1, 1.0));

Note that, since we are calculating motion vectors in the main pass, rather than the motion vector pass, Unity will not provide the previous or non-jittered current view-projection for you. I attached a simple script to the relevant particle systems to fix this:

public class ScreenspaceLiquidRenderer : MonoBehaviour 
    public Camera LiquidCamera;

    private ParticleSystemRenderer m_ParticleRenderer;
    private bool m_First;
    private Matrix4x4 m_PreviousVP;

    void Start()
        m_ParticleRenderer = GetComponent();
        m_First = true;

    void OnWillRenderObject()
        Matrix4x4 current_vp = LiquidCamera.nonJitteredProjectionMatrix * LiquidCamera.worldToCameraMatrix;
        if (m_First)
            m_PreviousVP = current_vp;
            m_First = false;
        m_ParticleRenderer.material.SetMatrix("_LastVP", GL.GetGPUProjectionMatrix(m_PreviousVP, true));
        m_ParticleRenderer.material.SetMatrix("_CurrVP", GL.GetGPUProjectionMatrix(current_vp, true));
        m_PreviousVP = current_vp;

I cache the previous matrix manually because Camera.previousViewProjectionMatrix gives incorrect results.


(This method also breaks render batching; it may make more sense in practice to set global matrix constants instead of per-material.)

Back in the fragment shader, we use these projected positions to calculate screenspace motion vectors:

float3 hp0 = / i.motion_0.w;
float3 hp1 = / i.motion_1.w;

float2 vp0 = (hp0.xy + 1) / 2;
float2 vp1 = (hp1.xy + 1) / 2;

vp0.y = 1.0 - vp0.y;
vp1.y = 1.0 - vp1.y;

float2 vel = vp1 - vp0;

(The motion vector calculations are taken almost verbatim from

Finally, the last value in the fluid buffer is noise. I use a per-particle stable random value to select one of four noises (packed into a single texture). This is then scaled by the speed and one minus the particle size (so faster and smaller particles are noiser). This noise value is used in the shading pass to distort normals and add a layer of foam. Green’s presentation uses 3-channel white noise, while a followup paper (“Screen Space Fluid Rendering with Curvature Flow”) suggests octaves of Perlin noise. I use Voronoi/cellular noise at various scales:

Blending Problems (And Workarounds)

Here is where the first problems with my implementation emerge. In order to calculate thickness correctly, particles are blended additively. Because blending affects all the outputs, this means the noise and motion vectors are also additively blended. Additive noise is fine; additive motion vectors are *not*, and left as-is produce ridiculous results for TAA and motion blur. To counteract this, I multiply the motion vectors by thickness when rendering the fluid buffer, and divide by total thickness in the shading pass. This produces a weighted average motion vector for all overlapping particles; not exactly what we want (produces some weirdness where streams cross each other), but acceptable.

A more difficult problem is depth; in order to render the depth buffer correctly, we need both depth writes and depth tests active. This can cause problems if our particles are unsorted (as differences in rendering order can cause particles behind others to have their output culled). So, we tell Unity’s particle system to keep particles ordered by depth, then cross our fingers and hope systems render by depth as well. There *will* be cases when systems overlap (i.e, two streams of particles intersecting) that will not be handled correctly, which will result in a lower thickness than expected. This doesn’t seem to happen very often in practice, and doesn’t make a huge difference in appearance when it does.

The proper approach would likely be rendering the depth and color buffers completely separately — at the cost of now having to render two passes. Something to investigate when tuning this.

Depth Smoothing

Finally, the meat of Green’s technique. We’ve rendered a bunch of spherical blobs to the depth buffer, but water is not “blobby” in reality. So now, we take this approximation and blur it to more closely match how we expect the surface of a liquid to look.

The naive approach is to apply a Gaussian blur to the entire depth buffer. This produces weird results — it smooths distant points more than close ones, and blurs over silhouette edges. Instead, we can vary the radius of the blur by depth, and use a bilateral blur to preserve edges.

There’s just one problem: these adjustments make the blur non-separable. A separable blur can be performed in two passes, blurring horizontally and then vertically. A non-separable blur is performed in a single pass, blurring both horizontally and vertically at the same time. The distinction is important because separable blurs scale linearly (O(w) + O(h)), while non-separable blurs scale quadratically (O(w*h)). Large non-separable blurs quickly become impractical.

As mature, responsible engineers, we do the obvious thing: plug our ears, pretend the bilateral blur *is* separable, and implement it with separate horizontal and vertical passes anyway.

Green demonstrates in his presentation that, while this *does* generate artifacts in the result (particularly when reconstructing normals), the final shading hides them well. With the narrower streams of water I’m generating, these artifacts seem to be even less common, and don’t have much effect on the result.


At last, the fluid buffer is done. Now for the second half of the effect: shading and compositing with the main image.

Here we run into a number of Unity’s rendering limitations. I choose to light the water with the sun light and skybox only; supporting additional lights requires either multiple passes (expensive!) or building a GPU-side light lookup structure (expensive and somewhat complicated). In addition, since Unity does not expose access to shadow maps, and directional lights use screenspace shadows (based on the depth buffer rendered by opaque geometry), we don’t actually have access to shadow information for the sun light. It’s possible to attach a command buffer to the sun light to build us a screenspace shadow map specifically for the water, but I haven’t yet.

The final shading pass is driven via script, and uses a command buffer to dispatch the actual draw calls. This is *required*, as the motion vector texture (used for TAA and motion blur) cannot actually be bound for direct rendering using Graphics.SetRenderTarget(). In a script attached to the main camera:

void Start() 
    m_QuadMesh = new Mesh();
    m_QuadMesh.subMeshCount = 1;
    m_QuadMesh.vertices = new Vector3[] { 
        new Vector3(0, 0, 0.1f),
        new Vector3(1, 0, 0.1f),
        new Vector3(1, 1, 0.1f),
        new Vector3(0, 1, 0.1f),
    m_QuadMesh.uv = new Vector2[] {
        new Vector2(0, 0),
        new Vector2(1, 0),
        new Vector2(1, 1),
        new Vector2(0, 1),
    m_QuadMesh.triangles = new int[] {
        0, 1, 2, 0, 2, 3,

    m_CommandBuffer = new CommandBuffer();
            Matrix4x4.Ortho(0, 1, 0, 1, -1, 100), false));
        m_QuadMesh, Matrix4x4.identity, m_Mat, 0, 

        m_QuadMesh, Matrix4x4.identity, m_Mat, 0, 

The color and motion vector buffers cannot be rendered simultaneously with MRT, for reasons I have not been able to determine. They also require different depth buffer bindings. We write depth to *both* of these depth buffers, incidentally, so that TAA reprojection works properly.

Ah, the joys of a black-box engine.

Each frame, we kick off the composite render from OnPostRender():

RenderTexture GenerateRefractionTexture()
    RenderTexture result = RenderTexture.GetTemporary(m_MainCamera.activeTexture.descriptor);
    Graphics.Blit(m_MainCamera.activeTexture, result);
    return result;

void OnPostRender()
    if (ScreenspaceLiquidCamera && ScreenspaceLiquidCamera.IsReady())
        RenderTexture refraction_texture = GenerateRefractionTexture();

        m_Mat.SetTexture("_MainTex", ScreenspaceLiquidCamera.GetColorBuffer());
        m_Mat.SetVector("_MainTex_TexelSize", ScreenspaceLiquidCamera.GetTexelSize());
        m_Mat.SetTexture("_LiquidRefractTexture", refraction_texture);
        m_Mat.SetTexture("_MainDepth", ScreenspaceLiquidCamera.GetDepthBuffer());
        m_Mat.SetMatrix("_DepthViewFromClip", ScreenspaceLiquidCamera.GetProjection().inverse);
        if (SunLight)
            m_Mat.SetVector("_SunDir", transform.InverseTransformVector(-SunLight.transform.forward));
            m_Mat.SetColor("_SunColor", SunLight.color * SunLight.intensity);
            m_Mat.SetVector("_SunDir", transform.InverseTransformVector(new Vector3(0, 1, 0)));
            m_Mat.SetColor("_SunColor", Color.white);
        m_Mat.SetTexture("_ReflectionProbe", ReflectionProbe.defaultTexture);
        m_Mat.SetVector("_ReflectionProbe_HDR", ReflectionProbe.defaultTextureHDRDecodeValues);



That’s the end of the CPU’s involvement, it’s all shaders from here on.

Let’s start with the motion vector pass. Here’s the entire shader:

#include "UnityCG.cginc"

sampler2D _MainDepth;
sampler2D _MainTex;

struct appdata
    float4 vertex : POSITION;
    float2 uv : TEXCOORD0;

struct v2f
    float2 uv : TEXCOORD0;
    float4 vertex : SV_POSITION;

v2f vert(appdata v)
    v2f o;
    o.vertex = mul(UNITY_MATRIX_P, v.vertex);
    o.uv = v.uv;
    return o;

struct frag_out
    float4 color : SV_Target;
    float depth : SV_Depth;

frag_out frag(v2f i)
    frag_out o;

    float4 fluid = tex2D(_MainTex, i.uv);
    if (fluid.a == 0) discard;
    o.depth = tex2D(_MainDepth, i.uv).r;

    float2 vel = / fluid.a;

    o.color = float4(vel, 0, 1);
    return o;

Screenspace velocity is stored in the green and blue channels of the fluid buffer. Since we scaled velocity by thickness when rendering the buffer, we divide the total thickness (in the alpha channel) back out, to produce the weighted average velocity.

It’s worth noting that, for larger volumes of water, we may want a different approach to dealing with the velocity buffer. Since we render without blending, motion vectors for anything *behind* the water are lost, breaking TAA and motion blur for those objects. This isn’t an issue for thin streams of water, but may be for something like a pool or lake, where we want TAA or motion blur for objects clearly visible through the surface.

The main shading pass is more interesting. Our first order of business, after masking with the fluid thickness, is reconstructing the view-space position and normal.

float3 ViewPosition(float2 uv)
    float clip_z = tex2D(_MainDepth, uv).r;
    float clip_x = uv.x * 2.0 - 1.0;
    float clip_y = 1.0 - uv.y * 2.0;

    float4 clip_p = float4(clip_x, clip_y, clip_z, 1.0);
    float4 view_p = mul(_DepthViewFromClip, clip_p);
    return ( / view_p.w);

float3 ReconstructNormal(float2 uv, float3 vp11)
    float3 vp12 = ViewPosition(uv + _MainTex_TexelSize.xy * float2(0, 1));
    float3 vp10 = ViewPosition(uv + _MainTex_TexelSize.xy * float2(0, -1));
    float3 vp21 = ViewPosition(uv + _MainTex_TexelSize.xy * float2(1, 0));
    float3 vp01 = ViewPosition(uv + _MainTex_TexelSize.xy * float2(-1, 0));

    float3 dvpdx0 = vp11 - vp12;
    float3 dvpdx1 = vp10 - vp11;

    float3 dvpdy0 = vp11 - vp21;
    float3 dvpdy1 = vp01 - vp11;

    // Pick the closest
    float3 dvpdx = dot(dvpdx0, dvpdx0) > dot(dvpdx1, dvpdx1) ? dvpdx1 : dvpdx0;
    float3 dvpdy = dot(dvpdy0, dvpdy0) > dot(dvpdy1, dvpdy1) ? dvpdy1 : dvpdy0;

    return normalize(cross(dvpdy, dvpdx));

This is the expensive way to do view position reconstruction: take the clip-space position, and unproject it.

Once we have a way to reconstruct positions, normals are easy: calculate position for the adjacent points in the depth buffer, and build a tangent basis from there. In order to deal with silhouette edges, we sample in both directions, and pick the closer point in view space to use for normal reconstruction. This works *surprisingly well*, only failing with very thin objects.

It does mean we do five separate unprojections per pixel (the current point and its four neighbors). There is a cheaper way; this post is getting quite long as it is so I’ll leave that for another day.

The resulting normals:

I distort this calculated normal with the derivatives of the fluid buffer’s noise value, scaled by a strength parameter and normalized by dividing out thickness (for the same reason as velocity):

N.xy += NoiseDerivatives(i.uv, fluid.r) * (_NoiseStrength / fluid.a);
N = normalize(N);

Finally, it’s time for some actual shading. The water shading has three main parts: specular reflection, specular refraction, and foam.

The reflection term is standard GGX, lifted wholesale from Unity’s standard shader. (With one correction: using the proper F0 for water of 2%.)

Refraction is more interesting. Correct refraction requires raytracing (or raymarching, for a close estimate). Fortunately, refraction is less intuitive than reflection, and being incorrect is not as noticeable. So, we offset the UV sample for the refraction texture by the x and y of the normal, scaled by thickness and a strength parameter:

float aspect = _MainTex_TexelSize.y * _MainTex_TexelSize.z;
float2 refract_uv = (i.grab_pos.xy + N.xy * float2(1, -aspect) * fluid.a * _RefractionMultiplier) / i.grab_pos.w;
float4 refract_color = tex2D(_LiquidRefractTexture, refract_uv);

(Note the aspect correction; not necessarily *needed* — again, this is just an approximation — but simple enough to add.)

This refracted light passes through the liquid, and so some of it gets absorbed:

float3 water_color = _AbsorptionColor.rgb * _AbsorptionIntensity;
refract_color.rgb *= exp(-water_color * fluid.a);

Note that _AbsorptionColor is defined exactly backwards from what you may expect: the values for each channel indicate how much of it is *absorbed*, not how much is let through. So, an _AbsorptionColor of (1, 0, 0) results in teal, not red.

The reflection and refraction are blended with Fresnel:

float spec_blend = lerp(0.02, 1.0, pow(1.0 - ldoth, 5));
float4 clear_color = lerp(refract_color, spec, spec_blend);

Up to this point I’ve (mostly) played by the rules and used physically-based shading.

While it’s decent, there’s a problem with this water. It’s a bit hard to see:

Let’s add some foam to fix that.

Foam appears when water is turbulent, and air mixes into the water forming bubbles. These bubbles cause a variety of reflections and refractions, which on the whole create a diffuse appearance. I’m going to model this with a wrapped diffuse term:

float3 foam_color = _SunColor * saturate((dot(N, L)*0.25f + 0.25f));

This gets added to the final color using an ad-hoc term based on the fluid noise and a softer Fresnel term:

float foam_blend = saturate(fluid.r * _NoiseStrength) * lerp(0.05f, 0.5f, pow(1.0f - ndotv, 3));
clear_color.rgb += foam_color * saturate(foam_blend);

The wrapped diffuse has been normalized to be energy conserving, so that’s acceptable as an approximation of scattering. Blending the foam color additively is… less so. It’s a pretty blatant violation of energy conservation.

But it looks good, and it makes the stream more visible:

Further Work and Improvements

There are a number of things that can be improved upon here.

– Multiple colors. Currently, the absorption is calculated only in the final shading pass, and uses a constant color and intensity for all liquid on the screen. Supporting different colors is possible, but requires a second color buffer and solving the absorption integral piecewise per-particle as we render the base fluid buffer. This is potentially expensive.

– Complete lighting. With access to a GPU-side light lookup structure (either built manually or by tying into Unity’s new HD render pipeline), we could properly light the water with arbitrary numbers of lights, and proper ambient.

– Better refraction. By using blurred mipmaps of the background texture, we can better simulate the wider refraction lobe for rough surfaces. In practice this isn’t very helpful for small streams of water, but may be for larger volumes.

Given the chance I will continue tweaking this into oblivion, however for now I am calling it complete.

Texture-space Decals

I wanted to write up an interesting method of projected decal rendering I’ve been working on recently.


The basic idea is (relatively) straightforward. To place a decal onto a mesh:
– Render the decal into a texture, projected onto the mesh’s UVs.
– Render the mesh using the decal texture as an overlay, mask, etc.

Naturally, the details get a bit more complex.

(I implemented this in Unity; unlike the rest of this blog, code samples will be using C# and Cg.)

Texture-space rendering

Rendering into texture space is not a new technique, but it is rather uncommon. (The canonical runtime usage is subsurface scattering: render diffuse lighting into a texture, blur it, and sample the result in the main shading pass.)

It is simpler than it may seem: we base the vertex shader output on the mesh UV coordinates, rather than positions:

struct appdata
    float4 vertex : POSITION;
    float2 uv : TEXCOORD0;

struct v2f
    float4 vertex : SV_POSITION;

v2f vert (appdata v)
    v2f o;
    // Remap UV from [0,1] to [-1,1] clip space (inverting Y along the way.)
    o.vertex = float4(v.uv.x * 2.0 - 1.0, 1.0 - v.uv.y * 2.0, 0.0, 1.0);
    return o;

(The Y coordinate of UVs is inverted here; this may vary depending on your renderer.)

In order to project the decal into the texture, we add a matrix uniform for the projection and a texture sampler for the decal texture itself. The vertex shader then projects the vertex position using this matrix, which gives us a UV coordinate for sampling the decal texture.

(Yes, input UVs become output positions and input positions become output UVs here. It’s a bit confusing.)

The fragment shader takes this projected UV coordinate, and performs a range check. If every component is within the range [0, 1], then the we sample the decal texture. Otherwise, the fragment is outside the projection and the shader discards the fragment.

struct appdata
    float4 vertex : POSITION;
    float2 uv : TEXCOORD0;

struct v2f
    float3 uvw0 : TEXCOORD0;
    float4 vertex : SV_POSITION;

sampler2D _MainTex;
float4 _MainTex_ST;
float4 _Color;

float4x4 _DecalProjection0;

v2f vert (appdata v)
    v2f o;
    // Remap UV from [0,1] to [-1,1] clip space (inverting Y along the way.)
    o.vertex = float4(v.uv.x * 2.0 - 1.0, 1.0 - v.uv.y * 2.0, 0.0, 1.0);

    float4 world_vert = mul(unity_ObjectToWorld, v.vertex);
    // We assume the decal projection is orthographic, and omit the divide by w here.
    o.uvw0 = mul(_DecalProjection0, world_vert).xyz * 0.5 + 0.5;
    return o;

bool CheckBox(float3 uvw)
    return !(uvw.x < 0.0 || uvw.y < 0.0 || uvw.z  1.0 || uvw.y > 1.0 || uvw.z > 1.0);

fixed4 frag (v2f i) : SV_Target
    bool in0 = CheckBox(i.uvw0);

    // Don't touch this pixel if we're outside the decal bounds.
    if (!in0) discard;

    // In this instance, _MainTex contains an alpha mask for the decal.
    return tex2D(_MainTex, i.uvw0.xy).a * _Color;

(While this shader uses the world-space position, using object-space positions will give better results in practice, as it will maintain precision for objects far away from the world origin.)

On the CPU side, we drive the decal rendering from script:

public void SplatDecal(Material decal_material, Matrix4x4 decal_projection)
    int pass_idx = decal_material.FindPass("DECALSPLATPASS");
    if (pass_idx == -1)
        Debug.LogFormat("Shader for decal splat material '{0}' does not have a pass for DecalSplatPass.",;

    RenderTexture active_rt =;
    decal_material.SetMatrix("_DecalProjection0", decal_projection);
    if (!decal_material.SetPass(pass_idx))
         Debug.Log("Decal splat material SetPass failed.");
    Graphics.DrawMeshNow(m_Mesh, transform.position, transform.rotation);

Producing the decal projection matrix is an interesting problem in and of itself, but I’m not going to get into detail here. In general, we want an orthographic projection matrix corresponding to the oriented bounding box of the decal in world space. (Again, in a production setting, applying decals in object space is likely to produce better results.)

Visual Artifacts

The primary visual artifact resulting from this method is visible UV seams:

The reason this occurs is that our decal rendering and final mesh rendering disagree slightly on what texels are affected by the UVs of a given triangle. This is due to the GPU’s rasterization fill rules: fragments along the edge of a triangle are deemed in or out of the triangle based on a consistent rule so that when two triangles share an edge the fragment is only shaded once. While absolutely the correct thing to do for general rendering, it bites us here.

Conservative rasterization extensions should fix this issue. However, Unity does not expose this extension to users yet, and in any case it is not widely available.

The fix I implemented is an image pass that grows the boundaries of UV islands by one texel.

First, we render a single-channel mask texture, using the same texture-space projection as above, and output a constant 1. The mask now contains 1 on every texel touched by UVs, and 0 everywhere else. This mask is rendered only once, at initialization. (Really, we could even bake this offline.)

Next, after rendering a decal, we run a pass over the entire texture, creating a second decal texture. This pass samples the mask texture at each texel: if the sample is 1, the texel is passed through unchanged into the output. However, if the sample is 0, we sample the neighboring texels. If any of those are 1, the corresponding color samples are averaged together and output.

The resulting fragment shader looks like this:

sampler2D _MainTex;
float4 _MainTex_TexelSize;
sampler2D _MaskTex;

fixed4 frag (v2f i) : SV_Target
    float mask_p11 = tex2D(_MaskTex, i.uv).r;
    if (mask_p11 > 0.5)
        // This pixel is inside the UV borders, pass through the original color
        return tex2D(_MainTex, i.uv);
        // This pixel is outside the UV border.  Check for neighbors inside the border and output their average color.
        //   0 1 2
        // 0   -
        // 1 - X -
        // 2   -
        float2 uv_p01 = i.uv + float2(-1, 0) * _MainTex_TexelSize.xy;
        float2 uv_p21 = i.uv + float2(1, 0) * _MainTex_TexelSize.xy;

        float2 uv_p10 = i.uv + float2(0, -1) * _MainTex_TexelSize.xy;
        float2 uv_p12 = i.uv + float2(0, 1) * _MainTex_TexelSize.xy;

        float mask_p01 = tex2D(_MaskTex, uv_p01);
        float mask_p21 = tex2D(_MaskTex, uv_p21);
        float mask_p10 = tex2D(_MaskTex, uv_p10);
        float mask_p12 = tex2D(_MaskTex, uv_p12);

        fixed4 col = fixed4(0, 0, 0, 0);
        float total_weight = 0.0;
        if (mask_p01 > 0.5) {
            col += tex2D(_MainTex, uv_p01);
            total_weight += 1.0;
        if (mask_p21 > 0.5) {
            col += tex2D(_MainTex, uv_p21);
            total_weight += 1.0;
        if (mask_p10 > 0.5) {
            col += tex2D(_MainTex, uv_p10);
            total_weight += 1.0;
        if (mask_p12 > 0.5) {
            col += tex2D(_MainTex, uv_p12);
            total_weight += 1.0;

        if (total_weight > 0.0) {
            return col / total_weight;
        else {
            return col;

Note the mask comparisons are “greater than 0.5”, not “equal to 1”. Depending on the layout and packing of UVs, you may be able to get away with storing the mask texture in half-resolution and using bilinear filtering to reconstruct the edge. This produces artifacts where UV islands are within a few texels of each other (as they merge on the lower-resolution rendering).

Seams are filled:

On the CPU side, this is a straightforward image pass:

public void DoImageSpacePasses()
    int pass_idx = DecalImageEffectMaterial.FindPass("DECALUVBORDEREXPAND");
    if (pass_idx == -1)
        Debug.Log("Could not find decal UV border expansion shader pass.");

    // We replace the existing decal texture after the blit.  Copying the result back into
    // the original texture takes time and causes flickering where the texture is used.
    RenderTexture new_rt = new RenderTexture(m_DecalTexture);

    DecalImageEffectMaterial.SetTexture("_MaskTex", m_UVMaskTexture);
    Graphics.Blit(m_DecalTexture, new_rt, DecalImageEffectMaterial, pass_idx);

    m_DecalTexture = new_rt;
    m_Material.SetTexture("_DecalTex", m_DecalTexture);

Of course, in doing this, we create a new render target each time we run this pass. Therefore, it’s not a great idea to run the image pass after every decal is rendered. Instead, render a batch of decals and run the image pass afterwards.


Okay, so what does all of this *get* us, in comparison to existing decal techniques?

– The computation cost of a decal is low. There is a fixed per-frame overhead in using the decal texture in the main shader. Each decal rendered is a one-time cost as it renders to the texture. (The UV fixup can be amortized across a batch of decals, making it even cheaper.)

Contrast deferred decals, which require extra draw calls every frame. Permanent decals are therefore very expensive over time. (Forward+ decals encounter similar issues.)

The price paid, of course, is memory: the decal texture takes memory, the mask texture takes memory. The full decal texture must be kept around, even for distant objects, because we must render into the top mip at all times. (Otherwise, we’re stuck needing to upsample the rendered decal when the object gets closer.)

– Support for skinned meshes.

Deferred decals do not work for skinned meshes. While you can in theory attach them to bones and move the decal around with animation, the decal will not deform with the mesh, and so will appear to slide across the surface.

This is not a problem for the texture-space approach; the decal texture will stretch and squash around joints just like any other texture.

– Avoids UV distortion compared to simply “stamping” decals into the texture. By projecting the decal using the mesh vertex positions, we maintain size and shape, regardless of the topology of the UVs underneath. (As well as wrapping around UV seams.) This means texture-space decals still work well on organic shapes, where UVs may be stretched, skewed, etc. Do note this comes at the cost of inconsistent texel density over a decal in these cases; we can’t have everything.


This technique does come with some important caveats.

– No UV overlap.

Overlapping UVs will cause a decal in one area to show up elsewhere. (And since the overlap is almost always along a UV island boundary, one of these will have a sharp edge.)

– Sensitive to UV distortion.

While the texture projection can compensate for minor distortion in UVs, severe stretching will result in wildly varying texel density across the decal.

Both of these combine to make texture-space decals generally inadvisable for environments. (As environment art leans heavily on both UV overlap and stretching across low-detail regions.)

– Unique texture per object.

Each object using this decal method needs its own texture to store the decal layer. These can be combined into a texture atlas to reduce the impact on render batching, but the memory cost is still significant.


This is not a generally-applicable decal method. Deferred / Forward+ decal rendering still solves the general case much better.

However, this *is* a useful tool for decals on a relatively small number of meshes, in cases where general methods aren’t optimal: skinned meshes, permanent decals, large numbers of decals. As an example, a recent trend in FPS games is decal overlays for first-person weapon meshes (i.e, blood splatter on a recently-used melee weapon). Rather than pre-author them, this technique would allow those overlays to be generated dynamically.

hikari – New Material System

The next step in building hikari’s terrain system is proper material handling. In order to do so, I need to completely revamp the material system. This is going to be extensive enough to earn its own post.

Current Material System:

Here is the current state of hikari’s material “system”, a single struct:

struct Material {
    GLuint normal_map;
    GLuint albedo_map;
    GLuint metallic_map;
    GLuint roughness_map;
    GLuint ao_map;

    bool has_alpha;
    bool is_leaf;
    bool is_terrain;

And its usage, from the per-mesh main render loop:

main_pass.SetRenderTarget(GL_FRAMEBUFFER, &hdr_buffer_rt_ms);

RenderMesh * rmesh = meshes + i;
IndexedMesh * mesh = rmesh->mesh;
Material * material = rmesh->material;
if (material->has_alpha) {
else {
Assert(mesh && material, "Cannot render without both mesh and material!");

GLuint shader_for_mat = default_shader;
if (material->is_leaf) {
    shader_for_mat = leaf_shader;
if (material->is_terrain) {
    shader_for_mat = terrain_shader;

main_pass.BindTexture(GL_TEXTURE_2D, "normal_map", material->normal_map);
main_pass.BindTexture(GL_TEXTURE_2D, "albedo_map", material->albedo_map);
main_pass.BindTexture(GL_TEXTURE_2D, "metallic_map", material->metallic_map);
main_pass.BindTexture(GL_TEXTURE_2D, "roughness_map", material->roughness_map);
main_pass.BindTexture(GL_TEXTURE_2D, "ao_map", material->ao_map);

// Material independant shader setup (lights list, shadow map binding,
// environment maps, etc.), elided for space.

// Draw the mesh.


The current notion of material is a set of textures and some flags used to determine the correct shader. Needless to say, this is neither flexible nor particularly efficient. Instead, what I’d like is a Material that consists of two bags of data: render states (shader program, blend mode, depth test) and shader uniforms (texture maps).

New Material System, First Pass:

struct Material {
    GLuint shader_program;
    GLenum depth_mask;
    GLenum depth_func;

    struct Uniform {
        const char * name;
        union {
            float value_f[4];
            Texture value_texture;

        enum UniformType {

        } type;

    std::vector<Uniform> uniforms;

    // Uniform setup methods elided

    inline void
    BindForRenderPass(RenderPass * pass) {

        for (u32 i = 0; i < uniforms.size(); ++i) {
             Uniform u = uniforms[i];
             switch (u.type) {
                 case Uniform::UniformType::Float1: {
                     pass->SetUniform1f(, u.value_f[0]);
                } break;
                case Uniform::UniformType::Float2: {
                    pass->SetUniform2f(, u.value_f[0], u.value_f[1]);
                } break;
                case Uniform::UniformType::Float3: {
                    pass->SetUniform3f(, u.value_f[0], u.value_f[1], u.value_f[2]);
                } break;
                case Uniform::UniformType::Float4: {
                    pass->SetUniform4f(, u.value_f[0], u.value_f[1], u.value_f[2], u.value_f[3]);
                } break;
                case Uniform::UniformType::TexHandle: {
                    pass->BindTexture(, u.value_texture);
                } break;

A good first step. No more branching to select shaders inside the render loop, no hardcoded set of textures. Still doing some silly things, like re-binding the shader for every mesh. Also still looking up uniform locations every time, though there is enough information here to cache those at load time now.

Let’s look at the entire main pass for a bit:

for (u32 i = 0; i < mesh_count; ++i) {     main_pass.Begin();     main_pass.SetRenderTarget(GL_FRAMEBUFFER, &hdr_buffer_rt_ms);     if (!visible_meshes[i]) {         continue;     }     RenderMesh * rmesh = meshes + i;     IndexedMesh * mesh = rmesh->mesh;
    Material * material = rmesh->material;

    Assert(mesh && material, "Cannot render without both mesh and material!");


    main_pass.SetUniformMatrix44("clip_from_world", clip_from_world);
    main_pass.SetUniformMatrix44("world_from_local", rmesh->world_from_local);
    main_pass.SetUniform3f("view_pos", cam->position);
    main_pass.SetUniform1f("time", current_time_sec);

    main_pass.BindUBO("PointLightDataBlock", point_light_data_ubo);
    main_pass.BindUBO("SpotLightDataBlock", spot_light_data_ubo);
    main_pass.BindUBO("LightList", light_list_ubo);

    main_pass.BindTexture(GL_TEXTURE_CUBE_MAP, "irradiance_map", skybox_cubemaps.irradiance_map);
    main_pass.BindTexture(GL_TEXTURE_CUBE_MAP, "prefilter_map", skybox_cubemaps.prefilter_map);
    main_pass.BindTexture(GL_TEXTURE_2D, "brdf_lut", brdf_lut_texture);

    main_pass.BindTexture("ssao_map", ssao_blur_texture);
    main_pass.SetUniform2f("viewport_dim", hdr_buffer_rt_ms.viewport.z, hdr_buffer_rt_ms.viewport.w);

    main_pass.SetUniform3f("sun_direction", world->sun_direction);
    main_pass.SetUniform3f("sun_color", world->sun_color);
    main_pass.BindTexture("sun_shadow_map", sun_light.cascade.shadow_map);
    for (u32 split = 0; split < NUM_SPLITS; ++split) {
        char buffer[256];
        _snprintf_s(buffer, sizeof(buffer) - 1, "sun_clip_from_world[%d]", split);
        main_pass.SetUniformMatrix44(buffer, sun_light.cascade.sun_clip_from_world[split]);

There is still a *lot* of uniform setup going on per-mesh, and almost all of it is unnecessary. But, since Material binds the shader each time, all of the other uniforms need to be rebound (because BindForRenderPass() may have bound a different shader).

Ideally, here’s the inner loop we’re aiming for:

for (u32 i = 0; i < mesh_count; ++i) {     RenderMesh * rmesh = meshes + i;     IndexedMesh * mesh = rmesh->mesh;
    Material * material = rmesh->material;
    Assert(mesh && material, "Cannot render without both mesh and material!");

    main_pass.SetUniformMatrix44("world_from_local", rmesh->world_from_local);

Material Instances:

When rendering the Sponza test scene, there are a few dozen materials loaded. However, there are only 3 different sets of render states: leaves (a shader doing alpha test and subsurface scattering), alpha tested, and default PBR. Within each class of material the only difference is the texture set.

If we were to separate the set of render states and the set of uniforms into different entities, we’d be able to minimize modifications to the render state in this loop. So that’s what I’ve decided to do.

A Material is a bag of render states, while a MaterialInstance is a bag of uniforms associated with a particular Material. For example, the vases, walls, columns, etc. in Sponza would all be instances of the same Material. If we sort and bucket the mesh list according to Materials, we only need to bind the full render state for each material once. (This is also a convenient point to eliminate culled meshes, removing the visibility check in the main loop.)

At least for now, I’ve done this in the most naive way possible; the list of uniforms is removed from Material and becomes MaterialInstance. Each instance also contains a pointer to its parent material. Done!

This is not a great solution, there are a lot of ways for one to shoot themselves in the foot. For example, a MaterialInstance that doesn’t contain the full set of expected uniforms (will render with stale data), or containing extras (will assert when the uniform bind fails). The Material should probably have “base instance” that defines what set of uniforms are required, and defaults for them; each instance would validate calls against this base. I have not implemented this yet.

Here’s where we end up with MaterialInstance:

BucketedRenderList render_list = MakeBucketedRenderList(meshes, mesh_count, visible_meshes);

for (RenderBucket bucket : render_list.buckets) {
    main_pass.SetRenderTarget(GL_FRAMEBUFFER, &hdr_buffer_rt_ms);


    // Additional global setup, as above.

    for (u32 i = bucket.start; i < bucket.end; ++i) {
        assert(i < render_list.mesh_count);
        RenderMesh * rmesh = &render_list.mesh_list[i];
        IndexedMesh * mesh = rmesh->mesh;
        MaterialInstance * material_instance = rmesh->material_instance;

        RenderPass main_subpass = main_pass.BeginSubPass();

        main_subpass.SetUniformMatrix44("world_from_local", rmesh->world_from_local);



(Material binding has been moved into RenderPass, which is in hindsight a more natural place for it to live.)

By carving the mesh list up into buckets by Material, we only need to do the global setup once for each bucket, and then only set the specific instance materials per-mesh. Even better, the check for culled objects disappears. And this list is reusable between the depth prepass and main render pass.

Still a lot of room for performance improvement: the RenderMesh struct causes a lot of unnecessary pointer chasing, meshes using the same instance of a material could be batched as well, there are *sprintf* calls in the outer loop. It’s pretty clear I need to spend a lot more time here.

However, this is progress! More importantly, Materials are now general enough I can implement the terrain materials. So that’s next.

hikari – Designing a Terrain System

The next big addition to hikari is going to be a system for rendering terrain. I’ve done a few weeks of research and wanted to write up my plans before implementing. Then, once it’s complete, I can review what did and did not work.

That’s the idea, anyway.

Source Format:

I decided to use simple heightmap-based terrain. Starting with a regular grid of points, offset each vertically based on a grayscale value stored in an image. This is not the only (or best) choice, but it is the simplest to implement as a programmer with no art team at hand. Other options include marching-cubes voxels, constructive solid geometry, and hand-authored meshes.

However, I’m consciously trying to avoid depending on the specific source of the mesh in the rest of the pipeline. This would allow any of (or even a mix of) these sources to work interchangeably. We’ll see how realistic that expectation is in practice.


Terrain meshes can be quite large, especially in the case of an open-world game where the map may cover tens of kilometers in any direction, while still maintaining detail up close. Working with such a large mesh is unwieldy, so let’s not.

Instead, we cut the terrain up into chunks. For example, if our world map is 4km x 4km, we could cut it up into 1600 separate 100m x 100m chunks. Now we don’t need to keep the entire map in memory, only the chunks that we might need to render in the near future (e.g, chunks around the camera.)

There are additional benefits to chunking the map; for example, the ability to frustum cull the chunks. Chunks provide a convenient granularity for applying level-of-detail optimizations as well.


The mesh is only part of our terrain, of course. It needs texture and color from materials: grass, rock, snow, etc. Importantly, we don’t necessarily want to texture every part of the terrain with the same material. One solution would be to assign a single material to each chunk, and have an artist paint a unique texture.

This seems to be a non-starter to me — potentially thousands of textures (of high resolution) would need to be created and stored, not to mention streamed into and out of memory alongside their mesh chunks. In addition, producing a distortion-free UV-mapping for terrain is difficult. Finally, while we don’t want to texture everything the same, there are only a limited set of materials we want to use, and it’d be nice if we could reuse the same textures.

Enter splat maps. The basic idea is straightforward: assign a color to the terrain. The components of that color determine the blend weights of a set of materials. If the red channel weights a rock texture and the green channel weights a grass texture, ‘yellow’ terrain is a blend between rock and grass. This allows us to smoothly blend between 5 different materials.

Wait, 5? How?

A color has four components, red, green, blue, and alpha. However, if we assume the material weights must add to equal 1.0, we can construct a new weight that is (1.0 – (r + g + b + a)). This becomes the weight of our 5th material. (Of course, we need to actually enforce these constraints in art production, but that is a separate issue.)

Furthermore, since chunks of the terrain get rendered in their own draw call, there’s no reason we can’t swap the materials used in each chunk. So while the number of materials in a specific chunk is limited, the total number of materials is unlimited (well, close enough).

The name “splat map” may imply using a texture to assign this color (and, for a heightmap terrain, that’s the easiest way to store it on disk), but I actually want to use vertex colors. This puts heightmaps and authored meshes on the same footing, and keeps decisions about mesh source in the mesh loading where it belongs.

In order to do texturing, we need texture coordinates. The obvious way to do this is assign a UV grid to our heightmap mesh. This will look fine for smooth, subtle terrain features, but the more drastic the slope the more distorted the texturing will be.

Instead, we can use so-called triplanar mapping to generate texture coordinates. Project the world position to each of the XY, YZ, and ZX planes, and use that value as the texture coordinate for three different lookups. Then, blend between them based on the vertex normal.

As an added bonus, there’s no requirement that the various axes in triplanar mapping use the same material. By mixing materials, we can create effects such as smooth grass that fades to patches of dirt on slopes.

Of course, there is a cost to all of this. Triplanar mapping requires doing 3x the material samples. Splat mapping requires doing up to 5x the samples. Together, it’s up to 15x the material samples, and we haven’t even begun to talk about the number of individual textures in each material. Needless to say, rendering terrain in this fashion is going to read a *lot* of texture samples per-pixel.

That said, hikari already uses a depth prepass to avoid wasted shading on opaque geometry, so I suspect that would mitigate the worst of the cost. In addition, it’s easy enough to write another shader for more distant chunks that only uses the 2-3 highest weights out of the splat where that’s likely to be an unnoticeable difference.


There are a bunch of assumptions that hikari makes regarding meshes and materials that are a bit at odds with what I’ve mentioned above. The renderer is simple enough that I can work around these assumptions for special cases. However, I will need to be on the look out for opportunities to make some of the rendering systems (or, at least, the exposed API) more general.

Some examples: at the moment, meshes are assumed to use a specific common vertex format (position, UV, normal, tangent) — terrain meshes will not (no need for UV and tangent, but we need the splat color.) Meshes are expected to use only one material, and materials do not pack any texture maps together (so, for example, baked AO and roughness are separate maps). Neither of these will play nicely with terrain, but should be simple enough to fix. “Materials” are currently just a bag of textures and a handful of uniform parameters, since at the moment there are only two mesh shaders in the engine. A more generic material model that ties directly into its associated shader would be a useful thing to fix.

So, there’s the plan of attack. We’ll see how well it stacks up. I’ll be documenting the implementation of each part of the system in a series of blog posts, capped off by a final report on what worked and where I got it totally wrong.

hikari – Implementing Single-Scattering

Over the past few days, I’ve been implementing a single-scattering volumetric fog effect in hikari, my OpenGL PBR renderer.

I started by more-or-less copying Alexandre Pestana’s implementation (, porting it to GLSL and fitting it into my existing render pipeline.

However, there’s an issue: by not taking into account the transmittance along the ray, instead taking an average of all scattered samples, the effect is far more severe than it should be. In addition, it blends really poorly with the scene (using additive blending after the lighting pass.)

Bart Wronski describes a more physically-correct model in his presentation on AC4’s volumetric fog ( I did not go the full route of using a voxel grid (although I may in the future; this renderer uses clustered shading so that’s a natural extension). However, Wronski presents the correct integration to account for transmittance, and we can apply that to the post-effect approach as well.

Background out of the way, I wanted an implementation that was:
– Minimally invasive. If possible, as simple as adding a new render pass and blend. No compute shaders or large data structures.
– Tweakable. Light dust to thick fog.
– Plausible. I don’t need absolute physical accuracy, but something that fits well in the scene and looks “right”.

The core loop is (relatively) simple:

// Accumulated in-scatter.
vec3 inscatter_color = vec3(0.0);
// Accumulated density.
float total_density = 0.0;
for (int i = 0; i < STEP_COUNT; ++i) {
    // Sample sun shadow map.
    vec3 sun_clip_position = GetClipSpacePosition(current_position, sun_clip_from_view);
    float shadow_factor = SampleShadowPoint(sun_shadow_map, sun_clip_position.xy, sun_clip_position.z);

    // Calculate total density over step.
    float step_density = density * step_distance;

    // Calculate transmittance of this sample (based on total density accumulated so far).
    float transmittance = min(exp(-total_density), 1.0);

    // Sun light scatter.
    inscatter_color += sun_color * shadow_factor * sun_scatter_amount * step_density * transmittance;

    // Ambient scatter.
    inscatter_color += ambient_light_color * ambient_scatter_amount * step_density * transmittance;

    // Accumulate density.
    total_density += step_density;

    // Step forward.
    current_position += step_vector;

The raymarch operates in view space. This was mostly a matter of convenience, but also means we maintain better precision close to the camera, even if we happen to be far away from the world origin.

Currently, I only compute scattering for sunlight and a constant ambient term. (Coming from directly up.) If you have ambient probes stored as spherical harmonics, you can compute a much better ambient term — see Wronski’s slides for details. Other light types require recomputing the scatter amount per-sample, as it depends on the angle between the light and view.

Note the scaling by transmittance. The light scattered by a sample has to travel through all of the particles we’ve seen to this point, and therefore it is attenuated by the scattering and absorption.

Density is (very roughly) a measure of particle density per unit length. (I don’t 100% understand the physical basis here, but it is dependent on both particle size and number.) The implementation currently uses a completely uniform value for density, but this could easily be sampled from a texture map (artist painted, noise, particle system, etc.) In practice, good-looking density values tend to be very low:




(g = -0.85)

You can smooth the results of higher densities by using more samples (or blurring the result.)

The other control parameter for the scattering is g. This is used in the phase function to compute the amount of light scattered towards the viewer. The valid range for g is [-1, 1]. In general, most atmospheric particles will have a negative g value, that increases in magnitude as the particle size increases. Some good values to start with are between -0.75 and -0.99.


(density = 0.00025)

At low sample counts, the result is highly susceptible to banding artifacts. To combat this, we jitter the starting position by moving some fraction of a step in the direction of the ray. (This is the same approach used by Pestana.) In order to keep the noise unobjectionable, we need to use either an even pattern like a Bayer matrix, or blue noise. I’ve also experimented with dithering the step distance as well, such that neighboring pixels use different step sizes. However, this does not seem to produce much benefit over just jittering the start position; the resulting noise is more noticeable.

[no jitter]
(density = 0.0025, g = -0.85)

The outputs of this shader are the accumulated scattered light (in RGB) and the final transmittance amount (in alpha). I then perform a bilateral upscale (directly ported from Pestana’s HLSL version) and blend it with the lighting buffer using glBlendFunc(GL_ONE, GL_SRC_ALPHA). The transmittance is the amount of light that reaches the viewer from the far end of the ray, so this is accurate.

For higher densities or more uniform scattering (g closer to 0), we may want to perform a depth-aware blur on the scatter buffer before upscaling. Since I didn’t have an immediate need for really thick scatter, I have not implemented this yet.

The scattering should be added before doing the luminance calculation and bloom. This helps to (somewhat) mitigate the darkening effect of unlit areas, as well as further smoothing the edges of light shafts.

There’s still a lot to be done here: supporting additional lights, variable density, etc. But even the basic implementation adds a lot when used subtly.

Orthographic LODs – Part II

Last time, we got the basic renderer up and running. Now we have more complex issues to deal with. For example, how to light the LOD models.


One approach to lighting is to simply light the detailed model, and bake that lighting into the LOD texture. This works, but requires that all lights be static. We can do better.

The standard shading equation has three parts: ambient light, diffuse light, and specular light:
K_d=DT(N \cdot L)
K_s=S(N \cdot H)^m (N \cdot L > 0)
(For details of how these equations work, see Wikipedia.)

The key thing to notice is that these equations rely only on the texture (which we already have), a unit direction towards the light source, a unit direction towards the camera, and the unit surface normal. By rendering our detailed model down to a texture, we’ve destroyed the normal data. But that’s easy enough to fix: we write it to a texture as well.

(Setup for this texture is more-or-less the same as the color texture, we just bind it to a different output.)

// Detail vertex shader
in vec3 position;
in vec3 normal;

out vec3 Normal;

void main() {
    gl_Position = position;
    Normal = normal;

// Detail fragment shader

in vec3 Normal;

out vec4 frag_color;
out vec4 normal_map_color;

void main() {
    vec3 normal = (normalize(Normal) + vec3(1.0)) / 2.0;

    frag_color = vec4(1.0);
    normal_map_color = vec4(normal, 1.0);

A unit normal is simply a vector of 3 floats between -1 and 1. We can map this to a RGB color (vec3 clamped to [0, 1]) by adding one and dividing by two. This is, again, nothing new. You may be wondering why we store an alpha channel — we’ll get to that later.

We can now reconstruct the normal from the LOD texture. This is all we need for directional lights:

// Fragment shader
uniform sampler2D color_texture;
uniform sampler2D normal_map_texture;

uniform vec3 sun_direction;
uniform vec3 camera_direction;

in vec2 uv;

out vec4 frag_color;

void main() {
    vec4 normal_sample = texture(normal_map_texture, uv);
    vec3 normal = ( * 2.0) - vec3(1.0);
    vec3 view = -1.0 * camera_direction;
    vec3 view_half = (normal + view) / length(normal + view);

    float n_dot_l = dot(normal, sun_direction);
    float n_dot_h = dot(normal, view_half);

    float ambient_intensity = 0.25;
    float diffuse_intensity = 0.5;
    float specular_intensity = 0.0;
    if (n_dot_l > 0) {
        specular_intensity = 0.25 * pow(n_dot_h, 10);

    vec4 texture_sample = texture(color_texture, uv);

    vec3 lit_color = ambient_intensity * texture_sample.rgb +
                     diffuse_intensity * texture_sample.rgb +
                     specular_intensity * vec3(1.0);

    frag_color = vec4(lit_color, texture_sample.a);


Point / Spot Lighting

So this solves directional lights. However, point and spot lights have attenuation — they fall off over distance. Additionally, they have an actual spatial position, which means the light vector will change.

After rendering the LODs, we no longer have the vertex positions of our detail model. Calculating lighting based on the LOD model’s vertices will, in many cases, be obviously wrong. How can we reconstruct the original points?

By cheating.

Let’s look at what we have to work with:

The camera is orthographic, which means that depth of a pixel is independent of where it is on screen. We have the camera’s forward vector (camera_direction in the above shader).

Finally, and most obviously, we don’t care about shading the points that weren’t rendered to our LOD texture.

This turns out to be the important factor. If we knew how far each pixel was from the camera when rendered, we could reconstruct the original location for lighting.

To put it another way, we need a depth buffer. Remember that normal map alpha value we didn’t use?

We can get the depth value like so:

// Map depth from [near, far] to [-1, 1]
float depth = (2.0 * gl_FragCoord.z - gl_DepthRange.near - gl_DepthRange.far) / (gl_DepthRange.far - gl_DepthRange.near);

// Remap normal and depth from [-1, 1] to [0, 1]
normal_depth_color = (vec4(normalize(normal_vec), depth) + vec4(1.0)) / 2.0;

However, we now run into a problem. This depth value is in the LOD render’s window space. We want the position in the final render’s camera space — this way we can transform the light position to the same space, which allows us to do the attenuation calculations.

There are a few ways to handle this. One is to record the depth range for our LOD model after rendering it, and then use that to scale the depth properly. This is complex (asymmetrical models can have different depth sizes for each rotation) and quite likely inconsistent, due to precision errors.

A simpler solution, then, is to pick an arbitrary depth range, and use it consistently for both the main render and LODs. This is not ideal — one getting out of sync with the other may lead to subtle errors. In a production design, it may be a good idea to record the camera matrix used in the model data, so that it can be confirmed on load.

We need two further pieces of data to proceed: the viewport coordinates (the size of the window) and the inverse projection matrix (to “unproject” the position back into camera space). Both of these are fairly trivial to provide. To reconstruct the proper position, then:

vec3 window_to_camera_space(vec3 window_space) {
    // viewport = vec4(x, y, width, height)

    // Because projection is orthographic, NDC == clip space
    vec2 clip_xy = ((2.0 * window_space.xy) - (2.0 * viewport.xy)) / ( - 1.0;
    // Already mapped to [-1, 1] by the same transform that extracts our normal.
    float clip_z = window_space.z

    vec4 clip_space = vec4(clip_xy, clip_z, 1.0);

    vec4 camera_space = camera_from_clip * clip_space;

(Note that this is rather inefficient. We *know* what a projection matrix looks like, so that last multiply can be made a lot faster.)

Calculating the light color is the same as for directional lights. For point lights, we then divide by an attenuation factor:

float attenuation = point_light_attenuation.x +
                    point_light_attenuation.y * light_distance +
                    point_light_attenuation.z * light_distance * light_distance;

point_light_color = point_light_color / attenuation;

This uses 3 factors: constant, linear, and quadratic. The constant factor determines the base intensity of the light (a fractional constant brightens the light, a constant greater than 1.0 darkens it.) The linear and quadratic factors control the fall-off curve.

A spot light has the same attenuation factors, but also has another term:

float spot_intensity = pow(max(dot(-1.0 * spot_direction, light_direction), 0.0), spot_exponent);

The dot product here limits the light to a cone around the spotlight’s facing direction. The exponential factor controls the “tightness” of the light cone — as the exponent increases, the fall-off becomes much sharper (remember that the dot product of unit vectors produces a value in [0, 1]).

So that covers lighting. Next time: exploring options for shadows.



I made some errors with terminology in my last post. After the projection transform is applied, vertices are in *clip* space, not eye space. The series of transforms is as follows:

Model -> World -> Camera -> Clip -> NDC
(With an orthographic projection, Clip == NDC.)

Orthographic LODs – Part I

NOTE: I’m going to try something new here: rather than write about something I have already solved, this series will be my notes as I work through this problem. As such, this is *not* a guide. The code is a pile of hacks and I will make silly mistakes.

The Problem:

I am working on a city-building game. I want to be able to use detailed models with large numbers of polygons and textures, but without the runtime rendering cost that entails. To do so, I want to pre-render detailed models as textures for lower level-of-detail (LOD) models. This is not a particularly original solution (it is the system SimCity 4 uses). There are a few restrictions implied by this choice:

  • Camera projection must be orthographic.
    • A perspective projection will appear to skew the model the further it gets from the center of the view. Orthographic projection will not.
  • Camera angle must be fixed (or a finite set of fixed positions).
    • While camera *position* is irrelevant (because an orthographic projection doesn’t affect depth), its rotation is not. Each camera angle needs its own LOD image. Therefore, we need a small number of them.

The process is simple enough: render the detail model for each view. Projecting the vertices of the LOD model with the same view produces the proper texture coordinates for the LOD. Then, pack all of the rendered textures into an atlas.

Orthographic Rendering

Rendering an orthographic projection is, conceptually, very straightforward. After transforming the world to camera space (moving the camera to the origin, and aligning the z-axis with the camera’s forward view), we select a box of space and map it onto a unit cube (that is, a cube with a minimum point of (-1, -1, -1) and a maximum point of (1, 1, 1).) The z-component of each point is then discarded.

I will not get into the math here. Wikipedia has a nicely concise explanation.

What this *means* is that we can fit this box tightly to the model we want to render. With clever application of glViewport, we could even render all of our views in one go. (At the moment, I have not yet implemented this.) This makes building the texture atlas much simpler.

Calculating Camera-space LOD

To get this tight fit, we need to know the bounds of the rendered model in camera space. Since we are projecting onto the LOD model, *its* bounds are what we’re concerned with. (This implies, by the way, that the LOD model must completely enclose the detail model.)

struct AABB {
    vec3 min;
    vec3 max;

GetCameraSpaceBounds(Model * model, mat4x4 camera_from_model) {
    AABB result;

    vec4 * model_verts = model->vertices;
    vec4 camera_vert = camera_from_model * model_verts[0];

    // We don't initialize to 0 vectors, because we don't
    // guarentee the model is centered on the origin.
    result.min = vec3(camera_vert);
    result.max = vec3(camera_vert);

    for (u32 i = 1; i < model->vertex_count; ++i) {
        camera_vert = camera_from_model * model_verts[i];

        result.min.x = min(result.min.x, camera_vert.x);
        result.min.y = min(result.min.y, camera_vert.y);
        result.min.z = min(result.min.z, camera_vert.z);

        result.max.x = max(result.max.x, camera_vert.x);
        result.max.y = max(result.max.y, camera_vert.y);
        result.max.z = max(result.max.z, camera_vert.z);

    return result;

This bounding box gives us the clip volume to render.

AABB lod_bounds = GetCameraSpaceBounds(lod_model, camera_from_model);

mat4x4 eye_from_camera = OrthoProjectionMatrix(
    lod_bounds.max.x, lod_bounds.min.x,  // right, left
    lod_bounds.max.y, lod_bounds.min.y,  // top, bottom
    lod_bounds.min.z, lod_bounds.max.z); // near, far

mat4x4 eye_from_model = eye_from_camera * camera_from_model;

Rendering LOD Texture

Rendering to a texture requires a framebuffer. (In theory, we could also simply render to the screen and extract the result with glGetPixels. However, that limits us to a single a single color output and makes it more difficult to do any GPU postprocessing.)

After binding a new framebuffer, we need to create the texture to render onto:

glGenTextures(1, &result->texture);
glBindTexture(GL_TEXTURE_2D, result->texture);

// render_width and render_height are the width and height of the 
// camera space lod bounding box.
    GL_TEXTURE_2D, 0, GL_RGBA, render_width, render_height, 0, GL_RGBA, GL_UNSIGNED_BYTE, 0


glFramebufferTexture(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, result->texture, 0);

glViewport(0, 0, render_width, render_height);

This gives us a texture large enough to fit our rendered image, attaches it to the framebuffer, and sets the viewport to the entire framebuffer. To render multiple views to one texture, we’d allocate a larger texture, and use the viewport call to selected the target region.

Now we simply render the detail model with the eye_from_model transform calculated earlier. The result:

(The detail shader used simply maps position to color.)

That’s all for now. Next time: how do we light this model?