hikari – Terrain Materials

Now that hikari’s material system can express ideas more complicated than “specific set of textures”, it’s time to implement the materials and shading for terrain.

Splat Maps:

The texture selection is going to be based off of splat maps, which means we need the vertex colors: something I conveniently ignored when doing mesh generation earlier. It’s a simple enough fix. I decided to go with layers of noise for testing again, both because it was easy (no need for tooling) and because it produces the worst-case scenario for performance (all pixels use all or most of the splats). If the splat / triplanar mapping is going to fall off the performance cliff I want to find out now, not later.

Splats are normalized (to sum to one) and packed into a 32 bit color per vertex:

inline u32
NormalizeSplat(float w0, float w1, float w2, float w3, float w4) {
    float sum = w0 + w1 + w2 + w3 + w4;
    Assert(sum != 0.0f, "Cannot have total weight of zero on terrain splat.");

    float r = w0 / sum;
    float g = w1 / sum;
    float b = w2 / sum;
    float a = w3 / sum;

    u32 ri = (((u32)(r * 255.0f) & 0xff) <<  0);
    u32 gi = (((u32)(g * 255.0f) & 0xff) <<  8);
    u32 bi = (((u32)(b * 255.0f) & 0xff) << 16);
    u32 ai = (((u32)(a * 255.0f) & 0xff) << 24);

    return ai | bi | gi | ri;
}

I wrote a basic shader to visualize the splat map (although, squeezing 5 values into 3 colors is not particularly effective):
splat_color.png

Textures:

By packing textures tightly, we can reduce the number of texture reads for each splat material to two:
– albedo color
– normal + AO + roughness

This requires encoding the normal map in two channels (which is completely fine for tangent space, just discard the Z component and reconstruct in the shader). It also assumes that none of the splat materials have a metal channel. This is not necessarily true, but the metal mask could be packed into the albedo’s alpha channel. For the textures I’m working with, the metal mask is a constant 0.

So, as a rough calculation:
2 textures per splat material
x 3 splat materials per splat (so that we can have different textures on each planar face)
x 5 splats
= 30 textures.

This immediately rules out the naive approach of binding each texture separately. GL_MAX_TEXTURE_IMAGE_UNITS, the maximum number of bound textures per draw, on my midrange GPU is 32. Since the lighting code uses several textures as well (shadow maps, environment probes, a lookup texture for the BRDF), we run out of texture units.

Fortunately, we don’t have to do things that way.

All of the terrain textures should be the same size, so that the resolution and variation is consistent between them. This in turn means we can pack all of the terrain textures into a set of texture arrays. Instead of binding different textures, each splat material provides an index into the texture arrays.

This does make some things, like streaming textures in and out, more complicated. I’ve ignored this for now, as hikari simply loads everything at startup anyway. (hikari’s handling of assets is not very good, and will need to be addressed sometime soon.)

Shader:

All of the shaders in hikari are plain GLSL, with some minor support for preprocessing (#includes). Material shaders share the same lighting code, via a function called AccumulateLighting() that takes a struct containing all of the surface parameters and returns an RGB color. The benefit of this is writing, debugging, and optimizing the lighting calculation and lookup *once*.

Writing a shader, then, is just a matter of filling out that structure.

For this terrain shader, we need to do two sets of blends: first, blending between splats; second, blending between the three planar projections.

The blending for splats is pretty much exactly what you’d expect:

vec3 SampleAlbedoSplat(vec2 uv, float spw0, float spw1, float spw2, float spw3, float spw4) {
    vec3 splat0 = texture(splat_albedo_array, vec3(uv, splat_id0)).rgb;
    vec3 splat1 = texture(splat_albedo_array, vec3(uv, splat_id1)).rgb;
    vec3 splat2 = texture(splat_albedo_array, vec3(uv, splat_id2)).rgb;
    vec3 splat3 = texture(splat_albedo_array, vec3(uv, splat_id3)).rgb;
    vec3 splat4 = texture(splat_albedo_array, vec3(uv, splat_id4)).rgb;

    vec3 splat_albedo = splat0 * spw0 +
                        splat1 * spw1 +
                        splat2 * spw2 +
                        splat3 * spw3 +
                        splat4 * spw4;

    return splat_albedo;
}

There is another, similar function for sampling and blending the surface properties (normal, roughness, AO).

The triplanar blending is more interesting. See these two blog posts for the fundamentals of what triplanar blending is and how it works:
Ben Golas – Normal Mapping for a Triplanar Shader
Martin Palko – Triplanar Mapping

The first step of triplanar blending is calculating the blend weights for each plane. I take the additional step of clamping weights below a threshold to zero. While this can create some visible artifacts if the threshold is too high, any plane we can avoid looking at is 10 fewer texture samples.

const float blend_sharpness = 4.0;
const float blend_threshold = 0.05;

// fs_in.normal is the interpolated vertex normal.
vec3 blend = pow(abs(fs_in.normal), vec3(blend_sharpness));

blend /= dot(blend, vec3(1.0));

if (blend.x < blend_threshold) blend.x = 0.0;
if (blend.y < blend_threshold) blend.y = 0.0;
if (blend.z < blend_threshold) blend.z = 0.0;

// Need to renormalize the blend
blend /= dot(blend, vec3(1.0));

By checking for a positive blend weight before sampling one of the projections, we can skip those that aren’t going to contribute much. This *does* seem to be a win (about a 0.25ms drop in render time in my test scene), but it’s pretty close to the noise level so I’m not certain.

Thresholding the splat weights may also be worthwhile; in the worst-case test I have set up it definitely isn’t, but actual artist-authored terrain is unlikely to use more than 2 or 3 channels per-pixel.

The actual blending is, mostly, exactly what you’d expect (multiply each planar projection with the corresponding weight and add them together.) The normal blending is slightly different, as described in Ben Golas’ blog post above:

// props_#.ts_normal is the tangent space normal for the plane facing that axis.
vec3 normal_x = vec3(0, props_x.ts_normal.yx);
vec3 normal_y = vec3(props_y.ts_normal.x, 0.0, props_y.ts_normal.y);
vec3 normal_z = vec3(props_z.ts_normal.xy, 0.0);
surf.N = normalize(fs_in.normal +
                   blend.x * normal_x +
                   blend.y * normal_y +
                   blend.z * normal_z);

The normal map for each plane is a linear blend of the splats. This is wrong, but looks okay in practice. I think the correct approach would be to swizzle *every* normal map sample out to world space and *then* blend? Not sure.

The result:
splat_texture.png

Per-plane Textures:

Stepping back a bit, let’s look at this render:
per_plane_color

The textures are placeholders, but each pixel is colorized based on the contribution by each planar mapping. For terrain rendering, this is really nice: it’s identified the slopes for us. In addition, we read a texture for each plane anyway — but there’s no need for it to be the *same* texture!

By exploiting this, we can use a different texture for slopes and level areas, “for free”:
grass_cliff

There’s more we can do here, too, like using different variants of a texture for X and Z planes to increase the amount of variation.

Performance:

I don’t have a very good feel on what makes shaders fast or slow. I was expecting 30+ texture reads to be a problem, but it doesn’t really appear to be. The depth prepass is possibly saving me a lot of pain here, as it means very little of the expensive shading goes to waste. I did notice some issues after first implementing the terrain shader, dropping to 30FPS on occasion, but after adding some GPU profiling code it turns out *shadow rendering* is slow, and just tipped over the threshold to cause problems. (Why does rendering the sun’s shadow cascade take upwards of 5-7ms? I dunno! Something to investigate.)

That said, the method I am using to time GPU operations (glQueryCounter) appears to be less than perfectly accurate (i.e., the UI pass seems to get billed the cost of waiting for vblank.) The GL specification, meanwhile, manages to be either extremely unclear or contradictory on the exact meaning of the timestamp recorded.

For now, I’m going to say this is fine and investigate more later. (ノ`Д´)ノ彡┻━┻

Continuing Work:

At this point, I have most of what I set out to implement, but there are a few things that still need to be done before I’d consider this terrain system complete:

– LOD mesh generation. In addition to simplifying the mesh itself, probably clamp low-weight splats to zero to make shading cheaper.
– Revisit hex tiling. I suspect it really is a more natural fit.
– Fix seams with normals and splats. These are still noticeable with real terrain textures.
– More detailed shading. The terrain materials I’m using all come with height maps; adding occlusion mapping would be interesting and would help sell the look up close.

Until next time.

Advertisements

hikari – New Material System

The next step in building hikari’s terrain system is proper material handling. In order to do so, I need to completely revamp the material system. This is going to be extensive enough to earn its own post.

Current Material System:

Here is the current state of hikari’s material “system”, a single struct:

struct Material {
    GLuint normal_map;
    GLuint albedo_map;
    GLuint metallic_map;
    GLuint roughness_map;
    GLuint ao_map;

    bool has_alpha;
    bool is_leaf;
    bool is_terrain;
};

And its usage, from the per-mesh main render loop:

main_pass.Begin();
main_pass.SetRenderTarget(GL_FRAMEBUFFER, &hdr_buffer_rt_ms);

RenderMesh * rmesh = meshes + i;
IndexedMesh * mesh = rmesh->mesh;
Material * material = rmesh->material;
if (material->has_alpha) {
    glDepthMask(GL_TRUE);
    glDepthFunc(GL_LESS);
}
else {
    glDepthMask(GL_FALSE);
    glDepthFunc(GL_EQUAL);
}
Assert(mesh && material, "Cannot render without both mesh and material!");

GLuint shader_for_mat = default_shader;
if (material->is_leaf) {
    shader_for_mat = leaf_shader;
}
if (material->is_terrain) {
    shader_for_mat = terrain_shader;
}
main_pass.SetShader(shader_for_mat);

main_pass.BindTexture(GL_TEXTURE_2D, "normal_map", material->normal_map);
main_pass.BindTexture(GL_TEXTURE_2D, "albedo_map", material->albedo_map);
main_pass.BindTexture(GL_TEXTURE_2D, "metallic_map", material->metallic_map);
main_pass.BindTexture(GL_TEXTURE_2D, "roughness_map", material->roughness_map);
main_pass.BindTexture(GL_TEXTURE_2D, "ao_map", material->ao_map);

// Material independant shader setup (lights list, shadow map binding,
// environment maps, etc.), elided for space.

// Draw the mesh.
mesh->SetAttribPointersForShader(main_pass.active_shader);
mesh->Draw();

main_pass.End();

The current notion of material is a set of textures and some flags used to determine the correct shader. Needless to say, this is neither flexible nor particularly efficient. Instead, what I’d like is a Material that consists of two bags of data: render states (shader program, blend mode, depth test) and shader uniforms (texture maps).

New Material System, First Pass:

struct Material {
    GLuint shader_program;
    GLenum depth_mask;
    GLenum depth_func;

    struct Uniform {
        const char * name;
        union {
            float value_f[4];
            Texture value_texture;
        };

        enum UniformType {
            Float1,
            Float2,
            Float3,
            Float4,

            TexHandle,
        } type;
    };

    std::vector<Uniform> uniforms;

    // Uniform setup methods elided

    inline void
    BindForRenderPass(RenderPass * pass) {
        pass->SetShader(shader_program);
        glDepthMask(depth_mask);
        glDepthFunc(depth_func);

        for (u32 i = 0; i < uniforms.size(); ++i) {
             Uniform u = uniforms[i];
             switch (u.type) {
                 case Uniform::UniformType::Float1: {
                     pass->SetUniform1f(u.name, u.value_f[0]);
                } break;
                case Uniform::UniformType::Float2: {
                    pass->SetUniform2f(u.name, u.value_f[0], u.value_f[1]);
                } break;
                case Uniform::UniformType::Float3: {
                    pass->SetUniform3f(u.name, u.value_f[0], u.value_f[1], u.value_f[2]);
                } break;
                case Uniform::UniformType::Float4: {
                    pass->SetUniform4f(u.name, u.value_f[0], u.value_f[1], u.value_f[2], u.value_f[3]);
                } break;
                case Uniform::UniformType::TexHandle: {
                    pass->BindTexture(u.name, u.value_texture);
                } break;
            }
        }
    }
};

A good first step. No more branching to select shaders inside the render loop, no hardcoded set of textures. Still doing some silly things, like re-binding the shader for every mesh. Also still looking up uniform locations every time, though there is enough information here to cache those at load time now.

Let’s look at the entire main pass for a bit:

for (u32 i = 0; i < mesh_count; ++i) {     main_pass.Begin();     main_pass.SetRenderTarget(GL_FRAMEBUFFER, &hdr_buffer_rt_ms);     if (!visible_meshes[i]) {         continue;     }     RenderMesh * rmesh = meshes + i;     IndexedMesh * mesh = rmesh->mesh;
    Material * material = rmesh->material;

    Assert(mesh && material, "Cannot render without both mesh and material!");

    material->BindForRenderPass(&main_pass);

    main_pass.SetUniformMatrix44("clip_from_world", clip_from_world);
    main_pass.SetUniformMatrix44("world_from_local", rmesh->world_from_local);
    main_pass.SetUniform3f("view_pos", cam->position);
    main_pass.SetUniform1f("time", current_time_sec);

    main_pass.BindUBO("PointLightDataBlock", point_light_data_ubo);
    main_pass.BindUBO("SpotLightDataBlock", spot_light_data_ubo);
    main_pass.BindUBO("LightList", light_list_ubo);

    main_pass.BindTexture(GL_TEXTURE_CUBE_MAP, "irradiance_map", skybox_cubemaps.irradiance_map);
    main_pass.BindTexture(GL_TEXTURE_CUBE_MAP, "prefilter_map", skybox_cubemaps.prefilter_map);
    main_pass.BindTexture(GL_TEXTURE_2D, "brdf_lut", brdf_lut_texture);

    main_pass.BindTexture("ssao_map", ssao_blur_texture);
    main_pass.SetUniform2f("viewport_dim", hdr_buffer_rt_ms.viewport.z, hdr_buffer_rt_ms.viewport.w);

    main_pass.SetUniform3f("sun_direction", world->sun_direction);
    main_pass.SetUniform3f("sun_color", world->sun_color);
    main_pass.BindTexture("sun_shadow_map", sun_light.cascade.shadow_map);
    for (u32 split = 0; split < NUM_SPLITS; ++split) {
        char buffer[256];
        _snprintf_s(buffer, sizeof(buffer) - 1, "sun_clip_from_world[%d]", split);
        main_pass.SetUniformMatrix44(buffer, sun_light.cascade.sun_clip_from_world[split]);
    }
    mesh->SetAttribPointersForShader(main_pass.active_shader);
    mesh->Draw();
    main_pass.End();
}

There is still a *lot* of uniform setup going on per-mesh, and almost all of it is unnecessary. But, since Material binds the shader each time, all of the other uniforms need to be rebound (because BindForRenderPass() may have bound a different shader).

Ideally, here’s the inner loop we’re aiming for:

for (u32 i = 0; i < mesh_count; ++i) {     RenderMesh * rmesh = meshes + i;     IndexedMesh * mesh = rmesh->mesh;
    Material * material = rmesh->material;
    Assert(mesh && material, "Cannot render without both mesh and material!");

    material->BindForRenderPass(&main_pass);
    main_pass.SetUniformMatrix44("world_from_local", rmesh->world_from_local);
    mesh->SetAttribPointersForShader(main_pass.active_shader);
    mesh->Draw();
}

Material Instances:

When rendering the Sponza test scene, there are a few dozen materials loaded. However, there are only 3 different sets of render states: leaves (a shader doing alpha test and subsurface scattering), alpha tested, and default PBR. Within each class of material the only difference is the texture set.

If we were to separate the set of render states and the set of uniforms into different entities, we’d be able to minimize modifications to the render state in this loop. So that’s what I’ve decided to do.

A Material is a bag of render states, while a MaterialInstance is a bag of uniforms associated with a particular Material. For example, the vases, walls, columns, etc. in Sponza would all be instances of the same Material. If we sort and bucket the mesh list according to Materials, we only need to bind the full render state for each material once. (This is also a convenient point to eliminate culled meshes, removing the visibility check in the main loop.)

At least for now, I’ve done this in the most naive way possible; the list of uniforms is removed from Material and becomes MaterialInstance. Each instance also contains a pointer to its parent material. Done!

This is not a great solution, there are a lot of ways for one to shoot themselves in the foot. For example, a MaterialInstance that doesn’t contain the full set of expected uniforms (will render with stale data), or containing extras (will assert when the uniform bind fails). The Material should probably have “base instance” that defines what set of uniforms are required, and defaults for them; each instance would validate calls against this base. I have not implemented this yet.

Here’s where we end up with MaterialInstance:

BucketedRenderList render_list = MakeBucketedRenderList(meshes, mesh_count, visible_meshes);

for (RenderBucket bucket : render_list.buckets) {
    main_pass.Begin();
    main_pass.SetRenderTarget(GL_FRAMEBUFFER, &hdr_buffer_rt_ms);

    main_pass.BindMaterial(bucket.material);

    // Additional global setup, as above.

    for (u32 i = bucket.start; i < bucket.end; ++i) {
        assert(i < render_list.mesh_count);
        RenderMesh * rmesh = &render_list.mesh_list[i];
        IndexedMesh * mesh = rmesh->mesh;
        MaterialInstance * material_instance = rmesh->material_instance;

        RenderPass main_subpass = main_pass.BeginSubPass();

        main_subpass.BindMaterialInstance(material_instance);
        main_subpass.SetUniformMatrix44("world_from_local", rmesh->world_from_local);
        mesh->SetAttribPointersForShader(main_subpass.active_shader);
        mesh->Draw();

        main_pass.EndSubPass(main_subpass);
    }

    main_pass.End();
}

(Material binding has been moved into RenderPass, which is in hindsight a more natural place for it to live.)

By carving the mesh list up into buckets by Material, we only need to do the global setup once for each bucket, and then only set the specific instance materials per-mesh. Even better, the check for culled objects disappears. And this list is reusable between the depth prepass and main render pass.

Still a lot of room for performance improvement: the RenderMesh struct causes a lot of unnecessary pointer chasing, meshes using the same instance of a material could be batched as well, there are *sprintf* calls in the outer loop. It’s pretty clear I need to spend a lot more time here.

However, this is progress! More importantly, Materials are now general enough I can implement the terrain materials. So that’s next.

hikari – Terrain Mesh Generation

Continuing on from the last post, I have implemented the basic mesh generation for heightmapped terrain. This was probably the easiest part, but did require some cleanup of existing code first.

Mesh Handling Cleanup:

Up to this point, meshes have not been dealt with in a systematic manner in hikari. The Mesh struct contained buffer handles and an index count and that was it. All meshes were assumed to be triangle lists. This was workable, because all object meshes had a common vertex layout, and rendered with variants of a single shader with fixed attribute indices. Rendering a mesh was easy enough:

glBindVertexArray(mesh->vao);
glBindBuffer(GL_ARRAY_BUFFER, mesh->vertex_buffer);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, mesh->index_buffer);

glEnableVertexAttribArray(0);
glEnableVertexAttribArray(1);
glEnableVertexAttribArray(2);
glEnableVertexAttribArray(3);

glDrawElements(GL_TRIANGLES, mesh->index_count, GL_UNSIGNED_INT, NULL);

glDisableVertexAttribArray(3);
glDisableVertexAttribArray(2);
glDisableVertexAttribArray(1);
glDisableVertexAttribArray(0);

There were already other meshes with different layouts, but they were special cases: a quad used for full-screen render passes (with only position and uv), a cube used to render the skybox (position only), etc.

Rather than hack around the limitations here (such as using separate render pass for terrain), I decided a mesh should know its vertex layout and be able to do the attribute setup itself.

Enter VertexBufferLayout:

struct VertexBufferLayout {
    struct Property {
        char * name;
        u32 size;
        GLenum type;
        bool normalized;
        u32 offset;
    };
    
    u32 stride;
    std::vector<Property> props;

    u32 GetPropertySizeBytes(Property p) {
        static const u32 TypeToByteTable[] {
            GL_BYTE, 1,
            GL_UNSIGNED_BYTE, 1,
            GL_SHORT, 2,
            GL_UNSIGNED_SHORT, 2,
            GL_INT, 4,
            GL_UNSIGNED_INT, 4,
            GL_HALF_FLOAT, 2,
            GL_FLOAT, 4,
            GL_DOUBLE, 8,
        };

        for (u32 i = 0; i < array_count(TypeToByteTable); i += 2) {
            if (TypeToByteTable[i] == p.type) {
                u32 byte_size = TypeToByteTable[i + 1];
                return byte_size * p.size;
            }
        }
        return 0;
    }
    
    void AddProperty(char * name, u32 size, GLenum type, bool normalized) {
        Property p;
        p.name = name;
        p.size = size;
        p.type = type;
        p.normalized = normalized;
        p.offset = stride;
        u32 byte_size = GetPropertySizeBytes(p);
        if (byte_size > 0) {
            stride += byte_size;
            props.push_back(p);
        }
        else {
            Assert(false, "Invalid vertex buffer property type.");
        }
    }
};

No real surprises here, just stuffing everything into an array, with a bit of extra bookkeeping to prevent errors in calculating stride and offset pointers. This isn’t a general solution, it makes the assumption that vertex attributes are always stored interleaved (not necessarily true), and does not permit a base offset (for storing multiple meshes in one vertex buffer). But it works, and there’s little point buying more complexity than needed.

Meshes each store a pointer to a layout. Prior to rendering, the renderer passes the shader handle to the mesh, which then uses the handle plus its layout data to determine attribute indices in the shader and bind the pointer to the buffer:

void
SetAttribPointersForShader(GLuint shader_program) {
    Assert(layout != NULL && attrib_indices.size() == layout->props.size(),
            "Invalid vertex buffer layout.");

    glBindBuffer(GL_ARRAY_BUFFER, vertex_buffer);

    for (u32 i = 0; i < attrib_indices.size(); ++i) {
        VertexBufferLayout::Property p = layout->props[i];
        attrib_indices[i] = glGetAttribLocation(shader_program, p.name);
        if (attrib_indices[i] != -1) {
            glEnableVertexAttribArray(attrib_indices[i]);
            glVertexAttribPointer(attrib_indices[i], p.size, p.type, p.normalized, layout->stride, (void *)p.offset);
            glDisableVertexAttribArray(attrib_indices[i]);
        }
    }
    
    glBindBuffer(GL_ARRAY_BUFFER, 0);
}

(Ideally we’d like to cache this — the index lookup should only need to be run once per layout/shader combination, not per-mesh.)

Now we have the list of attribute indices to enable, so we can render very similarly to how we did before.

This still leaves a lot to be desired, but it is enough to unblock the terrain work, so let’s get back on track.

Terrain Mesh Generation:

After all that, generating the terrain mesh itself seems almost trivial. Here is the entire loop for building the vertex positions:

u32 vertex_count = x_count*y_count;
TerrainVertex * verts = (TerrainVertex *)calloc(vertex_count, sizeof(TerrainVertex));

float xo = x_count * 0.5f;
float yo = y_count * 0.5f;\
Vector3 bounds_min(FLT_MAX, FLT_MAX, FLT_MAX);
Vector3 bounds_max = -bounds_min;

for (u32 y = 0; y < y_count; ++y) {
    for (u32 x = 0; x < x_count; ++x) {
        float map_val = map->ReadPixel(x_off + x, y_off + y);

        Vector3 pos = Vector3(
            x - xo,
            map_val, 
            y - yo);

        // Scale up from pixel to world space.
        pos *= chunk_meters_per_pixel;

        verts[y * x_count + x].position = pos;

        bounds_min = MinV(bounds_min, pos);
        bounds_max = MaxV(bounds_max, pos);
    }
}

This is a regular grid of squares in the XZ plane, with the Y coordinate taken directly from the heightmap. Next we build an index buffer (in the most naive manner), and calculate surface normals from the resulting triangles.

All of this is pretty straightforward, but there are a few subtle issues that come up:

– Triangulation. Unlike triangles, the vertices of a quad do not necessarily all fall in one plane. This is in fact *usually* the case with terrain. So how we decide to split that quad into triangles has an effect on the final shape:

quad_triangulation_illust

Currently, I split all quads uniformly. Another option I explored is switching the split direction in a checkerboard fashion. With the noise-based heightmap I’m testing with, the difference is negligible. This may prove more of an issue when producing sparser, low-LOD meshes for distant terrain. It also would *absolutely* matter if the terrain was flat-shaded (i.e., constant normal for each face), but that isn’t the case here.

– Calculating vertex normals. I calculate normals for this terrain mesh the same as any other mesh: loop through the faces and add face normals to each vertex, then normalize in another pass:

for (u32 i = 0; i < index_buffer_count; i += 3) {
    u32 idx_a = indices[i + 0];
    u32 idx_b = indices[i + 1];
    u32 idx_c = indices[i + 2];

    Vector3 a = verts[idx_a].position;
    Vector3 b = verts[idx_b].position;
    Vector3 c = verts[idx_c].position;

    Vector3 normal = Cross(a - b, a - c);

    verts[idx_a].normal += normal;
    verts[idx_b].normal += normal;
    verts[idx_c].normal += normal;
}

// We sum the unnormalized normals for each triangle, and then normalize
// the sum.  Since the cross product is scaled by the area of the triangle,
// this means that larger triangles contribute more heavily to the resulting
// vertex normal.
for (u32 i = 0; i < vertex_count; ++i) {
    verts[i].normal = Normalize(verts[i].normal);
}

So far so good, but there is a crucial problem here.

We iterate the faces of the mesh. But some of the points in question actually belong to several meshes! When we’re at the edge of the terrain chunk, we only account for the face normals of the triangles inside the current chunk. Which means the same point, on two different chunks, may produce a different normal.

And so, we get seams:

normal_seams

This is not necessarily easy to fix. The simplest option would be to iterate over neighboring chunks, find the shared vertices, and average the normals — not 100% correct, but it eliminates the seam. It’s unclear, though, how this would interact with streaming: a new chunk being loaded might change the normals on a mesh that’s been loaded for a while; should we read that mesh data back from the GPU? Keep a CPU copy and update? In addition, one of the design goals is for the terrain system to be agnostic about the source of meshes. For a well-structured grid finding the adjacent vertex is simple, but how does that work with an arbitrary hand-produced mesh?

Ultimately, there’s also the question of whether this seam in the normals actually matters. Obviously it *is* a bug, but it is only so glaringly visible because of the shiny plastic material I’m using to debug. I’m not sure how many of these artifacts will remain visible when applying actual terrain materials.

I’m counting this as a bug but I’m going to leave it alone for now.

– Alternate tiling strategies. I started with laying a square grid over the world. Why squares? There are other good tiling arrangements. In fact, we can tile the plane with *just* triangles, and avoid the question of how to triangulate our tiles altogether.

It turns out this is *really* easy to do:

pos.x += (y & 1) * 0.5f;  // Offset odd rows by half a unit
pos.z *= sqrtf(3.0f) / 2.0f; // Scale vertically by sin(pi/3)

Make sure the triangles cut across the short diagonal of that skewed grid, and now the plane is tiled with equilateral triangles.

There isn’t a huge amount of difference with my noise terrain (just a scaling on the Z axis):

This slideshow requires JavaScript.

However, tiling square chunks with equilateral triangles has a sticking point when it comes to downsampling the terrain for distant LOD meshes. The sawtooth edge can’t be merged into larger triangles without either breaking the tiling pattern or leaving gaps or overlap between chunks.

With that said, I feel like the triangle tiling has some really nice properties. No triangulation needed, and no ambiguity on the final mesh’s shape. It can still be represented easily in an image map (although, we’re pushing the limits on that). Grids make it easy to create 90 degree angles, which stick out, while a triangle tiling makes 60 degree angles which are less obviously objectionable. (This one is decidedly a matter of taste, but: my engine, my taste.)

Things get *really* interesting when we decide chunks don’t need to be rectangular. A hexagonal chunk tiles perfectly with equilateral triangles, which means the LOD reduction works fine as well. Using hexes for chunks also fits into the half-offset tiling described by Alan Wolfe here. (In addition to the memory savings he mentions in the post, the hex tiling also means a constant streaming cost as you move about the world, which is a very nice property to have.)

I haven’t implemented hex tiling, and this post is getting quite long as it is. I do think it is an avenue worth exploring, and will probably return to this after fixing up the material system.

Until next time!

hikari – Designing a Terrain System

The next big addition to hikari is going to be a system for rendering terrain. I’ve done a few weeks of research and wanted to write up my plans before implementing. Then, once it’s complete, I can review what did and did not work.

That’s the idea, anyway.

Source Format:

I decided to use simple heightmap-based terrain. Starting with a regular grid of points, offset each vertically based on a grayscale value stored in an image. This is not the only (or best) choice, but it is the simplest to implement as a programmer with no art team at hand. Other options include marching-cubes voxels, constructive solid geometry, and hand-authored meshes.

However, I’m consciously trying to avoid depending on the specific source of the mesh in the rest of the pipeline. This would allow any of (or even a mix of) these sources to work interchangeably. We’ll see how realistic that expectation is in practice.

Chunking:

Terrain meshes can be quite large, especially in the case of an open-world game where the map may cover tens of kilometers in any direction, while still maintaining detail up close. Working with such a large mesh is unwieldy, so let’s not.

Instead, we cut the terrain up into chunks. For example, if our world map is 4km x 4km, we could cut it up into 1600 separate 100m x 100m chunks. Now we don’t need to keep the entire map in memory, only the chunks that we might need to render in the near future (e.g, chunks around the camera.)

There are additional benefits to chunking the map; for example, the ability to frustum cull the chunks. Chunks provide a convenient granularity for applying level-of-detail optimizations as well.

Materials:

The mesh is only part of our terrain, of course. It needs texture and color from materials: grass, rock, snow, etc. Importantly, we don’t necessarily want to texture every part of the terrain with the same material. One solution would be to assign a single material to each chunk, and have an artist paint a unique texture.

This seems to be a non-starter to me — potentially thousands of textures (of high resolution) would need to be created and stored, not to mention streamed into and out of memory alongside their mesh chunks. In addition, producing a distortion-free UV-mapping for terrain is difficult. Finally, while we don’t want to texture everything the same, there are only a limited set of materials we want to use, and it’d be nice if we could reuse the same textures.

Enter splat maps. The basic idea is straightforward: assign a color to the terrain. The components of that color determine the blend weights of a set of materials. If the red channel weights a rock texture and the green channel weights a grass texture, ‘yellow’ terrain is a blend between rock and grass. This allows us to smoothly blend between 5 different materials.

Wait, 5? How?

A color has four components, red, green, blue, and alpha. However, if we assume the material weights must add to equal 1.0, we can construct a new weight that is (1.0 – (r + g + b + a)). This becomes the weight of our 5th material. (Of course, we need to actually enforce these constraints in art production, but that is a separate issue.)

Furthermore, since chunks of the terrain get rendered in their own draw call, there’s no reason we can’t swap the materials used in each chunk. So while the number of materials in a specific chunk is limited, the total number of materials is unlimited (well, close enough).

The name “splat map” may imply using a texture to assign this color (and, for a heightmap terrain, that’s the easiest way to store it on disk), but I actually want to use vertex colors. This puts heightmaps and authored meshes on the same footing, and keeps decisions about mesh source in the mesh loading where it belongs.

In order to do texturing, we need texture coordinates. The obvious way to do this is assign a UV grid to our heightmap mesh. This will look fine for smooth, subtle terrain features, but the more drastic the slope the more distorted the texturing will be.

Instead, we can use so-called triplanar mapping to generate texture coordinates. Project the world position to each of the XY, YZ, and ZX planes, and use that value as the texture coordinate for three different lookups. Then, blend between them based on the vertex normal.

As an added bonus, there’s no requirement that the various axes in triplanar mapping use the same material. By mixing materials, we can create effects such as smooth grass that fades to patches of dirt on slopes.

Of course, there is a cost to all of this. Triplanar mapping requires doing 3x the material samples. Splat mapping requires doing up to 5x the samples. Together, it’s up to 15x the material samples, and we haven’t even begun to talk about the number of individual textures in each material. Needless to say, rendering terrain in this fashion is going to read a *lot* of texture samples per-pixel.

That said, hikari already uses a depth prepass to avoid wasted shading on opaque geometry, so I suspect that would mitigate the worst of the cost. In addition, it’s easy enough to write another shader for more distant chunks that only uses the 2-3 highest weights out of the splat where that’s likely to be an unnoticeable difference.

Plumbing:

There are a bunch of assumptions that hikari makes regarding meshes and materials that are a bit at odds with what I’ve mentioned above. The renderer is simple enough that I can work around these assumptions for special cases. However, I will need to be on the look out for opportunities to make some of the rendering systems (or, at least, the exposed API) more general.

Some examples: at the moment, meshes are assumed to use a specific common vertex format (position, UV, normal, tangent) — terrain meshes will not (no need for UV and tangent, but we need the splat color.) Meshes are expected to use only one material, and materials do not pack any texture maps together (so, for example, baked AO and roughness are separate maps). Neither of these will play nicely with terrain, but should be simple enough to fix. “Materials” are currently just a bag of textures and a handful of uniform parameters, since at the moment there are only two mesh shaders in the engine. A more generic material model that ties directly into its associated shader would be a useful thing to fix.

So, there’s the plan of attack. We’ll see how well it stacks up. I’ll be documenting the implementation of each part of the system in a series of blog posts, capped off by a final report on what worked and where I got it totally wrong.

hikari – Implementing Single-Scattering

Over the past few days, I’ve been implementing a single-scattering volumetric fog effect in hikari, my OpenGL PBR renderer.

I started by more-or-less copying Alexandre Pestana’s implementation (http://www.alexandre-pestana.com/volumetric-lights/), porting it to GLSL and fitting it into my existing render pipeline.

However, there’s an issue: by not taking into account the transmittance along the ray, instead taking an average of all scattered samples, the effect is far more severe than it should be. In addition, it blends really poorly with the scene (using additive blending after the lighting pass.)

Bart Wronski describes a more physically-correct model in his presentation on AC4’s volumetric fog (https://bartwronski.com/publications/). I did not go the full route of using a voxel grid (although I may in the future; this renderer uses clustered shading so that’s a natural extension). However, Wronski presents the correct integration to account for transmittance, and we can apply that to the post-effect approach as well.

Background out of the way, I wanted an implementation that was:
– Minimally invasive. If possible, as simple as adding a new render pass and blend. No compute shaders or large data structures.
– Tweakable. Light dust to thick fog.
– Plausible. I don’t need absolute physical accuracy, but something that fits well in the scene and looks “right”.

The core loop is (relatively) simple:

// Accumulated in-scatter.
vec3 inscatter_color = vec3(0.0);
// Accumulated density.
float total_density = 0.0;
for (int i = 0; i < STEP_COUNT; ++i) {
    // Sample sun shadow map.
    vec3 sun_clip_position = GetClipSpacePosition(current_position, sun_clip_from_view);
    float shadow_factor = SampleShadowPoint(sun_shadow_map, sun_clip_position.xy, sun_clip_position.z);

    // Calculate total density over step.
    float step_density = density * step_distance;

    // Calculate transmittance of this sample (based on total density accumulated so far).
    float transmittance = min(exp(-total_density), 1.0);

    // Sun light scatter.
    inscatter_color += sun_color * shadow_factor * sun_scatter_amount * step_density * transmittance;

    // Ambient scatter.
    inscatter_color += ambient_light_color * ambient_scatter_amount * step_density * transmittance;

    // Accumulate density.
    total_density += step_density;

    // Step forward.
    current_position += step_vector;
}

The raymarch operates in view space. This was mostly a matter of convenience, but also means we maintain better precision close to the camera, even if we happen to be far away from the world origin.

Currently, I only compute scattering for sunlight and a constant ambient term. (Coming from directly up.) If you have ambient probes stored as spherical harmonics, you can compute a much better ambient term — see Wronski’s slides for details. Other light types require recomputing the scatter amount per-sample, as it depends on the angle between the light and view.

Note the scaling by transmittance. The light scattered by a sample has to travel through all of the particles we’ve seen to this point, and therefore it is attenuated by the scattering and absorption.

Density is (very roughly) a measure of particle density per unit length. (I don’t 100% understand the physical basis here, but it is dependent on both particle size and number.) The implementation currently uses a completely uniform value for density, but this could easily be sampled from a texture map (artist painted, noise, particle system, etc.) In practice, good-looking density values tend to be very low:

[0.00015]
density00015

[0.00025]
density00025

[0.0025]
density0025

[0.005]
density005
(g = -0.85)

You can smooth the results of higher densities by using more samples (or blurring the result.)

The other control parameter for the scattering is g. This is used in the phase function to compute the amount of light scattered towards the viewer. The valid range for g is [-1, 1]. In general, most atmospheric particles will have a negative g value, that increases in magnitude as the particle size increases. Some good values to start with are between -0.75 and -0.99.

[-0.75]
g-75

[-0.95]
g-95
(density = 0.00025)

At low sample counts, the result is highly susceptible to banding artifacts. To combat this, we jitter the starting position by moving some fraction of a step in the direction of the ray. (This is the same approach used by Pestana.) In order to keep the noise unobjectionable, we need to use either an even pattern like a Bayer matrix, or blue noise. I’ve also experimented with dithering the step distance as well, such that neighboring pixels use different step sizes. However, this does not seem to produce much benefit over just jittering the start position; the resulting noise is more noticeable.

[no jitter]
density0025nojitter
[jitter]
density0025
(density = 0.0025, g = -0.85)

The outputs of this shader are the accumulated scattered light (in RGB) and the final transmittance amount (in alpha). I then perform a bilateral upscale (directly ported from Pestana’s HLSL version) and blend it with the lighting buffer using glBlendFunc(GL_ONE, GL_SRC_ALPHA). The transmittance is the amount of light that reaches the viewer from the far end of the ray, so this is accurate.

For higher densities or more uniform scattering (g closer to 0), we may want to perform a depth-aware blur on the scatter buffer before upscaling. Since I didn’t have an immediate need for really thick scatter, I have not implemented this yet.

The scattering should be added before doing the luminance calculation and bloom. This helps to (somewhat) mitigate the darkening effect of unlit areas, as well as further smoothing the edges of light shafts.

There’s still a lot to be done here: supporting additional lights, variable density, etc. But even the basic implementation adds a lot when used subtly.

2D Rotations, the right way.

This is simple trigonometry, and won’t be a surprise to most of you. But it was a surprise to me. So, in case it helps anyone else…

Let’s say we have some sprite, that we want to rotate to face some other point. We can get the facing vector rather trivially: just subtract the two point vectors.

But we don’t want the facing vector, we want to rotate towards it. Well, we can get the angle with atan2(), and…

This, right here, is the error.

Let’s think about what we actually need to rotate a vector by an angle theta:

x' = x * cos(theta) - y * sin(theta);
y' = x * sin(theta) + y * cos(theta);

or, in matrix form:

[  cos(theta) -sin(theta)  ]
[  sin(theta)  cos(theta)  ]

Note that we only ever use the angle to find its sine and cosine. We don’t really *care* what theta is, it’s not relevant to the rotation. But how can we find sine and cosine without the angle?

Well, we have a vector. Now, if we think back to the very beginning of trigonometry, you’ll probably remember a mnemonic: SOH CAH TOA.

Our vector forms a right triangle by casting a vertical line to the x-axis. Therefore:

sine   = opposite / hypotenuse
cosine = adjacent / hypotenuse

or:

sine   = y / length
cosine = x / length

If our facing vector is normalized, the length terms fall out and we can just use the x and y components as sine and cosine directly. So:

x' = x * facing_x - y * facing_y;
y' = x * facing_y + y * facing_x;

As long as we know (or can easily find) the facing vector we want, no transcendentals are involved in calculating the rotation matrix. At worst, we need a square root to normalize the facing vector. In the (very rare, now) case that we *do* want to set facing with an angle, we can simply call sin/cos there. (We lose the ability to do sin/cos with 4-wide or 8-wide SIMD when batch-calculating transforms, but since we’ll be calling it a lot less that is a net win.)

So there we go. It’s easy to settle for angles in 2D, where they almost, kind of, work. But without, there are fewer transcendentals, less concern about range, everything is simpler. And no kittens murdered.

Orthographic LODs – Part II

Last time, we got the basic renderer up and running. Now we have more complex issues to deal with. For example, how to light the LOD models.

Lighting

One approach to lighting is to simply light the detailed model, and bake that lighting into the LOD texture. This works, but requires that all lights be static. We can do better.

The standard shading equation has three parts: ambient light, diffuse light, and specular light:
K_a=AT
K_d=DT(N \cdot L)
K_s=S(N \cdot H)^m (N \cdot L > 0)
(For details of how these equations work, see Wikipedia.)

The key thing to notice is that these equations rely only on the texture (which we already have), a unit direction towards the light source, a unit direction towards the camera, and the unit surface normal. By rendering our detailed model down to a texture, we’ve destroyed the normal data. But that’s easy enough to fix: we write it to a texture as well.

(Setup for this texture is more-or-less the same as the color texture, we just bind it to a different output.)

// Detail vertex shader
in vec3 position;
in vec3 normal;

out vec3 Normal;

void main() {
    gl_Position = position;
    Normal = normal;
}

// Detail fragment shader

in vec3 Normal;

out vec4 frag_color;
out vec4 normal_map_color;

void main() {
    vec3 normal = (normalize(Normal) + vec3(1.0)) / 2.0;

    frag_color = vec4(1.0);
    normal_map_color = vec4(normal, 1.0);
}

A unit normal is simply a vector of 3 floats between -1 and 1. We can map this to a RGB color (vec3 clamped to [0, 1]) by adding one and dividing by two. This is, again, nothing new. You may be wondering why we store an alpha channel — we’ll get to that later.

We can now reconstruct the normal from the LOD texture. This is all we need for directional lights:

// Fragment shader
uniform sampler2D color_texture;
uniform sampler2D normal_map_texture;

uniform vec3 sun_direction;
uniform vec3 camera_direction;

in vec2 uv;

out vec4 frag_color;

void main() {
    vec4 normal_sample = texture(normal_map_texture, uv);
    vec3 normal = (normal.xyz * 2.0) - vec3(1.0);
    vec3 view = -1.0 * camera_direction;
    vec3 view_half = (normal + view) / length(normal + view);

    float n_dot_l = dot(normal, sun_direction);
    float n_dot_h = dot(normal, view_half);

    float ambient_intensity = 0.25;
    float diffuse_intensity = 0.5;
    float specular_intensity = 0.0;
    if (n_dot_l > 0) {
        specular_intensity = 0.25 * pow(n_dot_h, 10);
    }

    vec4 texture_sample = texture(color_texture, uv);

    vec3 lit_color = ambient_intensity * texture_sample.rgb +
                     diffuse_intensity * texture_sample.rgb +
                     specular_intensity * vec3(1.0);

    frag_color = vec4(lit_color, texture_sample.a);
}

Result:
lod_test_06

Point / Spot Lighting

So this solves directional lights. However, point and spot lights have attenuation — they fall off over distance. Additionally, they have an actual spatial position, which means the light vector will change.

After rendering the LODs, we no longer have the vertex positions of our detail model. Calculating lighting based on the LOD model’s vertices will, in many cases, be obviously wrong. How can we reconstruct the original points?

By cheating.

Let’s look at what we have to work with:

The camera is orthographic, which means that depth of a pixel is independent of where it is on screen. We have the camera’s forward vector (camera_direction in the above shader).

Finally, and most obviously, we don’t care about shading the points that weren’t rendered to our LOD texture.

This turns out to be the important factor. If we knew how far each pixel was from the camera when rendered, we could reconstruct the original location for lighting.

To put it another way, we need a depth buffer. Remember that normal map alpha value we didn’t use?

We can get the depth value like so:

// Map depth from [near, far] to [-1, 1]
float depth = (2.0 * gl_FragCoord.z - gl_DepthRange.near - gl_DepthRange.far) / (gl_DepthRange.far - gl_DepthRange.near);

// Remap normal and depth from [-1, 1] to [0, 1]
normal_depth_color = (vec4(normalize(normal_vec), depth) + vec4(1.0)) / 2.0;

However, we now run into a problem. This depth value is in the LOD render’s window space. We want the position in the final render’s camera space — this way we can transform the light position to the same space, which allows us to do the attenuation calculations.

There are a few ways to handle this. One is to record the depth range for our LOD model after rendering it, and then use that to scale the depth properly. This is complex (asymmetrical models can have different depth sizes for each rotation) and quite likely inconsistent, due to precision errors.

A simpler solution, then, is to pick an arbitrary depth range, and use it consistently for both the main render and LODs. This is not ideal — one getting out of sync with the other may lead to subtle errors. In a production design, it may be a good idea to record the camera matrix used in the model data, so that it can be confirmed on load.

We need two further pieces of data to proceed: the viewport coordinates (the size of the window) and the inverse projection matrix (to “unproject” the position back into camera space). Both of these are fairly trivial to provide. To reconstruct the proper position, then:

vec3 window_to_camera_space(vec3 window_space) {
    // viewport = vec4(x, y, width, height)

    // Because projection is orthographic, NDC == clip space
    vec2 clip_xy = ((2.0 * window_space.xy) - (2.0 * viewport.xy)) / (viewport.zw) - 1.0;
    // Already mapped to [-1, 1] by the same transform that extracts our normal.
    float clip_z = window_space.z

    vec4 clip_space = vec4(clip_xy, clip_z, 1.0);

    vec4 camera_space = camera_from_clip * clip_space;
    return camera_space.xyz;
}

(Note that this is rather inefficient. We *know* what a projection matrix looks like, so that last multiply can be made a lot faster.)

Calculating the light color is the same as for directional lights. For point lights, we then divide by an attenuation factor:

float attenuation = point_light_attenuation.x +
                    point_light_attenuation.y * light_distance +
                    point_light_attenuation.z * light_distance * light_distance;

point_light_color = point_light_color / attenuation;

This uses 3 factors: constant, linear, and quadratic. The constant factor determines the base intensity of the light (a fractional constant brightens the light, a constant greater than 1.0 darkens it.) The linear and quadratic factors control the fall-off curve.

A spot light has the same attenuation factors, but also has another term:

float spot_intensity = pow(max(dot(-1.0 * spot_direction, light_direction), 0.0), spot_exponent);

The dot product here limits the light to a cone around the spotlight’s facing direction. The exponential factor controls the “tightness” of the light cone — as the exponent increases, the fall-off becomes much sharper (remember that the dot product of unit vectors produces a value in [0, 1]).

So that covers lighting. Next time: exploring options for shadows.

lod_test_09

NOTE:

I made some errors with terminology in my last post. After the projection transform is applied, vertices are in *clip* space, not eye space. The series of transforms is as follows:

Model -> World -> Camera -> Clip -> NDC
(With an orthographic projection, Clip == NDC.)