hikari – Implementing Single-Scattering

Over the past few days, I’ve been implementing a single-scattering volumetric fog effect in hikari, my OpenGL PBR renderer.

I started by more-or-less copying Alexandre Pestana’s implementation (http://www.alexandre-pestana.com/volumetric-lights/), porting it to GLSL and fitting it into my existing render pipeline.

However, there’s an issue: by not taking into account the transmittance along the ray, instead taking an average of all scattered samples, the effect is far more severe than it should be. In addition, it blends really poorly with the scene (using additive blending after the lighting pass.)

Bart Wronski describes a more physically-correct model in his presentation on AC4’s volumetric fog (https://bartwronski.com/publications/). I did not go the full route of using a voxel grid (although I may in the future; this renderer uses clustered shading so that’s a natural extension). However, Wronski presents the correct integration to account for transmittance, and we can apply that to the post-effect approach as well.

Background out of the way, I wanted an implementation that was:
– Minimally invasive. If possible, as simple as adding a new render pass and blend. No compute shaders or large data structures.
– Tweakable. Light dust to thick fog.
– Plausible. I don’t need absolute physical accuracy, but something that fits well in the scene and looks “right”.

The core loop is (relatively) simple:

// Accumulated in-scatter.
vec3 inscatter_color = vec3(0.0);
// Accumulated density.
float total_density = 0.0;
for (int i = 0; i < STEP_COUNT; ++i) {
    // Sample sun shadow map.
    vec3 sun_clip_position = GetClipSpacePosition(current_position, sun_clip_from_view);
    float shadow_factor = SampleShadowPoint(sun_shadow_map, sun_clip_position.xy, sun_clip_position.z);

    // Calculate total density over step.
    float step_density = density * step_distance;

    // Calculate transmittance of this sample (based on total density accumulated so far).
    float transmittance = min(exp(-total_density), 1.0);

    // Sun light scatter.
    inscatter_color += sun_color * shadow_factor * sun_scatter_amount * step_density * transmittance;

    // Ambient scatter.
    inscatter_color += ambient_light_color * ambient_scatter_amount * step_density * transmittance;

    // Accumulate density.
    total_density += step_density;

    // Step forward.
    current_position += step_vector;

The raymarch operates in view space. This was mostly a matter of convenience, but also means we maintain better precision close to the camera, even if we happen to be far away from the world origin.

Currently, I only compute scattering for sunlight and a constant ambient term. (Coming from directly up.) If you have ambient probes stored as spherical harmonics, you can compute a much better ambient term — see Wronski’s slides for details. Other light types require recomputing the scatter amount per-sample, as it depends on the angle between the light and view.

Note the scaling by transmittance. The light scattered by a sample has to travel through all of the particles we’ve seen to this point, and therefore it is attenuated by the scattering and absorption.

Density is (very roughly) a measure of particle density per unit length. (I don’t 100% understand the physical basis here, but it is dependent on both particle size and number.) The implementation currently uses a completely uniform value for density, but this could easily be sampled from a texture map (artist painted, noise, particle system, etc.) In practice, good-looking density values tend to be very low:




(g = -0.85)

You can smooth the results of higher densities by using more samples (or blurring the result.)

The other control parameter for the scattering is g. This is used in the phase function to compute the amount of light scattered towards the viewer. The valid range for g is [-1, 1]. In general, most atmospheric particles will have a negative g value, that increases in magnitude as the particle size increases. Some good values to start with are between -0.75 and -0.99.


(density = 0.00025)

At low sample counts, the result is highly susceptible to banding artifacts. To combat this, we jitter the starting position by moving some fraction of a step in the direction of the ray. (This is the same approach used by Pestana.) In order to keep the noise unobjectionable, we need to use either an even pattern like a Bayer matrix, or blue noise. I’ve also experimented with dithering the step distance as well, such that neighboring pixels use different step sizes. However, this does not seem to produce much benefit over just jittering the start position; the resulting noise is more noticeable.

[no jitter]
(density = 0.0025, g = -0.85)

The outputs of this shader are the accumulated scattered light (in RGB) and the final transmittance amount (in alpha). I then perform a bilateral upscale (directly ported from Pestana’s HLSL version) and blend it with the lighting buffer using glBlendFunc(GL_ONE, GL_SRC_ALPHA). The transmittance is the amount of light that reaches the viewer from the far end of the ray, so this is accurate.

For higher densities or more uniform scattering (g closer to 0), we may want to perform a depth-aware blur on the scatter buffer before upscaling. Since I didn’t have an immediate need for really thick scatter, I have not implemented this yet.

The scattering should be added before doing the luminance calculation and bloom. This helps to (somewhat) mitigate the darkening effect of unlit areas, as well as further smoothing the edges of light shafts.

There’s still a lot to be done here: supporting additional lights, variable density, etc. But even the basic implementation adds a lot when used subtly.

2D Rotations, the right way.

This is simple trigonometry, and won’t be a surprise to most of you. But it was a surprise to me. So, in case it helps anyone else…

Let’s say we have some sprite, that we want to rotate to face some other point. We can get the facing vector rather trivially: just subtract the two point vectors.

But we don’t want the facing vector, we want to rotate towards it. Well, we can get the angle with atan2(), and…

This, right here, is the error.

Let’s think about what we actually need to rotate a vector by an angle theta:

x' = x * cos(theta) - y * sin(theta);
y' = x * sin(theta) + y * cos(theta);

or, in matrix form:

[  cos(theta) -sin(theta)  ]
[  sin(theta)  cos(theta)  ]

Note that we only ever use the angle to find its sine and cosine. We don’t really *care* what theta is, it’s not relevant to the rotation. But how can we find sine and cosine without the angle?

Well, we have a vector. Now, if we think back to the very beginning of trigonometry, you’ll probably remember a mnemonic: SOH CAH TOA.

Our vector forms a right triangle by casting a vertical line to the x-axis. Therefore:

sine   = opposite / hypotenuse
cosine = adjacent / hypotenuse


sine   = y / length
cosine = x / length

If our facing vector is normalized, the length terms fall out and we can just use the x and y components as sine and cosine directly. So:

x' = x * facing_x - y * facing_y;
y' = x * facing_y + y * facing_x;

As long as we know (or can easily find) the facing vector we want, no transcendentals are involved in calculating the rotation matrix. At worst, we need a square root to normalize the facing vector. In the (very rare, now) case that we *do* want to set facing with an angle, we can simply call sin/cos there. (We lose the ability to do sin/cos with 4-wide or 8-wide SIMD when batch-calculating transforms, but since we’ll be calling it a lot less that is a net win.)

So there we go. It’s easy to settle for angles in 2D, where they almost, kind of, work. But without, there are fewer transcendentals, less concern about range, everything is simpler. And no kittens murdered.

Orthographic LODs – Part II

Last time, we got the basic renderer up and running. Now we have more complex issues to deal with. For example, how to light the LOD models.


One approach to lighting is to simply light the detailed model, and bake that lighting into the LOD texture. This works, but requires that all lights be static. We can do better.

The standard shading equation has three parts: ambient light, diffuse light, and specular light:
K_d=DT(N \cdot L)
K_s=S(N \cdot H)^m (N \cdot L > 0)
(For details of how these equations work, see Wikipedia.)

The key thing to notice is that these equations rely only on the texture (which we already have), a unit direction towards the light source, a unit direction towards the camera, and the unit surface normal. By rendering our detailed model down to a texture, we’ve destroyed the normal data. But that’s easy enough to fix: we write it to a texture as well.

(Setup for this texture is more-or-less the same as the color texture, we just bind it to a different output.)

// Detail vertex shader
in vec3 position;
in vec3 normal;

out vec3 Normal;

void main() {
    gl_Position = position;
    Normal = normal;

// Detail fragment shader

in vec3 Normal;

out vec4 frag_color;
out vec4 normal_map_color;

void main() {
    vec3 normal = (normalize(Normal) + vec3(1.0)) / 2.0;

    frag_color = vec4(1.0);
    normal_map_color = vec4(normal, 1.0);

A unit normal is simply a vector of 3 floats between -1 and 1. We can map this to a RGB color (vec3 clamped to [0, 1]) by adding one and dividing by two. This is, again, nothing new. You may be wondering why we store an alpha channel — we’ll get to that later.

We can now reconstruct the normal from the LOD texture. This is all we need for directional lights:

// Fragment shader
uniform sampler2D color_texture;
uniform sampler2D normal_map_texture;

uniform vec3 sun_direction;
uniform vec3 camera_direction;

in vec2 uv;

out vec4 frag_color;

void main() {
    vec4 normal_sample = texture(normal_map_texture, uv);
    vec3 normal = (normal.xyz * 2.0) - vec3(1.0);
    vec3 view = -1.0 * camera_direction;
    vec3 view_half = (normal + view) / length(normal + view);

    float n_dot_l = dot(normal, sun_direction);
    float n_dot_h = dot(normal, view_half);

    float ambient_intensity = 0.25;
    float diffuse_intensity = 0.5;
    float specular_intensity = 0.0;
    if (n_dot_l > 0) {
        specular_intensity = 0.25 * pow(n_dot_h, 10);

    vec4 texture_sample = texture(color_texture, uv);

    vec3 lit_color = ambient_intensity * texture_sample.rgb +
                     diffuse_intensity * texture_sample.rgb +
                     specular_intensity * vec3(1.0);

    frag_color = vec4(lit_color, texture_sample.a);


Point / Spot Lighting

So this solves directional lights. However, point and spot lights have attenuation — they fall off over distance. Additionally, they have an actual spatial position, which means the light vector will change.

After rendering the LODs, we no longer have the vertex positions of our detail model. Calculating lighting based on the LOD model’s vertices will, in many cases, be obviously wrong. How can we reconstruct the original points?

By cheating.

Let’s look at what we have to work with:

The camera is orthographic, which means that depth of a pixel is independent of where it is on screen. We have the camera’s forward vector (camera_direction in the above shader).

Finally, and most obviously, we don’t care about shading the points that weren’t rendered to our LOD texture.

This turns out to be the important factor. If we knew how far each pixel was from the camera when rendered, we could reconstruct the original location for lighting.

To put it another way, we need a depth buffer. Remember that normal map alpha value we didn’t use?

We can get the depth value like so:

// Map depth from [near, far] to [-1, 1]
float depth = (2.0 * gl_FragCoord.z - gl_DepthRange.near - gl_DepthRange.far) / (gl_DepthRange.far - gl_DepthRange.near);

// Remap normal and depth from [-1, 1] to [0, 1]
normal_depth_color = (vec4(normalize(normal_vec), depth) + vec4(1.0)) / 2.0;

However, we now run into a problem. This depth value is in the LOD render’s window space. We want the position in the final render’s camera space — this way we can transform the light position to the same space, which allows us to do the attenuation calculations.

There are a few ways to handle this. One is to record the depth range for our LOD model after rendering it, and then use that to scale the depth properly. This is complex (asymmetrical models can have different depth sizes for each rotation) and quite likely inconsistent, due to precision errors.

A simpler solution, then, is to pick an arbitrary depth range, and use it consistently for both the main render and LODs. This is not ideal — one getting out of sync with the other may lead to subtle errors. In a production design, it may be a good idea to record the camera matrix used in the model data, so that it can be confirmed on load.

We need two further pieces of data to proceed: the viewport coordinates (the size of the window) and the inverse projection matrix (to “unproject” the position back into camera space). Both of these are fairly trivial to provide. To reconstruct the proper position, then:

vec3 window_to_camera_space(vec3 window_space) {
    // viewport = vec4(x, y, width, height)

    // Because projection is orthographic, NDC == clip space
    vec2 clip_xy = ((2.0 * window_space.xy) - (2.0 * viewport.xy)) / (viewport.zw) - 1.0;
    // Already mapped to [-1, 1] by the same transform that extracts our normal.
    float clip_z = window_space.z

    vec4 clip_space = vec4(clip_xy, clip_z, 1.0);

    vec4 camera_space = camera_from_clip * clip_space;
    return camera_space.xyz;

(Note that this is rather inefficient. We *know* what a projection matrix looks like, so that last multiply can be made a lot faster.)

Calculating the light color is the same as for directional lights. For point lights, we then divide by an attenuation factor:

float attenuation = point_light_attenuation.x +
                    point_light_attenuation.y * light_distance +
                    point_light_attenuation.z * light_distance * light_distance;

point_light_color = point_light_color / attenuation;

This uses 3 factors: constant, linear, and quadratic. The constant factor determines the base intensity of the light (a fractional constant brightens the light, a constant greater than 1.0 darkens it.) The linear and quadratic factors control the fall-off curve.

A spot light has the same attenuation factors, but also has another term:

float spot_intensity = pow(max(dot(-1.0 * spot_direction, light_direction), 0.0), spot_exponent);

The dot product here limits the light to a cone around the spotlight’s facing direction. The exponential factor controls the “tightness” of the light cone — as the exponent increases, the fall-off becomes much sharper (remember that the dot product of unit vectors produces a value in [0, 1]).

So that covers lighting. Next time: exploring options for shadows.



I made some errors with terminology in my last post. After the projection transform is applied, vertices are in *clip* space, not eye space. The series of transforms is as follows:

Model -> World -> Camera -> Clip -> NDC
(With an orthographic projection, Clip == NDC.)

Orthographic LODs – Part I

NOTE: I’m going to try something new here: rather than write about something I have already solved, this series will be my notes as I work through this problem. As such, this is *not* a guide. The code is a pile of hacks and I will make silly mistakes.

The Problem:

I am working on a city-building game. I want to be able to use detailed models with large numbers of polygons and textures, but without the runtime rendering cost that entails. To do so, I want to pre-render detailed models as textures for lower level-of-detail (LOD) models. This is not a particularly original solution (it is the system SimCity 4 uses). There are a few restrictions implied by this choice:

  • Camera projection must be orthographic.
    • A perspective projection will appear to skew the model the further it gets from the center of the view. Orthographic projection will not.
  • Camera angle must be fixed (or a finite set of fixed positions).
    • While camera *position* is irrelevant (because an orthographic projection doesn’t affect depth), its rotation is not. Each camera angle needs its own LOD image. Therefore, we need a small number of them.

The process is simple enough: render the detail model for each view. Projecting the vertices of the LOD model with the same view produces the proper texture coordinates for the LOD. Then, pack all of the rendered textures into an atlas.

Orthographic Rendering

Rendering an orthographic projection is, conceptually, very straightforward. After transforming the world to camera space (moving the camera to the origin, and aligning the z-axis with the camera’s forward view), we select a box of space and map it onto a unit cube (that is, a cube with a minimum point of (-1, -1, -1) and a maximum point of (1, 1, 1).) The z-component of each point is then discarded.

I will not get into the math here. Wikipedia has a nicely concise explanation.

What this *means* is that we can fit this box tightly to the model we want to render. With clever application of glViewport, we could even render all of our views in one go. (At the moment, I have not yet implemented this.) This makes building the texture atlas much simpler.

Calculating Camera-space LOD

To get this tight fit, we need to know the bounds of the rendered model in camera space. Since we are projecting onto the LOD model, *its* bounds are what we’re concerned with. (This implies, by the way, that the LOD model must completely enclose the detail model.)

struct AABB {
    vec3 min;
    vec3 max;

GetCameraSpaceBounds(Model * model, mat4x4 camera_from_model) {
    AABB result;

    vec4 * model_verts = model->vertices;
    vec4 camera_vert = camera_from_model * model_verts[0];

    // We don't initialize to 0 vectors, because we don't
    // guarentee the model is centered on the origin.
    result.min = vec3(camera_vert);
    result.max = vec3(camera_vert);

    for (u32 i = 1; i < model->vertex_count; ++i) {
        camera_vert = camera_from_model * model_verts[i];

        result.min.x = min(result.min.x, camera_vert.x);
        result.min.y = min(result.min.y, camera_vert.y);
        result.min.z = min(result.min.z, camera_vert.z);

        result.max.x = max(result.max.x, camera_vert.x);
        result.max.y = max(result.max.y, camera_vert.y);
        result.max.z = max(result.max.z, camera_vert.z);

    return result;

This bounding box gives us the clip volume to render.

AABB lod_bounds = GetCameraSpaceBounds(lod_model, camera_from_model);

mat4x4 eye_from_camera = OrthoProjectionMatrix(
    lod_bounds.max.x, lod_bounds.min.x,  // right, left
    lod_bounds.max.y, lod_bounds.min.y,  // top, bottom
    lod_bounds.min.z, lod_bounds.max.z); // near, far

mat4x4 eye_from_model = eye_from_camera * camera_from_model;

Rendering LOD Texture

Rendering to a texture requires a framebuffer. (In theory, we could also simply render to the screen and extract the result with glGetPixels. However, that limits us to a single a single color output and makes it more difficult to do any GPU postprocessing.)

After binding a new framebuffer, we need to create the texture to render onto:

glGenTextures(1, &result->texture);
glBindTexture(GL_TEXTURE_2D, result->texture);

// render_width and render_height are the width and height of the 
// camera space lod bounding box.
    GL_TEXTURE_2D, 0, GL_RGBA, render_width, render_height, 0, GL_RGBA, GL_UNSIGNED_BYTE, 0


glFramebufferTexture(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, result->texture, 0);

glViewport(0, 0, render_width, render_height);

This gives us a texture large enough to fit our rendered image, attaches it to the framebuffer, and sets the viewport to the entire framebuffer. To render multiple views to one texture, we’d allocate a larger texture, and use the viewport call to selected the target region.

Now we simply render the detail model with the eye_from_model transform calculated earlier. The result:

(The detail shader used simply maps position to color.)

That’s all for now. Next time: how do we light this model?

How bad is rand() % range?

“Pick a random number between 0 and 10.”

It’s a fairly basic task, often used as a project in introductory programming classes and extended language examples. The most common and straightforward solution is modulus. (Indeed, most CS classes use this exercise purely to introduce the concept of modulus.)

int number = rand() % range;

However, there is a problem with this method — it skews the distribution of numbers produced.

C’s rand() returns an integer in the range [0, RAND_MAX].
For illustration, let’s:
Assume RAND_MAX = 8
Assume rand is a perfect random distribution. (Equal chance of returning any integer in range.)

Let range = 5.
The possible values for number are then:

[0 % 5,
 1 % 5,
 2 % 5,
 3 % 5,
 4 % 5,
 5 % 5,
 6 % 5,
 7 % 5,
 8 % 5]
 [0, 1, 2, 3, 4, 0, 1, 2, 3]

You’ll notice that the [0, 3] range appears *twice*, but 4 only once. Therefore, there is only a 1/9 chance that rand() % 5 will generate a 4, but all other values have a probability of 2/9. That said, there *are* combinations of RAND_MAX and range that will work: for example, RAND_MAX = 7, range = 4:

[0, 1, 2, 3, 4, 5, 6, 7] % 4 => [0, 1, 2, 3, 0, 1, 2, 3]

Of course, on a real system, RAND_MAX is never so low. To analyse the actual ranges, we’ll need to rely on more advanced statistics.

The Goodness of Fit (GOF) test is a method of testing whether the distribution of a given sample agrees with the theoretical distribution of the entire population. This test relies on the \chi^2 test statistic (read ‘chi-squared’). We derive our \chi^2 by:

\chi^2=\sum\frac{(expected - observed)^2}{expected}

That is to say, we take a sample of data — in our case, iterations of rand(). This data is (must be) categorical, separable into discrete ‘buckets’ of values. In this case, the buckets are the integer values mod range. We then compare the expected and observed counts in each bucket, and sum the result for all buckets. (Important caveat: for this to provide a valid result, the expected count *must* be at least 5 for all buckets.)

With \chi^2 in hand, we can calculate the p-value of our data. The p-value is the probability that the theoretical distribution is correct, based on our sample. To calculate the p-value, we use the \chi^2cdf function. The cdf, or Cumulative Distribution Function, is the area underneath the distribution function for a given distribution. The value we are interested in is the area to the right of our \chi^2 statistic, so our calculation looks like:

1-\chi^2cdf(\chi^2, df)

Because this is a probability distribution, the total area under the curve is 1. Subtracting the area up to our \chi^2, we’re left with the right tail, which is our p-value. The higher our p-value, the closer we are to the theoretical distribution.

You’ll note one other variable in that equation: df. This is the ‘degree of freedom’ in the distribution. The \chi^2 distribution function changes with sample size — the degree of freedom is how this is represented. For a Goodness of Fit test, the degree of freedom is equal to the number of buckets, minus one.

Now for the actual test.

Since we expect rand() to approximate a perfect random distribution (it doesn’t, on most systems, but that is a separate issue), our theoretical, expected count in each bucket is equal to the number of trials divided by the number of buckets (our range.) For the Goodness of Fit test, we need a minimum of five expected counts in each bucket. Therefore, to test a given range we need to generate at least range*5 trials.

I wrote a short program in C++ to calculate \chi^2 for all ranges of rand(). Note that this does not directly tell us our p-values. Instead, it generates a Scilab script to calculate and plot them. (Actually evaluating \chi^2cdf requires Calculus beyond what I’m familiar with.)

The resulting plot is interesting:

Below ~4000, many ranges produce a reasonable distribution. However, above that point, the vast majority of ranges lead to *highly* skewed distributions (p = 0).

So, what’s the takeaway? How bad *is* rand() % range?

Pretty bad! The more important question, however, is whether you care.

Cryptography has very strict standards on the distribution of its random numbers. Not only do you not want to use modulus to remap ranges in crypto, you don’t want to be using rand() in the first place! Other applications are far less demanding. The slight skew is likely irrevelant in a game, for example. (Doom rather famously uses a single static table for its random number generator, foregoing rand() altogether.)

Finally: This post is *almost certainly* littered with errors. Most glaringly, assuming rand() to have a proper random distribution is incorrect. This is intended more as an exploration of the concept than a rigorous proof. However, corrections are welcome!

FTJ2014 Post-mortem I – Rendering

Fuck This Jam 2014 has wrapped up, so I can finally take the time to pop stack and talk about what I’ve learned in the process. Together with a former classmate, I built “Majo no Mahouki” [link], a side-scrolling shoot ’em up. The game itself leaves much to be desired (as to be expected from a 7-day jam), but I think some of the underlying framework code is worth discussing.

The game is built in C++ on top of the SFML library. SFML provides (among other things) a 2D graphics abstraction over OpenGL. The core interface to this abstraction is sf::RenderTarget::draw, which accepts a buffer of vertices and a RenderState (pointing to textures, shaders, etc.) Every call to draw results in a GL draw call. As such, these calls are *not* cheap.

However, the overhead is more or less the same between 4 vertices and 4000. Therefore, the solution is obvious: batching!

Majo handles this by creating a single VertexArray in the Level object (which handles the actual rendering for all in-game sprites). Each Entity, then, has a method with the signature virtual void render(sf::Vertex * quad) const. The level hands each entity a pointer to a quad in its vertex array, and the entity renders itself to it.

An example (default static sprite rendering):

void Entity::render(sf::Vertex * quad) const
    const double radius = m_collider.getRadius();
    const sf::Vector2f position = m_collider.getPosition();
    const sf::Vector2f halfsize(radius, radius);

    quad[0].position = sf::Vector2f(position.x - halfsize.x, position.y - halfsize.y);
    quad[1].position = sf::Vector2f(position.x + halfsize.x, position.y - halfsize.y);
    quad[2].position = sf::Vector2f(position.x + halfsize.x, position.y + halfsize.y);
    quad[3].position = sf::Vector2f(position.x - halfsize.x, position.y + halfsize.y);

    quad[0].texCoords = sf::Vector2f(m_spriteBounds.left, m_spriteBounds.top);
    quad[1].texCoords = sf::Vector2f(m_spriteBounds.left + m_spriteBounds.width, m_spriteBounds.top);
    quad[2].texCoords = sf::Vector2f(m_spriteBounds.left + m_spriteBounds.width, m_spriteBounds.top + m_spriteBounds.height);
    quad[3].texCoords = sf::Vector2f(m_spriteBounds.left, m_spriteBounds.top + m_spriteBounds.height);

This convention has issues, obviously: Level assumes that each entity only renders 4 vertices, and Entity assumes it is passed 4 vertices to render to. In hindsight, passing a reference to the array and an index into it may be a better calling convention.

In any case, we now have our vertices set up. However, this is not enough. While the vertices define position and texture coordinates, they do *not* define what texture is actually used (nor shaders, nor transforms, though Majo uses neither.) For proper batching, we must sort the list of entities by texture before rendering them to the vertex array:

void Level::render()
    m_projectileCount = 0;
    m_enemyCount = 0;

    m_verts.resize(m_entities.size() * 4);

    // Sort by texture type -- we want to batch drawing by texture.
    std::sort(m_entities.begin(), m_entities.end(), 
        [](const Entity * a, const Entity * b) {
        return a->getTextureId() < b->getTextureId();

    // Scan the list, rendering to vertex buffer and 
    // recording counts of each texture type.
    for (int i = 0; i < m_entities.size(); ++i)
        const TextureId texId = m_entities[i]->getTextureId();
        if (texId == TEXTURE_PROJECTILES) m_projectileCount++;
        else if (texId == TEXTURE_ENEMIES) m_enemyCount++;

        m_entities[i]->render(&m_verts[i * 4]);

(A TextureId is Majo’s internal handle to a cached texture.)

Now Level has an array of vertices, and offsets within the array for each texture switch. (This particular method of recording the offset will not scale well, but Majo uses a total of 3 textures, so it works here.) The draw call, then, is straightforward:

void Level::draw(sf::RenderTarget& target, sf::RenderStates states) const
    if (m_entities.size() == 0) return;

    const sf::Vertex * projectileVerts = &m_verts[0];
    const sf::Vertex * enemyVerts = &m_verts[m_projectileCount * 4];
    const sf::Vertex * playerVerts = &m_verts[(m_projectileCount + m_enemyCount) * 4];

    states.texture = Resources::getTexture(TEXTURE_PROJECTILES);
    target.draw(projectileVerts, m_projectileCount * 4, sf::Quads, states);

    states.texture = Resources::getTexture(TEXTURE_ENEMIES);
    target.draw(enemyVerts, m_enemyCount * 4, sf::Quads, states);

    states.texture = Resources::getTexture(TEXTURE_PLAYER);
    target.draw(playerVerts, 4, sf::Quads, states);

    // Hacky UI drawing elided.

Thus, we can draw all Entities with only 3 calls to RenderTarget::draw. This is a fairly extreme example — while individually rendering entities is prohibitively expensive, it’s not necessarily worthwhile to batch *every* draw. However, it can be done.

Next time, we’ll explore how Majo handles sprite animation.

The Tale of trimZeros, or UI: Detail Counts

Continuing in a similar vein to my last post (Dragging Roads), another post on UI code. My CS professor has a few labs that we come back to and build on over the course of the year — one of these is an RPN calculator, which we built last semester (and which has become a coding kata for me in several languages), and which we are now developing GUIs for: first with Swing, and now on Android. (Spoilers: no, nothing in today’s post is actually related to Android UI in particular.)

Although the calculator code itself works on doubles, as you would expect, the UI stores the currently entered number as a String. This seems like a dirty hack (it is), but in order to support floating point entry and a backspace key, it’s the simplest means. The alternative, using exponentiation to place each digit in the right place on a double, has more nasty corner cases than are worth dealing with in the timeframe of a lab exercise. This String-backed UI has some consequences, namely that the user can input values that are either (a) invalid or (b) look wrong. Today’s post examines (b) — more specifically, removing unnecessary leading zeros.

Trimming zeros is easy, right?

private void trimZeros() {
    if (currentEntry.length() < 1) return;
    int i;

    for (i = 0; currentEntry.length() > i 
             && currentEntry.charAt(i) == '0'; ++i);

    currentEntry = currentEntry.substring(i);

(currentEntry is the String used to store the current entered number.)

This is somewhat more “clever” than it probably should be, but it’s still straightforward code — find the index of the first non-‘0’ character in the currentEntry string, then lop off the zeros all at once with a substring call.

However, it doesn’t quite preform how we want:

|  Input  |  Output  |
|  "0123" |   "123"  |
| "0.123" |  ".123"  |
|   "000" |      ""  |
|     "0" |      ""  |

Leading zeros on integers are trimmed properly, but zero values are reduced to empty strings, and decimal values lose the leading zero. With the exception of the empty strings, these are perfectly valid — feed them into #parseDouble and we’ll get the right double out, but we can do better.

private void trimZeros() {
    if (currentEntry.length() < 1) return;
    int i;
    int decimalPoint = currentEntry.indexOf('.');

    for (i = 0; currentEntry.length() - 1 > i 
             && currentEntry.charAt(i) == '0'; ++i) {

        if (i + 1 == decimalPoint) break;

    currentEntry = currentEntry.substring(i);

Let’s check our table again:

|  Input  |  Output  |
|  "0123" |   "123"  |
| "0.123" | "0.123"  |
|   "000" |     "0"  |
|     "0" |     "0"  |

We’ve made two changes here: first, we never try to consume the last character in the string. Therefore, if we have an input of “00”, we only chop off the first ‘0’, leaving “0” — exactly what we want. Second, we look up the location of the decimal point (if there is one), and ensure we stop one character short of it, preserving a leading zero before the decimal point.

(An aside for those not well versed in Java: #indexOf returns -1 for a character not in the string. Because i starts at 0 and is only incremented, we can guarantee that i + 1 != -1 (overflow excepted — it’s not ever a problem here).)

Minor UI polish like this is not particularly difficult — the total diff between these two methods is 3 lines of code. However, issues like this exist in every system and piece of data that comes close to a user. It’s worth investing the time to hunt them down and fix them, though — users may not notice a well-behaved UI, but they will notice an awkward one.