hikari – Entity Component System II

I’d intended to use this second post to dive into turning some existing systems in hikari into components. However, after a few attempts at writing it up, I think converting an existing system makes for a really terrible way to explain component systems. So instead I’m going to go over a few common decisions that were showing up a lot and discuss them directly. We’ll get back to the actual implementation after this.

Also, before I go any further, I wanted to clarify a point from last time. ECS is an abstract design pattern. There are many ways of implementing it, just like there are a variety of ways to implement abstract data structures. (i.e, a list could be an array, a linked list, a linked list of array chunks, etc.) There are tradeoffs for every implementation decision, so I don’t claim I’m implementing things in The Right Way. That said, this is a record of my implementation, not a broad study of ECS in general, so I may not mention all of the tradeoffs.

Components and Systems

Continuing the theme of “things are not what they seem”, components don’t necessarily exist in ECS. Which is to say, if you have a Foo component, you may or may not actually have a FooComponent struct anywhere.

The reason for this is simple: it is exceedingly rare to have *only one* of a thing. The structure we store things in determine its memory layout. Memory layout decisions for a group of things should be made with respect to the entire group and how it is used. Therefore, these decisions are made when implementing the system for a particular component.

For example, here’s a few ways we could layout a TransformSystem that holds parent- and world-relative spacial transforms as well as a scene graph structure describing the relationship between instances.

struct TransformSystem {
// Option:  Array-of-structs
struct TransformComponent {
Vector3    local_position;
Vector3    local_scale;
Quaternion local_rotation;
Vector3    world_position;
Vector3    world_scale;
Quaternion world_rotation;

u32 parent;
u32 next_sibling;
u32 prev_sibling;
u32 first_child;
};
u32 count;
TransformComponent * transforms;

// Option:  Struct-of-arrays
u32 count;
Vector3 *    local_position;
Vector3 *    local_scale;
Quaternion * local_rotation;
Vector3 *    world_position;
Vector3 *    world_scale;
Quaternion * world_rotation;
u32 * parent;
u32 * next_sibling;
u32 * prev_sibling;
u32 * first_child;

// Option:  Array-of-structs, use-case split
struct Transform {
Vector3    position;
Vector3    scale;
Quaternion rotation;
};

u32 count;
Transform * parent_from_local;
Transform * world_from_local;
u32 * parent;
u32 * next_sibling;
u32 * prev_sibling;
u32 * first_child;
};


A lot of material about memory layout pushes “struct of arrays for everything!”, and that’s a good default strategy, but it isn’t necessarily the best way to go every time. What gets used together? What doesn’t? When you optimize for cache performance in the most common code path, how many cache misses do you cause in the *second* most common code path? In the example, struct-of-arrays makes a lot of sense when we’re burning through the data linearly (as we do when updating the world transforms), but means three separate cache misses to look at a specific transform on its own. Which is more important is going to depend on factors that aren’t obvious and may change over time (how many other systems poke at individual transforms, say.) Hence why it’s nice to have this detail localized to the system itself — we can change this at will as long as no one else depends on our exact data layout.

The freedom to choose our data layout per system goes beyond struct-of-arrays, however. Need to keep the data sorted? Go ahead. Need to store it in a spatial lookup structure? Sure. Want to keep components in their own heap-allocated structs? That works too.

The listing above is missing one important field that all components will need: the entity it is attached to. Systems interact directly with each other. If we know our host entity, we can use that to query other systems for components attached to that same entity. (We also need that reference to check if our host has been destroyed. Remember: we’re not storing a list of attached components anywhere!)

Component Lookup

One of the key things we need to be able to do is, given an entity, find the component instance(s) attached to it. Once again, this is something we solve separately per-system. For many systems, we’re going to want to move instances around in our internal data structures, to keep them tightly packed or enforce ordering requirements. Pointers and indices alone won’t be sufficient in this case, because the instance moves around in memory.

We need a lookup structure between Entity handles and instances. How we choose to do this is a tradeoff between extra memory overhead and runtime.

On the low-memory side, we could just search the array of Entity handles. Linear search is O(n) in the number of active instances.

Option<Instance>
GetInstance(Entity e) {
Instance result = Instance::Null();
for (u32 i = 0; i < this->count; ++i) {
if (e == this->entities[i]) {
result = Instance(i);
break;
}
}
// NOTE: Option<T> stores an instance of T and whether that value is
// valid.  Accessing an invalid value from Option<T> will assert.
// I use this to force handling of the "not found" case by the caller.
return Option<Instance>(result, !result.IsNull());
}


We can improve this further by letting the caller pass in a previous instance handle — before searching, check if that instance and entity combination is still valid:

Option<Instance>
GetInstance(Entity e, Instance inst=Instance::Null()) {
if (!inst.IsNull() && inst.index < count) {
if (e == this->entities[inst.index]) {
// Past instance is valid
return Option<Instance>(inst, true);
}
}
// No longer valid, need to search again:
...
}


If the system doesn’t store a huge number of instances, and instances don’t move around very often, this is a perfectly reasonable approach. No memory overhead (we need that entity list anyway), and constant time in the common case.

Of course, we’re not going to be so lucky with some systems. For example, we can reasonably expect that the *majority* of entities have a Transform — anything that exists spatially in the world needs one. Transforms are also used by a lot of different systems, so we need to perform the lookup often. Finally, for the sake of its own updates, Transforms have ordering requirements that mean they move around in the internal structures pretty often.

We can solve this one with a linear lookup table indexed by the entity index:

Option<Instance>
GetInstance(Entity e) {
Instance result = Instance::Null();
if (e.index < this->lookup_table.size()) {
result = this->lookup_table[e.index];
// Check for stale entity:
if (this->entities[result.index] != e) {
result = Instance::Null();
}
}
return Option<Instance>(result, !result.IsNull());
}


Unlike the lookup table in the entity manager I discussed last time, there’s no strict requirement on this lookup table not moving around. Therefore, we can use std::vector (or equivalent), sized to hold the highest entity index we’ve ever been attached to. GetInstance can immediately reject entities beyond that point. This gives us O(n) memory overhead in the number of entities that exist in the world.

Somewhere between these two on the memory/runtime tradeoff is a hashtable mapping Entity handles to Instance handles. This requires less memory for component types that aren’t very common (no need to store null Instances for every entity), while having better asymptotic runtime than searching the Entity array. (Search may still win in practice for small enough systems.)

Multiple Instances

The above lookup functions only really work if each Entity can have only one of a particular component. This makes sense in some cases (what would it even *mean* to have multiple Transforms?) In many other cases, though, we’d really like to be able to use multiple instances per entity. For example, an explosion effect might have several different particle systems attached (smoke, rubble, sparks, etc.)

However, in this case, how do we reference a particular instance of a component from some other system? The host entity’s handle is no longer a unique identifier for the instance.

I don’t really have an answer I’m confident in, but my working theory is the system that allows multiple instances should handle this internally. Specifically, by expanding its data structures.

First, we add a unique id field to the Instance struct. This isn’t another index, just a monotonically increasing counter of instances created. (I am very explicitly *not* using the Nth instance attached to an entity as it interacts badly with adding / removing instances.) Next, we add two arrays to our instance data, one that points to the next instance assigned to the same entity, the next containing the instance id. This gives us a linked list of instances for a given entity — by putting this in its own array, we keep it out of the way during most operations, but increase the chance of hitting cache when we *are* iterating the list.

GetInstance() gets a bit more complex:

Option<Instance>
GetInstance(Entity e, Instance inst) {
if (inst.IsNull()) {
// Caller is expected to have a non-null instance already.  (As returned
// by AddInstance(), etc.)  Otherwise we don't know the correct id.
return Option<Instance>(inst, false);
}

if (inst.index < count) {
if (this->entities[inst.index] == e && this->instance_ids[inst.index] == inst.id) {
// Given instance reference is still valid.
return Option<Instance>(inst, true);
}
}

// Find the entity's first instance, using one of the methods above:
Instance current = FindFirstInstance(e);
while (current.id != inst.id && !current.IsNull()) {
u32 next_index = this->next_instance_index[current.index];
u32 next_id = this->instance_ids[next_index];
current.id = next_id;
current.index = next_index;
}

// Result is valid if the id matches the target.
return Option<Instance>(current, current.id == inst.id);
}


I’m not a huge fan of this. More state to keep in sync, and potentially a lot of effectively useless overhead (i.e., if *most* entities with a FooComponent only have one, the next_instance_index array is going to be full of nulls.) On the other hand, we only pay a performance cost when we actually *use* multiple instances per entity.

¯\_(ツ)_/¯

Generalization

So, we have a variety of approaches to solve instance storage and lookup on a per-system basis, but do we really have to do this every time? Many systems are going to want to use the same approach.

I don’t really know. However, I suspect we can get away with sharing a lot of the lookup code, by storing the lookup data in its own struct, templated on the system’s Instance type. Actual component data management is unlikely to be easy to share, but we can write helper code for things like “allocate a bunch of arrays from a single data block”.

Okay, onward to actual code.  Next time.

Orthographic LODs – Part II

Last time, we got the basic renderer up and running. Now we have more complex issues to deal with. For example, how to light the LOD models.

Lighting

One approach to lighting is to simply light the detailed model, and bake that lighting into the LOD texture. This works, but requires that all lights be static. We can do better.

The standard shading equation has three parts: ambient light, diffuse light, and specular light:
$K_a=AT$
$K_d=DT(N \cdot L)$
$K_s=S(N \cdot H)^m (N \cdot L > 0)$
(For details of how these equations work, see Wikipedia.)

The key thing to notice is that these equations rely only on the texture (which we already have), a unit direction towards the light source, a unit direction towards the camera, and the unit surface normal. By rendering our detailed model down to a texture, we’ve destroyed the normal data. But that’s easy enough to fix: we write it to a texture as well.

(Setup for this texture is more-or-less the same as the color texture, we just bind it to a different output.)

// Detail vertex shader
in vec3 position;
in vec3 normal;

out vec3 Normal;

void main() {
gl_Position = position;
Normal = normal;
}

in vec3 Normal;

out vec4 frag_color;
out vec4 normal_map_color;

void main() {
vec3 normal = (normalize(Normal) + vec3(1.0)) / 2.0;

frag_color = vec4(1.0);
normal_map_color = vec4(normal, 1.0);
}


A unit normal is simply a vector of 3 floats between -1 and 1. We can map this to a RGB color (vec3 clamped to [0, 1]) by adding one and dividing by two. This is, again, nothing new. You may be wondering why we store an alpha channel — we’ll get to that later.

We can now reconstruct the normal from the LOD texture. This is all we need for directional lights:

// Fragment shader
uniform sampler2D color_texture;
uniform sampler2D normal_map_texture;

uniform vec3 sun_direction;
uniform vec3 camera_direction;

in vec2 uv;

out vec4 frag_color;

void main() {
vec4 normal_sample = texture(normal_map_texture, uv);
vec3 normal = (normal.xyz * 2.0) - vec3(1.0);
vec3 view = -1.0 * camera_direction;
vec3 view_half = (normal + view) / length(normal + view);

float n_dot_l = dot(normal, sun_direction);
float n_dot_h = dot(normal, view_half);

float ambient_intensity = 0.25;
float diffuse_intensity = 0.5;
float specular_intensity = 0.0;
if (n_dot_l > 0) {
specular_intensity = 0.25 * pow(n_dot_h, 10);
}

vec4 texture_sample = texture(color_texture, uv);

vec3 lit_color = ambient_intensity * texture_sample.rgb +
diffuse_intensity * texture_sample.rgb +
specular_intensity * vec3(1.0);

frag_color = vec4(lit_color, texture_sample.a);
}


Result:

Point / Spot Lighting

So this solves directional lights. However, point and spot lights have attenuation — they fall off over distance. Additionally, they have an actual spatial position, which means the light vector will change.

After rendering the LODs, we no longer have the vertex positions of our detail model. Calculating lighting based on the LOD model’s vertices will, in many cases, be obviously wrong. How can we reconstruct the original points?

By cheating.

Let’s look at what we have to work with:

The camera is orthographic, which means that depth of a pixel is independent of where it is on screen. We have the camera’s forward vector (camera_direction in the above shader).

Finally, and most obviously, we don’t care about shading the points that weren’t rendered to our LOD texture.

This turns out to be the important factor. If we knew how far each pixel was from the camera when rendered, we could reconstruct the original location for lighting.

To put it another way, we need a depth buffer. Remember that normal map alpha value we didn’t use?

We can get the depth value like so:

// Map depth from [near, far] to [-1, 1]
float depth = (2.0 * gl_FragCoord.z - gl_DepthRange.near - gl_DepthRange.far) / (gl_DepthRange.far - gl_DepthRange.near);

// Remap normal and depth from [-1, 1] to [0, 1]
normal_depth_color = (vec4(normalize(normal_vec), depth) + vec4(1.0)) / 2.0;


However, we now run into a problem. This depth value is in the LOD render’s window space. We want the position in the final render’s camera space — this way we can transform the light position to the same space, which allows us to do the attenuation calculations.

There are a few ways to handle this. One is to record the depth range for our LOD model after rendering it, and then use that to scale the depth properly. This is complex (asymmetrical models can have different depth sizes for each rotation) and quite likely inconsistent, due to precision errors.

A simpler solution, then, is to pick an arbitrary depth range, and use it consistently for both the main render and LODs. This is not ideal — one getting out of sync with the other may lead to subtle errors. In a production design, it may be a good idea to record the camera matrix used in the model data, so that it can be confirmed on load.

We need two further pieces of data to proceed: the viewport coordinates (the size of the window) and the inverse projection matrix (to “unproject” the position back into camera space). Both of these are fairly trivial to provide. To reconstruct the proper position, then:

vec3 window_to_camera_space(vec3 window_space) {
// viewport = vec4(x, y, width, height)

// Because projection is orthographic, NDC == clip space
vec2 clip_xy = ((2.0 * window_space.xy) - (2.0 * viewport.xy)) / (viewport.zw) - 1.0;
// Already mapped to [-1, 1] by the same transform that extracts our normal.
float clip_z = window_space.z

vec4 clip_space = vec4(clip_xy, clip_z, 1.0);

vec4 camera_space = camera_from_clip * clip_space;
return camera_space.xyz;
}


(Note that this is rather inefficient. We *know* what a projection matrix looks like, so that last multiply can be made a lot faster.)

Calculating the light color is the same as for directional lights. For point lights, we then divide by an attenuation factor:

float attenuation = point_light_attenuation.x +
point_light_attenuation.y * light_distance +
point_light_attenuation.z * light_distance * light_distance;

point_light_color = point_light_color / attenuation;


This uses 3 factors: constant, linear, and quadratic. The constant factor determines the base intensity of the light (a fractional constant brightens the light, a constant greater than 1.0 darkens it.) The linear and quadratic factors control the fall-off curve.

A spot light has the same attenuation factors, but also has another term:

float spot_intensity = pow(max(dot(-1.0 * spot_direction, light_direction), 0.0), spot_exponent);


The dot product here limits the light to a cone around the spotlight’s facing direction. The exponential factor controls the “tightness” of the light cone — as the exponent increases, the fall-off becomes much sharper (remember that the dot product of unit vectors produces a value in [0, 1]).

So that covers lighting. Next time: exploring options for shadows.

NOTE:

I made some errors with terminology in my last post. After the projection transform is applied, vertices are in *clip* space, not eye space. The series of transforms is as follows:

Model -> World -> Camera -> Clip -> NDC
(With an orthographic projection, Clip == NDC.)

Orthographic LODs – Part I

NOTE: I’m going to try something new here: rather than write about something I have already solved, this series will be my notes as I work through this problem. As such, this is *not* a guide. The code is a pile of hacks and I will make silly mistakes.

The Problem:

I am working on a city-building game. I want to be able to use detailed models with large numbers of polygons and textures, but without the runtime rendering cost that entails. To do so, I want to pre-render detailed models as textures for lower level-of-detail (LOD) models. This is not a particularly original solution (it is the system SimCity 4 uses). There are a few restrictions implied by this choice:

• Camera projection must be orthographic.
• A perspective projection will appear to skew the model the further it gets from the center of the view. Orthographic projection will not.
• Camera angle must be fixed (or a finite set of fixed positions).
• While camera *position* is irrelevant (because an orthographic projection doesn’t affect depth), its rotation is not. Each camera angle needs its own LOD image. Therefore, we need a small number of them.

The process is simple enough: render the detail model for each view. Projecting the vertices of the LOD model with the same view produces the proper texture coordinates for the LOD. Then, pack all of the rendered textures into an atlas.

Orthographic Rendering

Rendering an orthographic projection is, conceptually, very straightforward. After transforming the world to camera space (moving the camera to the origin, and aligning the z-axis with the camera’s forward view), we select a box of space and map it onto a unit cube (that is, a cube with a minimum point of (-1, -1, -1) and a maximum point of (1, 1, 1).) The z-component of each point is then discarded.

I will not get into the math here. Wikipedia has a nicely concise explanation.

What this *means* is that we can fit this box tightly to the model we want to render. With clever application of glViewport, we could even render all of our views in one go. (At the moment, I have not yet implemented this.) This makes building the texture atlas much simpler.

Calculating Camera-space LOD

To get this tight fit, we need to know the bounds of the rendered model in camera space. Since we are projecting onto the LOD model, *its* bounds are what we’re concerned with. (This implies, by the way, that the LOD model must completely enclose the detail model.)

struct AABB {
vec3 min;
vec3 max;
};

AABB
GetCameraSpaceBounds(Model * model, mat4x4 camera_from_model) {
AABB result;

vec4 * model_verts = model->vertices;
vec4 camera_vert = camera_from_model * model_verts[0];

// We don't initialize to 0 vectors, because we don't
// guarentee the model is centered on the origin.
result.min = vec3(camera_vert);
result.max = vec3(camera_vert);

for (u32 i = 1; i < model->vertex_count; ++i) {
camera_vert = camera_from_model * model_verts[i];

result.min.x = min(result.min.x, camera_vert.x);
result.min.y = min(result.min.y, camera_vert.y);
result.min.z = min(result.min.z, camera_vert.z);

result.max.x = max(result.max.x, camera_vert.x);
result.max.y = max(result.max.y, camera_vert.y);
result.max.z = max(result.max.z, camera_vert.z);
}

return result;
}


This bounding box gives us the clip volume to render.

AABB lod_bounds = GetCameraSpaceBounds(lod_model, camera_from_model);

mat4x4 eye_from_camera = OrthoProjectionMatrix(
lod_bounds.max.x, lod_bounds.min.x,  // right, left
lod_bounds.max.y, lod_bounds.min.y,  // top, bottom
lod_bounds.min.z, lod_bounds.max.z); // near, far

mat4x4 eye_from_model = eye_from_camera * camera_from_model;


Rendering LOD Texture

Rendering to a texture requires a framebuffer. (In theory, we could also simply render to the screen and extract the result with glGetPixels. However, that limits us to a single a single color output and makes it more difficult to do any GPU postprocessing.)

After binding a new framebuffer, we need to create the texture to render onto:

glGenTextures(1, &result->texture);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, result->texture);

// render_width and render_height are the width and height of the
// camera space lod bounding box.
glTexImage2D(
GL_TEXTURE_2D, 0, GL_RGBA, render_width, render_height, 0, GL_RGBA, GL_UNSIGNED_BYTE, 0
);

glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAX_LEVEL, 0);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);

glFramebufferTexture(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, result->texture, 0);

glViewport(0, 0, render_width, render_height);


This gives us a texture large enough to fit our rendered image, attaches it to the framebuffer, and sets the viewport to the entire framebuffer. To render multiple views to one texture, we’d allocate a larger texture, and use the viewport call to selected the target region.

Now we simply render the detail model with the eye_from_model transform calculated earlier. The result:

(The detail shader used simply maps position to color.)

That’s all for now. Next time: how do we light this model?

How bad is rand() % range?

“Pick a random number between 0 and 10.”

It’s a fairly basic task, often used as a project in introductory programming classes and extended language examples. The most common and straightforward solution is modulus. (Indeed, most CS classes use this exercise purely to introduce the concept of modulus.)

int number = rand() % range;


However, there is a problem with this method — it skews the distribution of numbers produced.

C’s rand() returns an integer in the range [0, RAND_MAX].
For illustration, let’s:
Assume RAND_MAX = 8
Assume rand is a perfect random distribution. (Equal chance of returning any integer in range.)

Let range = 5.
The possible values for number are then:

[0 % 5,
1 % 5,
2 % 5,
3 % 5,
4 % 5,
5 % 5,
6 % 5,
7 % 5,
8 % 5]
=>
[0, 1, 2, 3, 4, 0, 1, 2, 3]


You’ll notice that the [0, 3] range appears *twice*, but 4 only once. Therefore, there is only a 1/9 chance that rand() % 5 will generate a 4, but all other values have a probability of 2/9. That said, there *are* combinations of RAND_MAX and range that will work: for example, RAND_MAX = 7, range = 4:

[0, 1, 2, 3, 4, 5, 6, 7] % 4 => [0, 1, 2, 3, 0, 1, 2, 3]


Of course, on a real system, RAND_MAX is never so low. To analyse the actual ranges, we’ll need to rely on more advanced statistics.

The Goodness of Fit (GOF) test is a method of testing whether the distribution of a given sample agrees with the theoretical distribution of the entire population. This test relies on the $\chi^2$ test statistic (read ‘chi-squared’). We derive our $\chi^2$ by:

$\chi^2=\sum\frac{(expected - observed)^2}{expected}$

That is to say, we take a sample of data — in our case, iterations of rand(). This data is (must be) categorical, separable into discrete ‘buckets’ of values. In this case, the buckets are the integer values mod range. We then compare the expected and observed counts in each bucket, and sum the result for all buckets. (Important caveat: for this to provide a valid result, the expected count *must* be at least 5 for all buckets.)

With $\chi^2$ in hand, we can calculate the p-value of our data. The p-value is the probability that the theoretical distribution is correct, based on our sample. To calculate the p-value, we use the $\chi^2cdf$ function. The cdf, or Cumulative Distribution Function, is the area underneath the distribution function for a given distribution. The value we are interested in is the area to the right of our $\chi^2$ statistic, so our calculation looks like:

$1-\chi^2cdf(\chi^2, df)$

Because this is a probability distribution, the total area under the curve is 1. Subtracting the area up to our $\chi^2$, we’re left with the right tail, which is our p-value. The higher our p-value, the closer we are to the theoretical distribution.

You’ll note one other variable in that equation: $df$. This is the ‘degree of freedom’ in the distribution. The $\chi^2$ distribution function changes with sample size — the degree of freedom is how this is represented. For a Goodness of Fit test, the degree of freedom is equal to the number of buckets, minus one.

Now for the actual test.

Since we expect rand() to approximate a perfect random distribution (it doesn’t, on most systems, but that is a separate issue), our theoretical, expected count in each bucket is equal to the number of trials divided by the number of buckets (our range.) For the Goodness of Fit test, we need a minimum of five expected counts in each bucket. Therefore, to test a given range we need to generate at least $range*5$ trials.

I wrote a short program in C++ to calculate $\chi^2$ for all ranges of rand(). Note that this does not directly tell us our p-values. Instead, it generates a Scilab script to calculate and plot them. (Actually evaluating $\chi^2cdf$ requires Calculus beyond what I’m familiar with.)

The resulting plot is interesting:

Below ~4000, many ranges produce a reasonable distribution. However, above that point, the vast majority of ranges lead to *highly* skewed distributions (p = 0).

So, what’s the takeaway? How bad *is* rand() % range?

Pretty bad! The more important question, however, is whether you care.

Cryptography has very strict standards on the distribution of its random numbers. Not only do you not want to use modulus to remap ranges in crypto, you don’t want to be using rand() in the first place! Other applications are far less demanding. The slight skew is likely irrevelant in a game, for example. (Doom rather famously uses a single static table for its random number generator, foregoing rand() altogether.)

Finally: This post is *almost certainly* littered with errors. Most glaringly, assuming rand() to have a proper random distribution is incorrect. This is intended more as an exploration of the concept than a rigorous proof. However, corrections are welcome!

FTJ2014 Post-mortem Part II – Animation

Unlike rendering, I went into this jam with no clue how to do animation. Fortunately, it turns out to be fairly simple (in part thanks to my per-entity rendering setup.)

First, a brief bit of background, for anyone unfamiliar with rendering:

Each vertex drawn has three properties: position, color, and texture coordinates. The rest of the pixels draw are determined by linear interpolation between connected vertices. Which vertices are connected depends on the primitive type defined when we send the vertex buffer to the GPU. In Majo, everything is rendered as Quads (4 consecutive vertices create a single face.) Finally, by convention, the elements of the position vector are referred to by the familiar (x, y, z), but texture coordinates are labeled (u, v, w).

So, the following code renders a 16×16 square starting at the origin (top-left of screen), textured with a 32×32 block of the current texture, starting at the top-left of the texture image:

void Foo::render(sf::Vertex * quad) const
{

}


(There’s no requirement for texture size and actual rendered size to be related, but direct multiples prevent squashing and stretching.)

What this means is we can place multiple sprites on a single texture image (a spritesheet), and slice out the portion we want to render. Which means that *animating* these sprites is just a matter of changing which slice of the spritesheet we render.

A single frame of animation, then, is simply a rectangle indicating what part of the spritesheet to render:

struct AnimationFrame
{
sf::Vector2f uv;  // Offset into the texture.
sf::Vector2f spriteSize;  // Size of the sprite on the texture.
};


And an animation consists of a list of frames, with some bookkeeping regarding when to switch:

class Animation
{
private:
/* Container for a list of animation frames. */
std::vector<const AnimationFrame> m_frames;
int m_currentFrame;

int m_ticksPerFrame;
int m_ticksSinceLastFrame;

public:
// Constructors / Accessors elided.

/* Start the animation. */
void start();
/* Update call.  Advances the animation. */
void update();
/*
* Render the texture coords to quad.
* Does not touch position -- that is to be set by the entity.
*/
};


The render method allows the owning entity to pass a quad to render the appropriate uv coordinates (while still rendering position and color itself.)

We’re not done yet!

The problem is that this class is hefty: a four-frame animation weighs in at 76 bytes, plus overhead incurred by the vector class. Copying this class as a member of every Entity that uses an animation adds up to a lot of memory moving around. (The issue is less the memory usage, and more that allocating, copying, and deallocating that memory is not free.)

The obvious solution is to cache the Animation, and let each entity store a pointer to it. However, we now have a different problem: the timing information and current frame are shared between all entities using a given animation. (Since each entity calls Animation::update, this leads to undesired behavior.)

This leads to the core lesson I learned during this jam: data and state are different things. Separate them.

Our animation frames are data: we load them into memory once, and only read from them. On the other hand, the timing information is *state*, modified on each update. These are two separate concepts, so let’s carve out the stateful bits into their own structure:

struct AnimationState
{
int frameCount;
int currentFrame;
int ticksPerFrame;
int ticksSinceLast;

// Methods elided
};


With a bit of trivial setup, each animated Entity now owns an AnimationState, as well as a pointer to an Animation (20 + 4 bytes per Entity on x86). Now we can cache and share Animations all we want, as they are constant data. Animation::render has its signature changed to accept an AnimationState passed in: void render(sf::Vertex * quad, const AnimationState& state) const;

Over 1000 entities, this system saves ~54.6kB of memory. That’s not a lot. However, by enabling caching of Animations, we reduce the amount of initialization code involved, as well as the number of runtime copies/moves neccessary. Decreased memory usage is just a nice bonus.

Next time: design, scope, and some miscellaneous issues encountered over the week.

FTJ2014 Post-mortem I – Rendering

Fuck This Jam 2014 has wrapped up, so I can finally take the time to pop stack and talk about what I’ve learned in the process. Together with a former classmate, I built “Majo no Mahouki” [link], a side-scrolling shoot ’em up. The game itself leaves much to be desired (as to be expected from a 7-day jam), but I think some of the underlying framework code is worth discussing.

The game is built in C++ on top of the SFML library. SFML provides (among other things) a 2D graphics abstraction over OpenGL. The core interface to this abstraction is sf::RenderTarget::draw, which accepts a buffer of vertices and a RenderState (pointing to textures, shaders, etc.) Every call to draw results in a GL draw call. As such, these calls are *not* cheap.

However, the overhead is more or less the same between 4 vertices and 4000. Therefore, the solution is obvious: batching!

Majo handles this by creating a single VertexArray in the Level object (which handles the actual rendering for all in-game sprites). Each Entity, then, has a method with the signature virtual void render(sf::Vertex * quad) const. The level hands each entity a pointer to a quad in its vertex array, and the entity renders itself to it.

An example (default static sprite rendering):

void Entity::render(sf::Vertex * quad) const
{
const sf::Vector2f position = m_collider.getPosition();

quad[0].position = sf::Vector2f(position.x - halfsize.x, position.y - halfsize.y);
quad[1].position = sf::Vector2f(position.x + halfsize.x, position.y - halfsize.y);
quad[2].position = sf::Vector2f(position.x + halfsize.x, position.y + halfsize.y);
quad[3].position = sf::Vector2f(position.x - halfsize.x, position.y + halfsize.y);

quad[1].texCoords = sf::Vector2f(m_spriteBounds.left + m_spriteBounds.width, m_spriteBounds.top);
quad[2].texCoords = sf::Vector2f(m_spriteBounds.left + m_spriteBounds.width, m_spriteBounds.top + m_spriteBounds.height);
quad[3].texCoords = sf::Vector2f(m_spriteBounds.left, m_spriteBounds.top + m_spriteBounds.height);
}


This convention has issues, obviously: Level assumes that each entity only renders 4 vertices, and Entity assumes it is passed 4 vertices to render to. In hindsight, passing a reference to the array and an index into it may be a better calling convention.

In any case, we now have our vertices set up. However, this is not enough. While the vertices define position and texture coordinates, they do *not* define what texture is actually used (nor shaders, nor transforms, though Majo uses neither.) For proper batching, we must sort the list of entities by texture before rendering them to the vertex array:

void Level::render()
{
m_projectileCount = 0;
m_enemyCount = 0;

m_verts.clear();
m_verts.resize(m_entities.size() * 4);

// Sort by texture type -- we want to batch drawing by texture.
std::sort(m_entities.begin(), m_entities.end(),
[](const Entity * a, const Entity * b) {
return a->getTextureId() < b->getTextureId();
});

// Scan the list, rendering to vertex buffer and
// recording counts of each texture type.
for (int i = 0; i < m_entities.size(); ++i)
{
const TextureId texId = m_entities[i]->getTextureId();
if (texId == TEXTURE_PROJECTILES) m_projectileCount++;
else if (texId == TEXTURE_ENEMIES) m_enemyCount++;

m_entities[i]->render(&m_verts[i * 4]);
}
}


(A TextureId is Majo’s internal handle to a cached texture.)

Now Level has an array of vertices, and offsets within the array for each texture switch. (This particular method of recording the offset will not scale well, but Majo uses a total of 3 textures, so it works here.) The draw call, then, is straightforward:

void Level::draw(sf::RenderTarget& target, sf::RenderStates states) const
{
if (m_entities.size() == 0) return;

const sf::Vertex * projectileVerts = &m_verts[0];
const sf::Vertex * enemyVerts = &m_verts[m_projectileCount * 4];
const sf::Vertex * playerVerts = &m_verts[(m_projectileCount + m_enemyCount) * 4];

states.texture = Resources::getTexture(TEXTURE_PROJECTILES);
target.draw(projectileVerts, m_projectileCount * 4, sf::Quads, states);

states.texture = Resources::getTexture(TEXTURE_ENEMIES);
target.draw(enemyVerts, m_enemyCount * 4, sf::Quads, states);

states.texture = Resources::getTexture(TEXTURE_PLAYER);

// Hacky UI drawing elided.
}


Thus, we can draw all Entities with only 3 calls to RenderTarget::draw. This is a fairly extreme example — while individually rendering entities is prohibitively expensive, it’s not necessarily worthwhile to batch *every* draw. However, it can be done.

Next time, we’ll explore how Majo handles sprite animation.

I’m currently working on a game prototype (using C++ and SFML) that involves, among other things, dragging roads on a tilemap. I had a very specific behavior in mind for dragging, which took some work to get right. I’m going to document my results here.

# Plumbing

Before we can even start to worry about how the dragging works, we have to implement a dragging action. SFML exposes events for the mouse being clicked, released, and moved — these form the basis of our action.

//... Inside main loop

sf::Event event;
while (window.pollEvent(event))
{
switch(event.type)
{
//... Other event handling
case sf::Event::MouseButtonPressed:
// Map global pixel location to a location relative to the window.
sf::Vector2f clickPos = window.mapPixelToCoords(
sf::Vector2i(event.mouseButton.x, event.mouseButton.y)
m_map->getView());

// Map this relative pixel location to a specfic tile.
sf::Vector2i tilePos = m_map->mapPixelToTile(clickPos);

// Tell the current tool where to start a drag.
m_currentTool->startDrag(tilePos);
break;
case sf::Event::MouseButtonReleased:
// End the active drag.
m_currentTool->endDrag();
break;
case sf::Event::MouseMoved:
sf::Vector2f mousePos = window.mapPixelToCoords(
sf::Vector2i(event.mouseMove.x, event.mouseMove.y)
m_map->getView());

sf::Vector2i tilePos = m_map->mapPixelToTile(mousePos);

// Update the current position of the drag.
m_currentTool->updateDrag(tilePos);
break;
}
}

//...


This is all fairly straightforward: map the mouse coordinates to a particular tile and pass off to the Tool class. We now have the groundwork in place:

class Tool
{
private:
bool isDragging;
sf::Vector2i m_dragStart;
sf::Vector2i m_dragCurrent;
std::vector<sf::Vector2i> m_tiles;

void updatePath();  // <-- The magic happens here.

public:
void startDrag(const sf::Vector2i& pos)
{
if (m_isDragging) return;

m_isDragging = true;
m_tiles.push_back(pos);
m_dragStart = pos;
m_dragCurrent = pos;
}
void updateDrag(const sf::Vector2i& pos)
{
if (!m_isDragging) return;

m_dragCurrent = pos;
updatePath();
}
void endDrag()
{
if (!m_isDragging) return;
m_tiles.clear();
}
};


# The Algorithm

Our goal is to produce a list of tile coordinates that connect m_dragStart and m_dragCurrent. We want a linear path, with an optional dogleg, i.e.:

* * * * * *
* *
* *
* *

Finally, if there is a diagonal segment, we want the tip to “rotate” as we extend the drag:

* * *
* *
*
|
v
* * *
* *
* *
|
v
* * *
* *
* *
*

The first draft of the algorithm succeeds at everything except rotating:

void Tool::updatePath()
{
// Clear output vector.
m_tiles.clear();

sf::Vector2i curr = m_dragStart;

while (curr != m_dragCurrent) {
m_tiles.push_back(curr);

// Signed deltas
const int dx = curr.x - m_dragCurr.x;
const int dy = curr.y - m_dragCurr.y;

if (abs(dx) > abs(dy)) {
curr.x -= 1 * sign(dx);
}
else {
curr.y -= 1 * sign(dy);
}

}

m_tiles.push_back(curr);
}


(The sign function simply maps negative numbers to -1, positive numbers to 1, and 0 to 0.)

This is a very simple algorithm, but it doesn’t handle the diagonals exactly the way we’d like. The “rotation” of the diagonal end is determined by what branch is run when abs(dx) == abs(dy) — here, it’s the vertical shift, and as such the diagonal always ends with a vertical shift. This causes a “wiggling” motion when dragging diagonally, as the entire diagonal segment shifts back and forth to compensate. Not ideal.

The key is that as we extend either the vertical or horizontal delta for the entire path, we want the diagonal end to switch rotations. This leads to the following:

void Tool::updatePath()
{
// Clear output vector.
m_tiles.clear();

sf::Vector2i curr = m_dragStart;

// Manhattan distance from m_dragStart to m_dragCurrent.
const int dist = abs(m_dragStart.y - m_dragCurrent.y) +
abs(m_dragStart.x - m_dragCurrent.x);
const bool odd = dist % 2 != 0;

while (curr != m_dragCurrent) {
m_tiles.push_back(curr);

// Signed deltas
const int dx = curr.x - m_dragCurr.x;
const int dy = curr.y - m_dragCurr.y;

if (abs(dx) > abs(dy)) {
curr.x -= sign(dx);
}
else if (abs(dx) < abs(dy)) {
curr.y -= sign(dy);
}
else {
if (odd) {
curr.x -= sign(dx);
}
else {
curr.y -= sign(dy);
}
}
}

m_tiles.push_back(curr);
}


In this modified version, we take the Manhattan distance between start and current tiles. Then, if this number is odd, we shift horizontally when the deltas are equivalent, and if it’s even we shift vertically. Which rotation is used for odd/even is arbitrary, it only matters that they are different. Now, as we drag along the diagonal, the rotation of the end tile changes as the mouse crosses the tile boundaries, preserving our previous path and eliminating the “wiggle”.