Fishy Magic


last update:

Posted in: forge

The single biggest cost of any piece of code is usually maintenence. Porting to another platform is a close second. Commonly to port code you’ll see a lot of things that look like

#ifdef WIN
#elif PSX

in the simplest case, anyway. If something’s more complex you might have whole functions replaced, or if something is sufficiently complex, an interface with platform-specific objects behind it. At worst many codebases end up with a melange of code for that system or not, and code flow becomes both a mess and a mystery.

To me the ideal is that the code for a platform should be readable as is. Code not for that platform should be compiled in, in its own files. Code for other platforms can be kept out of the build simply by not supplying it to the compiler. Other than platform-specific optimisations (which always end up polluting code down the line), keeping the platform-specific code as confined as possible.

Interfaces, not implementations

Done perfectly, isolated correctly, the contents of the platform-specific code don’t matter. In fact, when designing the architecture of an engine, only the interfaces of components matter. So while many diagrams might look like this


really it’s the lines that matter. In this ideal state, you can replace that platform-specific code with different code, specific to a different platform. Or perhaps you’ve isolated your renderer, and you can hence replace it with a different renderer. Or your input stack, or your network stack.


Which thing you want to be able to replace fundamentally affects that interface. If you’re replacing a renderer, you probably want an interface in the traditional sense - a fully pure-virtual base class, in C++ terms. If you don’t want runtime swapability, then there are static equivalents, where you might have a header defining a class, and multiple implementations, compiled in as required. At the far end you might want it to be so dynamic that the user can swap one of the runtime files to change the behaviour. This is where interface design really matters.

Reusability and Rewritability

If I used the above diagram as the highest-level view of an engine design (and I intend to) the things I’d need to be the most careful about are the Platform Interface and Engine Interface. Both in the traditional sense of what data crosses them and in what format, and in how those bridges are built. If G lives in a dynamic library, then given the write design it should be possible to edit and rebuild the code while the game is running.

This presents some problems when it comes to memory allocation. On Windows, every dll potentially can have its own C runtime, and its own memory allocator. Any memory allocated must be deallocated by the same allocator instance, so if a dll is unloaded, and was the ownly user of its allocator, it must deallocate all its memory, or nothing else will be able to do so correctly.

Obviously this is not ideal if you’re trying to reload the main portion of a game’s code. However, if dlls and their host program were built with the same runtime, the allocator should be shared, so at least on windows this problem can just be ignored. But if this isn’t the case on other platforms the way to fix it is to have the platform interface include access to the appropriate allocator

Platform Interface

So, what goes in here? What does the platform need to provide at its most basic? What can’t be inferred or created in a higher layer? Memory management is one potential answer, and definitely needed if we want to be as safe as possible. All the other parts of the answer I can currently think of are forms of input and output. We need a video outout buffer, the screen, or a window pane. We need an audio output buffer, likely the most simple part. And we need information about how the user is trying to send commands, usually via the mouse, keyboard, and gamepads.

Additionally, we’ll need some control over these - we might want to request a particular screen resolution or mode, or there may be some output in the input devices we want to use (e.g., vibration). We might also want to request particular behaviour of the audio system - e.g., sample rate. But creating an interface which should cover all eventualities for all platforms shouldn’t be too hard given the above.

Oh but it is

Of course, this all depends on what’s available on what platforms, and what their APIs make available to you. On windows the above is all certainly possible, and with a bit of work you could use hardware accellerated audio and video, all controlled from the engine layer. However… on modern console platforms you rarely get access so low level, and using this architecture, you might find that some methods you’ve put in the Engine layer (e.g., playAudioSample) have to make direct API calls.

This just leads to the previous principles needing to be implemented everywhere. Any coherant system might need to be swapped out on a particular platform to deal with quirks of that platform, and your atchitecture, partivularly its interfaces needs to be designed to allow that


Posted in: electronics

After starting my new job at Boss Alien I’ve found myself less willing to program at home again. As a result I’ve turned back to other hobbies for the evenings, particularly electronics. I never had such a great understanding of the subject before now, but someone showed me the nand2tetris course, and I couldn’t help but give it a go.

My work on it is being archived here, although it’s likely not of any interest to people - if you find the course interesting you’d try it out for yourself, and (assuming you get the knack for it) you’ll be railroaded into largely the same answers.

Backing up briefly, the course takes you through the steps of designing a computer. You start with NAND gates, and use them to build more complex logical gates, and use them to build more complex ones, until you have a full CPU, with RAM and memory mapped input. Truthfully, the CPU the course tricks you into designing is much simpler than most modern general purpose CPUs, but it’s quite capable of running reasonable software.

Further, the course has you write an assembler for the CPU, and then a virtual machine, and a compiler for a simple programming language. The final product is a very confined but functional design for a computer, and the tools that one would need to work with it. Sadly physically building the CPU would be quite the endeavour of its own.

While reading about some part of the course I also came across a recommendation of the book Code which covers many of the same topics, while going down to a much lower level description of how to build the hardware, all the way down to wires, reeds, and electromagnets.

But could you build one by hand?


Ok, but should you?

Probably not, no. Ultimately, while the NAND gate is a convenient starting point for CPU construction in the modern day, you can do it with any number of other bases. The relay (a reed and electromagnet construction) can be used on its own, or in pairs or threes to make most simple gates (and some less simple ones like multiplexors). While you can make a relay by hand, it’ll likely be large and prone to breaking. Making enough of them for a CPU would be a very large time investment. If you purchased them instead, you’d find that compared to silicon gates, they’re slow and noisy and consume lots of power.

You might not be able to build a general purpose stored-program computer in a reasonable amount of time, but you might be able to build a specific-purpose computer.

I’m still deciding if I should give it a go.


Posted in: code

I’d been wanting to try euclid on a Raspberry Pi recently, as I’d heard that while its CPU is 700MHz, its performance is somewhat like a Pentium 2 at 2-300Mhz. This means that the performance should be only 3x that of a machine which could run Duke Nukem 3D (its manual says it requires a 486 DX2/66 processor - 66Mhz, although with a much less ‘smart’ design overall). Now, I’ve no doubt that Ken Silverman is a much better programmer than I am, and of course, lua slows things down, but I thought that this sounded like plenty of overhead for the engine to run.

How wrong I was

Porting the game to work on linux again turned out to be relatively straight forward, although I had to rely on the linux packages for SDL2 and SDL2_image. That’s probably for the best though. After a few other tweaks I got it running, but not responding to input. However… the framerate was not so good. About 1-2 frames per second. Worse still, my lua profiler told me that the Render function was taking 0.05 seconds, meaning at best it would go up to 20 frames per second, or so I thought.

At the urging of several acquaintences, I figured I’d give luaJIT a go. It advertises 3-15x performance improvements, so it couldn’t hurt. luaJIT is based on lua 5.1. The actual language isn’t hugely different from 5.3, and most of the API changes are quite easy to deal with. Once integrated, luaJIT improved the framerate about two-fold. Not bad, but not good enough.

At this point I was ready to let the idea of running on Raspberry Pi go, but I decided to profile it anyway to see if there were any useful patterns. Just about everything was running almost exactly twice as fast. So much so that an important detail almost didn’t notice - the Render() function was also twice as fast.

Wait, what?

Nearly everything in the Render() function is native code. So it shouldn’t be affected by the change to luaJIT. At first I thought it might be to do with my C++/lua integration being less efficient than I thought, but I decided to check on the ‘nearly’ part of that statement.

It turns out inside the camera code there was a line of lua like this:

local sector, pos =
	entity.position + self:getViewOffset()

Where getViewOffset() is to give the camera viewbob. Why LineTrace ? because moving horizontally can put the position in a new sector, and working out which is non-trivial, and very important for the renderer. I removed this LineTrace, and the time for the Render call goes from 0.028 seconds (the luaJIT time) to 0.006 seconds. So that one line trace was taking around 22 milliseconds to run. This needs to be called every time any entity moves. So if there’s ten entities moving, plus the player, that’s 11 * 22ms or .242 seconds, just for Line Traces, per frame.

Of course (ok not of course), LineTrace was written in lua, when it should have been in native C++, so there’s a clear improvement to be had here.


A lot of this lua code was written because, with the live editing of lua, the iteration times were fast. Hence fixing bugs in lua code is fast compared to C++. However, code which is clearly expensive, and requires high performance needs to be moved to C++, even if, on my development PC it seems ‘fast enough’

The luaJIT documentation also indicates that it has some extensions which could be used to speed the code up even further, specifically tying in C-style structs and functions in a way that works nicely with the JIT compiler. Using this system might also allow for further optimisations.

Further Possibilities

If I had a way to hot-reload C++ code, it might have meant this code never entered lua, as I would have been able to iterate just as quickly in C++. Getting such a system working isn’t simple, but is definitely possible using dll files or shared libraries (on linux).

A change like this could be made superficially, partitioning the code into that which can hot-reload, and everything else. This wouldn’t be too difficult, but it also would only benefit those parts of the code that fell the right side of the line. To make substantial use of this feature, it would need to be worked into the engine in a more fundamental way.

Raspberry Pi

It’s possible that with all of these changes the engine might work on the Raspberry Pi, with some strong restrictions on the number of entities and the complexity of maps. While it should be possible to do a lot more on the platform than this, I think that might have to be restricted to some later engine. By entirely targetting an engine development effort at the platform, and in all likely hood restricting it to mostly C or C++, more impressive work should be possible.


Posted in: code

There is probably no concept more fundamental to computer science than the Finite State Machine (FSM). Variants of this concept came early enough in computer science that they actually predate electronic computers. The state machine has, in the modern day, become a very common software pattern.

The FSM is a highly practical pattern which can be used to simplify code, and accurately represent a number of fairly complex systems in a maintainable and idiomatic way. I’m assuming the reader will know what a FSM is, at least approximately, although I’ve written an overview of the concept below. I am also assuming that the reader is sufficiently capable of implementing one for themselves, or at least can find out about how to do that elsewhere.

What this piece is concerned with, is why you should use them, not how you can.

Finite State Machines

A FSM is defined by a list of its States, and the list of transitions. The current state of the machine starts in the initial state (of which there is usually exactly one), and can change when the conditions for the transitions are met.

They can be used to categorise data - Regular Expressions can be implemented as a state machine - or used to simulate any number of simple(ish!) systems - like animation trees, behaviour states, and so on.

For some systems, a FSM is an obviously good fit, for example fish behaviour, which has states for Swimming, Beached, Startled, Curious, Hooked, and Dead. These states are all mutually exclusive, and in the confines of the behaviour, the conditions for the transitions between them are exact and simple.

But there are other articles about how best to use them for this, and how best to implement them. The major benefit of FSM which applies much more widely is bug reduction, and code simplification (and hence, improved maintainability).

Bug Reduction

Frequently in code you’ll see complex functions with forms something like:

class someClass {
	void someFunc() {
		if(someBool) {
			/* do stuff */
		} else {
			/* other stuff */

		/* common code */

		if(someBool) {
			/* more stuff */
			if(SomeConditionIsMet()) {
				/* do some setup */
				someBool = true;
		} else {
			/* even more stuff */

This isn’t too bad as code goes, but it’s not resilient to changes. Say when someBool is set, the code in «common code» needs to do something slightly different. The maintainer has to remember to wrap that code in if(someBool) { } or it will affect both of the states. Worse, if another boolean or similar becomes part of the code you get further issues, as you have to be sure that the function does the right thing for every combination of states it could be in, and in the event that the state becomes an error state by a set of flags or enums being in a combination not expected or designed for.

For example, you might have someOtherBool which when it is set should supercede someBool being set. Now you have to update every conditional which checks someBool to make sure it respects both pieces of data. The more booleans or other dividing conditions you add, the more complex the relationship between the states of those data becomes. The more complex the conditions for that code gets, the easier it is to write bugs, and the harder it is to successfully maintain the codebase.

A state machine can fix this. Not in every case, and when it can it might not be the perfect solution, it’s not a magic bullet. But lets say instead of someBool and ifs we used states. The above code could look like this:

class someClass {
	struct someClassState : public State {
		virtual void someFunc() = 0;

	struct StateA : public someClassState {
		void someFunc() override {
			/* stateA code */

			if(SomeConditionIsMet()) {

	struct StateB : public someClassState {
		void Enter() override {
			/* StateB setup */
		void someFunc() override {
			/* stateB code */

	void someFunc() {

	using initialState = StateA;
	StateMachine<someClassState, initialState> curState;

This code is logically equivalent to the previous sample. The benefits: States are distinct from each other. Each state’s functions are clearly separated from one another, both logically by the object structure, and physically in the source code. Adding States or changing the states’ interactions has more obvious results.

The Enter() function here is called when a state machine enters that state, no matter what. This means in the first code sample, any of the setup code that precedes someBool = true; which needed to be set alongside someBool is handled, and a maintainer doesn’t have to remember or check what that might be. With a little work it’s possible to have EnterState accept arguments for the new State’s constructor as well, to handle any data that needs to be passed around (or just have the interface like EnterState(new StateX(args, go here)) or similar)


There are a few downsides I can think of, but some of them may be mitigated, somewhat. They are:


This one is the hardest to contest. Compared to a badly written ad-hoc state machine, the verbosity isn’t particularly significant. In a large complex system, it’s probably dwarfed by the actual code. In a small system, however, it can be a problem. I can only say that terse code is rarely as clear. If tersity is an absolute requirement, then don’t use this method. But I’ve never worked in a codebase where tersity was valued over clarity and maintainability.


The state machine seems like a more complex system. It’s an extra layer of abstraction, and requires your library to have a decent implementation of a state machine for you to work with. That said, just because you aren’t using an explicit FSM system, doesn’t mean you haven’t written an FSM accidentally. Ad-hoc state machines appear in code all the time, and they’re just as complex as the explicit state machine which is their equivalent, however that complexity is hidden, and not in a good way.

Using an FSM says to a maintainer ‘this system is a little complex’, but it also says ‘this complexity was planned and necessary’. It gives the maintainer a clearer idea of where the dividing lines in the states are, allowing them to navigate the complexity in a reasonable way.

Insufficient Complexity

State machines can only model certain specific classes of systems. Sufficiently complex systems might need multiple state machines to represent them accurately, or they might need a more complex type of machine entirely. Luckily there are Hierarchical State Machines, which give states base states and add all kinds of possibilities. There are also Push Down Automaton, which allow the state machine to retain a history of its states in a stack. They are very practical for menu systems, and many other things.

Virtual Calls

This complaint is a complex one. Compared to a badly written ad-hoc FSM, the number of branches, and switches and other crap that might make up the code being replaced which just disappear when you write an explicit FSM, the virtual jump cost disappears. Compared to the exact same code written as something like:

void someFunc() {
	switch(stateEnum) {
	case State1:
	case State2:
	case State3:

and an accompanying SetStateTo(newVal) to replicate Enter() and such… then yes, you’re paying for a virtual call you might not have had to. Of course, in the above version you’re paying for a lookup and jump which is really just an adhoc virtual call, and while I haven’t profiled even a toy version of this, I suspect the cost is similar.

Obviously given the whole point of this post is ‘Ad-hoc systems are not as good as explicit ones’ I’m going to say that an ad-hoc virtual table is not as good as an actual virtual table. Explicit FSMs and explicit virtual table systems are good idiomatic code. They tell the maintainer exactly what you’re up to, and they save time and effort down the road.

Use them.


Posted in: SFA

While working on actual fishing mechanics for Super Fishing Adventure I briefly wondered how much memory the game was using, and found that at least according to the task manager, the amount was steadily increasing.

actual fishing mechanics

Memory Leaks can be a massive pain, or they can be incredibly simple to fix. Sometimes you’ll have a leak that gets worse over time, and sometimes you’ll have one that gets harder to fix the longer you leave it. I was worried that the latter might be the case, if it was a result of how I was using lua in euclid, so I decided it would be best to figure out what was going on.


At the moment, the game doesn’t run on Linux, so I couldn’t just run valgrind, which would usually be my preference. Instead I thought if I could find when the bug had appeared, I could narrow down what part of the source code could be the cause. Git bisect told me that the bug was older than the SFA repository, so I took it over to euclid, and what I found there was that the bug was both very old, and the older the source code was, the less pronounced the bug was. To the point that I couldn’t be sure exactly when it appeared.

Luckily, I’m using Visual Studio 2015 for engine development, and it has a snapshot-based heap debugger. It lets you click a button to capture the heap state, and gives you stats about how it compares to the previous snapshot:

heap debugger

It also lets you explore the allocations, and see the callstack that created them by clicking the numbers next to the snapshot. It can show raw numbers, or allocations that differ from previous snapshots. The heap explorer has one big flaw though: if it can’t work out the type of an allocation, it gets filtered, and these allocations can only be shown by turning the filter off, behind this (in retrospect) obvious button:

insufficiently obvious button

It turned out my memory leak was hidden behind that filter. When I finally noticed it, it turned out to be inside this line:

auto file = FindFirstFile(filename, &findinfo)

Which was used for reloading script files as they were modified. The problem is that file here is a HANDLE which needs to be closed. And it turns out that this check was happening once per frame, per script file. This explained why the leak got harder to measure in older changelists - there were less and less script files.

Lessons Learned

This bug appeared as a result of my unfamiliarity with windows libraries. It wasn’t the first approach to reloading files I found, and it wasn’t even the first that worked, but I imagine I was sufficiently frustrated by the time I got it to actually do the thing I wanted that I neglected to read the documentation for the function thoroughly. Of course this makes it entirely my own fault, but I know I’d have found this very irritating if it was someone else who’d written it. Perhaps in future I’ll be more forgiving of mistakes like this in other people’s programs.


It turns out I’ve got some other problems - not with memory, but with CPU. At some point I’m going to have to make horizontal planes get batched, as the current system results in far too many costly calls to the code that draws them. That can wait though!


Posted in: SFA

I now have outdoor lighting! It’s not super fancy, but it allows sectors to have their lighting be 100% controlled by the current daylight level, or to go between some base level and some max level, proportionally to daylight. Super simple, but it looks quite nice.


I’ve also been blocking out the area I want for the intro to the game, as getting it right will require me to finish off a bunch of the systems I’ve got planned to go in, and it’ll require me to draw some new and better textures. And that would mean finally getting rid of the awful stone wall texture which I’m pretty sick of looking at.

I’ve started working on a more complete design document for the game too, but almost everything I write down I find myself thinking ‘I need to make that to see if it’s fun or not’, so perhaps it’s a pointless exersise. On the other hand, it gives me something to do when I can’t use my laptop.

Still got a lot to do though!


I’ve modified my website deploy system so that I can write posts on any device which supports git. My Web server will then do all the hard work to make sure the post is viewable online.

I modified the scripts from my phone and wrote this post from it too. It’s something of a pain on the arse but it works.


Finally repaired my blog with this new blogging platform, called Hugo. It’s pretty nifty, and allows you to write posts offline in markdown, and publish them as fully static pages. Commenting (if desired) is added by linking to disqus. Overall it was kind of a pain to set up, but having my entire blog stashed in a git repo is pretty nice overall.

I guess I’ll see how long it lasts!