Programming Advice From A Master: Sebastian Aaltonen
A lot of people head to programming school–i.e. computer science departments–with the goal of getting the education necessary to become software engineers at Amazon, Google, Facebook, etc. There’s a desire to learn the necessary skills to contribute to software projects. However, there often appears to be a disconnect between what is taught and what is useful in successfully delivering software products on time.
Experience is the best teacher for an individual desiring to work in the software field. With careful observation, however, one starts to see that there are many impediments to delivering software. For those who are starting out, or even those who have some experience, learning from those who’ve fought the same software battles provides a great return on the time invested. Traditionally, many professions used a master-apprentice relationship to share knowledge across generations. In the software field, finding those masters is a bit harder, but it’s still possible.
This post is meant to share the knowledge of one such master, Sebastian Aaltonen, principal engineer at Unity. The following is a collection of advice that Sebastian shared across social media; this is not exactly the same as if he was right next to us, teaching us the right way to think about software. But, his posts are the best we’ve got.
In his social media posts, there are some deep kernels of knowledge, some of which even contradicts what schools teach. For example, programming schools early on like to hammer the idea of the DRY (don’t repeat yourself) rule. However, following this advice blindly leads to a whole host of software mistakes: over-engineering, higher code complexity, tighter coupling. Sebastian’s advice contains great insights into how to better develop software.
The Master’s Advice
Now that people have already said highly controversial stuff like “debugger is useless for C++ development”, I think I can share my own controversial thoughts about unit testing, DRY, copy-paste coding and function length, etc… with 20 years of C++ programming experience.
I want to start this rant by telling that I have made (or allowed my team to make) countless of programming mistakes during my career: Textbook OOP (bird is animal), template monsters, too much codegen, over-engineered solutions, overly generic code , overuse of raw sync prims…
Testing
Each unit test is an additional dependency. Another call site that uses your function /class/data. Adding a dependency to code/data that has zero dependencies is not free . It adds inertia. Further changes of that code slow down and some refactorings/optimizations become infeasible
Adding an extra dependency (test case) for common library code (math, containers, etc) doesn’t add intertia, since code like this already has lots of dependencies and well defined API that doesn’t change. Test cases bring high benefit and very little downsides for code like this.
DRY and Dependencies
A good practice is to copy-paste code three times, and then refactor (extract) if all three instances are still doing the same thing. Before this, you don’t want to add unit tests, because your code has no dependencies. Code without dependencies is the best code. Safe to modify.
1 problem in big code bases is entangled code dependencies. If you make code too generic or extract code too soon, you end up with more dependencies. Code used by 2+ call sites tends to eventually bloat with complex control flow. Bad performance, hard to understand and modify.
There are two types of long functions. Entangled and linear. If you avoid dependencies (see above), you can often write long functions which progress perfectly linearly . Trivial to read, modify and understand. No jumping around the code base. Minimal brain load.
It might seem like a good idea at first to split several such long functions to small functions and reuse some pieces of shared code. But this adds dependency between these functions and reading experience is no longer linear. Modifications affect many functions -> higher risk.
Programs transform data. Data dependencies are the actual dependencies and define which data you need to process in which order, how you can parallelize the processing and which invariants need to be maintained. Code dependencies are often false dependencies.
As said above, large functions aren’t usually a problem. The same is not true for large classes/structs containing lots of data unrelated to each other (Baseobject syndrome). You should never use real world object abstractions when you decide where you put each piece of data…
Instead, you should split your data according to all transforms you performed to that data. Data commonly accessed together go together. Fields not accessed at most call sites get split to separate structures. This avoids false dependencies and improves data cache utilization.
Synchronization
Critical section (mutex, etc) teaches a wrong way to think about synchronization. You don’t want to synchronize code. Code is immutable. Race conditions (RAW, WAR, WAW) are all pure data hazards. Shared sync primitive guarding one function/class is often a code smell.
Instead of fine grained sync primitives, you want to split your data so that you remove false dependencies. This often allows trivial parallel work and at the same time makes your code base much easier to modify and maintain. And improves your performance too (cache utilization).
You can’t multithread your code if you can’t ensure no data races. Processing of objects filled with pointers and references to other objects are hard to parallelize. Virtual functions are even worse. How can you know which data is accessed, if the call target is not known?
Most of your perf bottlenecks are in loops. No function is slow if you don’t call it many times. If you fully understand all data used inside a loop, you can often use parallel loop to scale it to all cores. Abstractions, virtual funcs and data dependencies make this hard.
Parallel for loop is one of the safest multithreaded optimizations you can make if you use simple data structures with no hidden data. Parallel for loop can be self contained inside a function. No need to modify other code. Works well with task/job schedulers too.
Performance
“Primature optimization is root of all evil” is the most misunderstood sentence in CS. Planning your data structures is part of the architecture design, not an optimization . Good data layout both improves performance dramatically, and also improves code maintainability.
Better version “Do not micro-optimize unless profiler shows a bottleneck in that code”. Use profiling tools from the beginning of the project to react to problems before it is too late. Profile often and automate profiling for QA.
Maintainability
Best way to make future proof code: Make simplest possible code meeting current requirements . Minimize code/data dependencies. Overly generic code is not future proof. It handles some extra cases, but is overly complex. More refactoring to make it do what you actually need next.
Predicting future is hard. Try delaying decisions and writing code as late as possible . This way you make decisions and write code that better suits the actual needs. Coding a feature that nobody uses is waste of time. Coding a feature too early often leads to big refactoring.
Callbacks/delegates/listeners/events are dangerous in multithreaded environment. Destructor unregisters from all objs = modify all those objs (see next tweet). Firing an event = virtual call to N unknown targets. Impossible to make safe, unless you forbid parallelism during it.
Unknown object life time or owner is a code smell (ref count, shared_ptr). In most cases you should be able to define a clear owner for each object. In multithreaded environment custom destructors are very dangerous with ref count schemes. Hard to know when destructor is called.
I don’t personally hate unique_ptr or other non-refcounted RAII. But beware of memory allocation cost, mem fragmentation (on consoles and mobile) and pointer indirection cost (cache miss). Prefer value members instead of separate alloc for a member and use custom allocators…
The best way to allocate objects of certain type is to allocate a big block of storage for all of them and put them next to each other. Linear memory access pattern is much more cache friendly than allocating separate memory for each object. Big performance difference.
Most important custom allocator (in gamedev) is the frame temp allocator. It’s a fast (per thread) bump allocator that gets reset between each frame. Use it for all temporary allocs that have life time <= end of current frame. Big reduction in fragmentation and mem alloc cost.
If you use manual new/delete, use a memory leak detection tool. It’s also simple to wrap global new/delete to create your own leak tracker. Keep leak tracker active in daily dev builds. Once you have a custom mem tracker, you can use it to dump memory stats too. Very useful.
Not C++ related, but worth noting: Not all game features are equal. Features such as online multiplayer and deterministic simulation need consideration in data and processing model design. Can’t add features like these late in development. Agile doesn’t mean no planning at all.
When refactoring code, make a local branch (git) or shelve (P4). This makes it easier to “throw away” your changes if you don’t like the result. Code will still be available for later use if you want to revisit. Don’t push refactorings to dev main unless you are 100% happy.
Every refactoring attempt gives you information. Never consider a failed attempt as useless work. However, focus your refactoring efforts to code that is actively in development. Code that works fine and doesn’t need any near future change doesn’t need to be touched.
Write well performing code by default, but don’t micro-optimize without profiling . When optimizing code, benchmark result on all target platforms and don’t push to dev main unless result is 100% clear (no regressions) and you are happy about code quality. Branch/shelve if unclear
Store profile trace/capture files for all platforms to maintain history. Name files after latest optimization/change. Do before/after compare after every optimization on all platforms. Write email + brief info to commit about gains. Automate regression tests for QA.
Pure ALU instruction count is not the most common performance bottleneck. Use platform specific low level profiling tools to find out the actual bottleneck instead of wasting time doing wrong optimizations. In CPU code, memory latency and cache are the most common bottlenecks.
Memory load->addr->load dependency chains (such as linked lists) are #1 poison for modern out-of-order CPUs. Prefer breadth first over depth first tree traversal to expose more parallelism for the CPU. CPU starts loading memory of all siblings concurrently, amortizing latency.
Before adopting a new C++ standard ensure that all target platform compilers have robust support. Investigate codegen and compile times for every feature you want to use on all compilers. It’s OK to ban certain feature if you feel it’s too risky to adapt at current state.
C++ has constructs that can make it a weakly-typed language. Avoid features such as implicit user defined conversion operators and implicit constructors. Use explicit keyword and concrete types whenever possible. Lean on compiler to catch type errors early and consistently.
Enable “warnings as errors”. Prefer higher warning level and manually disable over-excessive warnings on platform/compiler basis. Review warning disable list after updating each compiler. Catching bugs at compile time is always better than runtime.
Use both static_assert (compile time) and assert (runtime) to validate assumptions . You should create separate assert_slow macro that is only enabled in debug config . This way you can reduce cost of slow asserts in hot code. Shipping config of course has all asserts disabled.
It’s a good idea to process things one “type” (functionality) at a time, instead of one (aggregate) “object” at a time. Loops become much simpler to understand as you don’t need lots of branches and don’t need indirect calls. Parallelizing your code becomes much easier.