On "Optimization"

April 30, 2023 | 8 Minute Read

Optimization is a funny word in software development. It seems to mean so many things which don’t overlap. The announcement hit that the newest annual modern warshooter, Black Ops Cold War will take 255GB on some systems. As a result, I’ll describe some approaches to optimizing diskspace that I’ve taken as an self-educated, uninformed, and frankly insane individual.

Many, many years ago I made a pet program in the Unity Game Engine titled “Cubequest”. The name was actually already taken by an older game from the 80’s, but it was never meant to be a completed project, just a toy. The game is available in the links section to download. Several attempts at a sequel that I could release were fiddled upon. The last, most recent version included an editor. That’s where my fun began.

I had realized that, unlike the first version, I needed to handle level loading myself. I wasn’t entirely sure how to safely handle Unity onboarding external levels, so it fell upon me to devise some sort of solution. This meant that I had to handle all serialization of all objects, their purpose, meaning, state, everything. That lead to an interesting land of opportunity.

I started fiddling to see how small I could get the files in cubedquest. By taking a cube, and having a pre-determined starting position, starting rotation, starting scale, a set series of states, and a set series of colors the amount of states is greatly reduced. In fact, this is the table used for determining everything about a cube’s state:

//0(1) start enabled
//1-4(16) cube type assignment [empty, player, enemy, ground, wall, glass, coin, endgate, obelisk, trigger, spotlight, NOTASSIGNED, NOTASSIGNED, NOTASSIGNED, NOTASSIGNED, NOTASSIGNED]
//5-7(8) cube-specific behavior traits
//8-15 (256) posx (-128 then div 2) 
//16-23(256) posy (-128 then div 2)
//24-31(256) posz (-128 then div 2)
//32-36(32) scalex 1 default then add 1 mult value by 2 (1 + 1 * 64)
//37-41(32) scaley 1 default then add 1 mult value by 2 (1 + 1 * 64)
//42-46(32) scalez 1 default then add 1 mult value by 2 (1 + 1 * 64)
//47-51(32) quai / 4 then sqrt to x, w is identity and is derived by subbing square powers of i, j, and k
//52-56(32) quaj / 4 then sqrt to y
//57-61(32) quak / 4 then sqrt to z

The name for that data layout was ‘OGD’, which is the same as what I have been using for object handling in the CoalStar engine. By defining predetermined points for start and stop, I was able to greatly decrease the amount of data necessary for representing an object’s position. This had gone through iterations including polar coordinates and microsized float arrangements. An associated array of linked cubes was used for activations and other activities. The list used an unsigned char index, but the size always remained very small as most cubes never used links.

In the end, this was the chosen arrangement. With mostly 8 bytes per object, levels were usually 80-100 bytes in size, small enough that NTFS seemed able to pad them in amongst other files, effectively reducing disk cost to 0. It’s a very “your mileage may vary” solution, but I felt a certain amount of pride in it. I had to strip out levelnames in the files, electing to use the filename as the levelname, just to cheat more space saving.

When it comes to an entire engine, I knew I needed to extend that design philosophy outwards in all directions. I am entirely unfamiliar as to the nature of how modern engines operate, outside of the general concepts of rendering and asset management, and seeing the effects through games and projects in those engines. Though I have created a naive solution for rendering and asset memory management, I have a hard time calling it an entire engine solution.

One key element of the engine is the “supermassive” keyword. The two problems I have spoken of is precision over range, and the other is disk space utilization. Large worldspace requires a large quantity of objects to fill it. Exponential growth of space is a killer for space. For memory space saving, the focus is greatly on object placement metadata and terrain heightdata.

Height data was initially represented as straight u32 heights for each height. For the initial world size of 128 * 128 it led to a disk size of 64.25 GB for only heights and heights alone. This is more than a little unreasonable for the project’s intended system requirements.

Terrain heightdata has been finalized as broken into 1024*1024 unit chunks. Though one terrain vertex exists per unit (in the case of the current project, per meter), The height data is every other unit. On top of that, height data is stored as a single unsigned char for a height modifier, which is multiplied by 1024.0f for the lowest possible point. The individual heights are a (1024 * 1024)-sized array of unsigned shorts (0-65535) in memory. The calculation to get a height point is getting the height address, multiplied by 0.1f, and added to the heightmod, multiplied by 1024.0f.

The heights saved on disk are collapsed using a serial delta. The heightmod is written as is. The base first u16 is written first, and then the height array is written serially, with the difference of the current step checked against the last. If the difference is less than 127, or greater than or equal to 128, then the difference is saved as an unsigned byte of 0-254(dually inclusive). If the difference of height values is outside that range, then a new base height value(u16) is written. 255 as a value is for demarking that a new u16 height is being written.

The disk size of the two demos has reflected the evolution of this process. The height data is small enough now to meet a desired disk size of 8GB. The 8GB is a ‘goal’ instead of a ‘requirement’, as ideally the result should trend towards a more widely adoptable product, over a hyper-specialized one. I personally have suffered the devastating decision of what game to uninstall and always going to the largest install size.

Object Generation Data for items like setpieces have been squeezed and pressed as much as possible. No setpieceing has been set yet to test the OGDs, but the OGD layout is this complex mess:

#Meta and Positional#
0b0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000
  ^^ <2> meta header ( 01 single 11 start 10 continues 00 ends)
    ^^_^^^^_^^ <8> xsubpos (* 0.5)
              ^^_^^^^_^^ <8> ysubpos (* 0.5)
                        ^^_^^^^_^^^^_^^^^_^^ <16> zpos (* 0.5 + height)
                                            ^^_^^^^ <6> xrot(* 5.625)
                                                   _^^^^_^^ <6> yrot(* 5.625)
                                                           ^^_^^^^ <6> zrot(* 5.625)
                                                                  _^^^^ <4> xscale(* 1.25)
                                                                       _^^^^ <4> yscale(* 1.25)
                                                                            _^^^^ <4> zscale(* 1.25)

#Generational#
0b0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000
  ^^ <2> on generation state
    ^^_^^ <4> gentype
         ^^_^^^^_^^^^_^^^^_^^^^_^^^^_^^^^_^^^^_^^ <32> gendata
                                                 ^^_^^^^_^^ <8>
                                                           ^^_^^^^_^^^^_^^^^_^^^^ (18) UID

This is 16 bytes per object, which covers Setpieces, Actors, Forces, Triggers, Emitters, Waterplanes, and External/Extended objects. Each position is serially derived from an iterated list to associate to the x/y coord of the associated chunk. This may seem like a bit too extreme, but for this project it’s not too extreme enough. For natural environmental setpieces (bushes, trees, rocks, etcetera) a separate zoning system will procgen fill the landform, and apply an edited override entry. That will cut down on the OGDs per chunk by a vast margin, by removing the need for saving mundane assets. Per chunk (approx square km) 20,000 objects (or one every ~52 square meters) equates to 4.9GB of data, entirely dedicated to metadata on the placement of things and nothing else.

Even meshes aren’t safe from my madness, with vertex data collapsed using a min/max value pre axis, with each point between existing as a division of 256. This is so that each vertices position may be represented as a single byte, rather than a float. Texture coords, tangents, and normals will also suffer my madness.

The point of this writeup was to address diskspace optimization in relation to game projects, but nowhere had I mentioned compression tools like Zlib, GZ, ZSTD, or any other algorithm. I do intend to utilize those, quite heavily if I can get away with it. However, they are not all-in-one fix-alls for every problem, at least in my opinion on such things. With few exceptions, relying on automated and general-purpose solutions will never out-compete a tailor-made solution for a specific use case. How we landed at 255GB for a video game is multifold, regarding development time, desired visual features, sometimes incompetent developers(I know I am), and little care for maintaining adoptability of a product.

Hilariously against my reasoning, Cyberpunk 2077’s launch was a myriad of failures due heavily to the president and leader of the company trying to navigate the product into releasing for two separate generations of game console. The attempt at seeking to release sooner to meet that wider available adoption is allegedly what caused so many problems. In my mind, adoptability is gained through a mix of flexibility and compatibility. Compatibility includes not only hardware-specific and environment-specific instructions, operations, and libraries, but also meeting the processing and storage capabilities of that hardware and environment (32bit memory systems never being able to utilize over 4GB of memory, as an example of environment).

CoalNova