Crude Minecraft in a thousand lines of Python

By dkl9, written 2025-199, revised 2025-199 (0 revisions)


For several months have I had the recurring urge to play Minecraft. This comes a couple years after I heavily played it and its (satisfactory) clones, mostly Luanti. On 2025-162, my friend M.M.K. remarked to me the ease with which one clones Minecraft. Seeking to fulfill that urge more intellectually, I began as he said a week later.

Python is an easy, fun, and well-documented programming language, so I chose it, along with Pygame and OpenGL. Python is also dreadfully slow, as I would find out the hard way. Near the start, I decided on a few rules:

  1. Write at most a hundred lines per file.
  2. Stop when you have a thousand lines.
  3. Use only Python and its libraries.

Rule 1 forced me to work carefully with class inheritance and improve my Vim setup. I conclude that the ideal file-size is somewhat more than that.

Rule 2 took a lot longer to get in the way than expected. Python is very concise. Serious 3D graphics were new to me, so there was much more confused debugging than real progress.

Rule 3 made clear to me how slow Python gets. Some of this, I fixed with cleverer algorithms. The rest, I tolerated and forgot about.

§ Learning OpenGL

I tried some PyOpenGL tutorials. They didn't work. I turned to ChatGPT for debugging, and found that Alpine Linux's py3-opengl was just broken, at least on my machine. Probably, the hardware is too old. Switch to a venv. That'll fix it.

With a lot of code I didn't grasp, and some code that I did, I made a scene of two flat shapes. Some years ago, I abused Shadertoy — as everyone there did — for 3D graphics. The geometry practice made it easy enough to make the camera mobile.

§ Clicking

To handle a click, you must know what the cursor points at. In a 3D game like this, the cursor is the centre of the screen. ChatGPT, obedient as always, told me how to query OpenGL's z-buffer to find which shape ended up in front, at any point of the screen.

As often, the real insight comes not from reading what the LLM says, but skimming it to inspire your own thoughts. The check-what-got-drawn trick was doomed to fail, but it satori'd me into into recalling that the camera has position and direction vectors. Cast a ray ro + rd t, and check for the minimum t that gets in a shape. After I copied the maths from Inigo Quilez, and adjusted for each shape's orientation, that worked perfectly.

blue rectangle and spinning orange triangle on a dark grey background

§ Cubes

A Minecraft world is made of chunks are made of cubes are each made of six squares are each made of two triangles. So I made a square, and then a cube, and ran into confusions about np.roll, modes for glDrawArrays, and more.

This was also the step at which I allowed for different colours for different parts of a shape. As I knew, such flexibility would later be essential. For now, I distinguished the faces by the colouring of debuggium.

a single flat-shaded red/green/blue cube on a dark grey background

It looks like GTK already independently discovered debuggium.

GTK logo: a partly-wireframe red/green/blue cube

§ Break and place

A chunk — called here a "region", to differ from Minecraft — is an array of materials (integers), one per block — here, voxel. With the array is a mesh of squares, six per non-zero voxel.

Left-click and you break a voxel: reset its material to 0 and drop its squares from the mesh. Right-click and you place a voxel: offset coordinates, with another np.roll, set material to 1, and add squares to match.

a messy assembly of mostly-adjacent flat-shaded red/green/blue cubes

In either case, you must update the vertex attributes after each change. In either case, the code to handle clicks must know which square was clicked — a minor change to Camera.target.

With material-array world and index-vector p, world[p] indexes three 2D layers of world, instead of the intended single point. That caused some problems. world[*p] is what I wanted.

three big sheets of debuggium

§ HUD

When first making Camera.target, I wished I had something to show the centre of the screen. Better late than never that I added a crosshair: two line segments exempt from the perspective transformation.

The whole program runs by one CPU thread, so any lag shows up as a strike to framerate. I wanted something more precise. Text would add another library, or a mess of code, and I felt it would break the visual spirit of the game. So a square in the corner turns green, yellow, or red to show the framerate.

debuggium scene with a coloured square at the upper left

It was about here that I quit querying ChatGPT. Shortly after did I find Learn OpenGL, from which I would've learned a lot more clearly. So it goes.

§ Perlin noise

Allegedly — a primary source eludes me — Minecraft uses Perlin noise to generate its maps. I looked up what that means.

As often, the real insight comes not from reading what maths Wikipedia says, but skimming it to struggle in confusion up to a spike of grokking. Thus did I demonstrate at the custom-URL'd https://www.desmos.com/calculator/kenhperlin.

Once I grokked, it was easy enough to fill part of the one-region world with a 3D noise-controlled pattern of debuggium. This led to unrealistic overhangs, so I later switched to 2D noise as a height-function.

voxel-blob cut out from a giant cube of debuggium

In either case, a 16³-voxel region implied upwards of a thousand cubes to draw implied great lag. Years ago had I watched how someone made a simple Minecraft clone. Without looking it up, I recalled it well enough to copy one of its tricks: draw only the exposed surface of each region. Thus my code skips leves of mesh-building, from regions directly to triangles.

voxel-blob from the inside, the view obstructed only by the surface

With the region-mesh like that, you could still place voxels by naively adding six squares at a time. When you break voxels, you would have to find and delete its existing faces, and insert new faces as the voxels behind it are exposed.

That could be done in-place. But I didn't feel clever enough to code it out, so I had the region fully remesh at each broken voxel. Sad and slow, but it worked — less slow after I (so says the commit log) "frantically optimise[d] remesh".

§ Faster raycast

Like in Minecraft, I added a wireframe-square to highlight the voxel you point at. But before I got that square oriented right, the program lagged terribly, far worse than before. By now, I was used to profiling; it told me that the problem was Camera.target, which now ran every frame, instead of only on clicks.

voxel-blob with one square of the surface highlighted by a dark square with one diagonal

Camera.target cast rays to WorldRegions the same as any Entity: break it into triangles and raycast to each, unconditionally. Each region has thousands of triangles, hence the lag.

The obvious step is to only check in detail when the camera's ray intersects the region's bounding-box. Quilez saved me again there. Besides that, the raycasting code should only check faces that fall in the path of the ray. Regions of voxels are highly regular, so this can be efficient, but the devil's in the details.

As often, the real insight comes not from writing and trying the code that comes to mind, but skimming over the problem until you shower, or lie in the dark, about to sleep, and so have the space to think carefully towards the (almost) perfect algorithm. Intersect the ray to each of the 17 evenly-spaced planes between voxels in the region. Repeat for each of the three axes, sort by how far along the ray it intersects the plane, and check for voxels. The two-dimensional equivalent is shown at https://desmos.com/calculator/rayhitgrid.

After I finished the project, I read bits of code from two other people's Minecraft clones. Both raycast with many uniform, tiny steps along the ray. So had I tried at first. The others before me used C and C++, so they got away with such a naive method. Python is too slow for that.

As I write, I realise that even I coded too naively. Camera.target checks the bounding-box of every region, not just those in the ray's path.

§ Reflectance

I wanted better lighting than a flat colouring at constant brightness everywhere. Memories of Shadertoy helped with the intuition; Learn OpenGL gave me the details.

blob of brown "dirt" and grey "stone", each lit diffusely and with giant specular highlights

Yes, I know: dirt's not that shiny. I adjusted it later.

Two new attributes per vertex — surface normal and reflectance pattern — were enough that I switched to filling in the shader-code at runtime from a list of attributes.

§ Multiple regions

Regions are bounded grids of voxels. Store them as arrays. The world is made of many regions, of number and arrangement not known in advance. That's harder to store.

You could make a cursed sort of linked list based on the adjacency relation. I just used a dict that maps coordinate-tuples to WorldRegions. This dict is part of a WorldHolder, originally an aspect of MapGen, and so goes together with the similar dict of random gradients for Perlin noise.

The MapGen builds new regions only as needed. At first, that was when you placed voxels out-of-bounds. Then, to check for collisions, it would have to build regions wherever the player went.

several distinct regions of green/brown "grassy dirt" and grey "stone"

§ Collisions

To collide is to halt (or, realistically, alter) your movement sith another object is in your path. Player.update_motion already set the path based on pressed keys. Now it had to also "look" around the Player for solid voxels, and constrain the path as they're found. How far you "step" from the Player to check implies how big the Player is.

§ Trees

For more interesting worlds, I had the MapGen sprinkle in trees following a Poisson distribution. The Poisson distribution was overkill: there can be only one tree per voxel-column, so a random() < 0.02 kind of condition would've sufficed. But that was the least of the issues.

Adding trees led to several issues in MapGen, and made existing ones clearer. I knew that reaching across regions by x and z to add leaves could cause problems, so I bodged trees to only show up in the middle of regions. Even with this, trees were partly — or fully — invisible, trunks often appeared floating well above the ground, and stone at the top of a region lacked dirt to cover it.

grass and trees, but it's cursed as described

The latter problem arose from how int rounds to zero, and so does np.floor for integer output, but not float.

§ Hand

By now, you could pick which material to place with number keys. I also made middle-clicks set your picked material to what you click, like the colour-pick tool in a graphics editor. Make three faces of a cube, set their material to match, rotate it to a nice angle, and fix it in a corner of screen-space — now you can see what you're placing.

voxelly trees-on-grass scene with a log-cube in the lower right

§ Gravity

A simple change to update_motion made the player fall until something was there to stop them. Jump to something right above you, and you should collide and fall back, but, as tested, you instead pass thru some of it and get stuck. So I had to painstakingly rearrange the logic of "willful" movement, collisions, and falls.

view from being anomalously stuck in tree leaves

§ Textures

Until now, the game showed all materials by colour alone. To get the first texture took standard stuff: load an image, convert it to an OpenGL-suitable pixel array, load that to OpenGL, give texture coordinates to each vertex, give the uniform sampler2D to the fragment shader, and set gl_FragColor based on texture2D. Tutorials all told me to set texture parameters, like GL_TEXTURE_MIN_FILTER, but I figured I could skip that and take the default. It turns out that MIN_FILTER really is needed.

oops, all dkl9!

For the next textures, I expanded the sampler2D to an array, adding a third "coordinate" to the texc attribute, and found out the hard way that you can't dynamically index arrays in GLSL. After the first few textures, my "dynamic indexing" culminated in a runtime-generated list of if statements. I think this is where you're supposed to sample from sections of a grid-of-textures image.

After loading a few image files for tests, I pivoted to what I intended all along, by my Rule 3: procedurally generate textures. By Rule 2, this had to be done in under thirty extra lines. Conveniently, MapGen already used Perlin noise, ready to be repurposed. MapGen samples noise per region by xz-coordinates and maps it (pun intended) to height. make_texture samples noise per texture by pixel coordinates and maps it to the interval between two colours.

grass, stone, and trees, with texture this time

To allow different colours on different sides of voxels, I ultimately made textures as brightness-patterns atop the colours used in preceding versions. Proper concentric rings for the cross-sections of trees would cost too much code to add now, so the inside of a tree looks like bark, just lighter.

grass, dirt, and trees, with better textures

§ Final game

Download the code, follow the README, and play.