@suihkulokki "with PHONG shading no less, metal shading, phong env mapping" -from one comment.
Tbh, all phong shading was just the standard environment mapping trick (you precalculate the light to 2D texture) that everyone was using during that time but still impressive work. I know this for a fact because I've talked to some members of the group (complex is from my birth town).
There's one trick that not that many knew at that time, and also one of the first ways I learned how you can do *data* cache optimization for profit. Before Pentium, all the focus was mostly on instruction pipeline.
When you have a 2D texture of pow2 size, you partition it into 8x8 tiles with one byte per pixel (VGA was 320x200 8-bit framebuffer), i.e. do this sort of precalculation step instead of directly interpolating the original texture. This on average takes away a lot of cache misses even tho adds a bit of complexity to the inner loop. 8x8 tile is a great fit for Pentium cache lines.
On instruction cache you of course similarly make your inner loop fit nicely to the cache line and order instruction dependencies so that the dependencies between opcodes are minimal and you get maximal throughput out of the CPU.
Pentium sort of started the era of memory optimization. Now most if not all optimization of code is a caching exercise...