I've been writing some Python 3 code to rip this game apart, which I'll probably end up shoving onto GitHub when I get home tomorrow.
Here's what I've learnt so far:
- It's way easier to decrypt something when you have a known plaintext.
- BPK files are used for anything that needs to be compressed. Anything that's not a BPK file is probably not compressed. If it's a BPK file, it *is* compressed, no exceptions.
- The LZW compression used here is *almost* identical to GIF. The end and clear codes are swapped.
- So far every single image is a BPK file, but not every BPK file is an image, and furthermore there's at least two different image formats.
- Anything in a TR?.BPA archive pertains to one level. Even if it's an image. Even if it's a BPK file that isn't even an image.
- From the little look I had at the EXE when trying to find strings it appears to be packed with a 32-bit-word-based LZSS... but the PMODE/W stub appears to use a different, 16-bit-word-based LZSS.
- The CMF files are encrypted S3M and XM files. They are uncompressed. There's 5 XM files which appear to be used purely as sample libraries, which really doesn't make much sense.
- Knowing that Purple Motion did the music I made the assumption that S3M was used. However, the bytes at 0x002C ended up being one of two things encrypted - one was "SCRM" encrypted, the other I had yet to find out. Either way, once I'd isolated the "probably S3M" and "probably not S3M" data, the S3M data behaved as I expected for the most part.
- Also knowing that Purple Motion did the music I checked modarchive and grabbed the unencrypted menu music from there. Turns out it's a perfect plaintext.
- I identified a few points where if the plaintext was 0x00 and the low byte of the address was 0x63, the ciphertext would also be 0x00, then if the plaintext 15 bytes afterwards was 0x00, the ciphertext would have the same top nybble and bottom nybble. Eventually I found a period of 0x700 and it was all downhill from there.
- I wrote a Python script to print a given number of non-8 bits at a time. I originally did a big-endian decoder thinking it was a little-endian decoder, but when I switched it around and got a little-endian decoder I noticed that there was a nice periodicity at 9 bits and it was looking suspiciously like LZW.
- I used the title screen image on the Death Rally page on moddingwiki as a reference. Thankfully it was a raw 640x480x8bpp screenshot with the exact palette needed, so it was a perfect plaintext. Thanks Malvineous!
A few questions I have...
- Was anyone looking forward to modding this game back in 2014?
- Is anyone looking forward to modding this game now?
- There's some stuff on the S3M Format page on moddingwiki which needs fixing up, namely the calculation for the tempo is incorrect (a tempo of 125 is 50 ticks per second, or 125 ticks per 2.5 seconds, so a tempo of 50 is actually a lot slower). Should I go ahead and tidy that up? (If anyone's wondering, yes I'm the same guy who wrote a lot of the incoherent garbage on the respective multimediawiki page. I have a history with dealing with weird edge cases in tracker formats.)
- What are the odds of getting any of this working with Camoto, or at least the libraries? I tried compiling master to no avail (for a GTK version there's definitely a lot of stuff still depending on wxWidgets), although I did manage to get the libs working.
- Is there anything in particular you'd suggest I could look more deeply into next?
- When would be a good time to get rid of the "Unmoddable" tag? Part of me things I should at least have something that can reassamble the BPA archives as mentioned earlier.