Street Fighter II decompression algorithm (QBasic source included)

Discuss game modding
Post Reply
Frenkel
5-bit member
Posts: 41
Joined: March 3rd, 2007, 10:50 am

Street Fighter II decompression algorithm (QBasic source included)

Post by Frenkel »

I reverse engineered the decompression algorithm that Street Fighter II uses. Is this a common algorithm and does it have a name?
Or is it something unique that Martin H Smith and James E Fisher made.

Here's the QBasic code. Too bad QBasic doesn't support unsigned integers and one byte integers.

Code: Select all

DECLARE FUNCTION toString$ (reg AS ANY)
DECLARE SUB rotateRight (reg AS ANY, i AS INTEGER)
DECLARE SUB shiftLeft (reg AS ANY, i AS INTEGER)
DECLARE FUNCTION toInteger% (reg AS ANY)
DECLARE SUB move (reg AS ANY, l AS LONG)
DECLARE SUB subtract (reg AS ANY, i AS INTEGER)

DEFSTR A-Z

'Use the following statement in the Immediate window for debugging
'PRINT "ax=" + toString$(ax) + ", bx=" + HEX$(bx) + ", cl=" + HEX$(cl) + ", dx=" + HEX$(dh) + " " + HEX$(dl) + ", si=" + HEX$(si) + ", di=" + HEX$(di) + ", bp=" + HEX$(bp)

OPEN "DRUGSPIC.LIN" FOR BINARY AS #1
OPEN "DRUGSPIC.LID" FOR BINARY AS #2
GET #1, 1, uncompressedSize%

TYPE register
  lo AS STRING * 1
  hi AS STRING * 1
END TYPE

DIM byte AS STRING * 1
DIM table(&H404 TO 16383) AS INTEGER 'Is 16383 large enough?

DIM si AS LONG
DIM siTemp AS LONG

DIM di AS LONG

DIM ax AS register
DIM bx AS INTEGER
DIM cl AS INTEGER '0 <= cl <= &HFF, but most of the time 0 <= cl <= 7
DIM dh AS INTEGER '1 <= dh <= &HFF
DIM dl AS INTEGER '9 <= dl <= &HFF
DIM reg AS register

DIM bp AS INTEGER



bx = &H404

cl = 1

dh = 1
dl = 9

bp = 1

'1 because QBasic is 1-based, 2 because of uncompressedSize%
GET #1, 0 + 1 + 2, byte
PUT #2, 0 + 1, byte
si = 1
di = 1

DO
  GET #1, si + 1 + 2, ax
  si = si + 2
  GET #1, si + 1 + 2, byte
  ax.lo = CHR$(ASC(ax.lo) XOR ASC(byte))
  ax.lo = CHR$(ASC(ax.lo) AND ((&HFF * 2 ^ cl) AND &HFF))
  ax.lo = CHR$(ASC(ax.lo) XOR ASC(byte))
  rotateRight ax, cl
  ax.hi = CHR$(ASC(ax.hi) AND dh)
  cl = cl + dl
  IF cl < &H10 THEN
    si = si - 1
  END IF
  cl = cl AND 7
 
  IF ASC(ax.hi) = 0 THEN
    '{ax < &H100}
    PUT #2, di + 1, ax.lo
    di = di + 1
    move ax, di
    bp = bp + 1
    subtract ax, bp
    table(bx) = toInteger%(ax)
    table(bx + 2) = bp
    bp = 1
    bx = bx + 4
  ELSEIF (ASC(ax.hi) = 1) AND (ASC(ax.lo) = 0) THEN
    '{ax = &H100}
    GET #1, si + 1 + 2, ax
    si = si + 2
    rotateRight ax, cl
    cl = cl + 2
    IF cl < 8 THEN
      si = si - 2
    ELSE
      si = si - 1
    END IF
    cl = cl AND 7
    ax.lo = CHR$(ASC(ax.lo) AND 3)
    IF ASC(ax.lo) <> 0 THEN
      '{ax.lo <> 0}
      ax.lo = CHR$(ASC(ax.lo) - 1)
      IF ASC(ax.lo) <> 0 THEN
        '{ax.lo <> 0}
        'Done
        IF HEX$(uncompressedSize%) <> HEX$(LOF(2)) THEN
          PRINT "_decMth3: Decompressed to wrong size."
        END IF
        END
      ELSE
        '{ax.lo = 0}
        GET #1, si + 1 + 2, ax
        si = si + 2
        GET #1, si + 1 + 2, byte
        ax.lo = CHR$(ASC(ax.lo) XOR ASC(byte))
        ax.lo = CHR$(ASC(ax.lo) AND ((&HFF * 2 ^ cl) AND &HFF))
        ax.lo = CHR$(ASC(ax.lo) XOR ASC(byte))
        rotateRight ax, cl
        cl = cl + 9
        IF cl < &H10 THEN
          si = si - 1
        END IF
        cl = cl AND 7

        PUT #2, di + 1, ax.lo
        di = di + 1
        bp = 1
        bx = &H404
        dh = 1
        dl = 9
      END IF
    ELSE
      '{ax.lo = 0}
      dh = dh * 2 + 1
      dl = dl + 1
    END IF
  ELSE
    '{ax > &H100}
    siTemp = si
    move reg, di
    subtract reg, bp
    '{reg = di - bp}
    bp = bp + 1
    table(bx) = toInteger%(reg)
    table(bx + 2) = bp
    shiftLeft ax, 2
    si = toInteger%(ax)
    IF si < bx THEN
      '{ax < bx}
      bp = table(si + 2)
      si = table(si)
    ELSE
      '{ax >= bx}
      si = table(bx)
      '{si = di - bp + 1}
    END IF
    
    'hack to make si an unsigned integer
    IF si < 0 THEN
      si = VAL("&H" + RIGHT$(HEX$(si), 4) + "&")
    END IF
  
    FOR i% = 1 TO bp
      GET #2, si + 1, byte
      PUT #2, di + 1, byte
      si = si + 1
      di = di + 1
    NEXT i%
 
    si = siTemp
    bx = bx + 4
  END IF
LOOP

SUB move (reg AS register, l AS LONG)

unsignedInt& = VAL("&H" + HEX$(l) + "&")
reg.hi = CHR$(unsignedInt& \ 256)
reg.lo = CHR$(unsignedInt& AND &HFF)

END SUB

SUB rotateRight (reg AS register, i AS INTEGER)

unsignedReg& = VAL("&H" + toString$(reg) + "&")

result& = unsignedReg&
FOR x% = 1 TO i
  carry% = 0
  IF result& AND 1 THEN
    carry% = 1
  END IF
  result& = result& \ 2 + carry% * &H8000&
NEXT x%

reg.hi = CHR$(result& \ 256)
reg.lo = CHR$(result& AND &HFF)

END SUB

SUB shiftLeft (reg AS register, i AS INTEGER)

unsignedReg& = VAL("&H" + toString$(reg) + "&")

result& = unsignedReg& * (2 ^ i)
reg.hi = CHR$(result& \ 256)
reg.lo = CHR$(result& AND &HFF)

END SUB

SUB subtract (reg AS register, i AS INTEGER)

unsignedReg& = VAL("&H" + toString$(reg) + "&")
unsignedInt& = VAL("&H" + HEX$(i) + "&")

result& = unsignedReg& - unsignedInt&
reg.hi = CHR$(result& \ 256)
reg.lo = CHR$(result& AND &HFF)

END SUB

FUNCTION toInteger% (reg AS register)

toInteger% = VAL("&H" + toString$(reg))

END FUNCTION

FUNCTION toString$ (reg AS register)

hi$ = HEX$(ASC(reg.hi))
IF LEN(hi$) = 1 THEN hi$ = "0" + hi$

lo$ = HEX$(ASC(reg.lo))
IF LEN(lo$) = 1 THEN lo$ = "0" + lo$

toString$ = hi$ + lo$

END FUNCTION
Frenkel
5-bit member
Posts: 41
Joined: March 3rd, 2007, 10:50 am

Re: Street Fighter II decompression algorithm (QBasic source included)

Post by Frenkel »

Here's a full extractor:

Code: Select all

DECLARE SUB decompress (offset AS LONG, size AS LONG)
DECLARE FUNCTION toString$ (reg AS ANY)
DECLARE SUB rotateRight (reg AS ANY, i AS INTEGER)
DECLARE SUB shiftLeft (reg AS ANY, i AS INTEGER)
DECLARE FUNCTION toInteger% (reg AS ANY)
DECLARE SUB move (reg AS ANY, l AS LONG)
DECLARE SUB subtract (reg AS ANY, i AS INTEGER)

DEFSTR A-Z

DIM SHARED byte AS STRING * 1

TYPE RowType
  inputFileAndCompressed AS STRING * 1
  offsetL AS STRING * 1
  offsetM AS STRING * 1
  offsetH AS STRING * 1
  sizeL AS STRING * 1
  sizeH AS STRING * 1
  extension AS STRING * 3
  filename AS STRING * 8
END TYPE

TYPE Register
  lo AS STRING * 1
  hi AS STRING * 1
END TYPE

DIM row AS RowType

DIM inputFile AS STRING * 16
DIM inputFiles(0 TO 7) AS STRING

DIM fullFilename AS STRING

DIM offset AS LONG
DIM size AS LONG



OPEN "INDEX.SF2" FOR BINARY AS #1
FOR i% = 0 TO 7
  GET #1, , inputFile
  inputFiles(i%) = LEFT$(inputFile, INSTR(inputFile, CHR$(0)) - 1)
NEXT i%

GET #1, , row
DO WHILE NOT EOF(1)
  OPEN inputFiles(ASC(row.inputFileAndCompressed) \ 16) FOR BINARY AS #2
 
  offset = ASC(row.offsetL) + ASC(row.offsetM) * 256& + ASC(row.offsetH) * 256& * 256&
  size = ASC(row.sizeL) + ASC(row.sizeH) * 256&

  fullFilename = row.filename + "." + row.extension
  OPEN fullFilename FOR BINARY AS #3
 
  PRINT fullFilename
  IF ASC(row.inputFileAndCompressed) AND 3 THEN
    'compressed
    decompress offset, size
  ELSE
    'uncompressed
    SEEK #2, offset + 1
    FOR b& = 1 TO size
      GET #2, , byte
      PUT #3, , byte
    NEXT b&
  END IF
  CLOSE #2, #3

  GET #1, , row
LOOP

END

SUB decompress (offset AS LONG, size AS LONG)

'Use the following statement in the Immediate window for debugging
'PRINT "ax=" + toString$(ax) + ", bx=" + HEX$(bx) + ", cl=" + HEX$(cl) + ", dx=" + HEX$(dh) + " " + HEX$(dl) + ", si=" + HEX$(si) + ", di=" + HEX$(di) + ", bp=" + HEX$(bp)

DIM table(&H404 TO 16383) AS INTEGER 'Is 16383 large enough?

DIM si AS LONG
DIM siTemp AS LONG

DIM di AS LONG

DIM ax AS Register
DIM bx AS INTEGER
DIM cl AS INTEGER '0 <= cl <= &HFF, but most of the time 0 <= cl <= 7
DIM dh AS INTEGER '1 <= dh <= &HFF
DIM dl AS INTEGER '9 <= dl <= &HFF
DIM reg AS Register

DIM bp AS INTEGER

DIM uncompressedSize AS INTEGER



GET #2, offset + 1, uncompressedSize



bx = &H404

cl = 1

dh = 1
dl = 9

bp = 1

'1 because QBasic is 1-based, 2 because of uncompressedSize
GET #2, offset + 0 + 1 + 2, byte
PUT #3, 0 + 1, byte
si = 1
di = 1

DO
  GET #2, offset + si + 1 + 2, ax
  si = si + 2
  GET #2, offset + si + 1 + 2, byte
  ax.lo = CHR$(ASC(ax.lo) XOR ASC(byte))
  ax.lo = CHR$(ASC(ax.lo) AND ((&HFF * 2 ^ cl) AND &HFF))
  ax.lo = CHR$(ASC(ax.lo) XOR ASC(byte))
  rotateRight ax, cl
  ax.hi = CHR$(ASC(ax.hi) AND dh)
  cl = cl + dl
  IF cl < &H10 THEN
    si = si - 1
  END IF
  cl = cl AND 7

  IF ASC(ax.hi) = 0 THEN
    '{ax < &H100}
    PUT #3, di + 1, ax.lo
    di = di + 1
    move ax, di
    bp = bp + 1
    subtract ax, bp
    table(bx) = toInteger%(ax)
    table(bx + 2) = bp
    bp = 1
    bx = bx + 4
  ELSEIF (ASC(ax.hi) = 1) AND (ASC(ax.lo) = 0) THEN
    '{ax = &H100}
    GET #2, offset + si + 1 + 2, ax
    si = si + 2
    rotateRight ax, cl
    cl = cl + 2
    IF cl < 8 THEN
      si = si - 2
    ELSE
      si = si - 1
    END IF
    cl = cl AND 7
    ax.lo = CHR$(ASC(ax.lo) AND 3)
    IF ASC(ax.lo) <> 0 THEN
      '{ax.lo <> 0}
      ax.lo = CHR$(ASC(ax.lo) - 1)
      IF ASC(ax.lo) <> 0 THEN
        '{ax.lo <> 0}
        'Done
        IF HEX$(uncompressedSize) <> HEX$(LOF(3)) THEN
          PRINT "_decMth3: Decompressed to wrong size."
        END IF
        IF si > size THEN
          PRINT "read past input"
        END IF
        EXIT SUB
      ELSE
        '{ax.lo = 0}
        GET #2, offset + si + 1 + 2, ax
        si = si + 2
        GET #2, offset + si + 1 + 2, byte
        ax.lo = CHR$(ASC(ax.lo) XOR ASC(byte))
        ax.lo = CHR$(ASC(ax.lo) AND ((&HFF * 2 ^ cl) AND &HFF))
        ax.lo = CHR$(ASC(ax.lo) XOR ASC(byte))
        rotateRight ax, cl
        cl = cl + 9
        IF cl < &H10 THEN
          si = si - 1
        END IF
        cl = cl AND 7

        PUT #3, di + 1, ax.lo
        di = di + 1
        bp = 1
        bx = &H404
        dh = 1
        dl = 9
      END IF
    ELSE
      '{ax.lo = 0}
      dh = dh * 2 + 1
      dl = dl + 1
    END IF
  ELSE
    '{ax > &H100}
    siTemp = si
    move reg, di
    subtract reg, bp
    '{reg = di - bp}
    bp = bp + 1
    table(bx) = toInteger%(reg)
    table(bx + 2) = bp
    shiftLeft ax, 2
    si = toInteger%(ax)
    IF si < bx THEN
      '{ax < bx}
      bp = table(si + 2)
      si = table(si)
    ELSE
      '{ax >= bx}
      si = table(bx)
      '{si = di - bp + 1}
    END IF
   
    'hack to make si an unsigned integer
    IF si < 0 THEN
      si = VAL("&H" + RIGHT$(HEX$(si), 4) + "&")
    END IF
 
    FOR i% = 1 TO bp
      GET #3, si + 1, byte
      PUT #3, di + 1, byte
      si = si + 1
      di = di + 1
    NEXT i%

    si = siTemp
    bx = bx + 4
  END IF
LOOP

END SUB

SUB move (reg AS Register, l AS LONG)

unsignedInt& = VAL("&H" + HEX$(l) + "&")
reg.hi = CHR$(unsignedInt& \ 256)
reg.lo = CHR$(unsignedInt& AND &HFF)

END SUB

SUB rotateRight (reg AS Register, i AS INTEGER)

unsignedReg& = VAL("&H" + toString$(reg) + "&")

result& = unsignedReg&
FOR x% = 1 TO i
  carry% = 0
  IF result& AND 1 THEN
    carry% = 1
  END IF
  result& = result& \ 2 + carry% * &H8000&
NEXT x%

reg.hi = CHR$(result& \ 256)
reg.lo = CHR$(result& AND &HFF)

END SUB

SUB shiftLeft (reg AS Register, i AS INTEGER)

unsignedReg& = VAL("&H" + toString$(reg) + "&")

result& = unsignedReg& * (2 ^ i)
reg.hi = CHR$(result& \ 256)
reg.lo = CHR$(result& AND &HFF)

END SUB

SUB subtract (reg AS Register, i AS INTEGER)

unsignedReg& = VAL("&H" + toString$(reg) + "&")
unsignedInt& = VAL("&H" + HEX$(i) + "&")

result& = unsignedReg& - unsignedInt&
reg.hi = CHR$(result& \ 256)
reg.lo = CHR$(result& AND &HFF)

END SUB

FUNCTION toInteger% (reg AS Register)

toInteger% = VAL("&H" + toString$(reg))

END FUNCTION

FUNCTION toString$ (reg AS Register)

hi$ = HEX$(ASC(reg.hi))
IF LEN(hi$) = 1 THEN hi$ = "0" + hi$

lo$ = HEX$(ASC(reg.lo))
IF LEN(lo$) = 1 THEN lo$ = "0" + lo$

toString$ = hi$ + lo$

END FUNCTION
BTW do the following images every appear in the DOS version of Street Fighter II?
You do not have the required permissions to view the files attached to this post.
Malvineous
8-bit mega nerd
Posts: 292
Joined: March 17th, 2007, 6:40 pm
Location: Brisbane, Australia
Contact:

Re: Street Fighter II decompression algorithm (QBasic source included)

Post by Malvineous »

It's a bit tricky to read, but I'm almost wondering whether it's some relation to LZW?

dl starts at 9 and gets incremented by 1 every now and then, at some point being reset back to 9. LZW starts off by reading 9-bit values, and once it has read 511 of them it switches to 10-bit values, then 11-bit, and once it has read the largest 12-bit value it typically resets the dictionary and returns to 9-bit values again.

dh starts at 1 and it is used to AND with ax.hi, effectively limiting the ax value to 9 bits (when dh=1). When dl goes from 9 to 10, dh also goes from 1 to 3, increasing the limit applied to ax to now be 10 bits.

The whole jumping around with decrementing si and the weird XOR operations almost looks like it's extracting these 9-12 bit values out of the bytes read, although I'm not entirely sure about that.

LZW works by treating values 0x00-0xFF as themselves, and values 0x100 and larger as code words, that refer to groups of bytes earlier in the stream. But often the first few values like 0x100 are reserved for various actions, such as signalling the end of the compressed data or resetting the dictionary. Here, when ax is 0x100 it looks like some special actions might happen (end of data or resetting things back to the initial state.)

I'm not confident that it is LZW, but it does appear to have some similarities.
Frenkel
5-bit member
Posts: 41
Joined: March 3rd, 2007, 10:50 am

Re: Street Fighter II decompression algorithm (QBasic source included)

Post by Frenkel »

I was hoping you would say "That's definitely LZW" :)

Anyway, I refactored the extractor a bit by giving some variables more descriptive names and by removing some subroutines.

Code: Select all

DECLARE SUB decompress (offset AS LONG, size AS LONG)
DECLARE SUB rotateRight (reg AS ANY, i AS INTEGER)
DECLARE FUNCTION toString$ (reg AS ANY)
DECLARE FUNCTION toLong& (reg AS ANY)

DEFSTR A-Z

DIM SHARED byte AS STRING * 1

TYPE RowType
  inputFileAndCompressed AS STRING * 1
  offsetL AS STRING * 1
  offsetM AS STRING * 1
  offsetH AS STRING * 1
  sizeL AS STRING * 1
  sizeH AS STRING * 1
  extension AS STRING * 3
  filename AS STRING * 8
END TYPE

TYPE TableRowType
  sourceIndex AS LONG
  count       AS INTEGER
END TYPE

TYPE Register
  lo AS STRING * 1
  hi AS STRING * 1
END TYPE

DIM row AS RowType

DIM inputFile AS STRING * 16
DIM inputFiles(0 TO 7) AS STRING

DIM fullFilename AS STRING

DIM offset AS LONG
DIM size AS LONG



OPEN "INDEX.SF2" FOR BINARY AS #1
FOR i% = 0 TO 7
  GET #1, , inputFile
  inputFiles(i%) = LEFT$(inputFile, INSTR(inputFile, CHR$(0)) - 1)
NEXT i%

GET #1, , row
DO WHILE NOT EOF(1)
  OPEN inputFiles(ASC(row.inputFileAndCompressed) \ 16) FOR BINARY AS #2
 
  offset = ASC(row.offsetL) + ASC(row.offsetM) * 256& + ASC(row.offsetH) * 256& * 256&
  size = ASC(row.sizeL) + ASC(row.sizeH) * 256&

  fullFilename = row.filename + "." + row.extension
  OPEN fullFilename FOR BINARY AS #3
 
  PRINT fullFilename
  IF ASC(row.inputFileAndCompressed) AND 3 THEN
    'compressed
    decompress offset, size
  ELSE
    'uncompressed
    SEEK #2, offset + 1
    FOR b& = 1 TO size
      GET #2, , byte
      PUT #3, , byte
    NEXT b&
  END IF
  CLOSE #2, #3

  GET #1, , row
LOOP

END

SUB decompress (offset AS LONG, size AS LONG)

'Use the following statement in the Immediate window for debugging
'PRINT "ax=" + toString$(ax) + ", bx=" + HEX$(tableIndex) + ", cl=" + HEX$(cl) + ", dx=" + HEX$(bitMask) + " " + HEX$(nrOfBits) + ", si=" + HEX$(si) + ", di=" + HEX$(di) + ", bp=" + HEX$(count)

DIM table(&H101 TO &HFFF) AS TableRowType

DIM si AS LONG
DIM siTemp AS LONG

DIM di AS LONG

DIM ax AS Register
DIM tableIndex AS INTEGER '{&H101 <= tableIndex <= &HFFF}
DIM cl AS INTEGER '{0 <= cl <= &HFF, but most of the time 0 <= cl <= 7}
DIM bitMask AS INTEGER '{bitMask in (00000001b, 00000011b, 00000111b, 00001111b)}
DIM nrOfBits AS INTEGER '{9 <= nrOfBits <= 12}

DIM count AS INTEGER '{1 <= count}

DIM uncompressedSize AS INTEGER



GET #2, offset + 1, uncompressedSize


count = 1
tableIndex = &H101

cl = 1

bitMask = 1
nrOfBits = 9

'1 because QBasic is 1-based, 2 because of uncompressedSize
GET #2, offset + 0 + 1 + 2, byte
PUT #3, 0 + 1, byte
si = 1
di = 1

DO
  GET #2, offset + si + 1 + 2, ax
  si = si + 2
  GET #2, offset + si + 1 + 2, byte
  ax.lo = CHR$(ASC(ax.lo) XOR ASC(byte))
  ax.lo = CHR$(ASC(ax.lo) AND ((&HFF * 2 ^ cl) AND &HFF))
  ax.lo = CHR$(ASC(ax.lo) XOR ASC(byte))
  rotateRight ax, cl
  ax.hi = CHR$(ASC(ax.hi) AND bitMask)
  cl = cl + nrOfBits
  IF cl < &H10 THEN
    si = si - 1
  END IF
  cl = cl AND 7

  IF ASC(ax.hi) = 0 THEN
    '{0 <= ax <= &HFF}
    PUT #3, di + 1, ax.lo
    di = di + 1

    count = count + 1
   
    table(tableIndex).sourceIndex = di - count
    table(tableIndex).count = count

    count = 1
    tableIndex = tableIndex + 1
  ELSEIF (ASC(ax.hi) = 1) AND (ASC(ax.lo) = 0) THEN
    '{ax = &H100}

    GET #2, offset + si + 1 + 2, ax
    si = si + 2
    rotateRight ax, cl
    cl = cl + 2
    IF cl < 8 THEN
      si = si - 2
    ELSE
      si = si - 1
    END IF
    cl = cl AND 7

    ax.lo = CHR$(ASC(ax.lo) AND 3)
    '{ax.lo in (0, 1, 2, 3)}
    IF ASC(ax.lo) = 0 THEN
      '{ax.lo = 0}
      bitMask = bitMask * 2 + 1
      nrOfBits = nrOfBits + 1
    ELSE
      '{ax.lo in (1, 2, 3)}
      ax.lo = CHR$(ASC(ax.lo) - 1)
      '{ax.lo in (0, 1, 2)}
      IF ASC(ax.lo) = 0 THEN
        '{ax.lo = 0}
        'Let's start over again
        GET #2, offset + si + 1 + 2, ax
        si = si + 2
        GET #2, offset + si + 1 + 2, byte
        ax.lo = CHR$(ASC(ax.lo) XOR ASC(byte))
        ax.lo = CHR$(ASC(ax.lo) AND ((&HFF * 2 ^ cl) AND &HFF))
        ax.lo = CHR$(ASC(ax.lo) XOR ASC(byte))
        rotateRight ax, cl
        cl = cl + 9
        IF cl < &H10 THEN
          si = si - 1
        END IF
        cl = cl AND 7

        PUT #3, di + 1, ax.lo
        di = di + 1

        count = 1
        tableIndex = &H101

        bitMask = 1
        nrOfBits = 9
      ELSE
        '{ax.lo in (1, 2)}
        'Done
        IF HEX$(uncompressedSize) <> HEX$(LOF(3)) THEN
          PRINT "_decMth3: Decompressed to wrong size."
        END IF

        IF si > size THEN
          PRINT "read past input"
        END IF

        EXIT SUB
      END IF
    END IF
  ELSE
    '{&H101 <= ax}
    siTemp = si

    table(tableIndex).sourceIndex = di - count
    count = count + 1
    table(tableIndex).count = count

    si = toLong&(ax)
    IF si < tableIndex THEN
      '{ax < tableIndex}
      count = table(si).count
      si = table(si).sourceIndex
    ELSE
      '{ax >= tableIndex}
      '{count = table(tableIndex).count}
      si = table(tableIndex).sourceIndex
      '{si = di - count + 1}
    END IF

    FOR i% = 1 TO count
      GET #3, si + 1, byte
      PUT #3, di + 1, byte
      si = si + 1
      di = di + 1
    NEXT i%

    tableIndex = tableIndex + 1

    si = siTemp
  END IF
LOOP

END SUB

SUB rotateRight (reg AS Register, i AS INTEGER)

result& = toLong&(reg)
FOR x% = 1 TO i
  carry% = 0
  IF result& AND 1 THEN
    carry% = 1
  END IF
  result& = result& \ 2 + carry% * &H8000&
NEXT x%

reg.hi = CHR$(result& \ 256)
reg.lo = CHR$(result& AND &HFF)

END SUB

FUNCTION toLong& (reg AS Register)

toLong& = VAL("&H" + toString$(reg) + "&")

END FUNCTION

FUNCTION toString$ (reg AS Register)

hi$ = HEX$(ASC(reg.hi))
IF LEN(hi$) = 1 THEN hi$ = "0" + hi$

lo$ = HEX$(ASC(reg.lo))
IF LEN(lo$) = 1 THEN lo$ = "0" + lo$

toString$ = hi$ + lo$

END FUNCTION
Malvineous
8-bit mega nerd
Posts: 292
Joined: March 17th, 2007, 6:40 pm
Location: Brisbane, Australia
Contact:

Re: Street Fighter II decompression algorithm (QBasic source included)

Post by Malvineous »

I'm afraid it's still really confusing, especially as I'm not used to reading BASIC code so the rotations and bitwise operators take a bit of getting used to!

Do you have a sample file you can send me, that includes both the compressed and decompressed versions? Especially if it contains text. I will run it through my own LZW code and see if it looks at all similar.

There are a few different variations (LZSS etc.) but so far the only thing it looks remotely like to me is LZW.
Frenkel
5-bit member
Posts: 41
Joined: March 3rd, 2007, 10:50 am

Re: Street Fighter II decompression algorithm (QBasic source included)

Post by Frenkel »

In the attachments is a zip with the compressed version and the decompressed version of a Street Fighter II MIDI file.

Here's some code to play the PC speaker music and sound effects:

Code: Select all

OPEN "tun\btune1.tun" FOR BINARY AS #1
GET #1, , frequency%
DO WHILE frequency% <> &HFFFF
  GET #1, , duration%
  
  IF frequency% = 0 THEN
    SOUND 0, duration% / 2.5
  ELSE
    SOUND &H1234DD / frequency%, duration% / 2.5
  END IF
  
  GET #1, , frequency%
LOOP
Here's some code to display the 256 color graphics:

Code: Select all

DIM byte AS STRING * 1

SCREEN 13
OPEN "pal\guilout3.pal" FOR BINARY AS #1
OUT &H3C8, 256 - (LOF(1) \ 3)
GET #1, , byte
DO WHILE NOT EOF(1)
  OUT &H3C9, ASC(byte)
  GET #1, , byte
LOOP
CLOSE #1

filename$ = "lin\guilout3.lin"
OPEN filename$ FOR BINARY AS #1
size& = LOF(1)
IF size& = 64000 THEN
  wid% = 320
ELSEIF size& = 48128 THEN wid% = 256
ELSEIF size& = 48000 THEN wid% = 320
ELSEIF size& = 44160 THEN wid% = 320
ELSEIF size& = 36864 THEN wid% = 288
ELSEIF size& = 32768 THEN wid% = 256
ELSEIF size& = 31360 THEN wid% = 224
ELSEIF size& = 30208 THEN wid% = 236
ELSEIF size& = 28928 THEN
  IF RIGHT$(filename$, 12) = "guilout3.lin" THEN
    wid% = 256
  ELSEIF RIGHT$(filename$, 12) = "ryuuout2.lin" THEN
    wid% = 226
  ELSE
    PRINT size&
    END
  END IF
ELSEIF size& = 27392 THEN wid% = 214
ELSEIF size& = 26880 THEN wid% = 240
ELSEIF size& = 26862 THEN wid% = 242
ELSEIF size& = 25312 THEN wid% = 226
ELSEIF size& = 25088 THEN wid% = 224
ELSEIF size& = 23296 THEN wid% = 208
ELSEIF size& = 21008 THEN wid% = 208
ELSEIF size& = 19888 THEN wid% = 176
ELSEIF size& = 11264 THEN wid% = 64
ELSEIF size& = 3584 THEN wid% = 64
ELSEIF size& = 1920 THEN wid% = 64
ELSEIF size& = 1024 THEN wid% = 32
ELSEIF size& = 576 THEN wid% = 24
ELSEIF size& = 224 THEN wid% = 16
ELSEIF size& = 48 THEN wid% = 4
ELSE
  PRINT size&
  END
END IF

GET #1, , byte
DO WHILE NOT EOF(1)
  PSET (x%, y%), ASC(byte)
  
  x% = x% + 1
  IF x% = wid% THEN
    x% = 0
    y% = y% + 1
  END IF
 
  GET #1, , byte
LOOP
CLOSE
Here's some code to display the 16 color sprites:

Code: Select all

DIM byte AS STRING * 1

SCREEN 13
OPEN "pal\carfrm00.pal" FOR BINARY AS #1
OUT &H3C8, 0
GET #1, , byte
DO WHILE NOT EOF(1)
  OUT &H3C9, ASC(byte)
  GET #1, , byte
LOOP
CLOSE #1

OPEN "spr\carfrm00.spr" FOR BINARY AS #1
GET #1, , byte
w% = ASC(byte)
GET #1, , byte
h% = ASC(byte)

x% = 0
y% = 0
GET #1, , byte
DO WHILE NOT EOF(1)
  c% = 1
  IF ASC(byte) = 0 THEN
    GET #1, , byte
    c% = ASC(byte)
    byte = CHR$(0)
  END IF

  FOR i% = 1 TO c%
    PSET (x%, y%), ASC(byte) \ 16
    x% = x% + 1
    PSET (x%, y%), ASC(byte) AND &HF
    x% = x% + 1
    IF x% = w% THEN
      x% = 0
      y% = y% + 1
    END IF
  NEXT i%

  GET #1, , byte
LOOP
CLOSE #1
END
And here's some code to display the 16 color character sprites. Both the top and the bottom part.

Code: Select all

DECLARE FUNCTION fileExists% (filename AS STRING)
DECLARE FUNCTION toString$ (i AS INTEGER)
DECLARE SUB drawCharacter (filename AS STRING, ox AS INTEGER, oy AS INTEGER)

DIM SHARED byte AS STRING * 1

CONST FALSE = 0
CONST TRUE = NOT FALSE

'CONST CHARACTER = "RYUT"
'CONST CHARACTER = "HOND"
'CONST CHARACTER = "BLAN"
'CONST CHARACTER = "GUIL"
CONST CHARACTER = "KENN"
'CONST CHARACTER = "CHLI"
'CONST CHARACTER = "ZANG"
'CONST CHARACTER = "DHAL"
'CONST CHARACTER = "BALR"
'CONST CHARACTER = "VEGA"
'CONST CHARACTER = "SAGE"
'CONST CHARACTER = "BISO"

SCREEN 13
OPEN "pal\" + CHARACTER + "t001.pal" FOR BINARY AS #1
OUT &H3C8, 0
GET #1, , byte
DO WHILE NOT EOF(1)
  OUT &H3C9, ASC(byte)
  GET #1, , byte
LOOP
CLOSE #1
'OUT &H3C8, 0
'OUT &H3C9, 63
'OUT &H3C9, 63
'OUT &H3C9, 63

t% = 0
tx% = 22
ty% = 0
b% = 0
bx% = 0
by% = 47

DO
  CLS
  LOCATE 21, 15
  drawCharacter "SPR\" + CHARACTER + "T" + toString$(t%) + ".SPR", tx%, ty%
  LOCATE 22, 15
  drawCharacter "SPR\" + CHARACTER + "B" + toString$(b%) + ".SPR", bx%, by%
  k$ = INKEY$
  DO WHILE k$ = ""
    k$ = INKEY$
  LOOP
  IF k$ = CHR$(27) THEN
    END
  ELSEIF k$ = "q" THEN t% = t% - 1: b% = b% - 1
  ELSEIF k$ = "w" THEN t% = t% + 1: b% = b% + 1
  ELSEIF k$ = "a" THEN t% = t% - 1
  ELSEIF k$ = "s" THEN t% = t% + 1
  ELSEIF k$ = "z" THEN b% = b% - 1
  ELSEIF k$ = "x" THEN b% = b% + 1
  
  ELSEIF k$ = "d" THEN tx% = tx% - 1
  ELSEIF k$ = "g" THEN tx% = tx% + 1
  ELSEIF k$ = "r" THEN ty% = ty% - 1
  ELSEIF k$ = "f" THEN ty% = ty% + 1
 
  ELSEIF k$ = "h" THEN bx% = bx% - 1
  ELSEIF k$ = "k" THEN bx% = bx% + 1
  ELSEIF k$ = "u" THEN by% = by% - 1
  ELSEIF k$ = "j" THEN by% = by% + 1
  ELSE BEEP
  END IF
LOOP

SUB drawCharacter (filename AS STRING, ox AS INTEGER, oy AS INTEGER)
IF fileExists%(filename) THEN
  PRINT filename; ox; oy

  OPEN filename FOR BINARY AS #1
  GET #1, , byte
  w% = ASC(byte)
  GET #1, , byte
  h% = ASC(byte)

  x% = 0
  y% = 0
  GET #1, , byte
  DO WHILE NOT EOF(1)
    IF ASC(byte) = 0 THEN
      GET #1, , byte
      FOR i% = 1 TO ASC(byte)
        x% = x% + 2
        IF x% = w% THEN
          x% = 0
          y% = y% + 1
        END IF
      NEXT i%
    ELSE
      PSET (x% + ox, y% + oy), ASC(byte) \ 16
      x% = x% + 1
      PSET (x% + ox, y% + oy), ASC(byte) AND &HF
      x% = x% + 1
      IF x% = w% THEN
        x% = 0
        y% = y% + 1
      END IF
    END IF

    GET #1, , byte
  LOOP
  CLOSE #1
ELSE
  PRINT filename; " unknown"
END IF

END SUB

FUNCTION fileExists% (filename AS STRING)
OPEN filename FOR BINARY AS #1
filesize& = LOF(1)
CLOSE #1
IF filesize& = 0 THEN
  KILL filename
  fileExists% = FALSE
ELSE
  fileExists% = TRUE
END IF
END FUNCTION

FUNCTION toString$ (i AS INTEGER)
IF (0 <= i) AND (i < 10) THEN
  toString$ = "00" + LTRIM$(STR$(i))
ELSEIF (10 <= i) AND (i < 100) THEN
  toString$ = "0" + LTRIM$(STR$(i))
ELSE
  toString$ = LTRIM$(STR$(i))
END IF
END FUNCTION
You do not have the required permissions to view the files attached to this post.
Malvineous
8-bit mega nerd
Posts: 292
Joined: March 17th, 2007, 6:40 pm
Location: Brisbane, Australia
Contact:

Re: Street Fighter II decompression algorithm (QBasic source included)

Post by Malvineous »

Thanks for that, that helps a lot!

It's definitely LZW and a pretty standard variant at that. If you look at the file as nine bits per byte, little-endian order, then you can clearly see the bytes:

Image

Each byte is numbered starting at 0x100 (although 0x100 itself is skipped and used for what looks like clearing the dictionary.)

So the bytes look like this:

Code: Select all

04D 054 068 064 000 105
100 101 102 103 104 105
Whenever a code is used (0x100 or larger) that byte gets replaced with the matching byte value plus the byte before the matching one. So here 0x101 refers to the second byte (0x054) and later on your can see 0x101 is used and it gets replaced with "MT" (part of "MTrk"), which is the 0x054 byte and the one before it.

According to libgamearchive, a number of games use 9-12 bit LZW with 0x100 as a reset codeword. Stellar 7 is one, and Monster Bash comes close except it uses 0x100 to signal the end of the data stream instead. (Monster Bash also adds an RLE layer on top of LZW.)

The only complication is that no game has self-referential codewords. In the above example, you can see that 0x105 refers to itself. This means 0x105 represents the byte before it (0x00) followed by 0x105, which represents the byte before it (0x00)... an infinite loop. I guess in the algorithm they zeroed the dictionary first, so 0x105 was the first 0x00 value found. This will likely require a small change to any decompression code, depending on how it is handled.

Long story short, I can say with certainty that it's LZW, little-endian, the initial state is 9-bit bytes, final state is 12-bit bytes, 0x100 is a reserved codeword, and codes referencing themselves are replaced with 0x00 (plus the preceding byte as per normal.) I can say with an educated guess that code 0x100 resets the dictionary and returns the state to 9-bit bytes, and once 12-bit bytes are reached the state is probably not reset automatically back to 9-bits (otherwise there'd be no need for the 0x100 code.)

The first two bytes are a little-endian 16-bit integer containing the decompressed file size, which needs to be removed before the LZW algorithm begins. Any standard LZW code you find online should be able to work with this algorithm with minimal modification.

I hope that answers your question!
Post Reply