NOTICE! Use of this site signifies acceptence of the Contract for site usage

CODEGEN is a disassembler that I wrote, based on the VIM instruction decoder.

While it is limited to COM programs (and other non-segmented files, such as boot records, etc) and small-memory-model EXE programs, it is more intelligent and more flexible than any other disassembler that I've seen.

For some strange reason, I feel very comfortable with the formatting of the CODEGEN output. It's almost like code I would write. :)

At one time, CODEGEN was sold as a commercial product by "Best In Class Software". I'm trying to remember how I first got in touch with the company, I think they saw the VMDEBUG (AKA VIM) program being sold by Wendin, which had CODEGEN bundled as a freebie utility. They weren't interested in the debugger, but like the disassember. I did some work on it, packaged it separately, and it was sold by them. It actually sold enough copies to pay for the manuals I had printed up.

The reason I wrote CODEGEN is that, while VIM was a good debugger, it's output wasn't right for a disassembler. And other disassemblers I tried (including, later, the "Gold Standard Commerical Disassembler" SOURCER) were not good enough. So, I took the instruction decoder from VIM, added disassembly flow logic, and CODEGEN was born.

Useless trivia department:
When I first tried SOURCER (see above), I found many shortcomings compared to CODEGEN. I sent back a registration/comment card with some scathing remarks. I actually got a letter in reply, which addressed my comments, and included a new version with many upgrades based on my comments. This was good.

Later, (a year at least, as I remember) I was adding 8087/80287 instructions to CODEGEN, and used SOURCER as a check of my work. The outputs didn't match. After a careful check, I found that CODEGEN was right -- SOURCER was ignoring the bit that logically swapped source and destination registers (thus making about half the disassembled 8087 opcode space wrong). When I talked to SOURCER tech support, they didn't seem too upset -- 8087 support had been out for over a year, and I was the first person to complain about it. {sigh...} This was bad.

A feature that was added later in CODEGEN's life was simulated instruction execution to do register tracking. If the contents of the registers were know, CODEGEN could meaningfully comment DOS/BIOS calls. This was a great help.

One principle that I scrupulously kept to in writing CODEGEN is that it MUST NOT try to outguess the user. In particular, the disasembled code must be EXACTLY the code that is present in the file. "High end" disassemblers like SOURCER have a "target assembler" option which, even when set to "none" (which should give you the raw disassembly), give you the code that it thinks will re-assemble into the file you're disassembling. From my point of view, this is worthless -- I want to know what instructions are in the code, NOT someone's interpretation of what the code SHOULD be.

The CODEGEN documtation is an ASCII TEXT file. It should be pretty self-explanatory. Note that, while this is specific to version 1.30A (see below), it should be just fine for version 1.20A (again, see below).

The CODEGEN executable runs under DOS 2.0 and above, and wants 256K (or more) of RAM beyond DOS. This should not be a problem these days. :)

I actually keep and use CODEGEN in two "flavors" -- CODEGEN V1.20A and CODEGEN V1.30A

The difference is that version 1.30A adds 80x87 support. This can be a BAD thing, unless the program you are analyzing uses the 80x87. The reason for this is that in recognizing the additional opcodes, there is a smaller undefined opcode space. So a disassembly that has gone "off track" has a smaller chance of hitting an invalid opcode and stopping the disassembly.

I suppose I could have changed the opcode setup so that 80x87 could be toggled on and off, like it currently toggles between the 8086/80186/80286, but I didn't do that. Oh, well...

Both executable packages also come with some additional files

CODEGEN.DOC -- documentation for CODEGEN (ASCII text)
FINDROM.EXE -- helper program referred to in CODEGEN.DOC
ADDCOM.EXE -- helper program referred to in CODEGEN.DOC
UNPACK.EXE -- helper program referred to in CODEGEN.DOC

The CODEGEN source also comes in two "flavors" -- CODEGEN V1.20A source and CODEGEN V1.30A source

Being originally derived from the VIM source, they share some of it's ugly code, but I don't think they are QUITE as bad.

Both "flavors" of CODEGEN use the DeSmet C compiler.

If you would like to discuss the disassembly flow and algorithms, please email me.


These pages last modified 1/10/2001