Home » Top Secret » Assembly Optimization and Tiny EXEs

Assembly Optimization and Tiny EXEs

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 46 other followers

Over the last week, I’ve been brushing up on my Assembler skills and taking the time to learn the basics of x64 Assembler by creating the very basic in all of computing, the “Hello World” program.

To say I have actually learned the basics of Assembler would be a lie, all I’ve really done over the last week is found out what I like and don’t like.

But before I get to that, I had a single goal in mind for working with Assembler.

I was trying to compile ‘inline’ assembler in C++, when I learned that x64 flat out claims not to support it.

Now this doesn’t make sense to me. When you compile C or C++, this creates obj files which are then ‘linked’ to create executable files. But the ‘inline’ method was being deprecated because C++ with the newer versions of Visual Studio, like C# and .Net – had become managed code.

Let me repeat that. C++ is managed code. That means a delete doesn’t occur when you tell it to, it’s garbage collected like the rest of the managed code in the .net world.

This surprised the hell out of me.

Now what sent me down this path with the inline assembler was – I have been considering creating my own methods for drawing graphics – leveraging math and direct memory access – rather than leveraging libraries such as OpenGL or DirectX.

My first goal was to draw a pixel leveraging an interrupt, which can only be done in assembler, which I know in Windows isn’t the proper way to do things, but it was worth a shot.

Once I got that to work, I figured I could create a set of functions that could handle Matrix math – Vertex drawing and such – which removed my dependency from OpenGL – and let me draw objects in 3d space, so then I could directly modify memory and optimize my functions to handle the drawing without dealing with the ‘black box’ of the OpenGL libraries.

Now in the end – two things are important to me – tight control of my code – and speed/size optimization.

But once I learned that Microsoft C++ based assembler is now managed code, and that Microsoft intentionally put a wall between me the developer and direct memory management functions, I took a step back – and said….

I’m glad I’m writing this. Perhaps I should not give up on C++ altogether and leverage another compiler which does allow inline x64 assembler….

IN any case. I decided to drop to the Assembler language, just to prove out my theory that I could make an interrupt call to do direct pixel draws. The idea was, after I got this to work, I could THEN use the C language for the higher level grunt work that I wasn’t concerned with optimization about, and then leverage Assembler for making calls to code where optimization was critical and/or direct access to memory through interrupts was required..

SO with all that said. I embarked down a path of creating a hello world application, which I was having nothing but problems finding working examples which would work with ML64, Microsoft’s 64 bit Assembler compiler.

And then I came across NASM. A tight little compiler which produced three errors for the Assembly code I’d found on the internet.

Here’s the code I wound up using:


It’s relatively simple: First, it gets a handle to the console window, second, it writes to that handle the “Hello World” message (defined at message:), and finally it exits the process with a zero value.

Ultimately, this make file (makeit.bat) handles the compilation and linking to form the executable, as you can see in it I’m referencing the NASM  compiler to build the ASM code above, which outputs an object file which is then linked by the Microsoft link tool with Kernel32.lib (because of the three WINDOWS api calls) – to form the executable.


But with all this work, and my goal of creating light and fast code, the net result surprised me.


Now why in Sam’s hell would the executable file size be LARGER than the assembly code itself?

The answer’s annoying but simple. First, the inclusion of Microsoft’s lib file, Kernel32.lib, which is nothing more than mapping to it’s DLL file by the same name adds in substantial bloat to the file. Second, there’s also a lot of ‘white space’ within the executable file.

Let me demonstrate. What follows is a hexadecimal view of the file generated, and as can clearly be seen on the right hand side – the clear text “Hello World” – trailing that is a great deal of a repeating 0x00 hexadecimal value.


Now to rewind. I get sidetracked a LOT when I code. But this in particular I considered a fundamentally important thing to understand. What was the size of an executable created in Assembler, versus that of say – the C language?

So what I did was created a quick little C program, and then did a compile of it using standard c libraries.

Here’s the program:


And lo and behold, a program which has PRECISELY the same output as the Assembly program that had compiled to 2k, written and compiled in C with default settings on the compiler I get:


Crazy, right? A program that I created in 90 bytes (90*8=720 bits) creates an executable of 53,248 bytes!

At first, I thought, ok, clearly assembler is the way to go. But then I thought, ‘lets optimize the C settings’.

The most substantially noticeable settings I altered for compiling this in C were simple and straightforward settings, Minimize Size and Favor Small Code:


From this, I saw the executable size shrink from 53k to … well.. you can see:


5,632 bytes. This was the most compact form I could get the executable in C without spending too much time on it, but it was becoming pretty clear already. Optimization of speed and size may already be proven to be better in Assembler, but the size of the executable just didn’t make sense. I would have thought that assembly would have been as close to machine language as possible, and logically, the size of the executable generated should at the very least have been no larger than the Assembly file itself.

Now in the end, I did wind up reducing the size of the executable dramatically, something I learned from those who are participating in the 4k and 64k demo competitions – and to me – this is equivalent to cheating.

I compressed the executable file using something called Crinkler (available here).

The file size speaks for itself:


As for the output, it’s the same, the first command line execution ‘helloworld.exe’ is the uncompressed (2048 byte) version, the second ‘hw.exe’ is the compressed (487 byte version):


Now to be clear, this method of size reduction – while excellent for code obfuscation with it’s wonderfully indecipherable binary file produced:


This method of size reduction introduces it’s own problem: It decreases performance.

Why? Quite simple: Because the binary has to be expanded in memory dynamically to a windows executable in order to run, even for this simple and tiny command line binary, there’s a HUGELY noticeable differences in performance of these executables.

So in the end. While I was able to get a clean compile of Hello World and a small executable, the simple fact of the matter is – I learned that both NASM and ML64 aren’t clean versions of assembler which translate directly to the machine operations they’re supposed to represent.

So while the next step is to attempt direct interrupt calls leveraging INT 10 to alter specific pixels on the screen by modifying memory directly, I’m already downloading x64 Intel Assembler books to see how the assembly compilers translate the assembler instructions to machine op codes.

As much as I shiver considering this. I very well may wind up making my own compiler.

Something I’d actually considered in the past.

But with this focus I have on accelerated graphics in software, not hardware, I’m reaching the conclusion that I may be better off not wasting my time on figuring out how to leverage these programs and tool sets which seem to be better at wasting my time rather than actually providing me value as they should be.

We’ll see.

Next up over the next couple days: INT 10 call to draw a pixel directly to screen.

THEY say it can’t be done because of Windows Protected mode isolation.

I call bullshit






‘m wanting to Tight, fast executable code generation – which hopefully, when I start digging into creating my own versions of

For instance, I found out I don’t like Microsoft’s ML64 assembler which comes standard with Microsoft Development products because of the larger file size it generates for output.

This is when I became aware that unlike old school 8086 assembler where there was only one flavor of assembler, now there are many variations of assembler based on the compiler.

So with this, I pulled down NASM Assembler, a popular compiler which is easy to use

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Enter your email address to follow this blog and receive notifications of new posts by email.