My Bare Metals

Why You Started

Watching Revolution OS will inspire you to learn Linux and understand what Linux brings to the table when it comes to development.

Von Neumann's work (and of course, including the people who worked under him before his paper drafts), ENIAC, and the realization that someone, at some point, wrote the very first assembly code (with the hope of trying to interact with dead electronic silicon) - all of this inspires me to learn Assembly. Before that, we were using punch cards, which, if I’m not mistaken, were originally used in the textile industry.

It’s not just about writing code for me; it’s about a deep desire to truly understand how machines work at their core. I feel proud and amazed that we, as humans, have created so much from almost nothing. Sometimes, it feels like we’re magicians, pulling inventions out of thin air. All I want is to be a part of this incredible journey. I’m so grateful for being in the right place, as I’m aware of wars going on somewhere on this same planet - it feels like a fateful act that we have the time to spend and understand things. I feel like I’m standing on the shoulders of giants (and no, I’m not getting into why Sir Isaac Newton said “giants” or if he was teasing his nemesis:)).

The Fascinating World of Assembly

Learning assembly is like looking under the hood of an immensely powerful, complex machine. You gain a direct understanding of how high-level languages translate their instructions into the very steps a processor executes. It’s the language of the hardware itself-unencumbered by the abstractions you find in C, Python, or Java.

This perspective can be both empowering and awe-inspiring. When you write assembly, you’re moving registers, toggling flags, and manipulating memory as the CPU does it. It’s also a gateway to improving overall coding prowess: once you understand assembly, you’ll be able to optimize high-level code, debug elusive bugs, and foresee performance bottlenecks long before they become an issue.

Diving into Technical Aspects

Outlinining some core areas where assembly gives you deeper insight:(my fvrt being Reverse Engineering)

Reverse Engineering: I have a few scripts on my machine that are meant to be executed before the laptop is logged off or shut down. If the laptop is used by someone other than me, the chances of the laptop crashing are significantly higher. It’s highly likely that the laptop will get erased. However, one day I forgot the script logic and code. The script was written in C, but I had already deleted the script itself and only kept the executable file. I somehow managed to perform reverse engineering and saved my laptop from crashing.
The point is: It’s possible to spend some time and try to understand machine instructions. Of course, I’m not suggesting you use reverse engineering to build an Adobe Photoshop application or bypass a paywall.
32-bit vs. 64-bit Nuances:
Imagine a codebase that compiles in both 32-bit and 64-bit modes. The 64-bit build uses additional registers—like %r8 through %r15—to keep more temporary values in registers, while the 32-bit build has to spill some of those values to memory due to fewer available registers.
A subtle rounding difference might appear in numeric computations: the 64-bit version keeps more precision in registers, whereas the 32-bit version frequently writes intermediate results to memory, losing tiny precision bits each time. By examining the compiled assembly, it becomes clear how each architecture handles register usage differently. Recognizing that 64-bit mode retains intermediate values in registers longer can explain why the final results diverge slightly from the 32-bit build. Knowledge of CPU registers and architecture resolves these discrepancies.
Memory Management Nuances:
A small embedded system exhibited data corruption, but high-level code gave no clear reason. By examining the linker script and the generated assembly, it became evident that a naive recursive Fibonacci call was causing stack overflow. This deep recursion let the stack pointer collide with a read-only data section in memory. The assembly map revealed the collision, allowing us to adjust the memory layout and fix the issue. That insight into memory layout and stack usage was only possible through assembly-level inspection.

Below is a naive Fibonacci C example that can lead to such a problem in a memory-constrained environment:
```
    
#include <stdio.h>
#include <stdint.h>
static const char banner[] = "Fibonacci Demo\\n";
uint32_t fib(uint32_t n) {
    if (n < 2) return n;
    return fib(n - 1) + fib(n - 2);
}
int main(void) {
    printf("%s", banner);
    uint32_t N = 40; 
    uint32_t result = fib(N);
    printf("fib(%u) = %u\\n", N, result);
    return 0;
}
```
In a limited-memory environment, repeatedly calling fib() can exceed the small stack, inadvertently overwriting the banner string in read-only memory. Observing the assembly (or linker script) shows exactly how the stack grows and collides with .rodata. Fixing either the recursion pattern (e.g., using an iterative approach) or reconfiguring the memory map prevents this corruption.

Instruction Set & Vector Operations:
A matrix multiplication loop for real-time image processing was too slow despite compiler optimizations. Assembly-level inspection showed inefficient use of SIMD instructions. By manually writing assembly using instructions like movaps, mulps, and addps, the performance improved significantly. Below is a naive Fourier transform example that illustrates this:


#include <stdio.h>
#include <math.h>
#include <xmmintrin.h> 
#define N 8 
void dft_naive(const float* inRe, float* outRe, float* outIm, int n) {
for (int i = 0; i < n; i++) {
    float sumRe = 0.0f;
    float sumIm = 0.0f;
    for (int k = 0; k < n; k++) {
        float angle = 2.0f * (float)M_PI * i * k / n;
        sumRe += inRe[k] * cosf(angle);
        sumIm -= inRe[k] * sinf(angle);
    }
    outRe[i] = sumRe;
    outIm[i] = sumIm;
}
}
int main(void) {
float inRe[N]  = {1,2,3,4,5,6,7,8};
float outRe[N] = {0};
float outIm[N] = {0};
dft_naive(inRe, outRe, outIm, N);
for (int i = 0; i < N; i++) {
    printf("freq[%d] = (%f) + j(%f)\\n", i, outRe[i], outIm[i]);
}
return 0;
}

The naive implementation repeatedly calls cosf() and sinf(), which is computationally expensive. Using vectorized instructions like movaps and mulps to precompute values and perform batch operations can significantly improve performance, especially on modern SIMD-capable architectures.

Calling Conventions:
Integrating a fast assembly routine into a C program caused crashes during function calls. Analyzing the assembly revealed mismatched calling conventions: the assembly expected registers like %rbx to remain preserved, while the C compiler reused them. Correcting the calling conventions aligned the two, resolving the crashes.
Performance & Micro-Optimizations:
In a high-frequency trading system, a critical loop caused latency spikes. Examining the assembly showed redundant instructions and memory accesses that the compiler didn't optimize. A manually unrolled loop in assembly reduced the latency significantly, demonstrating how direct control over the instruction sequence can lead to performance gains in latency-critical systems.

Taken together, these technical insights don’t just make you a “low-level coder”-they make you a well-rounded programmer who can switch between high-level design and low-level details whenever needed.


[Ilya@m87 ~]$ cat /usr/include/asm/unistd_32.h  | grep "exit"
#define __NR_exit 1
#define __NR_exit_group 252
[Ilya@m87 ~]$ cat /usr/include/asm/unistd_64.h  | grep "exit"
#define __NR_exit 60
#define __NR_exit_group 231

Jonathan Bartlett’s Book

A fantastic resource to get you started is Jonathan Bartlett’s book, often referenced as “Programming from the Ground Up.” It dives into x86 Linux assembly, guiding you through examples and hands-on exercises in a logical, approachable manner.

One version of the PDF is still active as of epoch:1735236236 and can be found here:(im lazy to write dates, I use `mtime` command for my personal reasons) Programming from the Ground Up (lettersize PDF) .

Bartlett's structured approach helps demystify assembly and correlates it directly to the day-to-day tasks programmers care about-like how function calls work or how data structures fit in memory.

My Leisure Project

If you’re looking for some informal notes and a place to see how I’ve been exploring (and explaining) some of Bartlett’s code examples, feel free to visit my GitHub repository:

github.com/1darshanpatil/Assembly

Please note, this is a leisure project, so don’t expect a polished tutorial or formal structure. However, it might offer additional perspectives on the foundational concepts-and maybe inspire you to add your own experiments and notes as you learn.

Why Learn Assembly

Why You Started

The Fascinating World of Assembly

Diving into Technical Aspects

Jonathan Bartlett’s Book

My Leisure Project