--- date: 2024-08-17 title: AArch64 Bootstrapping 1 - Seed Groundwork tags: - bootstrapping - aarch64 - assembly --- I got a new laptop[^newlaptop] recently, and in the spirit of how open and accessible and tinkerable the whole system is, I thought it would be fun to try and do a little bootstrapping challenge. Starting from a tiny hand-made ELF executable "seed," I'm going to try to build a complete programming environment from scratch. [^newlaptop]: [MNT Pocket Reform](https://www.crowdsupply.com/mnt/pocket-reform) Ideally I'd like to have started off by `cat`ting raw hexadecimal into a file and building up from there, or something to that effect, but my new laptop uses AArch64, also known as Arm64.[^aarch64] That's the instruction set used in most smartphones and newer Macs, and as a RISC-ish ISA it's quite a bit cleaner than the convoluted mess x86/64 has grown into, but it's also significantly more difficult to write by hand. I gave it a good shot, but on top of the instruction fields being weirdly aligned and my near-complete inexperience with directly using the instruction set, I finally relented when I realized that ELF executables have to be little-endian,[^endian] meaning I'd have to write the raw, unfamiliar instructions *backwards.* [^aarch64]: [AArch64 instruction set quick reference sheet](https://courses.cs.washington.edu/courses/cse469/19wi/arm64.pdf) [^endian]: [Wikipedia: Endianness](https://en.wikipedia.org/wiki/Endianness) Fortunately, the endianness thing is a pretty straightforward problem to solve, so I thought I'd try using that as a warmup. What I'd like is a program that takes in a space-delimited stream of varying sizes of numbers in hexadecimal, and spits them out the other side "compiled" to binary in little-endian byte order. Giving it a first go in C to make sure I had my implementation concept down right, I came up with this: ```c:0 // hex.c #include int main(void) { char c = 0; // Input character int count = 0; // Hex digits parsed for this number long long v = 0; // Current value while ((c = getchar()) != EOF) { // Skip comments if (c == '\\') { while ((c = getchar()) != EOF && c != '\n') {} } // Process incoming digits and add to running value else if (('0' <= c) && (c <= '9')) { v = (v << 4) + (c - '0'); count++; } else if (('a' <= c) && (c <= 'f')) { v = (v << 4) + (c - 'a' + 10); count++; } // Send the finished number out the other end else if (c == ' ') { // Turn the number of hex digits into a number of bytes if (count & 1) count++; count >>= 1; // Emit each byte, littlest first for (; count>0; count--) { c = v & 0xFF; putchar(c); v >>= 8; } } } } ``` That worked like a charm, but C is still pretty high-level for where we're going, so I'm not going to spend too long going over it here. Instead, I rewrote it in assembly. Given that I went for a mostly-direct translation, odds are there's plenty more efficiency that could be squeezed out of it in terms of speed or instruction count, but I'm really only planning on using this on itself and the very next stage, so I figure if it works correctly, that's good enough. I used the GNU `as` assembler because I was already familiar with it, and I find the ephemeral numbered labels especially nifty. Starting off, we set up a couple "global" variables, then begin the main loop by pulling in a new character and checking if we've reached the end of the input, and exiting if so. ```armasm:0:2 // hex.s .section .data buf: .byte 0 // input/output character buffer .section .text .global _start _start: mov x19, #0 // value mov x20, #0 // count loop: mov x8, #63 // read mov x0, #0 // stdin adr x1, buf mov x2, #1 svc 0 cmp x0, #0 // count of bytes read; 0 means EOF b.ne 1f exit: mov x8, #93 // exit mov x0, #0 // error code (no error) svc 0 ``` System calls are made in AArch64 Linux by loading the system call number[^syscalls] into the register `x8`, plus the syscall arguments into some of the other registers, and then running the "supervisor call" instruction `svc 0` which handles the transition over to the kernel's control. [^syscalls]: [AArch64 Linux system call reference sheet](https://arm64.syscall.sh/) For the `read` syscall (63), the arguments are the source file descriptor, a pointer to the target character buffer(!), and the requested length. Note that the call specifically requires a destination *buffer;* as far as I can tell, there's no way to request a single char read directly into a register. So, we need to load a register with a pointer to our one-byte buffer, then after the syscall returns, we load the character from there. AArch64 has a special instruction, `adr`, for loading a register with an absolute memory address given a program-counter-relative offset. My understanding is that this is the idiomatic way to do this, even though it might prove to be a bit of a pain when we're manually calculating addresses and offsets later. Once the syscall has placed the byte into the buffer, we pull it out and check to see if it's a backslash, meaning we're at the start of a comment. If so, we enter an inner loop where we read characters repeatedly until we find the end of the line, then we start the main loop again from the top. ```armasm:26:2 1: adr x1, buf ldurb w9, [x1] cmp w9, #92 // backslash; start of comment b.ne 2f comment: mov x8, #63 // read mov x0, #0 // stdin adr x1, buf mov x2, #1 svc 0 cmp x0, #0 b.eq exit adr x1, buf ldurb w9, [x1] cmp w9, '\n' b.ne comment b loop ``` If it's not a comment, then it might be a hex digit we need to collect into the current value. I broke this part up into three sections: one that finds the value of decimal digits, one that finds the value of hex digits `a` through `f` (case sensitive!), and one that shifts the digit into the current large value we're building. Once that's done, we head back up to the top of the main loop to process the next character. ```armasm:49:2 2: cmp w9, '0' b.lt 3f cmp w9, '9' b.gt 3f sub w9, w9, '0' b 4f 3: cmp w9, 'a' b.lt 5f cmp w9, 'f' b.gt 5f sub w9, w9, ('a' - 10) b 4f 4: lsl x19, x19, #4 add x19, x19, w9, uxtb add x20, x20, #1 b loop ``` Note the use of the... flag? `uxtb` in the `add` instruction on line 69. This signals the processor to treat the value in `w9` as an unsigned (i.e. zero-extended) byte for the purposes of the addition. Technically we only need a nibble (4 bits, not 8), but since we know the range of values in `w9` is only 0--15, byte-sized extension works fine. To be honest, you could probably just use the full-width `x9` and avoid needing to specify an extension scheme at all, but I felt like it would be nice to keep using `w9` for consistency. It also showed me something about the ISA structure that I wouldn't otherwise have thought to look into. If it's not a comment and it's not a hex digit, then it's either a character we're ignoring (like a line break) or it's a space, which we use as the cue to flush the current value to the output. ```armasm:73:2 5: cmp w9, ' ' b.ne loop ``` To flush the output, first we normalize the count of nibbles into a count of bytes: we round up (i.e. increment if it's odd), then we shift right one bit (dividing the count by two). The `tst` instruction used here differs from the `cmp` instruction we've been using everywhere else in that it does the comparison using a bitwise "and" operation, rather than subtraction. ```armasm:77:2 tst x20, #1 cinc x20, x20, ne lsr x20, x20, #1 ``` Finally, we enter a loop for each byte. We store the unsigned low-order byte from our big value in `x19` (accessed through the 32-bit sized `w19`), send it out using the `write` syscall, shift our big value over by 8 to put the next byte in the low-order spot, then decrement the byte count until we're done handling this value and we jump back up to the top of the main loop to start on the next value. ```armasm:81:2 6: cmp x20, #0 b.eq loop mov x8, #64 // write mov x0, #1 // stdout adr x1, buf sturb w19, [x1] mov x2, #1 svc 0 lsr x19, x19, #8 sub x20, x20, #1 b 6b ``` That wasn't too bad! In the next post, we'll test this new "language" out with a few small executables, then rewrite this program in hex itself, then *maybe* start on the next stage, a tiny little Forth implementation. Until then! :wave: --- Files: [hex.c](/20240817-aarch64-bootstrapping-1/hex.c) [hex.s](/20240817-aarch64-bootstrapping-1/hex.s)