diff options
author | Alexis Hovorka <[email protected]> | 2024-08-17 19:32:33 -0600 |
---|---|---|
committer | Alexis Hovorka <[email protected]> | 2024-08-17 19:32:33 -0600 |
commit | 3712c0af48b9cc68fe52729cb7aa24006e35cb5d (patch) | |
tree | 14ebf05a0335624d56b171c7f17450d0baa339bb /posts | |
parent | d2a685e50d20101a42b7d9d08d1b90bb92defcd3 (diff) |
[post] AArch64 Bootstrapping 1
Diffstat (limited to 'posts')
-rw-r--r-- | posts/20240817-aarch64-bootstrapping-1.md | 265 |
1 files changed, 265 insertions, 0 deletions
diff --git a/posts/20240817-aarch64-bootstrapping-1.md b/posts/20240817-aarch64-bootstrapping-1.md new file mode 100644 index 0000000..34e7b76 --- /dev/null +++ b/posts/20240817-aarch64-bootstrapping-1.md @@ -0,0 +1,265 @@ +--- +date: 2024-08-17 +title: AArch64 Bootstrapping 1 - Seed Groundwork +tags: +- bootstrapping +- aarch64 +- assembly +--- + +I got a new laptop[^newlaptop] recently, and in the spirit of how open and +accessible and tinkerable the whole system is, I thought it would be fun to try +and do a little bootstrapping challenge. Starting from a tiny hand-made ELF +executable "seed," I'm going to try to build a complete programming environment +from scratch. + +<!--more--> +[^newlaptop]: [MNT Pocket Reform](https://www.crowdsupply.com/mnt/pocket-reform) + +Ideally I'd like to have started off by `cat`ting raw hexadecimal into a file +and building up from there, or something to that effect, but my new laptop uses +AArch64, also known as Arm64.[^aarch64] That's the instruction set used in most +smartphones and newer Macs, and as a RISC-ish ISA it's quite a bit cleaner than +the convoluted mess x86/64 has grown into, but it's also significantly more +difficult to write by hand. I gave it a good shot, but on top of the +instruction fields being weirdly aligned and my near-complete inexperience with +directly using the instruction set, I finally relented when I realized that ELF +executables have to be little-endian,[^endian] meaning I'd have to write the +raw, unfamiliar instructions *backwards.* + +[^aarch64]: [AArch64 instruction set quick reference sheet](https://courses.cs.washington.edu/courses/cse469/19wi/arm64.pdf) +[^endian]: [Wikipedia: Endianness](https://en.wikipedia.org/wiki/Endianness) + +Fortunately, the endianness thing is a pretty straightforward problem to solve, +so I thought I'd try using that as a warmup. What I'd like is a program that +takes in a space-delimited stream of varying sizes of numbers in hexadecimal, +and spits them out the other side "compiled" to binary in little-endian byte +order. Giving it a first go in C to make sure I had my implementation concept +down right, I came up with this: + +```c:0 +// hex.c +#include <stdio.h> + +int main(void) { + char c = 0; // Input character + int count = 0; // Hex digits parsed for this number + long long v = 0; // Current value + + while ((c = getchar()) != EOF) { + + // Skip comments + if (c == '\\') { while ((c = getchar()) != EOF && c != '\n') {} } + + // Process incoming digits and add to running value + else if (('0' <= c) && (c <= '9')) { v = (v << 4) + (c - '0'); count++; } + else if (('a' <= c) && (c <= 'f')) { v = (v << 4) + (c - 'a' + 10); count++; } + + // Send the finished number out the other end + else if (c == ' ') { + + // Turn the number of hex digits into a number of bytes + if (count & 1) count++; + count >>= 1; + + // Emit each byte, littlest first + for (; count>0; count--) { + c = v & 0xFF; + putchar(c); + v >>= 8; + } + } + } +} +``` + +That worked like a charm, but C is still pretty high-level for where we're +going, so I'm not going to spend too long going over it here. Instead, I +rewrote it in assembly. Given that I went for a mostly-direct translation, odds +are there's plenty more efficiency that could be squeezed out of it in terms of +speed or instruction count, but I'm really only planning on using this on +itself and the very next stage, so I figure if it works correctly, that's good +enough. I used the GNU `as` assembler because I was already familiar with it, +and I find the ephemeral numbered labels especially nifty. + +Starting off, we set up a couple "global" variables, then begin the main loop +by pulling in a new character and checking if we've reached the end of the +input, and exiting if so. + +```armasm:0:2 +// hex.s +.section .data +buf: + .byte 0 // input/output character buffer + +.section .text +.global _start +_start: + mov x19, #0 // value + mov x20, #0 // count + +loop: + mov x8, #63 // read + mov x0, #0 // stdin + adr x1, buf + mov x2, #1 + svc 0 + + cmp x0, #0 // count of bytes read; 0 means EOF + b.ne 1f + +exit: + mov x8, #93 // exit + mov x0, #0 // error code (no error) + svc 0 +``` + +System calls are made in AArch64 Linux by loading the system call +number[^syscalls] into the register `x8`, plus the syscall arguments into some +of the other registers, and then running the "supervisor call" instruction `svc +0` which handles the transition over to the kernel's control. + +[^syscalls]: [AArch64 Linux system call reference sheet](https://arm64.syscall.sh/) + +For the `read` syscall (63), the arguments are the source file descriptor, a +pointer to the target character buffer(!), and the requested length. Note that +the call specifically requires a destination *buffer;* as far as I can tell, +there's no way to request a single char read directly into a register. So, we +need to load a register with a pointer to our one-byte buffer, then after the +syscall returns, we load the character from there. + +AArch64 has a special instruction, `adr`, for loading a register with an +absolute memory address given a program-counter-relative offset. My +understanding is that this is the idiomatic way to do this, even though it +might prove to be a bit of a pain when we're manually calculating addresses and +offsets later. + +Once the syscall has placed the byte into the buffer, we pull it out and check +to see if it's a backslash, meaning we're at the start of a comment. If so, we +enter an inner loop where we read characters repeatedly until we find the end +of the line, then we start the main loop again from the top. + +```armasm:26:2 +1: + adr x1, buf + ldurb w9, [x1] + + cmp w9, #92 // backslash; start of comment + b.ne 2f + +comment: + mov x8, #63 // read + mov x0, #0 // stdin + adr x1, buf + mov x2, #1 + svc 0 + + cmp x0, #0 + b.eq exit + + adr x1, buf + ldurb w9, [x1] + cmp w9, '\n' + b.ne comment + b loop +``` + +If it's not a comment, then it might be a hex digit we need to collect into the +current value. I broke this part up into three sections: one that finds the +value of decimal digits, one that finds the value of hex digits `a` through `f` +(case sensitive!), and one that shifts the digit into the current large value +we're building. Once that's done, we head back up to the top of the main loop +to process the next character. + +```armasm:49:2 +2: + cmp w9, '0' + b.lt 3f + cmp w9, '9' + b.gt 3f + + sub w9, w9, '0' + b 4f + +3: + cmp w9, 'a' + b.lt 5f + cmp w9, 'f' + b.gt 5f + + sub w9, w9, ('a' - 10) + b 4f + +4: + lsl x19, x19, #4 + add x19, x19, w9, uxtb + add x20, x20, #1 + b loop +``` + +Note the use of the... flag? `uxtb` in the `add` instruction on line 69. This +signals the processor to treat the value in `w9` as an unsigned (i.e. +zero-extended) byte for the purposes of the addition. Technically we only need +a nibble (4 bits, not 8), but since we know the range of values in `w9` is only +0--15, byte-sized extension works fine. To be honest, you could probably just +use the full-width `x9` and avoid needing to specify an extension scheme at +all, but I felt like it would be nice to keep using `w9` for consistency. It +also showed me something about the ISA structure that I wouldn't otherwise have +thought to look into. + +If it's not a comment and it's not a hex digit, then it's either a character +we're ignoring (like a line break) or it's a space, which we use as the cue to +flush the current value to the output. + +```armasm:73:2 +5: + cmp w9, ' ' + b.ne loop +``` + +To flush the output, first we normalize the count of nibbles into a count of +bytes: we round up (i.e. increment if it's odd), then we shift right one bit +(dividing the count by two). The `tst` instruction used here differs from the +`cmp` instruction we've been using everywhere else in that it does the +comparison using a bitwise "and" operation, rather than subtraction. + +```armasm:77:2 + tst x20, #1 + cinc x20, x20, ne + lsr x20, x20, #1 +``` + +Finally, we enter a loop for each byte. We store the unsigned low-order byte +from our big value in `x19` (accessed through the 32-bit sized `w19`), send it +out using the `write` syscall, shift our big value over by 8 to put the next +byte in the low-order spot, then decrement the byte count until we're done +handling this value and we jump back up to the top of the main loop to start on +the next value. + +```armasm:81:2 +6: + cmp x20, #0 + b.eq loop + + mov x8, #64 // write + mov x0, #1 // stdout + adr x1, buf + sturb w19, [x1] + mov x2, #1 + svc 0 + + lsr x19, x19, #8 + + sub x20, x20, #1 + b 6b +``` + +That wasn't too bad! In the next post, we'll test this new "language" out with +a few small executables, then rewrite this program in hex itself, then *maybe* +start on the next stage, a tiny little Forth implementation. + +Until then! :wave: + +--- + +Files: [hex.c](/20240817-aarch64-bootstrapping-1/hex.c) [hex.s](/20240817-aarch64-bootstrapping-1/hex.s) |