Alternative title: how to implement floating point arithmetic from scratch (the hard way)
Introduction
The above picture might not look terribly impressive, but a closer inspection should reveal something strange: doesn’t the precision seem to be a bit too low for a modern Prolog system (sadly, I’m not running my interpreter in MICRO-PROLOG)?
In this entry I grab the bull by the horns and completely rewrite the arithmetic predicates of the interpreter, previously relying on Prolog’s built-in is/2, to my homebrewed, purely logical implementation. I’d be the first one to admit that this might not sound like a worthwhile endevour, but of all the things I’ve considered in this project, this is perhaps the only contribution which in principle might be of some use. Writing a high-level interpreter for a programming language is much easier than writing a full-blown emulator, and not all language features need a low-level implementation. Hence, with this entry I want to show that we even in Prolog, where we in general have little control over such matters, can simulate low-level features of imperative programming languages without too much effort. The problem is that it’s rather unlikely that there’s any BASIC-80 software still in use where inaccurate floating point arithmetic would be important, but one never knows!
Constraints
I decided to to implement these arithmetic operations without using Prolog’s built-in arithmetical support (is/2 and friends). Mainly because it’s fun to build something from scratch, but also to refrain from using is/2 unless it actually simplifies the problem to be solved. The sole exception to this rule were predicates for outputting integers and floating point numbers where it’s more helpful to print the (floating point) number represented by the byte sequence rather than just printing the internal representation, and it’s easier to display these values if one is allowed to use exponentiation and multiplication.
Representation
Before we begin I have a confession to make: I initally made a quite poor choice regarding representation and then stubbornly stuck with it. ‘Poor’ is perhaps not the right word; ‘bizzare’ is probably closer to the truth. For the moment, let’s ignore floating point numbers and simply assume that we want to represent natural numbers. The easy representation, which can be found in many textbooks and which is typically taught to students, is to represent the number ‘0’ by itself and then represent any larger number by a nested term of the form s(…(s(0))). This is sometimes called a unary representation. What’s the problem? Representing a number requires a nested term of depth
which is exponentially worse than a binary representation where a number
can be represented with only log(n) bits. Such an inefficiency which would propagate through all arithmetical predicates of interest (addition, multiplication, and so on), and render any non-trivial program more or less useless. Moreover, while (a generalisation of) this representation could be used to represent rational numbers, it would not be meaningful to use it to represent floating point numbers in a fixed format, nor to accurately simulate e.g. overflow. Thus, our representation of numbers (and later on, floating point numbers) is going to be as sequences of bits, for which we have two obvious choices.
- As a list of bits [b1, …, bn].
- As a flat term bits(b1,…,bn).
But since we want to represent both integers (2 bytes) and floating point numbers in the MSB format (4 bytes), the latter representation would result in a lot of redundant and tedious code (just trust me on this!). However, there’s a compromise: we could represent bytes as flat terms of the form byte(b1,…,b8) and then (e.g.) represent an integer as int([Byte1, Byte2]). Then we would only have to write low-level predicates operating on bytes, but not arbitrary sequences of bits, which would be very inconvenient when working with flat terms. My arguments in favour of this representation when compared to the bit list representation were as follows:
- A list of bits is the wrong representation since some operations which should be O(1) are O(n), where n is the number of bits.
- A hard coded byte representation makes it easier for a structure sharing system to do its job properly, and could also facilitate tabling (a form of memoization). E.g., tabling addition or multiplication for all floating point numbers is probably too much, but we could very easily table byte addition, and obtain a significant speed-up.
- Even if predicates working directly on bytes are a bit cumbersome to define, we only have to define a small number of such predicates, and then floating point and integer operations can be handled in a reasonably clean way.
These arguments are not wrong , but not terribly impactful either. Does it really matter if a right-shift operation takes a bit more time than it theoretically should, and is it not better to start with the more general solution and then choose a more efficient representation if the general one turns out to be to slow? Definitely, which makes it a bit embarrasing to present the code in the following sections.
Bytes
As already remarked, we’ll represent a byte by a term byte(b1, …, b8) where each is a bit. For example, byte(1,0,0,1,0,0,0,1) would represent the byte 10010001. Since bytes consists of bits, let’s begin by writing a predicate which adds two bits, before we worry about solving the larger problem, and once we know how to add two bytes, we’ll worry about integer and floating point addition. But how can we accomplish this without using any built-in predicates? I’m quite convinced that if this question was asked during a programming interview then not that many would obtain an acceptable solution, but it’s actually very simple: just hardwire the required information. Thus, if the two given bits are 0, the third bit should be 0, and if one of them is 1 but the other one is 0, the result should be 1. In other programming languages we would encode this with a handful of if-statements, and in Prolog the easiest solution is to represent each such case by a fact.
add_bits(0,0,0). add_bits(0,1,1). add_bits(1,0,1).
But what if the two given bits are 1? Then the result should be 0, but with 1 in carry. Hence, we’ll have to add an additional argument describing the carry value, which we’ll for simplicity add as the last argument.
add_bits(0,0,0,0). add_bits(0,1,1,0). add_bits(1,0,1,0). add_bits(1,1,0,1).
This is not bad, but now imagine the situation where we want to add two bytes. Then the strategy should be to add the least significant bits, remembering the carry, and then use the carry (out) as additional input when adding the two next bits (carry in), and so on. Hence, what we really want to describe is the relationship between two given input bits, a carry in bit, an output bit, and a carry out bit. It’s easy to see that this doubles the number of cases that we have to consider, and we obtain the following program (where the carry in bit is the fourth argument and the carry out bit is the fifth argument).
add_bits(0,0,0,0,0). add_bits(0,1,1,0,0). add_bits(1,0,1,0,0). add_bits(1,1,0,0,1). add_bits(0,0,1,1,0). add_bits(0,1,0,1,1). add_bits(1,0,0,1,1). add_bits(1,1,1,1,1).
Let me just quickly digress that while this solution does require us to explicitly list all cases of interest, it’s conceptually not harder, and much easier to understand, than a solution making use of is/2. It also results in a more general program: we could e.g. ask a query of the form
add_bits(X1,X2,1,0,0)
which would give us the answer that X1 and X2 are either 0 and 1, or 1 and 0. Before we turn to byte addition we should make one observation: quite a few of the predicates which we’ll define are going to make frequent use of carry in/out, but manually having to keep track of carry in/out, and remembering which argument is which, is going to be quite tedious and error prone. A better solution would be to implicitly thread this state using definite clause grammars (DCGs). This is the same solution that I used to thread the state of the interpreter so if it sounds unfamiliar it might be a good idea to re-read the previous entry. Hence, carry in and carry out will be described by state//2 and we’ll understand grammar rules (written with –>) as describing a sequence of carry in/out transitions. We thus rewrite add_bits/5 as follows.
add_bits(0,0,0) --> state(0,0). add_bits(0,1,1) --> state(0,0). add_bits(1,0,1) --> state(0,0). add_bits(1,1,0) --> state(0,1). add_bits(0,0,1) --> state(1,0). add_bits(0,1,0) --> state(1,1). add_bits(1,0,0) --> state(1,1). add_bits(1,1,1) --> state(1,1).
If you’re struggling to make sense of DCGs, or are of the opinion that DCGs should only be used for parsing, then simply imagine that we with the above programming scheme makes it possible for rules to have carry in/out without explicitly needing to be declared as arguments. This is going to be quite convienent later on so it’s certainly worth the effort.
Let’s now turn to the problem of adding two bytes. I’ve already outlined the algorithm, but I suspect that the solution that I’m going to present might be quite disturbing to some readers. How can we iterate over a byte term byte(b1, …, b8)? The problem is that such a term is not a ‘recursive’ data structure (like a tree or a list) and it’s rather offputting to attempt to do any form of general iteration. While we certainly could implement a crude variant of a for loop by hardwiring the arithmetic required to proceed to the next iteration (remember that I’m trying to solve this without using is/2), the resulting code would be far from elegant and needlessly complicated. In fact, since a byte has a fixed length, it’s much easier to just add the bits manually, starting with the least significant bits in position 8, and finishing with the most significant bits in position 1. This can be accomplished as follows.
add_byte(byte(X1,X2,X3,X4,X5,X6,X7,X8), byte(Y1,Y2,Y3,Y4,Y5,Y6,Y7,Y8), byte(Z1,Z2,Z3,Z4,Z5,Z6,Z7,Z8)) --> add_bits(X8, Y8, Z8), add_bits(X7, Y7, Z7), add_bits(X6, Y6, Z6), add_bits(X5, Y5, Z5), add_bits(X4, Y4, Z4), add_bits(X3, Y3, Z3), add_bits(X2, Y2, Z2), add_bits(X1, Y1, Z1).
In contrast, if we had represented bytes as lists of bits, then we could easily have solved the problem via standard recursive problem solving, and in the process also obtaining a much more general program applicable to arbitrary sequences of bits.
add_byte([], [], []) --> state(C,C). add_byte([X|Xs], [Y|Ys], [Z|Zs]) --> add_byte(Xs, Ys, Zs), add_bits(X, Y, Z).
This is the problem with the byte/8 representation in a nutshell: while it’s in principle possible to write general predicates it’s in practice much easier to just hardwire everything. For an additional example, assume that we want to write a right-shift operation where we shift in according to carry in. Such a rule could very easily be defined by:
shift_right(byte(B1,B2,B3,B4,B5,B6,B7,B8), byte(C0,B1,B2,B3,B4,B5,B6,B7)) --> state(C0,B8).
Which almost feels like cheating, but if all we want to do is to shift a given byte, there’s really no reason to write a general purpose program. Before we turn to integer arithmetic I’m going to present one more simplification which is going to make it simpler to define larger operations. While defining operations on bytes (addition, subtraction, complement, and so on) is not terribly difficult, it’s a bit cumbersome to chain together several byte expressions without functional notation. There’s no general support for functional notation in Prolog, but it’s possible to create a context where a term is going to be interpreted as a function, similar to how it’s possible to use arithmetical expressions in the context of is/2. Thus, we’re going to implement a rule ev//2 which takes a byte expression (written in a redable way by using standard operators) in its first argument, and evaluates the expression (in the context of a carry out/in transition) in its second argument.
%~ is defined as a unary operator. ev(~Exp, Res) --> ev(Exp, Res0), neg_byte(Res0, Res). ev(Exp1 - Exp2, Res) --> ev(Exp2, Byte2), ev(Exp1, Byte1), sub_byte(Byte1, Byte2, Res). ev(Exp1 + Exp2, Res) --> ev(Exp1, Byte1), ev(Exp2, Byte2), add_byte(Byte1, Byte2, Res). ev(Exp1 >> Exp2, Res) --> ev(Exp1, Byte1), ev(Exp2, Byte2), shift_right(Byte1, Byte2, Res).
It’s of course very easy to extend ev//2 with additional operators as necessary. I also implemented the two constants b1 (short for byte 1) and b0 as:
ev(b1, byte(0,0,0,0,0,0,0,1)) --> state(C,C). ev(b0, byte(0,0,0,0,0,0,0,0)) --> state(C,C).
But it would also be nice to define an ‘assignment’ operator which we can use in infix notation. To accomplish this we begin by defining an operator which is going to unify its left argument with the result of evaluating a byte expression (using ev//2). For no particular reason I choose ‘$=’; mainly because it’s not already in use.
:- op(990, xfy, $=). X $= Y --> ev(Y,X).
Put together this makes it possible to write expressions in a much more readable way than using add_byte and sub_byte manually. For example, given a byte Byte, we could define a rule which defines the two’s complement of a byte.
two_complement(Byte, ByteC) --> ByteC $= ~Byte + b1.
This also makes it possible to define sub_byte//3 in a nice way: simply compute the two’s complement of the second byte and then use byte addition.
Integers
Following the BASIC-80 reference manual we’ll represent integers as two bytes: int([High, Low]) where High is the byte containing the 8 most significant bits, and dually for Low. We can now quite easily define e.g. integer addition as:
add_int(int([X1,X2]), int([Y1,Y2]), int([Z1,Z2])) --> Z2 $= X2 + Y2, Z1 $= X1 + Y1.
All the hard work involving byte arithmetic has finally paid off! Naturally, we could quite easily define other integer operations (subtraction, multiplication, and so on) as necessary. It would also be straightforward to increase the number of bytes and it would even be possible to write general predicates working with an arbitrary number of bytes. Thus, while I’m still a bit ashamed of my hardwired byte predicates, at least the top-level code can be written in a reasonably clean way.
Floating Point Numbers
With the approach in the previous section we can represent integers in the range −32768 to 32767. Not terribly impressive. While it’s not possible to discretely encode more objects with 16 bits, we can circumvent this limit if we shift priorities. For example, imagine that we only care about numbers of the form and store such numbers by encoding the exponent
. Then even with two lowly bytes we could store numbers larger than the number of atoms in the observable universe. The problem, of course, is that such a representation would be incredible imprecise, and a floating point representation is a nice middle ground where we in addition to an exponent store a significand
in a fixed number of bits so that a number is represented by a term
. The point of such a representation is that if we, say, add two large numbers, then we might not have enough bits to accurately add the significands of the two numbers, but we can get reasonable close by making sure that the exponents and the most significant bits in the significands are added correctly. These days floating point arithmetic is typically taken care of with dedicated hardware, but in the heydays of BASIC-80 floating point operations were quite often implemented as a software layer in the interpreter.
Let’s now have a look at how we can add support for floating point operations. Warning: this should not be viewed as a complete implementation but rather as a proof of concept, and I’m only going to superficially describe how two add two positive floating point numbers. Otherwise this entry would be even more delayed than it already is. So let’s turn to the technical details: BASIC-80 uses Microsoft’s binary format (MSB) in either single or double precision, where a number is represented by the product of a significand and an exponent (both represented in binary). A single precision floating point number is then represented as a sequence of 4 bytes B1, B2, B3, B4 where:
- B1 is the exponent (-127,…, -1 are represented by 1…127 and 0…127 are represented in the range 128…255).
- The first bit in B2 is the sign bit (0 is positive and 1 is negative) and the remaining 7 bits are the start of the significand.
- B3 and B4 are the rest of the significand (thus, 23 bits in total).
Representing this in Prolog via our byte representation is of course very straightforward: a floating point number is simply a term float([B1,B2,B3,B4]) where B1,B2,B3,B4 are the bytes in question, and we already know everything there is to know about bytes. But how do we actually obtain a number in this representation? To make a long story short, one has to:
- Convert a decimal fraction to a binary fraction with a fixed number of bits.
- Convert the binary fraction to scientific notation by locating the radix point.
- Convert the binary scientific notation to the single precision floating point number.
All of these steps are rather straightforward, albeit tedious, and my Prolog implementation does not really stand out in any particular way. Hence, I’m simply going to assume that all numbers have been converted to the internal byte representation in a suitable way. Next, to add two floating point numbers we:
- Compare the exponents.
- If they are equal, then add the significands, possibly also increasing the exponent if necessary.
- If one exponent is larger than the other then shift the significand of the smaller number, add the significands, and then shift back.
Thus, the first step is easy, assuming a predicate compare_byte/3 which takes two bytes and returns the order between them (=, <, or >). Please don’t ask how it’s defined (I didn’t hard code it by 16 distinct cases, or did I?).
add_float(float([E1, X1, Y1, Z1]), float([E2, X2, Y2, Z2]), F3) --> {compare_byte(E1, E2, Order)}, add_float0(Order, float([E1, X1, Y1, Z1]), float([E2, X2, Y2, Z2]), F3).
Let’s consider the case where the two exponents are equal. The only complicating factor is that the first two bits in X1 and X2 are not part of the significand, but represents the sign of the number, which we for the moment assume is 0 (i.e., two positive numbers). But if we simply add the significands then the carry is going to get ‘stuck’ in the sign bit, so as a workaround we’ll set the two sign bits to 1, and then add the bytes, starting with the least significant ones.
add_float0(=, float([E1, X1, Y1, Z1]), float([E1, X2, Y2, Z2]), float([E3, X4, Y4, Z4])) --> set_bit(X1, b1, 1, NewX1), set_bit(X2, b1, 1, NewX2), Z3 $= Z1 + Z2, Y3 $= Y1 + Y2, X3 $= NewX1 + NewX2, %Also adds the carry from the previous operation. Hence, the exponent will increase. E3 $= E1 + b0, %The exponent increased so the significand has to be shifted right. X4 $= X3 >> b1, Y4 $= Y3 >> b1, Z4 $= Z3 >> b1.
The case when one of the exponents is larger than the other is quite similar and we just have to shift the significand of the smaller number appropriately. It’s then in principle not difficult to add support for more floating point operations, but it’s of course a non-trivial amount of work which is not suitable material for a blog entry.
The end?
Is there a moral to all of this? Not really: if we want to, then we can simulate low level concepts of programming languages in Prolog, and although it feels a bit weird, it’s not really that much work than in other programming languages. In the unlikely case that there’s a reader who’s interested in delving deeper in similar topics: thread carefully, and take the easy way out and represent bytes as lists instead of flat terms.
I’ve managed to cover more or less everything that I wanted in these four entries, and missing features are either very hard to implement (e.g., implementing support for assembly subroutines) or similar to existing concepts, and thus not terribly interesting. However, I don’t want to leave the safe confinement of the 80’s computer industry just yet, so I plan to write one additional entry, with a secret topic. Hint: it’s time for some alternate history: what if Bill Gates and Paul Allen felt the heat of the fifth generation computer project and in a flash of panick created a bizarre Frankenstein between Prolog and BASIC?