Archive | May 2011

Prologomenon Tackles Logical Empiricism

Back, at last! This time we’ll have a look on a movement close to our hearts: logical empiricism. The subject is not really related to logic programming at all, but I did this text as an assignment in a Philosophy of Science course and figured that I might as well post it here in the case that someone finds it enjoyable.

Disclaimer one: It’s not particularly well-written and is probably full of errors and misunderstandings.
Disclaimer two: I did an automatic LaTeX to HTML conversion that apparently couldn’t handle footnotes and references. Since I’m a lazy and unmoral person I simply left them as is.

1. Introduction

Science is concerned with theories that make claims about the physical world. Hence it should not come as a great surprise that there exists a relation between facts and laws. Can universal laws be logically derived from observational facts? The explication of this relation is not a problem of science, but rather a philosophical one that has plagued philosophers for centuries, and is surprisingly difficult to answer without retorting to circular or ad hoc arguments. This text will give a brief introduction to two important movements in the history of the philosophy of science: empiricism\footnote{Empiricism is sometimes grouped together with positivism and inductivism.} and logical empiricism. The focus will be on one of the most important principles in the logicism movement: the distinction between analytic and synthetic sentences.

2. Empiricism

Empiricism takes a radical stance in the question of how science is related to facts, and asserts the importance and infallibility of sensory experience and observation. The ideal view is that the human mind is a blank tablet, a tabula rasa, and that knowledge can only be gained from experience. Hence, a scientist should primarily conduct research by observing the world and derive general conclusions from the observations. This methodology is called induction. \footnote{Not to be confused with the concept of mathematical induction.} The logical form is:

{property(a_0) \wedge property(a_1) \wedge ... \wedge property(a_n) \Rightarrow \forall x:\, property(x)}

Where we assume that {x} is restricted to a suitable domain. Let us say that we are interested in the question of whether all copper conducts electricity, and have successfully verified this property in {n} experiments. An application of the induction principle would then have the form:

{conducts\_electricity(copper\_sample_1)\, \wedge }

{conducts\_electricity(copper\_sample_2)\, \wedge }

{.}

{.}

{.}

{\wedge \, conducts\_electricity(copper\_sample_n)}

{\Rightarrow \forall x \in Copper:\, conducts\_electricity(x)}

Both the premises and the conclusion are true, yet the reasoning is not logically valid. Any finite number of affirmative observations cannot prove the necessity of the general hypothesis. This is known as the problem of induction and was elucidated by David Hume in the eighteenth century. At first it might appear to be a pedantic and irrelevant objection since scientists rarely claim to work with absolute knowledge. If the induction principle allows us to derive more and more probable hypotheses then that should be good enough. Here is the catch: for any set of observed facts there exists an infinite number of incongruent hypotheses. Hence we cannot claim that our particular hypothesis is probable in the usual mathematical sense of the word. This should of course not be taken as an indication that inductive testing is void of meaning, but rather that it is inherently difficult to assign a truth value to a scientific theory (that has not been falsified).

This is not the only objection to empiricism. Among other things it can be argued that too much weight is placed on sensory experience, which can hardly be considered to be realistic since almost all observations are dependent on an underlying theory. Furthermore, why should one demand that theories have to be derived by induction when the principle is not logically valid?

Further critique can be found in e.g. Chalmers \cite{Chalmers}.

3. Logical Empiricism

Logic made great strides in the late eighteenth century and the beginning of the nineteenth century. The German mathematician and philosopher Gottlob Frege developed the first formal system capable of higher order logic in the outstanding, but also rather impervious, Begriffsschrift \cite{Frege}. Frege’s aim was to show that mathematics, starting with number theory, did not depend on empirical experience but was true a priory and could be inferred from pure logic alone. That is, it is analytic instead of synthetic. The sentence There exists an infinite number of primes is to Frege an analytic proposition that is true only in virtue of its constituents. At the same time the sentence It will rain tomorrow is synthetic since there is no logical reason why it has to be true or false — it can be confirmed or falsified by observations.

We have learned that pure, strict empiricism fell short for a number of reasons. Does this mean that there is no difference between science and e.g. religion? Intuitively one would like to claim that the most important difference is that the domains of discourse are different. In the first case claims are made about the physical world. In the second case the claims are, at least partly, metaphysical in nature. If one adheres to this demarcation principle then it is clear that it is pivotal to filter out statements that are metaphysical in a scientific theory. This was one of the aims set forth by Alfred Jules Ayer in Language, Truth and Logic \cite{Ayer}, which later became the epitome of logical empiricism \cite{Stanford}. With the advances made in formal logic, it was believed that it would be possible to define a verification principle for metaphysical propositions. Just like Frege, Ayer made a distinction between analytic and synthetic propositions. For a verification principle to be successful, then, it needs to be capable of ascertaining whether a synthetic proposition is empirical. Any sentence that failed the verification principle would be labeled as metaphysical and was asserted to be absolutely void of any meaningful content. This is a harsh claim, and it is perhaps not too surprising that the definition of a verification principle is irksome and prone to errors. Ayer gave the following criterion \cite{Stanford}:

A (synthetic) sentence {A} has empiric content if either:

  • {A \Rightarrow O}, where {O} is an observational sentence.
  • There exists a {B} such that {A \wedge B \Rightarrow O}, where {O} is an observational sentence such that {B \nRightarrow O}.

The second criterion means that {A} can be used with another sentence, {B}, to derive an observational sentence that could not have been derived from {B} alone.

It can be argued that this principle is at least necessary, but is it sufficient? Not at all. Consider any sentence {A} and an observational sentence {O}. If it is possible to find a sentence {B} such that {A \wedge B} necessarily implies {O}, the principle fails. This is simple: take {B = A \Rightarrow O}. Then it can be shown by an application of modus ponens that {A \wedge B \Rightarrow O}.

Even though this simplistic attempt failed, it is not impossible that a more complicated version could do the trick. Of course, this hinges on the fact that there is a distinction between synthetic and analytic propositions. If it can be argued that no such distinction can be made, then we would simply have to abandon the idea behind Ayer’s verification principle. This was the approach taken by Willard Van Orman Quine in his seminal paper Two Dogmas of Empiricism \cite{Quine} \footnote{Where the second “dogma” is reductionism}. Quine argues that the current notion of analyticity is circular in the sense that it relies on another concept, synonymy, that cannot be explained without referring to analyticity. The argument is best illustrated with examples. One of the hallmark sentences in the analytic tradition is No unmarried man is married, which is true no matter what “man” and “married” refers to if we agree that the logical syntax remains fixed. Quine then puts forth a second category of analytic sentences that can be transformed into logically true — analytic — sentences by replacing synonyms with synonyms. For example: No bachelor is married. We have to agree that this sentence is not logically true, at least not in the same sense as the preceding sentence, but could be taken as analytic once bachelor is replaced with unmarried man.

But when are two words synonyms? A dictionary is of no help here since it only records what is already known; there is no explanation behind the stated facts. The reasoning that two words are synonyms if they are interchangeable in all contexts fares no better, since this does not apply to e.g. “bachelor” and “unmarried man” if one puts them in the context “… has fewer than 10 words”. Let us however for the sake of the argument ignore syntactic differences between words. Is this kind of interchangeability good enough to describe synonymy?

Assume that “bachelor” and “unmarried man” are interchangeable in the sense just described. Then by the statement Necessarily all and only bachelors are bachelors we can conclude that Necessarily all and only bachelors are unmarried men by substituting “bachelor” for “unmarried man”. But to claim this last statement is to claim that the statement All and only bachelors are unmarried men is analytic! Why? Because if a statement is necessarily true then it is analytic, and we are back where we started since we need the concept of analyticity to explain the concept of synonymy.

If one agrees with Quine’s reasoning then no sharp distinction can be made between synthetic and analytic propositions, which would leave most of the original logicism program in shambles.

4. Discussion

As far as I can tell there is obviously no logical flaw in Quine’s argument. I am however a bit skeptic to the fact that he is so quick to dismiss synonyms as definitions. Does it really matter why two words are synonyms? I see no inherent problem in simply accepting the fact that some words have evolved to mean the same thing. This would however have the consequence that it is impossible to say that a sentence is analytic without assuming a specific, underlying theory. But is this a problem? Most mathematicians would argue that the sentence {1 + 1 = 2} is analytic, but this is only true if we assume the usual definition of {1}, {+}, {=} and {2}; and would not be true if we interpreted {+} as another relation. A better example can be found in geometry: is the parallel postulate analytic? Yes, if we are working in Euclidian geometry. No, if we are working in a non-Euclidean geometry.

Hence I still think that we can make a distinction between synthetic and analytic propositions, but only with respect to an underlying theory; though I would suspect that this notion of analyticity is not strong enough to be of any use in the logicism program.


References

\bibitem{Chalmers} Alan Chalmers What Is This Thing Called Science. Hackett Pub Co, 1999.

\bibitem{Frege} Michael Potter and Tom Ricketts The Cambridge Companion to Frege. Cambridge University Press, 2010.

\bibitem{Ayer} Alfred Jules Ayer Language, Truth and Logic. Dover Publications; 2nd edition, 1952.

\bibitem{Stanford} Richard Creath Logical Empiricism. The Stanford Encyclopedia of Philosophy, Summer 2011 Edition.

\bibitem{Quine} Willard Van Orman Quine Two Dogmas of Empiricism. From a Logical Point of View, 1961.

Mullets, Moustaches and MicroPROLOG

There was more to the eighties than funny looking moustaches, mullets and screaming electrical guitars. I’m of course talking about the home computer revolution: the era when the end-user was more or less expected to have a solid understand of how the hardware and software worked. Some time ago I thought it would be a great idea to deepen my knowledge of this period, and did what any self-respecting man or woman would do in my situation: grow a moustache – ehr, I mean: read a book about it. The book in question is named The Personal Computer Handbook (Swedish: Allt du behöver veta om persondatorer), and is precisely the kind of book that I would normally scoff at. If it was written today that assessment would probably be correct, but it managed to surpass my wildest imaginations. Just look at this image from the second page, and you’ll probably understand why:

That’s the future right there! And the book keeps delivering. Instead of explaining how one uses a word processor it delves deep in technical details about electronics, peripherals, computer architectures, micro code, assembly code and finally high-level languages such as BASIC, LOGO, COBOL, LISP and finally the Prolog dialect microPROLOG. The text goes as follows (disclaimer: this is my translation from the Swedish book. The English version probably differs!):

At Imperial College in London, a team under the direction of Robert Kowalski has worked on a simplified version of PROLOG (called microPROLOG) intended to foster the children’s capability in logical thinking. This capability is not restricted to computers, but can also be applied in other areas such as mathematics, French, history and geographics.    

While LOGO’s Turtle Graphics is based on the computer’s ability to create images and figures, microPROLOG is focused on the computer’s ability to manipulate symbolic expressions. But Turtle Graphics has proved to be a such successful system, that it has been included in some versions of microPROLOG.

Simple PROLOG programs can be built from propositions that contains names of individuals and relations between them.

One first constructs atomic propositions (so-called because they are the simplest possible propositions.) For example: Anders is-neighbor-with Lisa. Lotta is-friend-with Nicke. Lisa is-fighting-with Anders.

One can also construct molecular (composite) propositions, for example: Maria likes Johan if Johan likes Maria and Johan is-nice-to Maria. Or: x is-larger-than y if x is-larger-than z and z is-larger-than y. One can ask questions to the computer that are based on these facts and relations. Writing microPROLOG programs is like writing simplified and logically clear English, and the children becomes excited by the fact that the computer understands a language so close to their own.

MicroPROLOG is just like BASIC an interactive language, where the pupil can add, remove and modify the program and immediately see the result of the modification. Individuals in simple programs can be replaced by variables or lists, for example: Johan has-favourite-groups (Police, Roxy Music, ABBA).

PROLOG can handle such lists by first dividing them into a head and a tail. The head is the first element in the list, and the tail is the rest. The head in Johan’s list is therefore “Police”, and the tail is everything else in the parenthesis. The same head-tail structure is also found in LISP and LOGO.

Recent studies show that education in BASIC programming learn children how they should use the computers of the seventies, while education in LOGO and microPROLOG prepares them for the computers of the eighties and the nineties.

I can’t help but be a little moved by such unbridled enthusiasm! What went wrong? When I went to middle-school they didn’t even teach us a Basic dialect! Perhaps it’s time to revitalize the grand plans of microPROLOG and create a new generation of Prolog hackers?

Finally, I feel obliged to post one final picture from the book. It’s referred to as The computer controlled home, and should speak for itself.

Man, the future is such a letdown!

(For the record, the next installment regarding computer generated music will probably be delayed for a few weeks due to an upcoming examination period.)

Prolog’s Makin’ Music – Part 1

Interlude

Gather around everyone, and I’ll tell the story of how I sold my soul to the binary devil.

It all began a dark and gloomy night. I’ve had one too many to drink – coffee, that is – and found it hard to concentrate on anything else than the splashing rain. The murky light played tricks on my eyes, or so I thought. Dangling contours everywhere. The buzzing monitor didn’t help either. I stretched out my back with a loud, cracking sound and tried to suppress a yawn.

“Do you want the power to create music from thin air?”

A voice from nowhere. Surely I hadn’t had that much to drink. I held up my keyboard like a club, cursing myself for getting rid of the IBM model M keyboard in favor of an ergonomic one, and slowly turned my head in the direction of the voice. If there was an intruder, I wouldn’t go down without a fight.

“Who’s there?”, I cried.

After a long silence the voice finally answered:

“Do you want to make a deal?”

“A deal?!” I blurted out, getting rather annoyed by his impudence.

“I shall grant your computer the gift of making music. All I ask in return is that your next blog entry contains some steamy, bit-on-bit action that somehow involves the WAV format. Also, I shall need your soul for all eternity.”

Having run out of ideas, I had no choice but to accept his offer.

“Sure! Wait, no!… Who are you?”

A manic laughter followed. He vanished in a hazy puff of smoke and left. All that remained was a chilly wind and a feeling that I had somehow been cheated.

Computer generated music

Now to the point: the goal of this and the following entries will be to create computer generated music in Prolog/Logtalk. That might sound (pun not intended – I can’t help it) like a tall order, but hopefully everything will become clearer once we’ve explicated some of the concepts in music theory. The outline is as follows:

  • Step 1 – Generate audio.
  • Step 2 – Generate tones from audio.
  • Step 3 – Generate melodies from tones, with a suitable formalism such as a cellular automata or an L-system.

Sound as oscillations

In order to generate music we first need to understand what sound is. Wikipedia says:

Sound is a mechanical wave that is an oscillation of pressure transmitted through a solid, liquid, or gas, composed of frequencies within the range of hearing and of a level sufficiently strong to be heard, or the sensation stimulated in organs of hearing by such vibrations.

Or to put it a bit more pragmatic: a sound is a series of frequencies. Of course, this is a bit too simplistic to be useful in practice. Among other things, we need to decide whether we’re interested in mono or stereo sound, how fine-grained each frequency should be and how fast they should be played.

So we have an idea of how sound should be represented. First we decide how it should be interpreted by the listener, and then we give out the actual frequencies. As one might suspect there exists a myriad of different formats for this purpose. One of the simplest is the WAV format, which we shall use in this project.

Writing to binary files

WAV is a binary format, and thus consists of a sequence of integers of varying sizes. Hence the first step is to learn how one writes to binary files in Prolog. The bad news is that there only exists one ISO primitive for this purpose: put\_byte/2, which is not sufficient since it only works for single byte, signed integers. The good news is that we can get it to do what we want with some low-level bit-fiddling. Here’s the operations that we’ll need in order to produce a fully functional WAV file:

  • Write 4 byte, unsigned integers in big endian format.
  • Write 4 byte, unsigned integers in little endian format.
  • Write 2 byte, unsigned integers in little endian format.
  • Write 2 byte, signed integers in little endian format.

It would be nice if we could handle this in a uniform way, so that the underlying details of how one should use put\_byte/2 can be postponed as far as possible. For this purpose we’ll introduce a data structure, word, that has the format:

word(Byte\_Count, Endian, Integer)

where Byte\_Count is either 2 or 4, Endian is either big or little, and Integer is a positive or negative integer. So to represent the number 135  in the little endian format we would use:

word(2, little, 135)

while the number 1350 in big endian format would represented as:

word(4, big, 1350)

Simple, but it might feel kind of weird to represent such a low-level concept in this way. In most imperative languages we would of course explicitly declare the data as either char, short, int and so on, but this is the best we can do in Prolog (unless we create necessary bindings for the host language and borrow some datatypes). Next, we’re going to define write\_word/2 that writes a word to a stream. Let’s focus on 2 byte integers for the moment. A first attempt might look like:

write_word(word(2, Endian, I), Stream) :-
    put_byte(Stream, I).

Alas, this only works for single byte integers. If we want to write 2 bytes, we need to extract the individual bytes from the integer and call put\_byte/2 two times. This can be done with shifting and the bitwise and-operation.

write_word(word(2, Endian, Bs), S) :-
    X1 is Bs >> 8,
    X2 is Bs /\ 0x00ff,
    (  Endian = big ->
       put_byte(S, X1),
       put_byte(S, X2)
    ;  put_byte(S, X2),
       put_byte(S, X1)
    ).

Note that we also check whether Endian is big, and if so output the bytes in reversed order. This works fine for positive numbers, but what about signed, negative numbers? Since put\_byte/2 only works with positive numbers, we need to convert the negative number into a positive number that is still negative with respect to that byte range. This is actually rather easy to do since we’re using two’s complement numbers: if the number is negative, then add  a number such that the sum is the two’s complement of the absolute value of the negative number. The code will make this easier to understand:

    write_word(word(2, Endian, Bs), S) :-
        Bs >= 0,
        X1 is Bs >> 8,
        X2 is Bs /\ 0x00ff,
        (  Endian = big ->
           put_byte(S, X1),
           put_byte(S, X2)
        ;  put_byte(S, X2),
           put_byte(S, X1)
        ).
    write_word(word(2, Endian, Bs), S) :-
        Bs < 0,
        Bs1 is Bs + 0xffff,
        write_word(word(2, Endian, Bs1), S).

(Thanks to Pierpaolo Bernardi who showed me this trick on the SWI-Prolog mailing list!)
Update: Richard O’Keefe also showed a simpler solution that doesn’t need the explicit positive/negative test. It’s left as an exercise to the reader!

The code for 4 byte integers is rather similar and hence omitted.

The WAV format

Now let’s focus on WAV. All my knowledge of the format stems from a single source (click for a useful, visual diagram). A WAV file consists of:

  • A header containing the string “RIFF”, the remaining chunk size and the string “WAVE”.
  • A format subchunk containing the string “fmt” (format), the remaining chunk size, the audio format, the number of channels, the sample rate, the byte rate, the block align and the number of bits that are used for each sample.
  • A data subchunk that contains the string “data”, the remaining size of the subchunk and finally the actual data (the samples).

Don’t worry if some of these terms are unfamiliar or confusing. It’s not necessary to understand all the details. We begin by defining the number of samples, the number of channels, the bits per sample and the sample rate as facts:

    num_samples(100000). %This value will of course differ depending on the audio data.
    num_channels(1). %Mono.
    bits_per_sample(16). %Implies that each sample is a 16 bit, signed integer.
    sample_rate(22050).

All the other values can be derived from these parameters. For simplicity we’re going to produce a list of words that are later written with the help of write\_word/2. This can be done in any number of ways, but DCG’s are fairly straightforward in this case. The RIFF chunk is first. It takes the size of the data chunk as argument since it is needed in order to produce the size of the remaining chunk.

    riff_chunk(Data_Chunk_Size) -->
        riff_string,
        chunk_size(Data_Chunk_Size),
        wave_string.

    riff_string --> [word(4, big, 0x52494646)].
    wave_string --> [word(4, big, 0x57415645)].

    chunk_size(Data_Chunk_Size) -->
        {Size is Data_Chunk_Size + 36}, % Magic constant!
        [word(4, little, Size)].

The end result will be a list of the form [word(4, big, 0x52494646), ...]. The format chunk follows the same basic structure:

fmt_chunk -->
    fmt_string,
    sub_chunk1_size,
    audio_format,
    number_of_channels,
    sample_rate,
    byte_rate,
    block_align,
    bits_per_sample.

fmt_string -->  [word(4, big, 0x666d7420)]. %"fmt".

sub_chunk1_size --> [word(4, little, 16)]. %16, for PCM.

audio_format --> [word(2, little, 1)]. %PCM.

number_of_channels -->
    [word(2, little, N)],
    {num_channels(N)}.

.
.
. % And so on for all the remaining stuff.

The remaining data chunk is even simpler:

data_chunk(Data_Chunk_Size) -->
    data_string,
    [word(4, little, Data_Chunk_Size)],
    test_data.

test_data --> ... %This should generate a list of samples.

And finally, we say that a WAV file consists of a riff chunk, an fmt chunk and a data chunk:

    wav_file -->
        {num_samples(N),
         bits_per_sample(BPS),
         num_channels(Cs),
         Data_Chunk_Size is N*BPS*Cs/8},
        riff_chunk(Data_Chunk_Size),
        fmt_chunk,
        data_chunk(Data_Chunk_Size).

It is used in the following way:

    output(File) :-
        open(File, write, S, [type(binary)]),
        %Call the DCG, get a list of words as result.
        phrase(wav_file, Data),
        %Write the list of words.
        write_data(Data, S),
        close(S).

    write_data([], _).
    write_data([B|Bs], S) :-
        write_word(B, S),
        write_data(Bs, S).

As test data, we’re going to generate a 440HZ sine wave.

    sine_wave(0) --> [].
    sine_wave(N) -->
        {N > 0,
        sample_rate(SR),
        N1 is N - 1,
        %% Standard concert pitch, 440 Hz.
        Freq is 440,
        ScaleFactor is 2*pi*Freq/SR,
        %% Needed since sin(X) returns an integer in [-1, 1], which
        %% is barely (if at all) perceivable by the human ear. The
        %% constant 32767 is used since we're dealing with 16 bit,
        %% signed integers, i.e. the range of the samples is [-32768,
        %% 32767].
        VolumeFactor is 32767,
        X is ScaleFactor*N,
        Sample0 is sin(X),
        %% Floor the sample. Otherwise we would end up with a floating
        %% point number, which is not allowed.
        Sample is floor(Sample0*VolumeFactor)},
        [word(2, little, Sample)],
        sine_wave(N1).

It’s not necessary to understand all the details, but the end result is a list of 2 byte words that represent a 440 HZ sine wave. You can listen to it here.

Summary

We’re now able to write samples to WAV files. These samples can represent any tone or sound, so in theory we already have everything that’s needed to generate music. But representing a tune as millions and millions of samples is not very user-friendly and would make it more or less impossible to automatically generate anything interesting. For that we’re going to need further abstractions, and among other things define a sound bank that contains some common tones.

Source code

The source code is available at https://gist.github.com/955626.