Prolog’s Makin’ Music – Part 2

Scales, scales and scales

It’s time to bring on the noise! To recapitulate the story thus far, it suffices to say that we’re now able to write raw audio data in the WAV format after some mysterious bit-fiddling. As already mentioned we could in principle start to crank out tunes in this stage, but to illustrate why this isn’t a good idea I have prepared a small sample file containing 1 million randomly generated samples in the range [-32768, 32767]. It is perhaps also instructive to see a visual representation of the samples. Here’s the results:

Soundcloud link.
As expected the result is garbled noise. There’s obviously no discernible structure to speak of since the samples are literally speaking all over the place. We want something more harmonic and symmetrical that we can actually work with. Why does classical instruments (excluding drums, cymbals and other unpitched instruments) have such nice acoustic properties? To make a long story short, many instruments produce sounds with the help of vibrating strings – oscillations at different frequencies, e.g. a sine wave. Different frequencies give us different tones. In e.g. a piano the keys to the right have higher frequencies than those to the left. Hence, to construct something akin to an instrument we need a system of  frequencies and a function that can generate the corresponding waves. Obviously these problems have already been solved many times over in the history of music theory, and it would be ignorant to not take advantage of this. Let’s start with the first problem of finding a system of frequencies, a scale. This is actually harder than expected. We know that the scale should go from lower to higher frequencies and that there at the very least should exist some relationship between them. A first attempt might be to start the scale at an arbitrary frequency, e.g. 500, and for every new frequency add 50. This would result in a scale where the difference between any two adjacent frequencies is constant, or in other words linear. With 12 frequencies we would obtain:

0 500
1 550
2 600
3 650
4 700
5 750
6 800
7 850
8 900
9 950
10 1000
11 1050

Our first scale! The first number in the column, the identification number, is called a note (to be more precise a note also needs a duration). Hence the purpose of the scale is to give frequencies to notes. The traditional notation (pun not intended) for a series of 12 notes is A, A\sharp, B, C, C\sharp, D, D\sharp, E, F, F\sharp, G, G\sharp, where the notes with funny looking sharp signs correspond to the small, black keys on a piano (so-called “accidentals”). For simplicity we’ll use the numeric notation though. The next question is how this scale sounds when it is played in succession.

Linear scale

Perhaps surprisingly, it sounds pretty terrible even though it’s not that simple to say why. Wrong starting frequency? Wrong increment? Well, maybe, but that’s not the real problem, namely that as the frequencies increase the perceived difference, the distance, gets smaller and smaller which results in an unsymmetrical scale. Hence we want a scale where the distance between any two adjacent frequencies is a constant. This is known as equal temperament. To be more precise the distance doesn’t have to be a constant, but it have to be a multiple of the smallest possible step in the scale. For example we could have a scale where the distance between the first and second frequency is 1.5, but where the distance between the second and third frequency is 1.5 \times 2 = 3.

With this in mind it’s not to hard to create a new scale. The frequency of the N:th note is then Start * Ratio^{N-1}, where Start and Ratio are constants. For Start = 500 and Ratio = 1.06978 we get the scale:

0 500.0
1 534.89
2 572.2
3 612.1
4 654.9
5 700.6
6 749.4
7 801.7
8 857.7
9 917.5
10 981.6
11 1050.0

Equal temperament scale

To be honest it still doesn’t sound very good, but at least it’s a step in the right direction. Somehow it doesn’t provide enough closure, and if we were to extend it even further the new notes wouldn’t really relate to the old notes in a natural way (what is “natural” is of course biased by experience and tradition). Here’s an idea: what if the extended frequencies were (approximately) multiples of the first 12 frequencies? That is F_{12} \approx 2 \times F_0, F_{13} \approx 2 \times N_1 and so on. It’s not too hard to derive such a constant. Let x be the constant. Then F_{12} = 2\times F_0 = F_0 \times x ^{12} \Leftrightarrow \frac{2\times F_0}{F_0} = x^{12} \Rightarrow 2^{1/12} = x. Hence the general formula is F_n = F_0 \times x^{n} = F_0 \times (2^{1/12})^n = F_0 \times 2^{n/12}.

If the starting frequency is 500 we then get the scale:

0 500.0
1 529.7
2 561.2
3 594.6
4 629.0
5 667.4
6 707.1
7 749.2
8 793.7
9 840.9
10 890.9
11 943.9

Equal temperament scale – second attempt

It’s hard to notice the difference in the first few notes, but in the end of the scale the difference gets more and more pronounced. Now we have something quite close to what’s actually used in the real world. The only difference is the starting frequency, which is usually 440 Hz, the so-called standard concert pitch. This value is somewhat arbitrary, but just for reference here’s what we get:

0 440.0
1 466.2
2 493.9
3 523.3
4 554.4
5 587.3
6 622.3
7 659.3
8 698.5
9 739.9
10 783.9
11 830.6

Chromatic scale

Fortunately it’s rather easy to implement scales once we have the theory behind us. There are two basic choices for the representation: either  we work with the raw frequencies in the scale, or we work with the notes and extract the frequencies when needed. I shall go with the second option since it’s often easier to work with notes. Interestingly enough, the chromatic 12 tone scale that we just used is an example of an abelian (commutative) group with 0 as the unit element, which means that it’s quite pleasant to work with. The basic operations that we want to perform are:

  • raise/2 – get the next note in the scale.
  • lower/2 – get the preceding note in the scale.
  • add/3 – add two notes in the scale.
  • length/1 – get the number of notes in the scale.
  • nth/2 – get the n:th note in the scale, starting from 0.
  • frequency/2 – get the associated frequency of the note.

Which is easily expressible in terms of a protocol:

:- protocol(scalep).

   :- public(raise/2).
   :- public(lower/2).
   :- public(add/3).
   :- public(nth/2).
   :- public(length/1).
   :- public(frequency/2).

:- end_protocol.

And to implement the chromatic scale is straightforward:

:- object(chromatic_scale,
        implements(scalep)).

    %A, A#, ..., G, G#.
    length(12).

    raise(N, N1) :-
        N1 is (N + 1) mod 12.

    lower(N, N1) :-
        N1 is (N - 1) mod 12.

    add(N1, N2, N3) :-
        N3 is (N1 + N2) mod 12.

    nth(I, I) :-
        % Used so that we can call nth/2 with uninstantiated
        % arguments.
        between(1, 12, I).

    %A4 to G#5.
    frequency(N, F) :-
        F is 440 * 2 ** (N/12).

:- end_object.

Extending this scale to use more than 12 elements would of course not be hard either. Just to show something different we’re also going to implement the C major scale. It contains the frequencies:

0 523.3
1 587.3
2 659.3
3 698.5
4 783.9
5 880
6 987.8

The C major scale

It’s slightly harder to implement than the chromatic scale since the distances between adjacent notes is not constant. The distance between any two adjacent notes is either a half step (the distance between two adjacent notes in the chromatic scale) or two half steps. If we then represent each note with its distance from the first note we get:

0 0
1 2
2 4
3 5
4 7
5 9
6 11

Don’t worry if these specific distances doesn’t make any sense to you. But they are not completely arbitrary; each note in C major corresponds to a white key on the piano, and is actually the only major scale that only makes use the white keys. Since we are now counting half-steps we can more or less use the same formula as in the chromatic scale for calculating frequencies.

:- object(c_major,
        implements(scalep)).

    nth(0, 0).
    nth(1, 2).
    nth(2, 4).
    nth(3, 5).
    nth(4, 7).
    nth(5, 9).
    nth(6, 11).

    raise(N, N1) :-
        nth(I1, N),
        I2 is ((I1 + 1) mod 7),
        nth(I2, N1).

    lower(N, N1) :-
        nth(I1, N),
        I2 is ((I1 - 1) mod 7),
        nth(I2, N1).

    % As far as I know, this is the only way to make sense of addition
    % in C major. Simply adding the distance from the tonic doesn't work
    % since that makes it possible to get notes outside the scale.
    add(N1, N2, N3) :-
        nth(I1, N1),
        nth(I2, N2)
        I3 is ((I1 + I2) mod 7),
        nth(I3, N3).

    % C, D, E, F, G, A, B.
    length(7).

    %C5 to B5.
    frequency(N, F) :-
        F is 440 * 2 ** ((N + 3)/12).

:- end_object.

The synthesizer

Whenever we’re going to generate music we’re going to use a specific scale in order to get a linear sequence of notes (since we don’t use chords). From the notes we get a series of frequencies. But to actually produce something that is nice to listen to we need something more. To play e.g. the standard concert pitch at 440 Hz we’re going to generate a wave with 440 oscillations per second. How we generate this wave determines how the note will be played. A sine wave will give a smooth sound while a sawtooth wave will give something reminiscent of a mechanical dentist drill. To create more complex sounds a technique known as additive synthesis can be used. We shall however not peruse this option at the moment.

Our synthesizer will take 3 input arguments: the frequency, the duration and the filter that shall be applied, and returns a list of samples in its single output argument. From the duration it’s possible to calculate how many samples that we’ll need to generate with the help of the sample rate. For example, if the duration is 0.5 seconds and the sample rate is 22050 the number of samples is 0.5 \times 22050 = 11025. The wave will be generated with a loop from 0 to Number\_Of\_Samples where the following operations are performed on each sample:

  • Divide the sample by the sample rate, so that we get the correct resolution. A high sample rate means that we’ll generate more points on the wave.
  • Calculate the angular frequency of the sample, i.e. \omega = 2\pi F, where F is the frequency.
  • Apply the filter. The filter should return a floating point number in [-1, 1].
  • Scale the sample in [-1, 1] with a volume factor so that we get samples in the full sample space.
This can actually be done in rather few lines of code. Without further ado I present to you:
:- object(synthesizer).

    :- public(samples/4).
    :- public(sample_rate/1).
    :- public(bits_per_sample/1).

    :- private(filter/3).
    :- private(volume/3).
    :- private(wave//3).

    bits_per_sample(16).
    sample_rate(22050).

    samples(Frequency, Duration, Filter, Samples) :-
        sample_rate(SR),
        N is floor(SR * Duration),
        phrase(wave(N, Frequency, Filter), Samples).

    %% We could have implemented this as higher order predicates
    %% instead, but the performance loss would not have been worth it
    %% since the filter might be applied to millions of samples.
    filter(sine, Sample0, Sample) :-
        Sample is sin(Sample0).
    filter(sawtooth, Sample0, Sample) :-
        Sample is Sample0 - floor(Sample0).
    filter(triangle, Sample0, Sample) :-
        Sample is -((acos(sin(Sample0)) / pi - 0.5)*2).

    volume(M, N, V) :-
        bits_per_sample(BPS),
        V0 is (2**BPS)/2 - 1,
        %% Decrease the volume over time.
        Percent is (M/N)/2,
        V is V0*(1 - Percent).

    wave(N, Freq, F) --> wave(0, N, Freq, F).
    wave(M, N, _, _) --> {M > N}, [].
    wave(M, N, Freq, F) -->
        {M =< N,
        sample_rate(SR),
        M1 is M + 1,
        volume(M, N, V),
        X is (2*pi*Freq)*M/SR,
        filter(F, X, Sample0),
        Sample is floor(Sample0*V)},
        [word(2, little, Sample)],
        wave(M1, N, Freq, F).

:- end_object.

Putting everything together

Somehow we’ve come this far without a suitable name for the project. I’ll name it Xenakis in honor of the Greek-French music theorist, composer and architect Iannis Xenakis. You can listen to one of his most famous pieces here (warning: it’s rather frightening).

Using the components just described is not hard. First one generates a list of frequencies in a scale, that is then used as input to the synthesizer which gives a list of samples which is written to a WAV file.

:- object(xenakis).

   :- public(init/0).

   init :-
       %% N is the number of samples.
       generate_notes(Ts, N),
       wav::prepare(output, N),
       write_samples(Ts).

   %% Generate the frequencies in the C major scale. Each note has a
   %% duration of 0.5 seconds.
   generate_notes(Ts, N) :-
       Scale = c_major,
       findall(F-0.5,
               (Scale::nth(_, Note),
                Scale::frequency(Note, F)),
               Ts),
       Scale::length(L),
       synthesizer::sample_rate(SR),
       N is L*SR/2.

   %% Write the notes to 'output'.
   write_samples([]).
   write_samples([F-D|Fs]) :-
        synthesizer::samples(F, D, sine, Samples),
        wav::write_audio(output, Samples),
        write_samples(Fs).

:- end_object.

All the scales that are available on Soundcloud were of course generated using this method. We now have a good foundation for the next installment where we at last will look at methods for automatically generating note sequences.

Source code

The source code is available at https://gist.github.com/1007820.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: