1.1 Background

This section covers the overall goals and architecture of LilyPond.


The art of music typography is called (plate) engraving. The term derives from the traditional process of music printing. Just a few decades ago, sheet music was made by cutting and stamping the music into a zinc or pewter plate in mirror image. The plate would be inked, the depressions caused by the cutting and stamping would hold ink. An image was formed by pressing paper to the plate. The stamping and cutting was completely done by hand. Making a correction was cumbersome, if possible at all, so the engraving had to be perfect in one go. Engraving was a highly specialized skill; a craftsman had to complete around five years of training before earning the title of master engraver, and another five years of experience were necessary to become truly skilled.

Nowadays, all newly printed music is produced with computers. This has obvious advantages; prints are cheaper to make, and editorial work can be delivered by email. Unfortunately, the pervasive use of computers has also decreased the graphical quality of scores. Computer printouts have a bland, mechanical look, which makes them unpleasant to play from.

The images below illustrate the difference between traditional engraving and typical computer output, and the third picture shows how LilyPond mimics the traditional look. The left picture shows a scan of a flat symbol from an edition published in 2000. The center depicts a symbol from a hand-engraved Bärenreiter edition of the same music. The left scan illustrates typical flaws of computer print: the staff lines are thin, the weight of the flat symbol matches the light lines and it has a straight layout with sharp corners. By contrast, the Bärenreiter flat has a bold, almost voluptuous rounded look. Our flat symbol is designed after, among others, this one. It is rounded, and its weight harmonizes with the thickness of our staff lines, which are also much thicker than lines in the computer edition.


Henle (2000)

Bärenreiter (1950)

LilyPond Feta font (2003)

In spacing, the distribution of space should reflect the durations between notes. However, many modern scores adhere to the durations with mathematical precision, which leads to poor results. In the next example a motive is printed twice: once using exact mathematical spacing, and once with corrections. Can you spot which fragment is which?

[image of music]

[image of music]

Each bar in the fragment only uses notes that are played in a constant rhythm. The spacing should reflect that. Unfortunately, the eye deceives us a little; not only does it notice the distance between note heads, it also takes into account the distance between consecutive stems. As a result, the notes of an up-stem/down-stem combination should be put farther apart, and the notes of a down-stem/up-stem combination should be put closer together, all depending on the combined vertical positions of the notes. The upper two measures are printed with this correction, the lower two measures without, forming down-stem/up-stem clumps of notes.

Musicians are usually more absorbed with performing than with studying the looks of a piece of music, so nitpicking about typographical details may seem academical. But it is not. In larger pieces with monotonous rhythms, spacing corrections lead to subtle variations in the layout of every line, giving each one a distinct visual signature. Without this signature all lines would look the same, and they become like a labyrinth. If a musician looks away once or has a lapse in concentration, the lines might lose their place on the page.

Similarly, the strong visual look of bold symbols on heavy staff lines stands out better when the music is far away from the reader, for example, if it is on a music stand. A careful distribution of white space allows music to be set very tightly without cluttering symbols together. The result minimizes the number of page turns, which is a great advantage.

This is a common characteristic of typography. Layout should be pretty, not only for its own sake, but especially because it helps the reader in her task. For performance material like sheet music, this is of double importance: musicians have a limited amount of attention. The less attention they need for reading, the more they can focus on playing the music. In other words, better typography translates to better performances.

These examples demonstrate that music typography is an art that is subtle and complex, and that producing it requires considerable expertise, which musicians usually do not have. LilyPond is our effort to bring the graphical excellence of hand-engraved music to the computer age, and make it available to normal musicians. We have tuned our algorithms, font-designs, and program settings to produce prints that match the quality of the old editions we love to see and love to play from.

Automated engraving

How do we go about implementing typography? If craftsmen need over ten years to become true masters, how could we simple hackers ever write a program to take over their jobs?

The answer is: we cannot. Typography relies on human judgment of appearance, so people cannot be replaced completely. However, much of the dull work can be automated. If LilyPond solves most of the common situations correctly, this will be a huge improvement over existing software. The remaining cases can be tuned by hand. Over the course of years, the software can be refined to do more and more things automatically, so manual overrides are less and less necessary.

When we started, we wrote the LilyPond program entirely in the C++ programming language; the program’s functionality was set in stone by the developers. That proved to be unsatisfactory for a number of reasons:

These problems have been addressed by integrating an interpreter for the Scheme programming language and rewriting parts of LilyPond in Scheme. The current formatting architecture is built around the notion of graphical objects, described by Scheme variables and functions. This architecture encompasses formatting rules, typographical style and individual formatting decisions. The user has direct access to most of these controls.

Scheme variables control layout decisions. For example, many graphical objects have a direction variable that encodes the choice between up and down (or left and right). Here you see two chords, with accents and arpeggios. In the first chord, the graphical objects have all directions down (or left). The second chord has all directions up (right).

[image of music]

The process of formatting a score consists of reading and writing the variables of graphical objects. Some variables have a preset value. For example, the thickness of many lines – a characteristic of typographical style – is a variable with a preset value. You are free to alter this value, giving your score a different typographical impression.

[image of music]

Formatting rules are also preset variables: each object has variables containing procedures. These procedures perform the actual formatting, and by substituting different ones, we can change the appearance of objects. In the following example, the rule which note head objects are used to produce their symbol is changed during the music fragment.

[image of music]

What symbols to engrave?

The formatting process decides where to place symbols. However, this can only be done once it is decided what symbols should be printed, in other words what notation to use.

Common music notation is a system of recording music that has evolved over the past 1000 years. The form that is now in common use dates from the early renaissance. Although the basic form (i.e., note heads on a 5-line staff) has not changed, the details still evolve to express the innovations of contemporary notation. Hence, it encompasses some 500 years of music. Its applications range from monophonic melodies to monstrous counterpoints for large orchestras.

How can we get a grip on such a many-headed beast, and force it into the confines of a computer program? Our solution is to break up the problem of notation (as opposed to engraving, i.e., typography) into digestible and programmable chunks: every type of symbol is handled by a separate module, a so-called plug-in. Each plug-in is completely modular and independent, so each can be developed and improved separately. Such plug-ins are called engravers, by analogy with craftsmen who translate musical ideas to graphic symbols.

In the following example, we see how we start out with a plug-in for note heads, the Note_heads_engraver.

[image of music]

Then a Staff_symbol_engraver adds the staff

[image of music]

the Clef_engraver defines a reference point for the staff

[image of music]

and the Stem_engraver adds stems.

[image of music]

The Stem_engraver is notified of any note head coming along. Every time one (or more, for a chord) note head is seen, a stem object is created and connected to the note head. By adding engravers for beams, slurs, accents, accidentals, bar lines, time signature, and key signature, we get a complete piece of notation.

[image of music]

This system works well for monophonic music, but what about polyphony? In polyphonic notation, many voices can share a staff.

[image of music]

In this situation, the accidentals and staff are shared, but the stems, slurs, beams, etc., are private to each voice. Hence, engravers should be grouped. The engravers for note heads, stems, slurs, etc., go into a group called ‘Voice context,’ while the engravers for key, accidental, bar, etc., go into a group called ‘Staff context.’ In the case of polyphony, a single Staff context contains more than one Voice context. Similarly, multiple Staff contexts can be put into a single Score context. The Score context is the top level notation context.

See also

Internals Reference: Contexts.

[image of music]

Music representation

Ideally, the input format for any high-level formatting system is an abstract description of the content. In this case, that would be the music itself. This poses a formidable problem: how can we define what music really is? Instead of trying to find an answer, we have reversed the question. We write a program capable of producing sheet music, and adjust the format to be as lean as possible. When the format can no longer be trimmed down, by definition we are left with content itself. Our program serves as a formal definition of a music document.

The syntax is also the user-interface for LilyPond, hence it is easy to type:

  c'4 d'8

to create a quarter note C1 (middle C) and an eighth note D1 (D above middle C).

[image of music]

On a microscopic scale, such syntax is easy to use. On a larger scale, syntax also needs structure. How else can you enter complex pieces like symphonies and operas? The structure is formed by the concept of music expressions: by combining small fragments of music into larger ones, more complex music can be expressed. For example


[image of music]

Simultaneous notes can be constructed by enclosing them with << and >>:

<<c4 d4 e4>>

[image of music]

This expression is put in sequence by enclosing it in curly braces { … }:

{ f4 <<c4 d4 e4>> }

[image of music]

The above is also an expression, and so it may be combined again with another simultaneous expression (a half note) using <<, \\, and >>:

<< g2 \\ { f4 <<c4 d4 e4>> } >>

[image of music]

Such recursive structures can be specified neatly and formally in a context-free grammar. The parsing code is also generated from this grammar. In other words, the syntax of LilyPond is clearly and unambiguously defined.

User-interfaces and syntax are what people see and deal with most. They are partly a matter of taste, and also subject of much discussion. Although discussions on taste do have their merit, they are not very productive. In the larger picture of LilyPond, the importance of input syntax is small: inventing neat syntax is easy, while writing decent formatting code is much harder. This is also illustrated by the line-counts for the respective components: parsing and representation take up less than 10% of the source code.

Example applications

We have written LilyPond as an experiment of how to condense the art of music engraving into a computer program. Thanks to all that hard work, the program can now be used to perform useful tasks. The simplest application is printing notes.

[image of music]

By adding chord names and lyrics we obtain a lead sheet.

[image of music]

Polyphonic notation and piano music can also be printed. The following example combines some more exotic constructs.

[image of music]

The fragments shown above have all been written by hand, but that is not a requirement. Since the formatting engine is mostly automatic, it can serve as an output means for other programs that manipulate music. For example, it can also be used to convert databases of musical fragments to images for use on websites and multimedia presentations.

This manual also shows an application: the input format is text, and can therefore be easily embedded in other text-based formats such as LaTeX, HTML, or in the case of this manual, Texinfo. By means of a special program, the input fragments can be replaced by music images in the resulting PDF or HTML output files. This makes it easy to mix music and text in documents.

Other languages: français, español, deutsch.

Learning Manual