The development of the synchronization of sound and image (1995)

Early experiments in synchronization

On October 6th 1889 Thomas Edison was given a demonstration of a ‘talking picture” by William K.L. Dickson, by projection onto a four foot screen from a camera designed by Edison himself. The Edison camera was probably the first to create true motion picture photographs while the sound was provided by means of synchronized wax-type cylinder recordings. This is one of the first examples of synchronized sound and image, and in a letter to The Pittsburgh Press on September 20th 1896. Edison expressed his aims for the future of motion pictures – “In the year of 1887 the idea occurred to me that it was possible to devise an instrument which should do for the eve what the phonograph does for the ear and that by a combination of the two all motion and sound could be recorded and reproduced simultaneously” (“From Tinfoil To Stereo” p.278).

Dickson left the Edison Laboratories in 1894 to join one of the many competitors in the new motion picture field though around this time projection, and especially motion pictures were discontinued until technical advances could make commercialization possible.

In 1897 George W. Brown claimed to have invented a device for synchronizing the projector and the phonograph and in 1900 Gaumont received a patent on a sync motor method. Experiments in synchronization were few and far between at this time, until 1903, when Eugene Lauste demonstrated a method of producing sound from film by using photographed sound waves. This idea borrowed from Alexander Graham Bell’s experiments in projecting sound over a beam of light. Bell called this invention a Photophone and it was patented around 1884. The sound was created by projecting light through the film onto a selenium cell which affected the cell’s conductivity. These inventions prove that there were definite links between sound and image and the way in which the two were produced – even at this early stage. There were also links in the way both were exhibited. Films were shown by a travelling exhibitor who showed one set of films until they were worn out. This was also the way in which the first tinfoil phonograph was demonstrated to the general public.

By 1906. Dr. Lee DeForest had developed Lauste’s sound on film ideas to include a photo-electric cell and was experimenting with amplification achieved by use of his own three element vacuum tube.

Thomas Edison was probably the most influential inventor of film and cameras. in that most of the bootleg inventions that were created (due to a lack of proper patents for Edison’s own inventions) used concepts that were made standard by Edison such as the width of the film used (35mm) and the four-notches-per-frame design of the ratchet feed device. Though Edison was a pioneer in the field of both sound and image, he never achieved an effective combination of the two. His final invention before giving up the attempted synchronization of sound and image was the Kinetophone which he demonstrated in various cities around the USA in 1913. This device used a 5.5 inch diameter celluloid cylinder record (similar to the earlier blue amberol record). An amplifier was used to increase the volume, and synchronization was achieved by use of a pulley system connecting the projector at the back of the theatre with the mandrel of the phonograph behind the screen at the opposite end of the theatre. The mechanism was difficult to handle, and as a result, sound often moved out of sync with the images and breaks in the film were not always spliced to achieve an effective flow of images to accompany the sound. The main problem, practically, was a lack of precise synchronization between the speeds of the film and of the cylinder record, and. artistically, a lack of sound during the parts of the film where there would be breaks in the images.

In 1919 Theodore W. Case was granted the first of a series of patents based on experiments into the use of a tiny mirror attached to a diaphragm vibrated by the sound waves. This led to the method of registering sound simultaneously with the images in the margin of the same film – the soundtrack concept which is still in use today. This method was not received well by the general public who favoured instead the synchronized record technique. It was also a cheaper method for the theatres, rather than having to install expensive equipment to “read” the soundtrack off the film which at such an early stage would produce an unreliable quality of sound.

Synchronization in early film

By the turn of the 1920s the American movie industry had firmly based itself in Hollywood. and up until the birth of the silent movie as an art-form, synchronization of sound and image had only been of scientific interest. The first known use of music with film as an artistic concept probably occurred on December 28th 1895 when the Lumiere brothers tested the commercial value of their early films at the Grand Cafe on the Boulevard des Capucines in Paris. This screening consisted of a piano accompaniment and by the time the screenings reached Britain in February 1896 a harmonium was being used for accompaniment.

The function of a musical accompaniment to a silent film was probably twofold:

  1. practically, to drown out the noise generated by the mechanisms of the early projectors which were large and noisy.
  2. aesthetically, to give a film “auditory accentuation and profundity” (Jack London. taken from Prendergast, p.5).

The best early film producers extended this aesthetic by requiring specific music that would complement their films. The first original score was composed by Saint- Saens in Paris in 1908 and this was probably the last original score for some time because of increased production costs.

A year later, in 1909, the Edison Film Company recommended ‘specific suggestions for music” (Prendergast. p.6) to accompany their films. By 1913, manuscripts were being published that contained musical examples of different moods that could be used to accompany different dramatic situations. These collections included “The Sam Fox Moving Picture Music Volumes” by J. S. Zamecnik. in 1913. Erno Rapee’s “Motion Picture Manual for Piano and Organ” and one of the most popular. Guiseppe Becce’s ” Kinobibliothek” (or ” Kinothek”), published in Berlin in 1919. These collections were successful in that they classified their musical examples according to mood and style. and allowed accompanists to musically reflect what the audience was watching.

The more advanced filmmakers attempted to construct cue sheets, another early attempt at synchronizing image and live sound. A cue sheet would specify what was to be played (e.g. “Minuet No-2 in G” by Beethoven), how long the piece was to be played for (90 seconds) and often included a visual cue (until title on screen, “Follow me dear”). This system was devised by Max Winkler who was one of the first people to catalogue music for silent film in the USA. He was eventually employed by Universal Films to provide cue sheets to all their films.

As the size of the accompanying orchestra for movies shown in theatres grew, more problems arose with synchronization. It was fairly easy for a lone pianist or organist to watch the screen for visual cues while he/she played but it was more difficult for an orchestra. This meant that detailed scores had to he written, but this was both time-consuming and expensive, especially considering the incredible rate at which films were being produced by some companies in the first decade of the 20th Century. One way in which small theatres avoided the expense of a large orchestra was by buying one of the many musical accompaniment machines that were on the market at the turn of the 1910s. These machines were known as “One Man Motion Picture Orchestra”, “Film Player”, ” Movieodeon” or “Pipe Organ Orchestra”. These machines could be worked by a single musician. and some, such as the American Photo Player Company’s ” Fotoplayer Style 50″ could even play certain rural and urban sound effects (such as street noises. fires crackling, cattle lowing etc.).

There were still problems with synchronization, even with scores specifically composed for a film. One of the most successful devices invented to synchronize a live orchestra with moving images was the Rhythmonome. invented by Carl Robert Blum and displayed to the public for the first time in Berlin in 1926. Kurt London describes the workings of this device in his book “Film Music” – “Tapes registering the ” phonorhythmical” signs run within the instrument in such a way that they pass a sight index from left to right… The sound can then be reproduced in the original rhythm, as the sight index allows it to be read off in exact timing…” ( from Prendergast, pp.12-3). The instrument was placed on the conductor’s stand and could be synchronized with the projector, the speeds of both devices being adjustable by use of a “musical chronometer”. By watching the rhythmonome, the conductor could cue the orchestra as the corresponding notes ran past the sight index.

Other, less successful devices were developed to synchronize live music with moving image. These included a picture of a conductor on a screen in front of the orchestral conductor (though not all conductors could successfully follow the guide conductor). and an abbreviated score that appeared at the bottom of the screen to guide the orchestra (though this tended to spoil the audience’s enjoyment of the film).

The advent of the sound film was facilitated by improvements in recording technology that grew out of the developments in radio and other technologies developed during the War years, as well as devices developed for commercial use by the record companies.

Around 1926, the Bell Telephone Company discovered that, with slight modifications, their sound recording, techniques, which had been accepted by the phonographic industry, could be applied to the sound film. Furthermore, these techniques offered significantly improved acoustic and sound quality. This was achieved by reducing the recording, speed of their discs from 78 to 33.33 rpms for extended playing time, and the use of larger discs for the storage of a greater amount of music. Synchronization of record and projector was achieved by use of synchronous motors (as pioneered by Gaumont in 1897), the starting, points of both film and record clearly marked for exact synchronization. Film and projector technology had improved sufficiently by this time to avoid many long, cuts or pauses in the film. This device was called the Vitaphone and was taken up by Warner Bros. for experimentation to iron out the problems, before developing their own sound films.

The first full length sound film produced by Warner Bros. was “The Jazz Singer” starring AI Jolson and premiered on October 6th 1927. Before this, films had been made to accustomize audiences to synchronized dialogue and music. One of these films was “Don Juan” which was specially provided with a symphonic score on a set of Vitaphone discs, and first shown in February 1927.

There was no re-recording in the early sound film, so many soundtracks would be recorded “on set” just out of the camera shot. This made editing difficult, since the film could not be cut without spoiling the soundtrack’s continuity. With improving recording technology, new designs for coil microphones and the eventual development of multitrack recording, the recording and synchronization of soundtracks became a Hollywood standard. Without the need to worry about technological pitfalls, filmmakers and composers could now concentrate on synchronizing music and the moving image from an aesthetic point of view, writing into the score effects that would add to the visual drama. or to recall the words of Jack London – to live “auditory accentuation”.

Contemporary techniques

Once a film has been completed by the director, the composer will view the film at least twice – once in an “initial screening”, then in more detail, scene by scene when the composer will decide where the music for each scene will begin and end. This is known as “spotting” the film and this meeting will include the film’s producer, director, composer and music editor. From “spotting notes” the music director will write “timing breakdown notes” for the composer. This breakdown is a detailed description of each scene that requires music

As soon as the breakdown has been compiled, the composer will “lay out” the score onto manuscript paper, mapping out meters and barlines in a beat by beat layout of the music that is to be composed. Tempos and number of beats need to be calculated if the music is to stay in exact synchronization with the images. These calculations are determined by use of a “click track”, which is a series of steady ticks that can be adjusted from 40 to 208 beats per minute. The click track is therefore a form of metronome that allows the composer to decide how fast or how slow the music for a particular scene should be. Prendergast defines the click track as “a synchronous metronome that is locked to the picture, thus enabling the music to stay in synch with the picture” (p.263).

Music is recorded on tape known as “magnetic track” and is coated in an oxide surface (like normal audio or video cassettes), though it looks like normal 35mm film. Magnetic track has the same number of sprocket holes as normal film (four per frame) and is passed over the sound head at the same rate as the film is passed through the projector. Click tracks are usually recorded onto opaque leader tape, and holes are punched into it – holes that would produce a popping sound when passed over the head of an audio film machine.

To determine the click equivalent of any tempo, the composer will start a stopwatch at the beginning of his piece of music. He/she will stop the watch on the 25th beat of the music when the reading will be the same, numerically, as the equivalent click. The difference is that the click tracks are read in eighths. so a stopwatch reading of 10.5 seconds will have a click equivalent of 10 4’8. 14.5 seconds will be read as 14 2/8 etc.

Click track numbers give information as to the number of frames per click, which is inversely proportional to the standard metronome markings (which gives beats per minute). Since there are 1440 frames per minute (42 X 60), the metronome markings can be discovered by dividing the number of frames per minute by the number of frames per click.

e.g. 12 clicks per frame = 1440/12 = 120bpm.

10 4/8 clicks per frame = 1400/10.5 = 137.14bpm

The main advantage of a click track is that it cuts down the time taken to compose a soundtrack and helps the conductor of the orchestra pinpoint specific cues during the recording stage. The disadvantage of using a click track is that it allows little flexibility for the composer when the tempo is locked to a specific click. By using a variable click track it is possible to include musical devices such as accelerandos and ritardandos which may help make the music more expressive. This will entail more calculations on the composer’s (or music editor’s) part but computer programs have been developed to do this. To be sure of total accuracy and precision with the use of clicks, there exist “click books” which contain tables of timings in both clicks and beats per minute. The first click book was assembled by the music editor Carroll Knudson in 1965, and the click book is often known as the Knudson book.

Click tracks aren’t always used in film composition. A technique known as “free timing” makes use of “streamers” or “punches” for synchronization. Streamers are created by scratching the emulsion off a film in a diagonal line. This line usually lasts around two seconds and covers three feet of film. Slower tempos may require longer cues of up to three or four seconds. When the film is projected, the streamer will appear as a vertical line moving across the screen from left to right, and the composer’s cue point will come when the streamer reaches the right hand side of the screen. Streamers are usually notated above the score as a circle. Another conducting aid used to aid synchronization are ”punches”, notated in the score as + or – superimposed over a circle, and are multiple holes punched in the film with a standard paper punch (or added electronically to videotape) that produce a series of flashing light pulses on the screen. Again these are cue points for the composer and for the conductor during the recording stage, and can be seen without looking at the screen. The main advantage of using a streamer is that it allows the conductor a few seconds to anticipate beats. Streamers and punches are often used to help the conductor synch the music without the use of clicks, and are at their most useful when used in combination with clicks. e.g. in rubato or a tempo sections.

A different kind of synchronization will be required for a practise that is known as “tracking” – i.e. adding an existing piece of music to a film as a “temporary soundtrack” until the original score is ready. The main reason why filmmakers use these temp tracks include

i ) to improve screenings to producers. Studios, network executives and preview audiences during various stages of postproduction. This helps the filmmaker avoid screening the film without a score. which tends to make a film less attractive to prospective buyers.

ii) to help find a “concept” for the score. The composer, director or music editor will often find inspiration for the score by using existing pieces of music inserted into different scenes.

Using temp tracks brings about its own synchronization problems in that the music is not composed with the visuals in mind, so the pre-existing music has to be edited to suitably fit the scene in which it is being used. This is a tedious job and is often left to the music editor.

One of the final stages of recording and synchronization is the dubbing process, which blends together dialogue, sound effects and music. The process begins with predubbing, which will often involve the composer, and is the stage where the sound engineers will become aquainted with the music. The actual dubbing stage can often take two weeks or more for a big-budget movie, while the predubbing will take anything between a few days to a few weeks. The process usually takes place on a dubbing stage specially built for the purpose of dubbing, and will usually involve at least three individuals, who serve as “mixers”, sitting at a mixing desk in front of a large motion picture screen. Each mixer handles a different aspect of the sound and they will be known as the effects mixer, and the music mixer, both of whom will be taking orders from the dialogue mixer.


Synchronization techniques differ yet again when considering video postproduction. in that the technology used is electronic rather than mechanical, as is the case with film. Video postproduction began around 1956, following the development of the first videotape machine. Videotape benefited both television and film composers in its versatility. All television programmes were previously recorded live. so had to rely on whatever sound effects or music that could be created in the studio. The advent of videotape meant that producers could both add sound effects and music as required. and edit programmes to achieve a smoother, more professional effect.

Videotape’s main advantage is that it is an electronic medium. and is therefore easily manipulated by computers. This saves on the large labour costs that are often associated with film editing and postproduction. Preparing a picture (stream and punch and calculating various click starts) for a scoring session for a one-hour television programme being done on film may take a music editor six to eight hours. The same show, done on videotape and using the ” VideoScore” system (a computer program designed by The ‘Music Design Group editing company) would probably take no more than 30 minutes.

The technology available to aid video post-production was developed by a number of different companies. co-operating over several years to create a network of services that were technologically compatible. The Neiman-Tiller Associates developed the ACCESS system in the mid-1970s – a video-based sound effects digital editor, but the “PAP” system, developed around the same time by Glen Glenn Sound became the main system used.

The Music Design Group pursued the development of music editing techniques at a time when most companies were concerned with sound effects, dialogue replacement and mixing etc. The systems developed by the Music Design Group fall into two areas:

  1. pre-scoring and scoring
  2. assembling of tracks for the dubbing session.

The sound for videotape is recorded onto audio recording tape (either two-inch 24 track or half-inch 4 track tape depending on the number of tracks needed). Sound and image are synchronized by use of the SMPTE time-code. This was developed by the Society of Motion Picture and Television Engineers and is a standardized timecode. SMPTE is a set of numbers, consisting of hours, minutes, seconds and frames (video runs at 30 frames per second as opposed to film’s 24) which can be laid down onto tape as an audio signal or an image. A timecode number may read 02:01:38:27 when read left to right would mean that our position in the duration of the film would be two hours, one minute, 38 seconds and 27 frames.

SMPTE’s audio signal is used to link up different audio equipment, the timecode being standard to them all. These different machines may play music, or sound effects etc. and they work by following the timecode of the videotape machine and their audio tracks move in synchronization with the image, the audio tracks being read by synchronizers such as “Q-Lock” or “LYNX” devices.

As with film, breakdown notes have to made, so The Music Design Group developed a computer-based system known as VideoScore which did this work digitally. The main advantage of this system is that any changes made to the length of the music cues by the director extending or cutting scenes, means that the music editor does not have to recalculate the entire set of breakdown notes.

SMPTE is used again at the dubbing stage. The soundtrack is recorded with identical timecode to the one on the picture, and then synchronized with the other audio elements, such as sound effects. The most common use of audio tape so far has been the 4-track. but the advent of stereo television and Dolby SurroundSound will probably lead to to a demand for more audio tracks. This is the next stage for the synchronization of sound and image.

Further reading