Editing Documentary Scenes
There are no rules, just guidelines, the below is a approach I have assembled from talking and working with some very experienced editors (special thanks to Mark Atkins who has been very generous with his time). The first thing that you need to do is decide what the purpose of the scene is (from a story standpoint). This is to enable you to decide what to include. Generally a good strategy is to cut sync first, unless you are dealing with actuality (will be dealt with elsewhere). I should say straight off the bat that with experience it may not be necessary to go through all off this process but for difficult scenes it may still be useful. A very experiences editor may just look through the rushes and edit in their head, grabbing what they know they will need.
The purpose of the scene is fundamental in deciding what to include. One you know this, you are asking two questions. Firstly, does it move the story forwards (in terms of the purpose of the scene) and secondly, does it do it with an element of emotion. I am not talking emotional breakdown or tears here, I am just saying does it has something of the person in it. This is why first hand testimonial by people who were ‘actually there’, works and someone simply expressing the views of others or recounting something they saw on TV, generally does not (unless it was something with an emotional element like seeing the Twin Towers collapse on television). At this point we are just collecting sync around the purpose of the scene (you could say you are collecting sync around ‘what the scene is about’ but there must be purpose it in, it must be something that can move the story forward). I personally don’t worry too much about how these will fit together at this stage, I am just collecting good sync. Cut out ALL stuff where people are going off topic and repeating themselves, be brutal, then be more brutal. When people repeat themselves you generally need the most concise version of what they are saying (unless longer version have a greater emotional content). If there are a couple of times when they say the same thing concisely (and with an emotional element) keep both, you can decide later which works better in the cut.
The second thing to do is decide the IN (and OUT) points. The expression’ turn up late and leave early comes to mind here. By choosing the most concise telling, you may have already done this but you are looking for stuff with punch. ‘It was a bad day, this was the worst day of my life’, you can loose the first bit. Starting with ‘this was the worst day of my life’ has more emotional punch and when you remove the first bit you do not lose any meaning. If it was ‘the worst day of my life’ the fact it was ‘a bad day’ goes without saying. What we are doing is stripping stuff down to its core, generally the shorter and more concise the better. The exception is a very emotional piece of sync, where you want to stay with the character, if there is enough emotion we can should not be brutal, let it run.
There may be a few possible IN points and at this point you do not need to decide. Remove the footage before the first and mark the others. Generally it is obvious and there is only one. By cutting out waffle and repetition you will probably already end up with a number of shortish clips each with an obvious IN.
You should also mark the out points. Again you are looking for something with punch.
‘What some Elvis impersonators do not realise is that the audience is affected by two things, the image of Elvis and their performance, so they get a bigger reaction than they normally would, even if they are crap, most performers realise this but you will see the ones who don’t’.
Ending with ‘even though they are crap’, is punchy. For OUTs it is best to mark them rather than chop off the end as you may need the the rest to make the scene flow.
The really important OUT is the one that ends the scene. You should keep an eye out for things that are good OUTs and have enough punch to carry audience over to next scene. Something that will get the audience thinking for a moment while we get into the next scene. A scene can be seen as a small story with a beginning middle and end. At the end there is often a bit of a pause (in the sync, with something visual to join the scenes). So ‘Arrive Late’ mean cut straight to the chase, start with a punch rather than the waffle before. ‘Leave Early’ means when you have made your point get out of there. A scene should have one purpose and point.
Putting the all sync into a single sequence can work well (or if it is a difficult scene and there is a lot a sequence for each character). Although I said don’t worry about the order of the sync, and making it flow, this does not mean that when putting it into a sequence you can’t try to place stuff in what seems like a logical order. Just don’t spend too much time stressing over it.
Now for the fun bit, making all the sync into a coherent whole. A very good approach can be to say what needs to be said using a number of characters bouncing off each other (although a scene with a single character can work) . You then get a number of voices telling the same story. If they are agreeing this will give what they are saying more power as you are getting several people to say the same thing, but you don’t want to get people to simply repeat each other (generally unless is it a very powerful point where having several people saying it is necessary).
This way you can be as brutal as you like and only start with what is great. If in doubt throw it out is a term often used, but the reality is harsher. If you have even the slightest bit of doubt that something should go, either because it is slightly off topic, a bit rambling/preparative or simply does not have enough emotional content it must go. This is what being brutal means. Being brutal is great but it may remove short pieces of sync that are needed to make the scene coherent/flow. A strategy here is to keep a sequence with the raw sync and bring back the minimum needed to make things work.
The weaving together sync can be illustrated by an example.
C1 protester, C2 local person, C3 journalist
C1 “police removed our placards. After that things got a little bit heated and before we had time to calm down we had half a dozen riot vans turn up”
C2 “on Thursday I was in a local cafe prying and went outside and saw loads of vehicles, about six ambulances and then loads of police vans turned up and we had no idea what was going on”
C3 “the police had information that they considered to be reliable that someone had a petrol bomb…”.
Intercutting has a number of advantages. Firstly it works visually, you can’t just chop a bit of an interview out and continue, since it jars. Secondly it helps keep audience attention, every-time you switch it provides a little jolt of energy. Thirdly it adds credibility; in this case there are three different types of people all telling the same story. This can be done where people are not agreeing, which creates an on-screen argument (possibility between people who have never met) or can be cut for humour. Maybe a number of people who have extremely different and idiotic views on something. Another thing that can work well is getting people to complete each others sentences, cutting on and, but or even or works well here.
The main thing you need to ensure is you have the rhythm of speech correct, close your eyes so you don’t get distracted by the visual side of things. Does it sound natural, are the pauses correct? This does take some practice and experience but you need to be able to do this well.
As well as intercutting there are some other ways of dealing with cuts (in the same character where it jars). The first is reframing. If the framing is significantly different (medium shot to close up) this can work. You can also exaggerate this (or if you have high enough resolution footage create it from scratch) by pushing in (magnifying clip enough to create a different shot size).
The second is cutting away and showing something else visual. I do not like the use of what has been refereed to as cutaways, a single shot to join an edit. Really bad examples of this (for me) is a shot of hands or the interviewer nodding (a nod shot). What I prefer is a short visual montage (of at least two shots, preferably more) that illustrates what is being said and tell the audience more, rather than just have a shots of what is being talked about. For the above example a sequence of shots of riot vans turning up and people reacting would work.
You also need to ensure is it is very simple and easy to follow. I once interviewed a editor and as he left he said ‘The one thing I have learned is it can never be too simple’ and this is very true. Remember people are seeing what you edit for the first time in real time. The challenge here is to put yourself in the audiences’ position. Trust your instincts, if anything seems not to be extremely simple it is too complex. This is not dumbing down, its it simply not confusing the audience. Ask yourself “would your Gran understand this” (no offence meant to you Gran, what it means is will everybody who may be watching understand). This is very difficult and it involves concentrations hard on what is said and actively assessing complexity.
Documentary, feature and even factual TV are all about portraying emotion, not delivering information. We are trying to get people excited about the world, emotionally engaged and entertaining them, not give them information. Books and Radio are much better at this. Even with campaign/activist video this is true. In this case you are trying to get people excited, or angry, enough to want to find out more, and they can then head to the internet, print or radio for more information.
Another way of doing this is with a sting, for example a boxer hitting someone with audio, but this must be sufficiently stylistic. White flashes can also work but generally when the contributor is getting excited and animated (2 frames of white works well). With this type of method the only limit is your imagination but make sure it is appropriate to the look and feel of the piece. Lastly don’t forget the power of jump cuts. For example if the character in in conflict with himself and having an on screen debate with himself this can add a nice edginess.
Once you have got the first cut refinement is an iterative process. Go through the sync and remove anything that can be removed without destroying the cohesiveness. If you can remove it and it still makes sense it generally should go (again the exceptions are to do with emotional power, you don’t want to remove this). Generally as you remove the unnecessary stuff, things become easier to understand. This is in fact he same technique to use when editing the written word, remove all that is unnecessary. You can also tighten up what is being said, making people sound more articulate. Removing pauses, ums etc. can also be done. This is all a matter of personal judgement and choice and does require the character not be in vision for the cuts (you need more montages to overlay).
This brings us to scene transitions. Again there are no rules, just things that tend to work. From a technical point of view, overlapping audio will smooth the transition, overhang the incoming and outgoing audio with a fade in/out can work well. We are talking about overlapping room tone/wildtrack here, you don’t generally overlap sync from one scene to another, although it can work. Overlapping music can also work well, either continuing the music from the end of one scene into the next or bringing music in from the next scene in a little early. You often want to show something visual (without sync) between scenes, a short montage maybe, or the director walking to a contributors house, something to create a breathing space between scenes.
It should be noted that breathing space (to get the pace correct) is also important within a scene. You dont want a wall of dialog. Cutting in visual elements, performance, action or even actuality all help. It is also not necessary to see the contributor in vision all the time. You could, for example, start with them in vision, bring in a performer (with their audio very low) then when the contributor has made a point, up the audio, have a bit of performance, then back to the contributor in vision for the next point. Look at the rhythm of what is being said and find good points to stop/start the sync. You should be weaving the elements together whilst varying your cutting. If the sync you have is placed just after the contributor is asked a question, you could have them in vision listening (for a short while) while the previous contributor finishes talking. This can tie things together nicely, giving the feel the contributors are actually listening to each other.
Lastly you have to decide if you want the interviewer’s questions in; this can be very useful but is ultimately a stylistic point. One trick here is to edit the questions to make them more concise, as the contributor is in vision you can edit the interviewers audio as much as you like, giving a very clear question (you can also re-record the questions but this can be tricky to get right so should be avoided if possible).