To understand virtual studios in their current form it is helpful to understand how the needs of content makers helped advance the technology that preceded them. By understanding how we got to where we are today, we will also be able to make some informed predictions of where things are likely to go in the future.
We will look at the types of applications virtual studios are best suited for, and the alternatives.
As each reader is likely to have different levels of knowledge, this document has been structured chronologically so that you can jump forward to a section based on your needs.
Virtual sets are often judged and compared to their physical counterparts, so it makes sense to start by looking at them first.
Since the early days of cinema, filmmakers have found ways to use artificial backdrops to help make their stories easier to film. It was not always practical to travel to locations depicted in the stories they were telling – especially in harsh or dangerous environments.
As of writing, the majority of television entertainment studio shows continue to use physically built sets. They help to create a unique identity for a show and a visually exciting setting in which the production can take place.
Not every background of every shot needs to be built to scale. Filmmakers found they could create convincing sets, locations, and objects by using small model miniatures to create the illusion of large objects further away.
Whilst the use of miniatures was common for drama and films, their cost meant they were much rarer in television entertainment, which instead formed its own language with sets that did not need to be so ‘literal’.
Filmmakers used many illusions to fool the audience into what they were seeing.
‘Glass’ shots allowed for the blending of real sets with elements painted on a sheet of glass. The scene was shot through the glass, obscuring unwanted areas of the set and enhancing the shot with the painted element. Popular in the days of Charlie Chaplin, it was used right through to the 1980s on films such as Star Wars and Indiana Jones.
In the example, Charlie Chaplin can be seen roller skating precariously close to a big drop. You can see how the drop is just a painting.
The elements of design and craft, as well as masks, sometimes known as ‘mattes’, will recur again later when we get to talk about virtual studios.
A further development, which allowed for movement in the background, was projection.
Projection allows for a composite (combination) of shots filmed at different times. It is most familiar to audiences from countless, unconvincing, driving scenes.
Projection techniques were used as recently as the 1990s on films like Terminator 2.
The combining of different moving picture sources, or ‘compositing’ as it is usually referred to, is something we'll see later in virtual studios.
What green screen is, and how it works, is a major part of understanding one of the core elements of a virtual studio.
Green Screen overcomes some of the drawbacks of rear projection. It is an electronic process where a machine ‘looks’ for a particular colour in a picture (in this case green) and replaces any of this with a different picture. The term ‘Green Screen’ is a bit of a misnomer, as you could just as easily use a blue screen… or any other pure colour. The colour you use must be different from the parts of the picture that you want to keep. More often than not, we want to keep the actor in the foreground and change the background. One of the main reasons that we chose blue, or green is that human skin doesn't tend to have any blue/green elements; however, men often wear blue suits, so green tends to be used more often. There are other technical reasons to do with green providing a stronger signal, but we don't need to worry about that – suffice to say any colour can be chosen, but it needs to be different to your foreground.
In a traditional film, the green colour is replaced with desired content in postproduction, whereas in television it can be achieved live in the studio using what is known as a keyer - a generic description for a piece of hardware or software that enables us to superimpose one TV picture on top of another. The vision mixing desks we have in our studios all have built-in keyers.
Chromakey - Where ‘chroma’ means colour and ‘key’ refers to the process of keying/superimposing one image on top of another.
CSO or Colour Separation Overlay – Using the difference (separation) between colours so the keyer can overlay another image.
These are all different descriptions of the same thing.
by Michael Lodmore, dock10
A keyer uses three input signals:
Background signal – this is the image that appears behind all the other fill images.
Fill signal – this is the image that appears on top of the background image.
Key signal – this is generated from the fill image by the keyer. Anything in the image that is green is converted to black and everything else is converted to white. This image is used to control which of the background or fill image appears on the final output image. When the key image is black the background image appears on the final output and when the key signal is white it’s the fill signal that appears on the final output.
Putting the three images above though our keyer we get this final output signal.
A chroma keyer works in the same way except that the key signal is generated from the fill image.
To understand the next development of the green screen and take a step closer to virtual studios we need to take a diversion into the development of computer graphics!
Computers allow for the creation of 3D digital models, like Minecraft or Fortnite. You will no doubt have noticed how the quality of computer game graphics has evolved in leaps and bounds over the years.
This has been brought about by improved hardware capabilities as well as new and better software that can maximise the processing power available to make better games.
‘Game Engines’ are the software used to build and create video games and they are responsible for drawing (‘rendering’) graphics, memory management, and much more. ‘Unity’ and ‘Unreal’ are games engines. Dock10 uses the ‘Unreal Engine’ which is made by the same company that developed the game ‘Fortnite’. It is also used in Hollywood for work on feature films such as the latest Star Wars films and series.
The latest console games are so powerful that they can ‘draw’ or ‘render’ photorealistic graphics so quickly that players can move through these environments in real-time.
In the TV world, this means that instead of using 2D filmed backgrounds, we can now create 3D photo-realistic backgrounds and much like a game, manipulate them in real-time.
We've come a long way from those computer games like ‘pong’ in the 1970s. The speed and quality of the graphics have continued to improve and improve. Opposite is actual footage from Unreal Engine 5 Demo.
All the previous ‘special effects’ involved a compromise. The biggest compromise of all was not being able to move the camera (or shoot from different angles) without extensive repositioning and shooting new background footage. Programme makers wanted to shoot on a green screen set just as they would a physical set, recording moving cameras from different angles and directing artists. They wanted to see what the effect looks like there and then and not have to wait until the effect had been composited in post.
Enter the virtual studio!
Virtual Studios replace the traditional background with a computer-generated model. In the same way as a computer game works, this makes it possible to move around this model as if it were real. This enables you to change the perspective of the background as you change the camera shot.
Virtual studios have been around since the 1990s and whilst some amazing effects were produced, the graphics often looked very artificial. The computers weren't powerful enough to produce photo-realistic graphics in real-time, so a lower resolution ‘proxy’ was used with the high-quality image being drawn or ‘rendered’ by banks of powerful computers after the event. Whilst the finished quality was good, it wasn't possible to be certain of the final output until the rendering had taken place and live broadcasts were not an option.
As we've seen, computers have evolved a great deal since then with improved processing speed and power. This allows for photo-realistic graphics, which more importantly, can now be drawn (rendered) in real-time. So, how do they work?
Both old and new virtual systems are made up of the same components:
When the cameras are moved, thanks to the tracking system, the computers know to move the computer environment too, so that when the image is viewed all the elements work together to create a convincing illusion of the subject in the chosen set.
Unlike the virtual studio tech of the past, modern computers can add digital shadows and reflections to make it look as though people are actually situated in their artificial environments.
There are several things you can do with virtual studios that are worth exploring separately.
You can also electronically mask areas of the studio when using a virtual environment. This works in a similar way to the glass shots we explored earlier, effectively painting out parts of the studio or set you don't want people to see and replacing them with another picture – in this case, an extension of your computer-generated background. Again, because this is all computer-generated, it can be re-drawn in real-time and hence used on multiple cameras from different angles.
This means you can give the impression of having sets covering all 4 walls of the studio. When you pan the camera in a circle, you would see a continuation of the virtual environment including items such as a scoreboard or view from a window.
You can take this a step further with Dynamic Masks. These enable you to hide things that can move - such as camera peds etc. This can help to maximise the useable space in the studio and provide wide shots where the crew don't need to hide behind a piece of set or ‘camera trap’.
It is easy to think of virtual sets as all-encompassing computer-generated sci-fi backgrounds requiring entire studios draped in green – sometimes that's the case, but that's only half of the story.
Virtual sets often work best when ‘real’ set elements are combined with virtual ones. So, you might have real podiums for a quiz show but the rest of the studio be virtual, offering you the chance to create the illusion of a much bigger space. Green screens are only needed when the artist has to move in front of the background. Some effects are achievable by just using masks – such as using computer-generated video walls to replace LED screens or replacing the lighting grid with a virtual chandelier.
Potentially the show's host could move from a real set to a green screen area, which would feature a virtual set. This could be changed instantly to reflect numerous different backgrounds without a time-consuming reset.
It is also possible to transition a shot from a virtual environment to real-life - for example tracking forward from a road through a window into a front room.
In reverse, it is possible to take the viewer in a single, unedited shot, from a physical part of the set out to another area of the digital environment, where the camera operator can move, zoom, focus just as before.
This is where physical screens and video walls can be used to show 3D virtual elements so that presenters can see them. It can be combined with AR to dramatic effects - such as having a car drive out of an LED wall in front of the presenter into the studio for example.
More common in the feature film world, in this form of virtual studio the green screen is replaced with a video wall. The camera tracking and computer-controlled environment are the same as the green-screen setup. Whilst this technology has some interesting plus points, it is only usable with a single camera and thus somewhat restrictive in the TV environment.
Often abbreviated to AR, augmented reality is where computer-generated elements are superimposed on top of the real world. Combined with a virtual environment it can create something much more immersive. The computer graphics in the foreground can be anything from title cards to three-dimensional charts and images, and moving objects such as cars.
Again, using the camera tracking technology, it is possible to move around these objects as if they were there in real life. The computer redraws the object in real-time from the perspective of whichever camera you choose.
People filmed in one studio against a green screen can be made to appear in a virtual studio at a completely different location. It is possible to do a 2-way interview with a football manager, moments after the game, and have it appear as if they are physically in the studio with hosts - even though they might be hundreds of miles apart.
In the same way in which cameras have trackers for computers to know where they are in the physical environment, it is possible to place trackers on presenters so that computers ‘know’ where they are too. This enables presenters to be able to walk around computer-generated objects as if they were physically there. The computer redraws the object from the perspective of the camera dependent on whether it is being obscured by the presenter or not.
Sometimes referred to as mo-cap, it is where a person can wear a suit containing sensors that track their movement and translate them into a computer model. Those movements can then be used to animate a digital character or an avatar which can then be included in a virtual environment. It can be used for puppetry or to replace traditional animation and allows the ‘character’ to interact with other people in real-time. You could even interview Wallace and Grommet on the set of their latest film – and they could respond in real-time.
Current cameras work by recording the intensity of the light to create flat 2D images. Plenoptic cameras or light field cameras can also record the direction from which the light rays are travelling and can create the ‘light field’ of a scene – in effect recording 3D information of a scene. These kinds of cameras are already being used on top-end mobile phones for applications such as measuring objects using only the camera.
Because depth information is captured, it becomes possible to do ‘depth keying’ where you can superimpose subjects based on their relative position to the camera, effectively removing the need for the green screen as well as offering an alternative to talent tracking, so presenters can move around digital elements more naturally.
AI is already able to upscale footage to higher resolutions, add colour to black and white images, and remap real facial movements onto digital avatars. As it becomes better and quicker at identifying people, we can expect AI-enhanced keying, and motion capture without the need for trackers.