Introduction to VR Cameras
Virtual Reality (VR) camera is a fairly modern term for what has long been known as an omnidirectional camera. VR Head Mounted Displays (HMD) such as the HTC Vive, Oculus Rift or Google Cardboard have been generating interest for synthetic environments but now people are looking to create live content so they need to be able to see in all directions in the real world. Omnidirectional cameras can capture views in all directions simultaneously. In this post we will use the terms VR camera and omnidirectional camera interchangeably. Some VR cameras don't actually cover 360 degrees in both horizontal and vertical directions, only capturing 360 degrees on one axis and a limited view angle in the other. Most information about VR cameras only talks about "360 degree video" and gaps in the viewing sphere are not mentioned.
In the past if you wanted an omnidirectional camera you would have to buy an expensive specialist model or build your own camera. VR cameras have been increasingly popular recently and a number of models have been announced or released, particularly as action cameras. Some are aimed at the consumer but several are aimed at professional users. YouTube and Facebook support interactive 360 degree video otherwise you need to use an online video player service supplied by your camera manufacturer.
Single Camera versus Multi Camera Systems
Most VR cameras use two or more camera sensors to capture in all directions. The images are mapped onto the inside of a sphere and joined together using a stitching algorithm. An omnidirectional camera can be made by using a parabolic lens and one sensor: an inverted conical mirror with the camera underneath. This gives a 360 degree view horizontally but there is compression of the pixels closer to the tip of the cone and expansion of the pixels at the base of the cone.
Cameras that appear to be one module with two lenses have two sensors internally. Fish eye lenses pointing in opposite directions each capture more than 180 degrees for full coverage. Images from fish eye lenses are circular (see image below). If each sensor is a standard video sensor scanning the pixels in a 16:9 rectangle then a large area of the sensor is not used either side of the image.
Flatiron Fish Eye Showing Circular Border
(credit: Wikimedia user Autopilot)
Multisensor cameras use stitching to join the images together. This is either done in-camera or by external software and for some models the external software is run in the cloud. The quality of the stitching may result in seams or ghosting; this is more obvious with foreground objects crossing between images from different sensors.
Professional VR cameras tend to use a larger number of sensors, often six or more, in order to reduce the amount of distortion from the lens. This type of camera requires more stitching but the stitching should be easier because there is less distortion. For fish eye images, the pixels in the stitching zone are the most distorted and most tightly packed so there is more opportunity for seam artefacts, especially if the image is compressed before stitching.
Each sensor in a multisensor system will have to be correctly synchronized otherwise temporal artefacts could occur. For full 360 degree cameras, on a clear day the sun is potentially always in view and therefore internal reflections and lens flares are likely as well as exposure issues.
The best use of VR is when the viewer can interact to change viewpoint. This is a huge benefit for action sports where you can place the camera in the middle of the action, however, fully capturing all directions means that the operator or tripod will also be captured. If the video is not for VR, you could use Ken Burns style cropping to obtain a unidirectional view in scenes where the most interesting views cannot be predicted in advance (GoPro call this OverCapture).
Note that for some of the multicamera rigs it can be difficult to change batteries quickly (the GoPro Omni has an external charging rig so this not a problem). And if there is an issue with one camera then all of your footage from that capture may be useless.
Video must be stitched first. Using viewing software for an interactive view or a head mounted display lets you see the video in presentation format. Viewing the video with a standard player will let you see the whole scene at a glance but, as mentioned before, some areas of the frame will be difficult to check.
Final Video Format
Some cameras output 1:1 square format and some 16:9. Most new cameras are 4K but some are HD. This is one case where higher resolution is vital: the pixels must be spread over the view so each pixel is covering a lot of space. As most cameras use a Colour Filter Array and then chroma subsampling, consider that the colour resolution is further reduced, this could be a general quality issue or might affect chromakey.
Metadata encoded in the video stream indicates when it is VR video. Properly tagged video can be uploaded directly to Facebook or YouTube. YouTube recommends 1:1 format video.
8K is considered low resolution for 360×360 degree video. Let's play with some numbers to get a feel for how resolution might affect quality.
- 8K resolution gives us 8192 pixels horizontally/360 degrees = 22.76 pixels per degree for the horizontal axis
- HD gives us 1920 pixels/360 degrees = 5.3 pixels per degree
Now suppose we have a Canon 5D mark 4 with a 35mm f1.4L lens, the native sensor resolution is 6720×4480 and the lens has a 54 degree horizontal field of view. That gives us 6720/54 = 124 pixels per degree for a still shot at 30 Megapixels.
For video the 5D mark 4 has a 4K video crop factor of 1.75. There is also a 1080P mode using pixel binning over the full frame; that gives us 1920/54 = 36 pixels per degree. Suppose the pixel binning is a very good algorithm, 8K over 360 degree video is less detailed than HD (for this lens). An alternative way of looking at it: 8K resolution for 360 degrees cropped to a view of 54 degrees at 22.76 pixels per degree is 1229 pixels.
We found a GoPro demo video from YouTube served at 640×360 (see image & link below). At this low resolution we get 1.78 pixels per degree: this is probably going to be too low quality for an HMD but it's fine for viewing software, such as YouTube.
Using the numbers given above, the following table summarises the "equivalent" unidirectional view resolution, given the same camera and lens combination, for stitched video.
|8K||less than HD|
|4K||less than SD|
|HD||less than QVGA|
This image apparently taken from a Samsung Gear 360 appears to pack 2 square images together when storing them:
For cameras with a larger number of sensors, each camera will be using lenses with a narrower field of view and therefore will have less lens distortion. It should be possible for source footage to be captured with better quality after compression.
Multiple RAW video streams will incur storage and transfer time costs so stitching on camera and RAW output of the stitched result would be preferable.
When views are stitched together for distribution the codec is likely to be H.264/AVC with a leaning towards H.265/HEVC for BD or streaming. A key problem for these codecs is there is no profile aimed at VR video. The standards specify profiles and levels for codecs to implement in order to support standard applications. The people who make hardware codecs or write software codecs then know how much cache, bit rate, etc they need to support. 4K×4K is only possible within level 6.2 of H.265 which is aimed at widescreen 8K.
Most modern video codecs use block based motion estimation with compensation for inter frame compression. Each block of pixels is matched to other frames and a 2D offset vector is calculated if the match is good. Motion vectors are smaller than a block of pixels, so that is where most of the compression comes from. Block vectors can represent general planar motion with the camera rotating in all 3 axes, if the blocks are small enough. Smaller foreground objects in the video tend towards having translational motion, so again block vectors can represent them. To speed things up and apply compression to the vectors, they are often predicted from neighbouring vectors.
The stitching process means that at the top and bottom edges of the frame the motion is no longer planar and vectors cannot be predicted so well from neighbours. Blocks will be too big to effectively model the actual motion of the pixels. H.264/5 levels also set a maximum number of blocks that can be used. This will likely result in a performance hit for the compression and artefacts. For the same reasons, intra prediction (a powerful feature of H.264 that was extended in H.265) should be less effective too.
Intra and inter coded video as used in the majority of video codecs have proven to be very flexible with a wide variety of content. Only by looking at content with varying global and local motion, level of detail and bit rates can we determine whether new codecs will be required.
The frame from a demo video for the GoPro Omni below appears to be reasonable quality even though we downloaded it direct from YouTube. Note the colour at the bottom of the frame looks a little blurred.
Note: The driver for a HMD only transfers the current view to the display. JPEG XS (lightweight) has been proposed for compressing views.
Summary of Potential Issues Affecting Video Quality
- Final output resolution: is it for HMD or is it for desktop viewing ?
- Reduced colour spatial resolution from CFA
- Chroma subsampling (YCbCr coding)
- Number of sensors: more sensors mean less distortion, higher resolution, higher cost, longer transfer times, more stitching seams
- Seam visibility : more likely for fish eye based systems because of the increased distortion
- Pay close attention to foreground objects at seams/avoid action that gets too close to the camera
- Don't capture when the sun is directly visible; orient blind spots for non 360 degree cameras to avoid sun
- Potential exposure issues if one side of the camera is in sunlight and the other side is in shade
- Lens coatings may reduce lens flare
- Motion model and intra prediction may be less effective at top and bottom of the frame resulting in higher bitrates or compression artefacts in these areas
- Bandwidth and storage if capturing RAW
Recently, Google announced their development of a new format, VR180, a stepping stone from conventional video towards 360 degree video. In VR180, two fish eye cameras are used to create a stereoscopic immersive experience with 180 degrees field of view. The type of camera used is conceptually similar to the 360 degree camera made from two fish eye cameras in opposite directions but now they are arranged in the same direction with a small separation for eye distance.
Most of the comments regarding quality still apply but no stitching is required, some alignment post processing may be necessary for comfortable viewing.
List of Camera Hardware
In this table we list some of the available cameras and their key specs. A lot of information about the specifications is not clear or missing and resolution does not necessarily imply image quality. It's interesting to see stereoscopic video at the top end.
|Ricoh Theta S||2||full HD|
|Kodak Pixpro SP360||1 (two cameras needed for 360 degrees)||HD|
|Nikon KeyMission 360||2||2160P|
|LG 360 Cam||2||2K|
|Xiaomi MiJia 360||2||3.5K (3456×1728)|
|Samsung Gear 360||2||4K|
|Garmin VIRB 360||2||4K stitched|
|Go Pro Fusion (unreleased)||2||5.2K|
|Go Pro Omni||6||4K unstitched|
|Humaneyes Vuze||8 (stereoscopic)||4K per eye, unstitched|
|Nokia Ozo||8 (stereoscopic)||4K per eye|
At Tiliam we are interested in all issues concerning video quality. In this article we have considered areas that may affect quality but only through experience of video production with a variety of gear and content types will we get a true picture of what really impacts on video quality from VR cameras.