Choosing Video Formats

The first digital video standard, ITU-R BT.601, was published in 1982. After more than three decades of working with digital video formats people are still asking the question which format to use and when you should use it. Everyone wants video that can play anywhere, is high quality and has small file size.

A video format is defined by the spatiotemporal characteristics of the video,

  • Spatial resolution
    • Pixels: width and height; aperture (scaling such as anamorphic; cropping); pixel aspect; sampling
    • Gamut and precision: color space (RGB, YCbCr, alpha), subsampling and transform coefficients; depth (number of bits used to represent each color component)
  • Temporal resolution
    • Frame rate in fps (frames per second); some cameras, especially phones, capture varying frame rates depending on light levels
    • Interlaced or progressive

ITU-R Recommendations BT.601 (SDTV), BT.709 (HDTV) and BT.2020 (UHDTV) specify parameters for digital TV systems for production and programme exchange. SMPTE specifies ST 274-2008 (HDTV) and ST 2036-1:2009 (UHDTV). They define the spatiotemporal characteristics listed above.

Alpha is not a display format parameter, it is for compositing, but a lot of video is composited. Also, YCbCr is actually for efficient storage, almost all video cameras capture some form of RGB but store as YCbCr. So it makes sense to define alpha and YCbCr in a format.

A lot of people would say a video format is the codec. There are three properties that are usually muddled together when people talk about formats,

  1. Spatiotemporal characteristics as listed above
  2. Codec
  3. Container

The codec defines the compression method and the container defines how the data is laid out on disk or in the video stream. The container is important for synchronizing the timing between images and audio and controlling things like what happens if an image isn't decoded properly during reception.

Codecs can store video data using a totally different color space internally, do not specify resolution, frame rate or interlacing; some codecs only work for specific parameters. You can choose almost any codec and container and as long as the compression is good quality, the video will look the same.

Suppose you ask someone to record a clip of "someone playing tennis", they ask you what format and you answer "MPEG-4". So they give you high bit rate QCIF at 48 fps: high quality and small file size. Or maybe they give you 1080P at 15 fps, low bit rate. In both cases the video is compressed with MPEG-4 but probably it's not what you wanted.

Suppose you have some nice video coded with an intermediate codec like ProRes. You transcode it to H.264 to send it to a friend. It's still the same resolution, the quality remains high; perceptually it is the same video. Or maybe it's ProRes in a QuickTime container but you convert it to an MXF container; the source video remains identical.

Codec and container are important details for storage and transmission but only the spatiotemporal parameters define the format because you can change codec and container without changing video quality.

What format should you use ?

Depending on whether you are working in SD, HD or UHD the answer to this question should always start with the reference specification, such as ITU-R BT.XXXX or SMPTE ST XXXX and then specific parameters about spatiotemporal characteristics; e.g.

1920×1080, 30 fps progressive, 8 bits per component, YCbCr 4:2:2, progressive.

Which parameter settings should you use ?

  • Progressive
  • Same or higher resolution than the deliverable
  • 4:4:4, highest available component precision
  • The best quality that you can capture with your camera

Unless you absolutely have no choice, there is no good reason to capture interlaced video. Online video should not be interlaced and if you really must output interlaced you can convert progressive to interlaced.

Downscaled video is much better quality than upscaled so always choose a higher resolution if possible. However, there is probably little benefit in choosing 2K over HD.

Which codec should you use ?

  • The best quality that you can capture with your camera
  • Transcode to an intermediate codec with more color precision for editing, such as ProRes
  • Transcode to the final deliverable at the final step of the editing process — this is very likely inter coded

You should use an intermediate codec for two reasons; (1) if the original is inter coded (e.g. MPEG or H.26x) or if the resolution is very high the resources load will be higher for your editing computer, (2) increasing the mathematical precision will introduce less noise during editing.

This is the choice that has the most effect on the final file size, assuming that resolution has already been chosen. You can now decide the acceptable trade off between file size and final quality.

Note: Intra coded means a frame is coded with no reliance on other frames. Inter coded means frames are predicted from previous and/or future key frames; this gives very high compression but takes more effort to compress/decompress. MPEG and H.26x standards use a mix of intra and inter coded frames.

What container should you use ?

  • It really depends on who will be using the content and what for
  • One that is suitable to encode the metadata you need to attach to the video

 As mentioned earlier, container does not affect quality and has little effect on file size, so this choice is down to preference and convenience.