Last week I wrote about the new SpanishDict video lessons feature, which has been a huge success, even with the school year not yet in full swing! If you haven't yet read last week's post, please do that first, as it will provide some helpful context about how the video lessons feature works. In this post, I'll talk about how we engineered what we hope is a great video experience, regardless of a user's browser choice and connection speed.
As you'll recall from last week's post, every question in every lesson, regardless of question type, has a video. From discussions with the product team, it seemed like there were two main goals for these videos. The first was that a question's video should start playing right when the user advances to it. The second was that the videos should be high quality, which seemed to mean no noticeable blurriness. Engineering added a third goal, keeping the video files as small as possible. Keeping the file sizes small would result in us paying less for data transfer, users downloading the videos faster, and mobile users consuming less of their data plans when taking lessons. In the sections to come, I'll go through all of these goals and how we achieved them. Interestingly, each of these respective goals complicated our ability to achieve the other two.


Because of the need to preload videos and the need to have enough control over the videos to beep out certain sections, we felt strongly that we needed to host the videos ourselves rather than rely on a video hosting provider like Youtube or Vimeo. As such, we needed to get up to speed on serving video on the web!
Video Formats
There are many types of video formats, and every browser offers different support. A video's extension, for instance .mp4 or .webm, is its container format. The container can have both video and audio within it, both of which could be in a variety of formats. A .mp4 file could have h264 video and aac audio, or it could have hevc video and opus audio. There are many different combinations, and any of the three components, container format, audio format, or video format could cause a video to be unable to play in a given browser. Because the user's operating system plays such a part in video playback, sometimes the same browser version will be able to play a certain video file in a newer version of an operating system but not an older version.
If you want full browser support, the easiest thing to do is to just encode all your videos in an mp4 container with h264 video and aac audio. This provides support all the way back to Internet Explorer 9, and all modern browsers have supported this type of video for years. However, there are many new video codecs which can produce videos of the same quality at much smaller file sizes than those produced by h264. Two of these which have been around for a while are VP9 video in a webm container and hevc video in a mp4 container.
VP9/webm video has been supported in both Firefox and Chrome for 5+ years and is also supported by the new Chromium-powered Edge. The hevc/mp4 video format has been supported since desktop Safari 11 (but only for MacOS High Sierra or later) and since iOS Safari 11. Both of these formats can encode video at the same quality as h264/mp4 but with 30-50% smaller file sizes. By using these two formats in addition to h264/mp4, we would be able to serve highly compressed video to more than 95% of our users, falling back to a larger video format for full support.


It's worth noting that both VP9/webm and hevc/mp4 formats have successors that boast even better compression improvements and which will start to get better adoption in the coming years. The successor to VP9 is AV1, which does have some browser support now, but which we found to not be mature enough to use quite yet. Encoding videos in this format is still quite slow and expensive, and we were unable to get any AV1 videos we produced to successfully seek in Firefox or Chrome when testing on MacOS. The successor to hevc is vvc, which was finalized last month and is not yet supported in any browsers.
All right, so at this point we knew the three video formats we wanted to generate for every question, h264/mp4, hevc/mp4, and VP9/webm. However, we still needed to figure out what resolutions to use for our videos. On desktop, our videos are 480x270, and on mobile devices, our video needs to be as wide as the screen, up to a maximum of 480 pixels wide. Most mobile phones fall between 360 and 414 pixels wide, so we made the call to go ahead and serve the 480px-wide videos to mobile users for simplicity, but if we really wanted to hone in on bandwidth, we could go back and generate videos for several common mobile breakpoints.
It's well-known that in order to avoid blurry images and serve the minimal image file possible, you should use something like srcset to serve higher resolution images to users with high-density screens, like Apple's Retina displays. However, we couldn't find much written about this for video. We wanted to know if serving a 480x270 video into a 480x270 video tag on a 2x or a 3x device would result in the video appearing blurry, so we tested this out. Below is a comparison between a 480x270 video and a 980x540 video on a 2x device. Both videos were generated using similar quality settings.


You can see that the screenshot from the 1x video on the left is of noticeably lower quality than the 2x video on the right. We found that even if we drastically increased the quality of the 1x video compared to the 2x video, the 2x video still looked a lot better than the 1x video when viewed on a 2x device. As a result, we decided to generate three video sizes (1x, 2x, and 3x) in each format, to provide clear video for users with varying screen pixel densities. We also generated a very low quality version of the video at a smaller resolution, 368x207, which we planned to use for users on a slow connection. We figured that immediately playing a blurry video with clear audio was a superior experience to waiting for a high-quality video to buffer. This meant that for every question in a lesson, we would generate 12 videos due to the four different sizes and three different formats.
Video Conversion Pipeline
We had had success in the past using AWS Elemental Media Convert to process and convert videos for our pronunciation feature, so we turned our attention there again in order to create a pipeline for video resizing and conversion that works something like in the diagram below.

Our video production team edits all the videos for a lesson in high resolution and exports the videos into the appropriate lesson folder, naming the videos for the questions they correspond to, 1.mov, 2.mov, etc. For cloze and translate questions, the question types that have a blank and require beeping, the video team also indicates the start and end locations of the blanks. The production team then runs a script which uploads the videos to an S3 bucket. Videos which need to be beeped are uploaded with metadata indicating the blank start and end times.
The names of the video files that we upload include all the information necessary in order to know the lesson and question number for each video. For example, the video filename for question 3 of Spanish lesson 7 is lesson-7-lang-es-question-3-version-1.mov
. The version is simply a way for us to permit a question's video to be changed later, something that has been needed quite a lot, as problems are often found with videos after uploading them. Since we will set the cache-control headers of the video files we generate to allow browser caching, we will need to change the name of the video file when uploading a new video, and the version allows us to do this. The video upload script defaults the version to 1, but if there is already a video for the current question, the script increments the latest version.
A Lambda watches our S3 bucket for new files, and when it notices one, it kicks off a MediaConvert job which takes the input video file and generates our twelve output files, 1x, 2x, 3x, and low-bandwidth resolutions in each of our three formats, h264/mp4, hevc/mp4, and VP9/webm. We name the files in such a way that it's easy to tell which of the twelve files is which, something that MediaConvert makes it really easy to do with the Name modifier field.

Looking at the file sizes for the different videos above, a few things stand out.
First, at really low resolutions, using the more highly compressed formats hardly saves you any bytes at all, even on a percentage basis. I don't have a good understanding of video compression, but I think that, intuitively, this makes sense, as there are fewer pixels to analyze across the length of the video, and thus fewer spots across frames to find common ground for compression.
Second, despite the 2x videos having four times the number of pixels as the 1x videos and the 3x videos having nine times the number of pixels as the 1x videos, those videos are not four times and nine times larger than their 1x counterparts. This is a result of us restricting the maximum bitrate for the 2x and 3x videos using MediaConvert settings due to the following finding: if you play the 2x or 3x videos at their true resolutions, they appear blurry, but since we are placing them in a video tag that is 1/2 or 1/3 of the width of the video file itself, they actually appear quite clear to our users at only double or triple the 1x file size.


Third and finally, the file size of the VP9/webm videos is a bit larger than the file size of the hevc/mp4 videos. This is due to a couple of factors. First, from our tests, hevc/mp4 does seem to achieve better compression. Second, MediaConvert allows specifying a video quality setting for h264/mp4 and hevc/mp4 videos, but it does not yet support that for VP9/webm. You can only specify the average bitrate, which can lead to some blurriness in some higher complexity spots in some videos. For this reason, we had to set the bitrate a bit higher than we might have liked for this format. Shots where an actor was walking around outside were especially problematic in VP9/webm format without bumping the bitrate a bit. I won't bore you with the specific settings we used for all the different formats, but if you end up doing something similar with MediaConvert, make sure to test lots of different types of videos with different bitrates and quality settings in order to output the smallest files possible that meet your quality requirements!

Serving The Correct Video
Of the goals laid out at the start of this post, we've made progress toward two of them thus far. We have a way to generate small, yet high-quality, videos. However, at this point we still needed to figure out a couple of things. First, there was the question of how to get the browser to serve the correct video file, and second, we needed to figure out how to preload that video so that it would be ready to play once its question was loaded. After some research, there did not appear a built-in way to use the <video> tag to serve different videos based on screen density. Serving different video formats based on browser support can be done using <source> tags, but because of the lack of any native ability to serve different video sizes based on screen density, we would need to use Javascript to determine which video to serve.

To figure out whether to use our 1x, 2x, or 3x videos, we turned to window.devicePixelRatio, which according to MDN, returns the ratio of the resolution in physical pixels to the resolution in CSS pixels for the current display device. It turns out this value can be fractional, so we wrote this code to choose the correct video size:
const getScreenDensity = () => {
const devicePixelRatio = window.devicePixelRatio;
if (devicePixelRatio > 2) {
return '3x';
} else if (devicePixelRatio > 1) {
return '2x';
}
return '1x';
};
Determining the correct video format to use was messier than we would have liked. We ended up using a combination of the canPlayType api and user agent detection. It would have been nice to use only the canPlayType api, but the browser does not actually know for sure before it tries to play a video if the video will play successfully due to factors like GPU drivers and OS playing such a huge part in its ability to do so. For this reason, the canPlayType api has three possible return values, 'probably', 'maybe', and ''. For instance, despite the fact that Safari 13 is always able to play hevc/mp4 video, video.canPlayType('video/mp4; codecs=hevc')
returns 'maybe'. The same value is returned by Safari 11 on MacOS Sierra and earlier, but hevc/mp4 videos cannot play there. Here's what we came up with, keeping in mind that SD_USER_AGENT is determined server-side using a user agent library:
const getCodec = () => {
if (SD_USER_AGENT == null) {
return 'h264';
}
const video = document.createElement('video');
const browser = SD_USER_AGENT.browserName;
const browserVersion = SD_USER_AGENT.browserVersion;
// HEVC support is only present in
// Safari/Mobile Safari 11 and later,
// but MacOS versions 10.12 and earlier
// do not support HEVC playback at all.
// Because of this, for extra safety,
// we will only serve HEVC video for
// Safari/Mobile Safari versions 13
// and newer, as Safari 13 is only present
// on MacOS 10.13 (High Sierra) and
// later versions of MacOS.
if (
// This canPlayType check keeps
// Chrome/Firefox iOS mobile view
// from trying and failing to
// serve HEVC video in development
video.canPlayType('video/mp4; codecs=hevc') &&
(browser === 'Safari' || browser === 'Mobile Safari') &&
+browserVersion >= 13
) {
return 'hevc';
} else if (
// Some older versions of Edge report
// that they probably can play
// this codec when they actually cannot.
// Edge is a small enough % of
// our traffic that we will just serve
// h264 to Edge users for now
// as more users transition to newer
// versions based on Chromium.
video.canPlayType('video/webm; codecs="vp9, vorbis"') &&
browser !== 'Edge'
) {
return 'vp9';
} else {
return 'h264';
}
};
Preloading and Autoplaying
We now have the functions necessary to identify the correct video file, but how can we go about making sure the video preloads and then autoplays right when its question is shown to the user? The autoplay question is a tricky one, since mobile browsers and now most desktop browsers will not allow an unmuted video with sound to play without any user interaction. However, we found that even mobile Safari, which has the tightest autoplay restrictions, would allow us to autoplay videos if we added a start screen that requires the user to click a button to advance to the first question. We also found that as long as the same <video> tag is kept on the page, you can switch out the source of that <video> tag as much as you like, and it can continue to autoplay. This was a life-saver for us, since the way we were triggering a video play was via a React useEffect, which reacts to changing props. We weren't triggering the play in immediate response to a user mouse or keyboard action.
Be careful though! If you remove the <video> tag at any point and then want to autoplay videos afterwards, you'll again need some sort of user action in order to permit the autoplay. We hit this snag when we introduced an auto-advancing intermediate review screen in the lessons, so we added in a Continue button which has to be clicked in order to advance.



As mentioned above, we didn't just want the videos to autoplay when arriving at a question. We wanted them to be fully downloaded so that they could immediately play! Google Web Fundamentals has a great article on preloading videos, where they mention things like the preload attribute on <video> tags and preloading videos in a <link> tag. However, these techniques don't have full browser support, and Mobile Safari does everything it can to keep you from downloading videos before they need to be played, so we turned to a technique we had used for our video pronunciation feature, making an XHR request for the videos before they are needed, and then using createObjectURL to generate the source for the video. We initiate the preload for the first question's video on the start screen, for the second question's video when the first question is shown, and so on.
new Promise((resolve, reject) => {
const req = new XMLHttpRequest();
req.open('GET', videoUrl);
req.responseType = 'blob';
req.onreadystatechange = ({ target }) => {
if (target.readyState === req.DONE) {
if (target.status === 200) {
const videoBlob = target.response;
resolve(URL.createObjectURL(videoBlob));
} else {
reject();
}
}
};
req.send();
// Keep a handle on the request so that
// it can be aborted if the preload
// did not finish before we need this
// particular video and we need to
// fall back to streaming
preloadedVideos.current.requests[questionNumber] = req;
})
.then(objectUrl => {
preloadedVideos.current.responses[questionNumber] = objectUrl;
})
.catch(() => {
// Do nothing, this just means either
// the preload request failed, or we
// forced the preload request to abort
// since it did not finish in time,
// and we are falling back to streaming.
});
Because we are using React, we store both the requests for the video files and their responses in a ref to avoid causing a re-render of our video component whenever we make a preload request or receive a preload response. When it comes time to play the video for a particular question, our video component checks to see if the video file has been preloaded by checking preloadedVideos.current.responses[questionNumber]
. If the video is available, we play it! If it's not, we will abort the XHR request stored in preloadedVideos.current.requests[questionNumber]
and then set the video source equal to the url for the small, low-quality video we generated in the video pipeline so that it can be buffered and streamed. Since network connection information is not exposed by all browsers, we use failure to preload the high quality video as an indicator of a slow connection and will make sure to attempt to preload the low-bandwidth video for the next question when this happens. As I mentioned above, we'd rather have a low quality video play right away than have the user wait around for a high quality one.
const preloadedVideo =
preloadedVideos.current.responses[questionNumber];
if (preloadedVideo != null) {
videoRef.current.src = preloadedVideo;
} else {
// Abort the XHR request which was trying
// to preload this video so that it
// does not compete with the video
// streaming that we initiate below.
try {
preloadedVideos.current.requests[
questionNumber].abort();
} catch (e) {
// It's possible that an ajax request
// was never made to preload the video
// or that we already aborted the request
// in question. Cancelling this request is
// just a performance optimization,
// so we don't need to take any action
// if there was no request to abort.
}
// If we failed to preload the video,
// it's likely the user is on a slow connection,
// so use the low bandwidth video for this question.
// Setting the source of the video like this
// will allow the video to stream. That is,
// the video can start playing even when it
// is only partially downloaded.
videoRef.current.src = getVideoUrl({
lessonId,
lang,
questionNumber,
version,
codec,
screenDensity: 'low-bandwidth',
});
nextVideoPreloadScreenDensity = 'low-bandwidth';
}
videoRef.current
.play()
.then(() => {
// Do not start preloading the next video
// until this video has begun to play
onPreloadNextVideo({
screenDensity: nextVideoPreloadScreenDensity
});
})
.catch(e => {
if (e.code === DOMException.ABORT_ERR) {
// This means that the user advanced
// to the next video before this one
// was ready to play
return;
}
Logging.error('Lesson video play error', {
errMsg: e.message,
errStack: e.stack,
errStackTrace: e.stacktrace,
});
});
It took a while to get here, but we've now achieved our three goals! We can serve high quality videos to users using as few bytes as possible on a variety of devices and browsers with different screen density and codec support, and we are able to preload these videos and autoplay them immediately. You can see all of this in action if you take a lesson with your browser's network tab open! If you've made it this far and have any questions, or if you are a video expert with ideas about how we could improve our various approaches, please leave a comment!
Comments powered by Talkyard.