How to Manually Encode WebM Videos With FFmpeg

WebM is an audio and video container that has gained popularity over the past few years for its open licensing (and thus being copyright free) and native support by many popular web browsers, including Google Chrome and Mozilla Firefox. As a result, more and more webhosts and users are turning to it as a method to host and share content. While there are services and software that allow for hands-off encoding of WebM videos, it is often preferable or even necessary to encode by hand in order to fine-tune and get the best quality possible. This tutorial aims to teach you how to get the most out of encoding WebM.

Step 1: Downloading FFmpeg

FFmpeg is a free, open-source software project that utilizes the libavcodec library, used to encode the VP9 video format and ogg vorbis audio format that WebM utilizes. We will be using it to create our WebM, so we will need to download it from the FFmpeg page at https://ffmpeg.org/download.html. Be sure to download to download the version suitable for your operating system and architecture. While FFmpeg is available on Windows, OS X, and Gnu/Linux, we will be working with windows from this point forward during installation, specifically Windows 7 Service Pack 1. However, instructions for after installation will be largely the same.

In order to use FFmpeg, we will need to extract the contents of the folder using either the built-in windows extraction tool or one of your choice. Make sure that you remember the address of the directory you extracted it to.

Next, in order to use it on the command line like we need, we’ll need to add the directory we’ve extracted FFmpeg into to the PATH variable. If you don’t know where the option for modifying this variable is, it can be found by navigating to Control Panel\System and Security\System and selecting advanced system settings -> environment variables

If there is not currently a PATH variable listed in the user variables category, we'll need to add it by selecting new, setting the name of our new variable to PATH, and setting its value to the following:

[path/to/ffmpeg]/bin

where [path/to/ffmpeg] is the root directory that we extracted.

If PATH is already listed under the user variables category, we will need to edit it, and append the following to it:

;[path/to/ffmpeg]/bin

Note: in this guide, bits of text surrounded by brackets [] as above are bits of info that are specific to you. When entering these, make sure to not actually put your part in brackets.

Step 2: Identifying the Needs of Your Video

Before we can start encoding, we need to take a look at the source we’ll be extracting from. Naturally, it’s not viable to attempt to get good quality out of a low-quality source, and we’ll need to know a few things about our source and the limitations of our finished product before we start working on it.

Starting time and duration of our clip:

Typically, if you’re converting from a recorded source, you’re wanting to cut up a portion of the video you’re using as a source. Luckily, we can accomplish this in the process of the encoding. After you find the start and end time of the clip you want to use, you’ll need to find the duration of the clip by subtracting the start time from the end time. Save this calculated duration and the starting time for later steps, we’ll be referring to these as the [duration] and [start time], respectively.

Max filesize:

While we want to get good quality out of our WebM, we probably don’t want to host a 100 MB video file. In order to avoid this, we’ll want to limit the bitrate of our output video to a suitable level. To calculate one suitable for you, you’ll want to take the following steps:

Calculate the desired max size of the output in bits (# megabytes*8000000)
Divide this number by the duration of the video you calculated previously in seconds, we’ll refer to this later as the [bitrate]
Since we're assuming you'll want audio in this WebM, subtract 192000 from your [bitrate], this accounts for the bitrate of the encoded audio.

Desired resolution:

To make sure that your WebM is the desired size, we'll want to take a look at the resolution of our source to determine if it it suitable for our needs. Luckily, FFmpeg comes with a handy tool called "ffprobe" we can use to quickly figure out the width and height of our souce. To do this, we'll want to open up a windows command line in the directory of our souce by holding shift and right clicking, then selecting "open command window here," and finally entering the following into our terminal:

ffprobe -v error -of flat=s=_ -select_streams v:0 -show_entries stream=height,width "[input.filename]"

This will give us an output of two lines, the first being the width of the frame and the second being the height. If this resolution is suitable for your needs, there's no need to take any further action, but if a smaller or larger resolution is needed, figure out the new height you'd like the video to be at, which we will later refer to as [height]. You'll have to use an additional option listed later in step 4 to take advantage of this knowledge.

Step 3: Equalizing the Audio

In many cases, your source will have audio that needs to be modified to ensure a consistent volume throughout the video. To accomplish this, we’ll need to find the maximum offset of the volume of the video, which luckily we can do using the FFmpeg library.

You’ll first want to open a command prompt in the directory of your source video. You can quickly do this by holding shift and right clicking in the directory, and then selecting “open command window here.”

Once that’s open, you’ll need to run the following command:

ffmpeg -i "[input.filename]" –ss [start time] –t [duration] -af "volumedetect" -f null /dev/null

Where [input.filename] is the full name of your file, extension included.

That’s a lot to take in, so let’s break it down piece by piece:

ffmpeg
We’re just telling the prompt we want to run FFmpeg

-i “input.filename”
–i specifies we’re telling FFmpeg to use the following filename [input.filename] as our source.

–ss [start time] –t [duration]
–ss tells the program where to start looking in the source file and –t tells it how long we want the video to run. Both are represented in the form hours:minutes:seconds.milliseconds. For example, 01:28:30.2 would be one hour, 28 minutes, thirty seconds, and two milliseconds.

-af “volumedetect”
–af tells FFmpeg we want to see the filtergraph of the file we’ve given it. It’s not important to know exactly what a filtergraph is, we’re only going to be using a portion of it.

-f null /dev/null
This block just tells us that we don’t want to output the graph to a file, we only need one number from it so there’s no reason to do so.

After running the script, a series of lines will begin to appear. You can ignore these, waiting until you get an output similar to the picture. The only information you need from here is the number labeled as max_volume.

Take this number, and drop the negative sign, so for example the picture would be 5.2. We’ll refer to this number later as the [max volume].

With that finished, we can move on to the initial steps for actually encoding the video.

Step 4: First-Pass Encoding

For the actual encoding portion, we will be using a method called “multi-pass encoding.” Basically what this means is that we’re asking the program to take an initial look at the video of the file so it has a better idea of how best to encode it before we actually do so.

Assuming you’ve got your command line opened in your source’s directory from the previous step, you’ll want to go ahead and enter the following in your prompt:

ffmpeg -i “[input.filename]” –ss [start time] –t [duration] -c:v libvpx-vp9 -b:v [bitrate] –g 128 –tile-columns 6 –frame-parallel 1 –an –f webm –pass 1 –y /dev/null

Once again, that’s a lot to take in, so let’s break it down:

ffmpeg –i [input.filename]
We’ve seen this before, we’re just telling FFmpeg to look at the file [input.filename].

–ss [start time] –t [duration]
We’ve seen this before, too. Here we’re simply telling the program where we want to start and end the clip.

-c:v libvpx-vp9
Here we’re specifying what video codex to use for our encoding. WebM uses the vp9 format, so that’s what we’re telling FFmpeg to use.

-b:v [bitrate]
Here we’re telling the program exactly how many bits of disk space it’s allowed to use per second of rendered video, and directly effects how large the file we'll eventually create will be. [bitrate] of course is the number we calculated earlier.

-tile-columns 6 –frame-parallel 1
Both of these are settings come into play when decoding the output, rather than encoding it. This basically tells any decoder attempting to play the output that it’s allowed to use multiple processor cores, making it play faster. This results in a slight drop in quality, but is definitely worth it.

-g 128
This specifies the interval between what’s called “key frames” in our video, in this case every 128 frames. What this mainly does is allow us to seek in tighter intervals on the video than what is defaulted.

-an
This flag is telling the program we don’t want to encode audio right now. Since this is our first pass of encoding, we’re strictly worrying about rendering video right now.

-f webm
All this does is enforce that we’re eventually creating a webm, as you’ll see in a bit we’re not actually creating anything this pass, so we’re just letting the program know what we’ll ultimately be making later.

-pass 1
We’re just telling ffmpeg that this is our first pass, and we want it to collect data for a second pass.

-y /dev/null
We’re not actually making anything right now, since we only want the program to collect data, this specifies that we don’t want to write anything to an output.

Tip: if you get an error that say something along the lines of /dev/null not being found, go ahead and replace "/dev/null/" with the name of the actual file you want to output to eventually.

Additional Options

These aren’t required, but may be useful in special cases, if you want to use any of these, simply add them before the name of the output file outside of any of the options listed above.

-sn
If your video has external subtitles, this flag tells it to not use them for the encoding.

-threads [number]
This is how many cores of your CPU you want the program to be able to use while encoding. By default VP9 uses one core while encoding, so increasing this number will make it encode faster, sacrificing a little quality.
-vf "scale= -1:[height]"
If in step 2 you decided you wanted to re-scale your output, this is where you declare the new scale of your output. For the purpose of this guide, we will assume you don't want to change the aspect ratio (thus distorting the output). [height] will, of course, be the new height of the output's frame, and -1 tells FFmpeg to scale the width according to the aspect ratio of the source.

After you’ve entered your command and have hit enter, wait until you’re given a prompt to enter another command before moving on. Do note that this may take a long time depending on how long the video you’re creating is.

Step 5: Second-pass Encoding

Now that we’ve had FFmpeg take an initial look at our video, now it’s time to run a second pass over it and start encoding for real.

Once again, assuming you still have the command prompt open from the previous step, we’ll want to go ahead and enter the command to run the second pass.

ffmpeg -i [input.filename] -c:v libvpx-vp9 -b:v [bitrate] -c:a libvorbis –b:a 192k -af “volume=[max volume]:precision=double” –g 128 –tile-columns 6 –frame-parallel 1 –pass 2 –f webm –y [output.filename]

Tip: If you used any of the optional flags in the previous section, be sure to add them again to this command as well.

You may have noticed that this command is very similar to the previous one, with the exception of a few additions and removals. Let’s take a quick look at what these new pieces mean:

-c:a libvorbis
Now that we’ll be encoding the full video, we’ll start encoding audio as well. This piece simply tells FFmpeg to use the ogg vorbis codec to encode our audio.

Tip: If you're not satisfied with the results from using the ogg vorbis codec, consider trying out the opus codec, simply switch "libvorbis" to "libopus." This guide recommends ogg vorbis as a more common encoding for audio, but feel free to experiment!

-b:a 192k
This is the audio equivalent to the “-b:v” call we looked at earlier. This is telling the program exactly how much space we’re allowing it to use for the audio each second. The number we’re using, 192000, is a good value to use for high-quality audio encoding of ogg vorbis, but if it’s a burden to the bitrate of your video, it may be a good idea to lower it an amount equal to how much higher you’d like to raise the video bitrate.

-af “volume=[max volume]:precision=double”
This is where the audio equalization of step 3 comes in. This portion declares how high we want the volume to be on average in the video, a number we base on the data we collected in step 2. By the time the video is finished, this will ensure that all audio channels are at suitable levels so that the overall volume is consistent.

-y [output.filename]
Since we now want a finished product, we want to go ahead and tell FFmpeg the name of the file we want to output to. Be aware that FFmpeg WILL be sure to let you know if you're attempting to overwrite a file, so there's no worry of accidentally deleting other content. Make sure to give it a .webm extension!

Additionally, we have also modified the –pass call to equal –pass 2 for this iteration, signifying we want to use the data gathered from the previous pass in our encoding.

Once this new command is set, hit enter and let it work for the final time. Be prepared, as this encoding pass will almost certainly take longer than the previous one.

Once this is finished, you should have a finished WebM suitable to your needs, now ready to be published, hosted, shared, or what have you.