The Terrible Word “Premultiplied” Explained
Andrzej Wojdala
Most of my fellow students hated algebra. I liked it, in part because of our professor – an interesting character, different than any other mathematician I knew. No walking with the head in the clouds, very practical and down to earth – and an accomplished bridge player with numerous international titles. We called him “Julian And-The-Rest-Is-Obvious”, because whenever he presented a proof of an algebraic theorem, he would go only as far as he thought was necessary through the intellectual hurdles. Then he would take a step back, dust off his hands from the chalk, look at the blackboard and say: “and the rest is obvious, isn’t it?” – leaving us scratching our heads and trying to find the obviousness in all the rest…
One of the things Julian taught us was that some things look simple on the first glance, then start to look complex and confusing when you dive in, but when you really understand them they become simple again. So…
Everyone recognizes this formula?
mix = graphics * α + video * (1 – α)
Yup. The good old blending function we have all known and loved since late ’70s. In the real-time broadcast graphics world, it allows us to overlay the texts and graphics elements on top of the video. Simple, isn’t it? Well…
Have a look at this picture:
These dark edges around the text don’t look right. It should look like below, shouldn’t it?
So… where these dark edges came from?
Let’s now forget about our texts and orchids and do some math on a simple example. Imagine an edge of the uniformly light grey object (say, color=0.7 in the scale from 0 to 1). Let’s now assume that the edge is sloped. In order to look nice and smooth, the edges need to be antialiased. Since the raster graphics is composed of discrete pixels, antialiasing is realized by calculating the alpha for each edge pixel, based on the percentage of the pixel covered by the edge:
The pixel which is covered by an edge in 25% will have α = 0.25. Now assume that we overlay our pixel on the video, which has the color=0.4:
The formula is:
mix = graphics * α + video * (1 – α)
somix = 0.7 * 0.25 + 0.4 * (1-0.25) = 0.475
We should receive 0.475, but the color we really get is significantly darker. Why is that? To find out, we need to understand the process:
image 4
Most of you encountered the term “fill and key” which in the video world means the same as “graphics with alpha” or “color and alpha.” Fill and key signals are generated by the graphics server and are overlaid on the video by the linear key (usually in the video mixer), producing the composite signal, which I earlier called the “mix”.
The graphics and its alpha look like this:
Color
Alpha
See how nicely the edges are antialiased? That’s what we expected, right? Well, wrong!
The text is antialiased because it was overlaid on the graphics’ black background using the same blending function as for blending with the video. Let’s come back to our earlier example:
graphics = object_color * α + background * (1 – α)
Our graphics pixel will have the value:
graphics = 0.7 * 0.25 + 0.0 * (1-0.25) = 0.175
Now, when we overlay that pixel on our video, instead of expected 0.475 we get:
mix = 0.175 * 0.25 + 0.4 * (1-0.25) = 0.34375
Significantly darker color… You should see now where dark edges came from: the color of the text was multiplied by alpha twice, because it was in fact composed twice: first when drawing it over the background of the graphics and the second time when mixing it with the video. We say that graphics was premultiplied because – since the background was black – it was simply multiplied by alpha prior to mixing it with the video:
graphics = object_color * α + 0.0 * (1 – α) = object_color * α
One might ask what if the graphics background was not 0? Well, that would be really asking for trouble, just don’t do it!
Anyway: the result of this pre-multiplication is that the edges which have alpha smaller than 1.0 will become darker than they should. How to solve it? There are four solutions to this problem:
1. Compose just once: get the video into the graphics system and overlay the graphics directly over it. There are some disadvantages, such as double color-space conversion between YUV (video) and RGB (used in the graphics space), but that is the least of our problems. The most important is that in most cases our customers will simply not agree that the compositing takes place inside of the graphics. Graphics server is usually not supposed to be downstream…
2. So here is another idea: why don’t we just antialias alpha, but not the graphics? In other words, render the graphics in such a way that it is not pre-multiplied:
Color
Alpha
Unfortunately, this won’t work in general case. I’ll let you figure it out for yourselves… Hint: remember that the graphics can have more than one object.
3. What we can do is to un-pre-multiply the graphics. Terrible word, but it describes what needs to be done: before we apply the good old blend function, we need to “repair” the graphics by dividing it by alpha in order to recover the original object colors. This can be easily done by applying graphics fragment program (a.k.a. “shader”) to the rendered graphics. Those who are afraid that dividing by very low alpha values will result in calculation errors should not worry. In the end, what is important is really the error of:
(graphics_color / α) * α
4. But there is a more elegant solution. Back to the math… As a result of the pre-multiplication the true formula is:
mix = graphics * α + video * (1 – α)
i.e.:
mix = object_color * α * α + video * (1 – α)
while what we are really asking for is:
mix = object_color * α + video * (1 – α)
The solution is simple. Since:
graphics = object_color * α
our formula should be:
mix = graphics + video * (1 – α)
In other words, change the blending function of the linear key and simply don’t multiply incoming graphics by the alpha. After all, it was already pre-multiplied by it! And indeed, some linear keyers support such modified blending function. Simple again, isn’t it?
Avid’s graphics servers (called “HDVG”) support three out of four solutions presented above:
1 – through so-called video insertions, which can be mapped on the video background
3 – through the “shader” applied as a post-processing for the entire rendered image
4 – when internal linear keyer of HDVG is used. I know, I know. You are going to ask how does it differ from solution 1? In both cases we put graphics machine downstream… Indeed, but in this case HDVG does the mixing directly in the video i/o board, not in the GPU. And that i/o board has a bypass. Should anything happen to the graphics subsystem, the video will pass through unharmed.
By the way, there is a small caveat. Of all four solutions, only the first one really gives proper results in all cases. Even though we normally use solutions 3 or 4, in some cases they might produce wrong colors. I’ll let you figure it out for yourselves… Hint: the same as in solution 2. By the way – the trouble described above applies not just to antialiased edges. Exactly the same mechanism is responsible for darkening and discoloring of semi-transparent objects.
For dessert, a problem which looks unrelated, but in fact has the same root cause.
The word “calligraphy” is derived from Greek and means “beautiful writing”. In our broadcast graphics world by “calligraphic font” we mean a font in which letters touch each other and change shapes depending on their neighbors. Arabic are most known examples of such fonts. In theory, the glyphs (or their presentation forms, because shapes change depending on neighbors) should touch each other in order for the text to look continuous. In practice, letters overlap. Look what can happen then:
Do you see a subtly thicker joint? There is actually more than one in the text above… If we draw the text with transparency, it looks even worse:
Any idea why? After what we just have discussed all you need to know is that characters are separate objects. And… the rest is obvious, isn’t it?