Thursday, December 13, 2007

Introduction to Shaders

SHADERS AND THE EXISTING GRAPHICS PIPELINE

There are currently (as of DirectX 9) two flavors of shaders: vertex and pixel shaders. Vertex shaders operate on vertices, or in more precise terms, the output is assumed to be a vertex in homogenous clip space coordinates. A vertex shader (usually) produces one vertex as its output, (usually) from a set of vertices provided as input. A pixel shader produces the color of the pixel as its sole output, (usually) from color and/or texture coordinates provided as inputs.

I've added the (usually)'s since this is the way that the shaders are intended to be used when replacing the functionality of the fixed-function pipeline (FFP). The point is that when you're using either a vertex shader or a pixel shader (but not both), you're limited by the way that you can provide data to the other part. However, when you're using both a vertex and pixel shader, the vertex shader output is still sampled by whatever interpolating method is set as a render state (flat or Gouraud), and those values are provided as input for the pixel shader. There's nothing wrong with using this pipeline as a conduit for your own data. You could generate some intermediate values in the vertex shader (along with a valid vertex position), and those values will make their way to the pixel shader. The only thing the pixel shader has to do is specify the final color for the pixel. We'll go over this in depth when we talk about using shaders with the FFP.

the vertex and pixel shaders fit into the current hardware pipeline. It's been designed so that the current graphics pipeline can be replaced by the new functionality without changing the overall flow through the pipeline. Pixel and vertex shaders can be independently implemented; there is no requirement that they both be present. In fact, the first generation of hardware sometimes only implemented pixel shaders. Vertex shaders were available only as part of the software driver—a very highly optimized part, but still one that doesn't run on the hardware. This illustration also shows the addition of the higher order primitive section. This is going to become an ever-increasing part of the graphics pipeline in the near future since a higher order primitive means less data sent over the graphics memory bus. Why send over a few thousand perturbed vertices when you can just send over a few dozen perturbed control points and have the hardware generate the vertices for you?

VERTEX SHADERS: TECHNICAL OVERVIEW

Vertex shaders replace the TnL (texture and lighting) part of the Direct3D and OpenGL pipeline. This pipeline is still available and is now referred to (since DirectX8) as the fixed-function pipeline since it's not programmable as shaders are. Since vertex shaders are intended to replace the FFP TnL part of the pipeline, vertex shaders must produce essentially the same output as the FFP from the same input. Now among other things, the pipeline depends upon the current rendering state— thus it's possible to perform calculations on things like fog or textures. But if fog or texturing isn't turned on, the values that are generated will never be used. It is important to note that a vertex shader's only job is to take a set of input vertices and generate an output set of vertices. There is a one-to-one correspondence between an input vertex and an output vertex. A vertex shader cannot change the number of vertices since it's called once for each vertex in a primitive.

The input to a vertex shader is one or more vertex streams plus any other state information (like current textures) , The output is a new vertex position in clip coordinates (discussed in the next section), and any other information that the shader provides, like vertex color, texture coordinates, fog values, and the like.

PIXEL SHADERS: TECHNICAL OVERVIEW

Whereas vertex shaders replace the FFP in the traditional rendering pipeline, pixel shaders replace the pixel-blending section of the multitexture section of the pipeline. To understand how pixel shaders operate, you should be familiar with the dualistic nature of the texture pipeline. Traditionally, two paths are running simultaneously in the hardware—the color pipe (also called the vector pipe) and the alpha pipe (also called the scalar pipe). These two pathways handle the color and alpha operations of the texture processing unit. Frequently, you will have set up a mode so that the color operations are performed with one set of parameters, whereas the alpha operations are performed with a different set. At the end, the results are combined into a resulting rgba value.

Although the traditional texture pipeline allows you to specify a cascade of operations on sequential textures, pixel shaders allow you to specify these with more generality. However, the dual nature of the pipeline is still there. In fact, you'll probably be spending some time fine-tuning your pixel shaders so that you can get color and alpha operations to "pair," that is, to run simultaneously in the hardware.

Another dualistic nature of pixel shaders is they have two separate types of operations: arithmetic and texturing. Arithmetic operations are used to generate or modify color or texture coordinate values. The texture operations are operations to fetch texture coordinates or sample a texture using some texture coordinates. No matter the type of operations (coloring, texturing, or a blend), the output of the pixel shader is a single rgba value.

VERTEX SHADERS, PIXEL SHADERS, AND THE FIXED FUNCTION PIPELINE

So you might be wondering, how does all this stuff fit together? Well there are two cases to consider. If you're using a shader in conjunction with the FFP, then you'll have to consider what the FFP operations are. The FFP vertex operations are going to provide your pixel shader with two color registers: one diffuse, one specular. The FFP pixel shader is going to expect two color registers as input: one diffuse, one specular. (Note that using specular is a render state.) It gets interesting when you write your own vertex and pixel shader and ignore the FFP altogether.

Your vertex shader needs to provide a valid vertex position, so you'll need to perform the transformation step and provide a valid vertex position in the oPos register. Other than that, you've got the fog, point size, texture coordinates, and of course, the two color registers. (Note that the FFP pixel operations expect only one set of texture coordinates and the two color values.) Since your pixel shader operates on these values, you are free to stick any value into them you want (within the limits of the pixel shader precision). It's simply a way of passing data directly to the pixel shader from the vertex shader. However, you should be aware that texture coordinates will always be perspective correct interpolated from the vertex positions. The fog value will always be interpolated as well. The two color values will be interpolated only if the shading mode is Gouraud. The color interpolation in this case will also be perspective correct. Setting the shading mode to flat and placing data in the color registers is the preferred method of getting values unchanged to the pixel shaders.

Specular color is added by the pixel shader. There is no rendering state for specular when using pixel shaders. Fog, however, is still part of the FFP, and the fog blend is performed after the pixel shader executes.

VERTEX SHADERS

To quote the Microsoft DirectX 8.0 documentation,

Vertex processing performed by vertex shaders encompasses only operations applied to single vertices. The output of the vertex processing step is defined as individual vertices, each of which consists of a clip-space position (x, y, z, and w) plus color, texture coordinate, fog intensity, and point size information. The projection and mapping of these positions to the viewport, the assembling of multiple vertices into primitives, and the clipping of primitives is done by a subsequent processing stage and is not under the control of the vertex shader.

What does this mean? Well, it means that whatever your input from the vertex streams (because you will have specified these values), the output from a vertex shader for a single pass will be

  • One single vertex in clip-space coordinates

  • Optional color values (specular and diffuse)

  • Optional texture coordinates

  • Optional fog intensity

  • Optional point sizing

So, at the very least your minimal vertex shader needs to take the object's vertex positions and transform them into clip-space coordinates. The optional parts are determined by the rendering state. Since the object has to be rendered with some properties, you'll have texture coordinates and/or color specifications as output. But the constant and absolute requirement for every vertex shader is that you provide vertex positions in clip-space coordinates. Let's start with that.

Our first vertex shader will do just that, transform the untransformed input vertex to clip space. There are two assumptions in the following code. The first is that the input vertex position shows up in shader register vO. The actual register number depends on the vertex shader input declaration, which tells the shader the format of the input stream. The second is that we've stored the world-view-projection matrix in the vertex shader constants. Given those assumptions, we can write the following shader:

// v0     -- position
// c0-3 -- world/view/proj matrix

// the minimal vertex shader
//transform to clip space
dp4 oPos.x, v0, c0
dp4 oPos.y, v0, c1
dp4 oPos.z, v0, c2
dp4 oPos.w, v0, c3

This shader uses four of the dot product shader instructions to perform a matrix multiply using the rows of the world-view-projection matrix sequentially to compute the transformed x, y, z, and w values. Each line computes a value and stores the scalar result into the elements of the output position register. Note that this is usually how the first few lines of your shader will look. It's possible that you might not need to perform the matrix multiply (e.g., if you transform the vertex before the shader is run). In any case, the minimal valid vertex shader must set all four elements of the oPos register.

There are some tricky issues with performing transformations, so let's review what this section of the shader has to do and what pitfalls there are. Along the way, we'll discuss some DirectX C++ code to perform the setup the shader requires.

Transformations and Projections

Typically, you'll start out with your object in what are called "object" or "model" coordinates—that is, these are the vertex positions that the model was originally read in with. Most of the projects I've worked on have created them with the object centered about the origin. Static objects might get an additional step where their vertices are transformed to some location in world space and then never touched again—creating a group of trees from a single set of original tree vertex coordinates, for example. Each new tree would take the values for the original tree's position and then use a unique transformation matrix to move the tree to its unique position.

So, for every vertex in our tree example, we need to transform it from its local model space into the global world space coordinate system. That's done pretty simply with a matrix multiplication. For every point,

So, this set of vertices in world coordinates is what we assume we are starting with. What we need to do is get from world coordinates to clip coordinates.

The Trip from World to Clip Coordinates

The trip from world space to clip space is conceptually done in three separate steps. The first step is to actually get the model into the global, world coordinate system. This is done by multiplying the object's vertices by the world transformation matrix. This is the global coordinate system that makes all objects share the same coordinate system. If you are used to OpenGL, this is called the model transformation.

The second step is to get the object's vertices into the view coordinate system. This is done by multiplying by the view transformation. The result of this step is to place the object's vertices in the same space as the viewer, with the viewpoint at the origin and the gaze direction down the z axis. (DirectX uses the positive z axis, whereas OpenGL uses the negative.) Once the vertices have been transformed, they are said to be in eye space or camera space.

It should be noted that typically an optimization step is to concatenate the world and view matrices into a single matrix since OpenGL doesn't have a separate world matrix and instead makes you premultiply the viewing parameters into its modelview matrix. The same effect can be had in DirectX by leaving the world matrix as the identity and using just the view matrix as the equivalent of OpenGL's modelview. Remember, the object is not only to get the results we want but also to do it in as few steps as possible.

Now you might remember that there was a zNear and a zFar value and a field-of-view parameter that are used in the setup of the viewing parameters. Well, here's where they get used. Those values determined the view frustum—the truncated pyramid (for perspective projection) in which only those things that are inside get rendered. What actually gets calculated from those values is the projection matrix. This matrix takes those values and transforms them into a unit cube. An object's coordinates are said to be in NDC (normalized device coordinates) or, more practically, clip space. For a perspective projection, this has the effect of making objects farther away from the viewpoint (i.e., the origin in view coordinates) look smaller. This is the effect you want, that objects farther away get smaller. The part of this transformation that produces more problems is not that this is a linear transformation in the z direction, but that (depending upon how wide the field of view is) the actual resolution of objects in the z direction gets less the closer you get to the zFar value. In other words, most of the resolution of the depth value (the z value) of your objects in clip space is concentrated in the first half of the viewing frustum. Practically, this means that if you set your zFar/zNear ratio too high, you'll lose resolution in the back part of the viewing volume and the rendering engine will start rendering pixels for different objects that overlap, sometimes switching on a frame/frame basis, which can lead to sparkling or z-fighting

The output of the view transformation is that everything now sits in relation to a unit cube centered about the origin. The cube in DirectX has one corner located at (1, 1, 1) and the other at (1,1,1). Everything inside this cube will get rendered; everything outside the cube will get clipped. The nice thing about not writing your own rendering engine is what to do about those objects that cross the boundary. The rendering engine has to actually create new vertices where the object crosses the boundary and render only up to those locations. (These vertices are created from the interpolated values provided by the FFP or the vertex shader—that is, the vertex shader isn't run for these intermediate vertices.) This means that it has to also correctly interpolate vertex colors, normals, texture coordinates, etc. A job best left up to the rendering engine.

To summarize: We have three different matrix transformations to get from world coordinate space to clip space. Since, for a single object, you usually don't change the world, view, or projection matrices, we can concatenate these and get a single matrix that will take us from model space directly to clip space.

We recalculate this matrix every time one of these original matrices changes—generally, every frame for most applications where the viewpoint can move around—and pass this to the vertex shader in some of the constant vertex shader registers.

Now let's look at some actual code to generate this matrix. In the generic case, you will have a world, view, and projection matrix, though if you're used to OpenGL, you will have a concatenated world-view matrix (called the modelview matrix in OpenGL). Before you load the concatenated world-view-projection matrix (or WVP matrix), you'll have to take the transpose of the matrix. This step is necessary because to transform a vertex inside a shader, the easiest way is to use the dot product instruction to do the multiplication. In order to get the correct order for the transformation multiplication, each vertex has to be multiplied by a column of the transformation matrix. Since the dot product operates on a single register vector, we need to transpose the matrix to swap the rows and columns in order to get the correct ordering for the dot product multiplication.

We do this by creating a temporary matrix that contains the WVP matrix, taking its transpose, and then passing that to the SetVertexShaderConstant() function for DirectX 8, or the SetVertexShaderConstantF() function for DirectX 9.

// DirectX 8 !
D3DXMATRIX trans;

// create a temporary matrix holding WVP. Then
// transpose and store it
D3DXMatrixTranspose( &trans ,
&(m_matWorld * m_matView * m_matProj) );
// Take the address of the matrix (which is 4
// rows of 4 floats in memory. Place it starting at
// register r0 for a total of 4 registers.
m_pd3dDevice->SetVertexShaderConstant(
0, // what register # to start at
&trans, // address of the value(s)
4 ); // # of 4-element values to load

Once that is done, we're almost ready to run our first vertex shader. There are still two items we have to set up—the vertex input to the shader and the output color. Remember that there are usually two things that the vertex shader has to output—transformed vertex positions and some kind of output for the vertex—be it a color, a texture coordinate, or some combination of things. The simplest is just setting the vertex to a flat color, and we can do that by passing in a color in a constant register, which is what the next lines of code do.

// DirectX 8!
// set up a color
float teal [4] = [0.0f,1.0f,0.7f,0.0f]; //rgba ordering

// specify register r12
m_pd3dDevice->SetVertexShaderConstant(
12, // which constant register to set
teal, // the array of values
1 ); // # of 4-element values

Finally, you need to specify where the input vertex stream will appear. This is done using the SetStreamSource() function, where you select which vertex register(s) the stream of vertices shows up in. There's a lot more to setting up a stream, but the part we're currently interested in is just knowing where the raw vertex (and later normal and texture coordinate) information will show up in our shader. For the following examples, we'll assume that we've set up vertex register 0 to be associated with the vertex stream. Most of the vertex shader code you'll see will have the expected constant declarations as comments at the top of the shader.

So with the vertex input in vO, the WVP matrix in cO through c3, and the output color in c12, our first self-contained vertex shader looks like this.

// v0      -- position
// c0-3 -- world/view/proj matrix
// c12 -- the color value

// a minimal vertex shader
// transform to clip space
dp4 oPos.x, v0, c0
dp4 oPos.y, v0, c1
dp4 oPos.z, v0, c2
dp4 oPos.w, v0, c3

// write out color
mov oD0, c12

Transforming Normal Vectors

In order to perform lighting calculations, you need the normal of the vertex or the surface. When the vertex is transformed, it's an obvious thing to understand that the normals (which I always visualize as these little vectors sticking out of the point) need to be transformed as well; after all, if the vertex rotates, the normal must rotate as well! And generally you'll see applications and textbooks using the same transformation matrix on normals as well as vertices, and in most cases, this is ok. However, this is true only if the matrix is orthogonal—that is, made up of translations and rotations, but no scaling transformations. Let's take a shape and transform it and see what happens so that we can get an idea of what's happening.

If we apply a general transformation matrix to this shape and the normals as well.

Although the shape may be what we desired, you can clearly see that the normals no longer represent what they are supposed to—they are no longer perpendicular to the surface and are no longer of unit length. You could recalculate the normals, but since we just applied a transformation matrix to our vertices, it seems reasonable that we should be able to perform a similar operation to our normals that correctly orients them with the surface while preserving their unit length.

If you're interested in the math, you can look it up [TURKOWSKI 1990]. But basically, it comes down to the following observations. When you transform an object, you'll be using one of these types of transformations.

  1. Orthogonal transformation (rotations and translations): This tends to be the most general case since most objects aren't scaled. In this case, the normals can be transformed by the same matrix as used for vertices. Without any scaling in it, the transpose of a matrix is the same as its inverse, and the transpose is easier to calculate, so in this case, you'd generally use the transpose as a faster-to-calculate replacement for the inverse.

  2. Isotropic transformation (scaling): In this case, the normals need to be transformed by the inverse scaling factor. If you scale your objects only at load time, then an optimization would be to scale the normals after the initial scaling.

  3. Affine transformation (any other you'll probably create): In this case, you'll need to transform the normals by the transpose of the inverse of the transformation matrix. You'll need to calculate the inverse matrix for your vertices, so this is just an additional step of taking the transpose of this matrix.

In fact, you can get away with computing just the transpose of the adjoint of the upper 3 × 3 matrix of the transformation matrix [RTR 2002].

So, in summary,

  • If the world/model transformations consist of only rotations and translations, then you can use the same matrix to transform the normals.

  • If there are uniform scalings in the world/model matrix, then the normals will have to be renormalized after the transformation.

  • If there are nonuniform scalings, then the normals will have to be transformed by the transpose of the inverse of the matrix used to transform the geometry.

If you know that your WVP matrix is orthogonal, then you can use that matrix on the normal, and you don't have to renormalize the normal.

// a vertex shader for orthogonal transformation matrices
// v0 -- position
// v3 -- normal
// c0-3 -- world/view/proj matrix

// transform vertex to clip space
dp4 oPos.x, v0, c0
dp4 oPos.y, v0, c1
dp4 oPos.z, v0, c2
dp4 oPos.w, v0, c3

// transform normal using same matrix
dp3 r0.x, v3, c0
dp3 r0.y, v3, c1
dp3 r0.z, v3, c2

On the other hand, if you have any other kind of matrix, you'll have to provide the inverse transpose of the world matrix in a set of constant registers in addition to the WVP matrix. After you transform the normal vector, you'll have to renormalize it.

// a vertex shader for non-orthogonal
// transformation matrices
// v0 -- position
// v3 -- normal
// c0-3 -- world/view/proj matrix
// c5-8 -- inverse/transpose world matrix

// transform vertex to clip space
dp4 oPos.x, v0, c0
dp4 oPos.y, v0, c1
dp4 oPos.z, v0, c2
dp4 oPos.w, v0, c3

// transform normal
dp3 r0.x, v3, c5
dp3 r0.y, v3, c6
dp3 r0.z, v3, c7

// renormalize normal
dp3 r0.w, r0, r0
rsq r0.w, r0.w
mul r0, r0, r0.w

There are a series of macroinstructions (such as m4×4) that will expand into a series of dot product calls. These macros are there to make it easy for you to perform the matrix transformation into clip space. Do not make the mistake of using the same register for source and destination. If you do, the macro will happily expand into a series of dot product calls and modify the source register element by element for each dot product rather than preserving the original register.

Vertex Shader Registers and Variables

Shader registers are constructed as a vector of four IEEE 32-bit floating point numbers .
While hardware manufacturers are free to implement their hardware as they see fit, there are some minimums that they have to meet. Since vertex shaders are going to be passed back into the pipeline, you can expect that the precision will match that of the input registers, namely, closely matching that of IEEE 32-bit float specification with the exceptions that some of the math error propagation rules (NAN, INF, etc.) are simplified. On those output registers that are clamped to a specific range, the clamping does not occur till the shader is finished. Note that you'll get very familiar behavior from vertex shader math, which can lull you into a sense of security when you start dealing with the more limited math precision of pixel shaders, so be careful!

PIXEL SHADERS

A pixel shader takes color, texture coordinate(s), and selected texture(s) as input and produces a single color rgba value as its output. You can ignore any texture states that are set. You can create your own texture coordinates out of thin air. You can even ignore any of the inputs provided and set the pixel color directly if you like. In other words, you have near total control over the final pixel color that shows up. The one render state that will change your pixel color is fog. The fog blend is performed after the pixel shader has run.

Inside a pixel shader, you can look up texture coordinates, modify them, blend them, etc. A pixel shader has two color registers as input, some constant registers, and texture coordinates and textures set prior to the execution of the shader through the render states (Figure 4.8).

Click To expand
Figure 4.8: Pixel shaders take color inputs and texture coordinates to generate a single output color value.

Using pixel shaders, you are free to interpret the data however you like. Since you are pretty much limited to sampling textures and blending them with colors, the size of pixel shaders is generally smaller than vertex shaders. The variety of commands, however, is pretty great since there are commands that are subtle variations of each other.

In addition to the version and constant declaration instructions, which are similar to the vertex shader instructions, pixel shader instructions have texture addressing instructions and arithmetic instructions.

Arithmetic instructions include the common mathematical instructions that you'd expect. These instructions are used to perform operations on the color or texture address information.

The texture instructions operate on a texture or texture coordinates that have been bound to a texture stage. There are instructions that can sample a texture or you can assign a texture to a stage using the SetTexture() function. This will assign a texture to one of the texture stages that the device currently supports. You control how the texture is sampled through a call to SetTextureStageState(). The simplest pixel shader we can write that samples the texture assigned to stage 0 would look like this.

// a pixel shader to use the texture of stage 0
ps.1.0

// sample the texture bound to stage 0
// using the texture coordinates from stage 0
// and place the resulting color in t0
tex t0

// now copy the color to the output register
mov r0, t0

There are a large variety of texture addressing and sampling operations that give you a wide variety of options for sampling, blending, and other operations on multiple textures.

Conversely, if you didn't want to sample a texture but were just interested in coloring the pixel using the iterated colors from the vertex shader output, you could ignore any active textures and just use the color input registers. Assuming that we were using either the FFP or our vertex shader to set both the diffuse and specular colors, a pixel shader to add the diffuse and specular colors would look like this.

// a pixel shader to just blend diffuse and specular ps.1.0

// since the add instruction can only access
// one color register at a time, we need to
// move one value into a temporary register and
// perform the add with that temp register
mov r0, v1
add r0, v0, r0

As you can see, pixel shaders are straightforward to use, though understanding the intricacies of the individual instructions is sometimes a challenge.

Unfortunately, since pixel shaders are so representative of the hardware, there's a good deal of unique behavior between shader versions. For example, almost all the texture operations that were available in pixel shader 1.0 through 1.3 were replaced with fewer but more generic texture operations available in pixel shader 1.4 and 2.0. Unlike vertex shaders (for which there was a good implementation in the software driver), there was no implementation of pixel shaders in software. Thus since pixel shaders essentially expose the hardware API to the shader writer, the features of the language are directly represented by the features of the hardware. This is getting better with pixel shaders 2.0, which are starting to show more uniformity about instructions.

DirectX 8 Pixel Shader Math Precision

In pixel shader versions 2.0 or better (i.e., DirectX 9 compliant), the change was made to make the registers full precision registers. However, in DirectX 8, that minimum wasn't in place. Registers in pixel shaders before version 2.0 are not full 32-bit floating point values (Figure 4.9). In fact, they are severely limited in their range. The minimum precision is 8 bits, which usually translates to an implementation of a fixed point number with a sign bit and 7-8 bits for the fraction. Since the complexity of pixel shaders will only grow over time, and hence the ability to do lengthy operations, you can expect that you'll rarely run into precision problems unless you're trying to do something like perform multiple lookup operations into large texture spaces or performing many rendering passes. Only on older cards (those manufactured in or before 2001) or inexpensive cards will you find the 8-bit minimum. As manufacturers figure out how to reduce the size of the silicon and increase the complexity, they'll be able to squeeze more precision into the pixel shaders. DirectX 9 compliant cards should have 16- or 32-bits of precision.

As the number of bits increases in the pixel shader registers, so will the overall range. You'll need to examine the D3DCAPS8.MaxPixelShaderValue Or D3DCAPS9.PixelShader1xMaxValue capability value in order to see the range that pixel registers are clamped to. In DirectX 6 and DirectX 7, this value was 0, indicating an absolute range of [0,1]. In later versions of DirectX, this value represented an absolute range, thus in DirectX 8 or 9, you might see a value of 1, which would indicate a range of [1,1], or 8, which would indicate a range of [8,8]. Note that this value typically depends not only on the hardware, but sometimes on the driver version as well!

No, No, It's Not a Texture, It's just Data

One of the largest problems people have with using texture operations is getting over the fact that just because something is using a texture operation, it doesn't have to be texture data. In the early days of 3D graphics, you could compute lighting effects using hardware acceleration only at vertices. Thus if you had a wall consisting of one large quadrangle and you wanted to illuminate it, you had to make sure that the light fell on at least one vertex in order to get some lighting effect. If the light was near the center, it made no difference since the light was calculated only at the vertices, and then linearly interpolated from there—thus a light at the center of a surface was only as bright as at the vertices. The brute force method of correcting this (which is what some tools like RenderMan do) is to tessellate a surface till the individual triangles are smaller than a pixel will be, in effect turning a program into a pixel accurate renderer at the expense of generating a huge number of triangles.

It turns out that there already is a hardware accelerated method of manipulating pixels—it's the texture rendering section of the API. For years, people have been doing things like using texture to create pseudolighting effects and even to simulate perturbations in the surface lighting due to a bumpy surface by using texture maps. It took a fair amount of effort to get multiple texture supported in graphics hardware, and when it finally arrived at a fairly consistent level in consumer-level graphics cards, multitexturing effects took off. Not content with waiting for the API folks to get their act together, the graphics programmers and researchers thought of different ways to use layering on multiple texture to get the effects they wanted. It's this tradition that pixel shaders are built upon. In order to get really creative with pixel shaders, you have to forget the idea that a texture is an image. A texture is just a collection of 1D, 2D, or 3D matrix of data. We can use it to hold an image, a bump map, or a lookup table of any sort of data. We just need to associate a vertex with some coordinate data and out in the pixel shader will pop those coordinates, ready for us to pluck out our matrix of data.

PHYSICALLY BASED ILLUMINATION

In order to get a more realistic representation of lighting, we need to move away from the simplistic models that are found hard coded in most graphics pipelines and move to something that is based more in a physical representation of light as a wave with properties of its own that can interact with its environment. To do this, we'll need to understand how light passes through a medium and how hitting the boundary layer at the intersection of two media can affect light's properties. there's an incident light hitting a surface. At the boundary of the two media (in this case, air and glass), there are two resulting rays of light. The reflected ray is the one that we've already discussed to some extent, and the other ray is the refracted or transmitted ray.

In addition to examining the interaction of light with the surface boundary, we need a better description of real surface geometries. Until now, we've been treating our surfaces as perfectly smooth and uniform. Unfortunately, this prevents us from getting some interesting effects. We'll go over trying to model a real surface later, but first let's look at the physics of light interacting at a material boundary.

Reflection

Reflection of a light wave is the change in direction of the light ray when it bounces off the boundary between two media. The reflected light wave turns out to be a simple case since light is reflected at the same angle as the incident wave (when the surface is smooth and uniform, as we'll assume for now). Thus for a light wave reflecting off a perfectly smooth surface

Until now, we've treated all of our specular lighting calculations as essentially reflection off a perfect surface, a surface that doesn't interact with the light in any manner other than reflecting light in proportion to the color of the surface itself. Using a lighting model based upon the Blinn—Phong model means that we'll always get a uniform specular highlight based upon the color of the reflecting light and material, which means that all reflections based on this model will be reminiscent of plastic. In order to get a more interesting and realistic lighting model, we need to add in some nonlinear elements to our calculations. First, let's examine what occurs when light is reflected off a surface. For a perfect reflecting surface, the angle of the incoming light (the angle of incidence) is equal to that of the reflected light. Phong's equation just blurs out the highlight a bit in a symmetrical fashion. Until we start dealing with nonuniform smooth surfaces in a manner a bit more realistic than Phong's in the section on surface geometry, this will have to do.

Refraction

Refraction happens when a light wave goes from one medium into another. Because of the difference in the speed of light of the media, light bends when it crosses the boundary. Snell's law gives the change in angles.

where the n's are the material's index of refraction. Snell's law states that when light refracts through a surface, the refracted angle is shifted by a function of the ratio of the two material's indices of refraction. The index of refraction of vacuum is 1, and all other material's indices of refraction are greater than 1.

What this means is that in order to realistically model refraction, we need to know the indices of refraction of the two materials that the light is traveling through. Let's look at an example Let's take a simple case of a ray of light traveling through the air (nair 1) and intersecting a glass surface (nglass 1.5). If the light ray hits the glass surface at 45°, at what angle does the refracted ray leave the interface?

The angle of incidence is the angle between the incoming vector and the surface. Rearranging Snell's law, we can solve for the refracted angle.

which is a fairly significant change in the angle! If we change things around so that we are following a light ray emerging from water into the air, we can run into another phenomenon. Since the index of refraction is just a measure of the change in speed that light travels in a material, we can observe from Snell's law (and the fact that the index of refraction in a vacuum is 1) that light bends toward the normal when it slows down (i.e., when the material it's intersecting with has a higher index of refraction). Consequently, when we intersect a medium that has a lower index of refraction (e.g., going from glass to air), then the angle will increase. Ah, you must be thinking, we're approaching a singularity here since we can then easily generate numbers that we can't take the inverse sine of! If we use Snell's law for light going from water to air, and plug in 90° for the refracted angle, we get 41.8° for the incident angle. This is called the critical angle at which we observe the phenomenon of total internal reflection. At any angle greater than this, light will not pass though a boundary but will be reflected internally.

One place that you get interesting visual properties is in the diamond—air interface. The refractive index of a diamond is fairly high, 2.24, which means that it's got a very low critical angle, just 24.4°. This means that a good portion of the light entering a diamond will bounce around the inside of the diamond hitting a number of air—diamond boundaries, and as long as the angle is 24.4° or greater, it will keep reflecting internally. This is why diamonds are cut to be relatively flatish on the top but with many faceted sides, so that light entering in one spot will bounce around and exit at another, giving rise to the sparkle normally associated with diamonds.

Another place where a small change in the indices of refraction occurs is on a road heated by the sun when viewed from far away (hence a glancing incident angle). The hot air at the road's surface has a slightly smaller index of refraction than the denser, cooler air above it. This is why you get the effect of a road looking as though it were covered with water and reflecting the image above it—the light waves are actually reflected off the warm air—cold air interface.

What makes this really challenging to model is that the index of refraction for most materials is a function of the wavelength of the light. This means that not only is there a shift in the angle of refraction, but that the shift is different for differing wavelengths of light. You can see the general trend that shorter wavelength light (bluish) tends to bend more than the longer (reddish) wavelengths.

This is the phenomenon that's responsible for the spectrum that can be seen when white light is passed through a prism. It's refraction that will break apart a light source into its component colors, not reflection.

This is one area where our simplistic model of light breaks down since we're not computing an entire spectrum of light waves, but we're limited to three primary colors. For reference, the rgb values can be assigned to a range of wavelengths as follows:

There's a lot more to color science than just determining wavelengths, but that's beyond the scope of this book.

While the spectrum spreading effect of refraction is interesting in itself, the rgb nature of computer color representation precludes performing this spreading directly—we can't break up a color value into multiple color values. However, with some work, you can compute the shade of the color for a particular angle of refraction and then use that as the material color to influence the refracted color.

Temperature Correction for Refractive Index

Refractive index is a function of temperature, mostly due to density changes in materials with changes in temperature. A simple correction can be applied in most circumstances to allow you to use a value given at one temperature at another. For example, suppose the index of refraction value you have is given at 25°C: η25. To convert the index to another temperature, ηt, you can use the following equation:

where the actual temperature you want is t, and the 25 is the temperature (both in °C) of the actual index you have, η25.

The Fresnel Equations

The Fresnel (pronounced Freh-nel) equations are used to calculate the percentage of energy in the refracted and the reflected parts of the wave (ignoring adsorption). In order to understand what the equations calculate, we'll have to take a look at what happened when a light wave (as opposed to a photon) interacts with a surface. We have to do it this way since this is the only way to describe the subtle (and realistic) visual effects we are looking for. A wave of energy has both an electric field and a magnetic field that travel in perpendicular phase, as shown in.

In general, when a wave reaches a boundary between two different dielectric constants, part of the wave is reflected and part is transmitted, with the sum of the energies in these two waves equal to that of the original wave.

The Fresnel equations are a solution to Maxwell's equations for electromagnetic waves at the interface. If you are really interested in seeing how this works see [HECHT 1987], but I'll spare you the actual derivation and get to the important part. What Fresnel did (besides proving once and for all that light can behave like a wave) is figure out that for the two extrema of the light wave—the light wave with the electric field parallel to the surface and the light wave with the electric field perpendicular to the surface, the energy transmitted and reflected—are functions of the angle of incidence and the indices of refraction for the two media. This is for a nonconductive (dielectric) medium like plastic or glass.

For a conductive media, there's actually some interaction between the free electrons in the conductor (or else it wouldn't be a conductor) and the magnetic field of the light wave. Being a conductor, there are free electrons in the material. When the light wave interacts with the material, the electrons in the material oscillate with the magnetic field of the light wave, matching its frequency. These oscillations radiate (and, in effect, reflect) the light wave. In addition, a conductor has some resistance to electron movement, so the material absorbs some of the energy that would have been reradiated. Fresnel equations for conductive interfaces (since they absorb light they are also generally opaque) usually involve a dielectric (like air) and a conductor since you don't have a light interface between two opaque materials. The parallel and perpendicular components are sometimes referred to as the p-polarized and s-polarized components, respectively.

Fresnel Equations for Dielectrics

The simplest form of the Fresnel equations are for dielectrics.

where r and t are broken into parallel and perpendicular segments. nt and ni are the indices of refraction for the transmitting (reflecting) and incident materials respectively.

Now we can simplify these equations by assuming normalized vectors and multiplying out the dot products, noting that n • 1 = cos(φi) and n • t = cos(φi), and using Snell's law to get rid of the indices of refraction. φi is the angle of incidence and φt the reflected angle. These simplified equations are

Using these equations, we can calculate the percent of energy transmitted or reflected. Since these equations represent the maxima and minima of the interaction between the media depending upon the orientation (polarization) of the light wave's fields, we'll just take the average to compute the amount transmitted and reflected. Thus

gives us the fraction of unpolarized light transmitted and reflected. Note that this average is for unpolarized light. If your light source was polarized, you could pick one equation instead of the average. Also note that due to the conservation of energy, we could also write


Let's take a look at what this means in a practical application. Let's plot the values for the Fresnel equation for the air-glass interface , The first thing you should notice is that even when the incident angle is at 0°—that is, the light is shining along the surface normal—there is still some loss in transmittance.

The reflection and transmission curves for the parallel and perpendicular waves in the air-glass interface.If we do the math, we will discover that there is no distinction between the parallel and perpendicular components when the angle is 0°, and in this case, the Fresnel equations simplify to the following:
Thus at 0° for the air-glass interface, we can see that about 4% of the light is reflected. This means that if you have a glass window, you will get only 92% of the light transmitted through (you have the air-glass interface, 4% reflected, and the glass-air interface, another 4%, when the ray comes out of the glass). Glass used in optics is usually coated with a thin film of antireflective coating, which reduces the reflectivity to something around 1%. You might also note that at an angle of about 56°, the parallel reflectance drops to zero. This is called Brewster's angle or the polarization angle (φp) and is the effect on which polarized lenses work. You can use Snell's law and the observation that for polarized light φt = 90° φp to derive the following equation for Brewster's angle:
Let's plot the average reflected and transmitted energy and take a look at the plot.
We can see that as the incident angle approaches 90°, the reflectivity approaches 100%. This is one of the more important aspects of the Fresnel equations—at a glancing angle, all surfaces become perfect reflectors, regardless of what the surface is made. In fact, this is one way in which x-rays are focused. The only fly in the ointment is that this is true only for perfectly flat surfaces, but this will be covered in the next section.To be thorough, we should also take a look at the other side of the interface, when looking though a medium of higher refractive index to that of a lower one. In this case, we will reach an incident angle where we reach total internal reflection, and no light will be transmitted through the interface. If we reverse the indices of refraction and take a look at the glass-air interface.
At the critical angle of 41.8°, we get total internal reflection, and no light is transmitted through the interface. Thus if you wanted to model something like a scene underwater, you'd need to treat the water surface as a mirror surface at any angles over the critical angle.

The Fresnel Term in Practice

For real materials, the Fresnel term depends upon the incoming angle of light and the reflected angle. The reflected angle is a function of the indices of refraction of both materials, which in turn, are dependant upon light wavelength and density of the materials. The density, in turn, typically is a function of temperature. The index of refraction usually increases with the density of the medium, and usually decreases with increasing temperature. This is all very fine if you happen to have data for the index of refraction for the wavelengths of interest over the temperatures you'll need. Since this kind of data is difficult to find, we'll do what innumerable computer graphics researchers have done before us—we'll fake it.

To get a reasonable estimate on the Fresnel term, Cook and Torrance [COOK 1982] note that the values of reflectance at normal angles of incidence are more commonly found. You can get the Fresnel value at this angle (called F0) and then calculate the angular dependence by back-substituting for the index of refraction using F0 and then plugging this value for the index of refraction back into the original Fresnel equation. This gives you the Fresnel value as a function of the angle of incidence. In order to perform this calculation, we need to reformat the Fresnel equation for reflectance into a form derived by Blinn [BLINN 1977]. This is easier if we do it in steps. The parallel reflectance part of Fresnel's equation is

If we get rid of the tangent using tan2(φ) + 1 = sec2(φ) = 1/cos2(φ), then we can rework the equation into terms that involve only cosines. The new equation is
And then we can use the cos(φ ± θ) = cos(φ) cos(θ) ± (1) sin(φ) sin(θ) identity to break it down into terms involving θi and φt separately, and use Snell's law as ηλsin(φr) = sin(φi), where ηλ = ηr/ηi.
Blinn introduced the term g in his paper to simplify the equations a bit, and we'll do the same here.
First multiplying through by ηλ/ηλ and then replacing with g yields
Then we can use the sin2(φ) + cos2(φ) = 1 identity to remove the sine term and replace it with a cosine term, which will allow a further replacement with the g term to further simplify the equation to
Finally, we use the term g2 = η2λ + cos2(φi) 1 and factor out common terms to get
Whew! What we've now got is an expression for a Fresnel term that is only a function of ηλ and φi. Let's do the same with the perpendicular reflection term.
Using the sin(φ ± θ) = sin(φ) cos(θ) ± cos(φ) sin(θ) identity breaks out the equation into terms involving φi and φt separately.
Then use Snell's law to get rid of the sine terms.
and finally replace the ηλ cos(φr) with g to get
Finally, add the two pieces of the Fresnel reflection terms and average, and we get
Since
we now have a Fresnel equation that's dependant upon only three values–the two indices of refraction and the incident angle.Well, that's great, but why have we gone through all that? It's to further simplify the equation. If we look at the equation at normal incidence, when θ = 0, then cos(θ) = 1 and g = ηλ, then most of the equation cancels out, and we're left with
Ok, what's the advantage of this? Well, this lets us assume a value for ηλ when the light is normal to the surface. In other words, when we shine a light directly at the surface and look from the same direction (normal incidence), then this equation lets us solve for ηλ at this angle. Thus we can rearrange the equation to read
You can then use the values of ηλ generated in this way to plug into the equation to generate Frat other angles.

PHYSICALLY BASED SURFACE MODELS

The most widely used model of surfaces that are not perfectly smooth and uniform is the Cook-Torrance [COOK 1982] model. What they did was to assume that

  • The geometry of the roughness of the surface is larger than that of the wavelength of the light.

  • The geometry is considered to be made up of v-shaped facets.

  • The facets are randomly oriented.

  • The facets are mirrorlike.

Using such a model, there are three different ways that a light ray can interact with the entire internal reflecting surface of the v-shaped geometry (termed "microfacets") depending on the angle of the v.

  • The ray can reflect with no interference.

  • The ray could be partially shadowed by other geometry.

  • The ray could get blocked by part of the geometry.

These three cases are shown in.

Roughness Distribution Function

When using this model, we need a way to specify the distribution of the slopes of the facets. This is termed the slope distribution function, D. Blinn [BLINN 1977] used a Gaussian distribution function to model the slope distribution.

where c is some arbitrary constant and cos(α) = n • h. The parameter m is the RMS slope parameter, for which smaller values (0.2) indicate a smooth surface, whereas larger values (0.8) indicate a rougher surface.

Cook and Torrance [COOK 1982] used a Beckmann distribution function, which they state can successfully model both rough and smooth dielectrics and conductors.

This model has the advantage of not requiring a constant but just relies on one parameter m to specify the surface roughness.

There are other models such as the Trowbridge-Reitz model [TROWBRIDGE 1975], which models the microfacets as ellipsoids. You can even consider the Phong specular term to be a distribution function with the roughness specified as the exponential power value.

Geometric Attenuation Function

In addition to describing how the geometry of a rough surface is laid out, the Cook-Torrance model can be used to calculate the amount of light actually hitting on the microfacets. Blinn [BLINN 1977] has a very nice derivation of the geometry involved. Basically, there are three different cases to consider.

  1. There is no interference in any of the light.

  2. Some of the incoming light is blocked (shadowed).

  3. Some of the reflected light is blocked (masked).

Blinn calculated the amount of light that would get blocked in each case. These are basically functions of the light direction and the facet normal. You then calculate these three values and select the minimum as the geometric attenuation function, G.

THE BIDIRECTIONAL REFLECTANCE DISTRIBUTION FUNCTION (BRDF)

We're trying to accurately simulate reflectance from light traveling through a dielectric (something transparent, air, vacuum, etc.) and hitting the surface of a conductor (or something nontransparent) and reflecting off that surface.

The bidirectional reflectance distribution function (BRDF) takes into account the structure of the reflecting surface, the attenuation of the incident light by that structure, and the optical properties of the surface. It relates the incident light energy with the outgoing light energy. The incoming and outgoing light rays need to have not only their angles with the surface normal considered, but also the orientation of the rays with the surface orientation. This allows surfaces that reflect light differently, depending on their orientation around the surface normal (i.e., an isotropic surface), to be modeled.

the BRDF parameters relate to the surface normal and orientation of the surface. a BRDF depends upon a total of four angles. Of course, you can simplify these by making assumptions about which terms are important to get the effect you want.

BRDFs in Practice

The problem with BRDFs is that they are tough to implement in a practical manner. It's perfectly fine to have a BRDF that's expensive to calculate if you are doing some non-real-time imagery, but for performing BRDFs in shaders, you frequently have to simplify the model. The Lambertian model for diffuse reflection, for example, can be considered a BRDF that's just a constant. Phong's illumination model takes the (typical) approach of breaking a BRDF into diffuse and specular parts. It uses the Lambertian model for the diffuse and a cosθ term for the specular, treating the specular BRDF as a function of the incident light vertex normal angle only.

However, we've already tried these models and found them wanting, so we'll take a look at a model for specular reflection developed by Blinn [BLINN 1977] that is still popular. Blinn proposed that, using the Cook-Torrance model for surface geometry, the specular reflection is composed of four parts.

  • The distribution function, D.

  • The Fresnel reflection law, F.

  • The geometric attenuation factor, G.

  • The fraction of the microfacets that are visible to the light and the viewer by a (n • v)(n • l) term.

The BRDF specular function is then

Now we've already gone though the Fresnel term, the distribution function, and the geometric attenuation factor. You're free to make these as complicated as you like. For example, Blinn leaves the Fresnel term = 1, whereas Cook-Torrance doesn't.

It's possible to precompute the BRDF by judiciously choosing some of the parameters, and then generating one or more textures to account for the other terms. Typically, you might generate a texture where the u, v values of the texture are mapped to the n • v and n • l terms. For more information on BRDF factorization, you can refer to [ENGEL 2002] and [LENGYEL 2002].

Anisotropic Reflection

One interesting feature of the BRDF is that is supports anisotropic reflection, that is, reflection that varies in strength depending upon the orientation of the material's surface. Many surfaces exhibit this type of shading: hair, brushed metal, grooved surfaces (CDs, records), some fabrics, etc. Some of the more complex BRDFs can model the effects.

Poulin-Fournier [POULIN 1990] wrote one of the earliest papers on anisotropic reflection models. The Poulin-Fourier model replaced the randomly oriented v-shaped grooves of the Torrance-Sparrow model with aligned cylindrical shapes (either grooves or protrusions). He, Torrance, Sillion, and Greenberg [HE 1991] proposed a model where they broke the specular term into two parts, a diffuse specular and a directional specular. Their full model is quite complex and can be used with polarized light. Even the unpolarized simplified equations are quite complex and would be nearly impossible to fit inside a shader. In a latter paper [HE 1992], they address issues of computational speed, and in a time-honored tradition, compute a lookup table from which they can closely reproduce the values in their original paper. A different approach was taken by Ward [WARD 1992], who proposed finding the simplest empirical mode that would fit the data. This is still a very active area of research. Many papers in Siggraph Proceedings of recent years are worth looking into if you are interested in seeing further details and research. [BRDF] lists some online BRDF databases.

NONPHOTOREALISTIC RENDERING (NPR)

On the other end of the rendering spectrum is nonphotorealistic rendering (NPR). This style of rendering throws out most of the attempt to simulate real-world reflection models to achieve a different artistic goal. This can be the simplification of a scene to make it easier to understand, the modification of the scene to highlight an aspect of the scene, or the simulation of some other method of illustration such as watercolor or pen and ink drawing. Using shaders, it's possible to create your own method of illustration along any one of these styles. Unfortunately, much of the research cited in these examples was done before hardware shaders existed (though there are quite a few RenderMan shaders out there), so ready-made hardware shader examples are few and far between for some of these techniques, but they are starting to become available. The basic technique is to use the lighting equations for specular light (basically, some n • l term) to modulate the intensity of the effect produced. Look at any of the Siggraph Proceedings since 1995, and you'll typically find a couple of papers on these techniques. You can also find some good reviews of the latest research in [GOOCH 2001] and [STROTHOTTE 2002]. Craig Reynolds maintains an excellent Web page that's up to date, with a ton of links to various papers at http://www.red3d.com/cwr/npr/. There's now an annual Nonphotorealistic Animation and Rendering Conference (NPAR). You can get more information about it at http://www.npar.org.

NPR Styles in 3D Rendering

NPR in 3D rendering can be roughly broken into a few different styles.

Painterly Rendering

This style is intended to simulate the results from brush-applied media. It's characterized by having a virtual brush apply media to the object's surface. Brush attributes include stroke weight, media loading, stroke attack and media attenuation over time, etc. Application of the media is sometimes done by calculating the particle flow of the media. You can find further examples of this style in [HAEBERLI 1990] and [MEIER 1996].

Pen and Ink, Engraving, and Line Art

This style is a high-contrast style where you're limited to an unvarying color intensity and can only adjust the line width. It's a simulation of using only an instrument of constant color intensity such as a pen to apply color. a rendering of Frank Lloyd Wright's Robie House, uses the technique described in [WINKENBACH 1994].

A nice example of creating digital facial engraving from 2D images, which attempts to imitate traditional copperplate engraving using the techniques from [OSTROMOUKHOV 1999], This technique uses multiple layers to lay down the different areas of the face in order to provide a sharp demarcation of the different facial regions.



Sketching and Hatching

This style is similar to the preceding one but allows the use of tone as well. It imitates the look of charcoal and pencil with strokes that are hatched images scaled to approximate stroke density. Some nice examples of this work can be found in [WEBB 2002] and [PRAUN 2001]. The use of tonal art maps (TAMs) as textures that are used to control the degree of shading on an object gives some particularly nice results.


When these are applied in place of the traditional shading equations, you get some nice effects.

Halftoning, Dithering, and Artistic Screening

These techniques use digital halftoning to get range and material. Rather than simulate a brush or pen stroke, the simulation is of the halftoning technique typically used by printers to achieve density through the use of dots or lines of varying density [OSTROMOUKHOV 1999], [STREIT 1999].

‘Toon Shading, Cel Shading, Outlining, and Stylized Rendering

This is probably one of the best-known areas of NPR. These styles use a combination of simple gradient shading and edge outlining to get some visually stunning results. A particularly striking example found in [GOOCH 1998] shows how using changes in hue and saturation to show changes in the orientation of the model's surface clarify structure. They present an alternative lighting model to traditional Phong shading.

Another popular look is to use cel shading to get a cartoonlike feel. It's found a lot on computer-rendered scenes because it's fairly easy to generate automatically. A popular way of cel shading nontextured objects is to shade according to two areas (lit and unlit)-also called hard shading because of the hard delineation-or three areas (brightly lit, lit, and unlit). The vertex color is then set to one of these dark or bright colors according to the amount of light that's falling on the vertex.

Other Styles

Of course, there are many styles that don't quite fit into one of the previous types. These range from rendering fur and/or grass based upon a procedural texture placed near silhouette edges (graftals) by Kowalski [KOWALSKI 1999], which creates scenes rendered in a Dr. Seuss-like style.

The edges of objects are determined and then an algorithm is used to graft on textured geometry, the edges are outlined, and all else is rendered with flat shading. With some attention to graftal coherency, it's possible to actually move around the scene.



: Mathematics of Lighting and Shading

LIGHTS AND MATERIALS

In order to understand how an object's color is determined, you'll need to understand the parts that come into play to create the final color. First, you need a source of illumination, typically in the form of a light source in your scene. A light has the properties of color (an rgb value) and intensity. Typically, these are multiplied to give scaled rgb values. Lights can also have attenuation, which means that their intensity is a function of the distance from the light to the surface. Lights can additionally be given other properties such as a shape (e.g., spotlights) and position (local or directional), but that's more in the implementation rather than the math of lighting effects.
Given a source of illumination, we'll need a surface on which the light will shine. Here's where we get interesting effects. Two types of phenomena are important lighting calculations. The first is the interaction of light with the surface boundary, and the second is the effect of light as it gets absorbed, transmitted, and scattered by interacting with the actual material itself. Since we really only have tools for describing surfaces of objects and not the internal material properties, light—surface boundary interactions are the most common type of calculation you'll see used, though we can do some interesting simulations of the interaction of light with material internals.
Materials are typically richer in their descriptions in an effort to mimic the effects seen in real light—material surface interactions. Materials are typically described using two to four separate colors in an effort to catch the nuances of real-world light—material surface interactions. These colors are the ambient, diffuse, specular, and emissive colors, with ambient and specular frequently grouped together, and emissive specified only for objects that generate light themselves. The reason there are different colors is to give different effects arising from different environmental causes. The most common lights are as follows:

  • Ambient lighting: The overall color of the object due to the global ambient light level. This is the color of the object when there's no particular light, just the general environmental illumination. That is, the ambient light is an approximation for the global illumination in the environment, and relies upon no light in the scene. It's usually a global value that's added to every object in a scene.

  • Diffuse lighting: The color of the object due to the effect of a particular light. The diffuse light is the light of the surface if the surface were perfectly matte. The diffuse light is reflected in all directions from the surface and depends only on the angle of the light to the surface normal.

  • Specular lighting: The color of the highlights on the surface. The specular light mimics the shininess of a surface, and its intensity is a function of the light's reflection angle off the surface.

  • Emissive lighting: When you need an object to "glow" in a scene, you can do this with an emissive light. This is just an additional color source added to the final light of the object. Don't be confused just because we're simulating an object giving off its own light; you'd still have to add a real "light" to get an effect on objects in a scene.

Before we get into exactly what these types of lighting are, let's put it in perspective for our purpose of writing shader code. Shading is simply calculating the color reflected off a surface (which is pretty much what shaders do). When a light reflects off a surface, the light colors are modulated by the surface color (typically, the diffuse or ambient surface color). Modulation means multiplication, and for colors, since we are using rgb values, this means component-by-component multiplication. So for light source l with color (r1,g1,b1 shining on surface s with color (rs,gs,bs, the resulting color r would be:

or, multiplying it out, we get

where the resulting rgb values of the light and surface are multiplied out to get the final color's rgb values.

The final step after calculating all the lighting contributions is to add together all the lights to get the final color. So a shader might typically do the following:

  1. Calculate the overall ambient light on a surface.

  2. For each light in a scene, calculate the diffuse and specular contribution for each light.

  3. Calculate any emissive light for a surface.

  4. Add all these lights together to calculate the final color value.

This is pretty much what the FFP does, and it's fairly simple to do as long as you don't let the number of lights get too large. Of course, since you're reading this, you're probably interested not only in what the traditional pipeline does, but also in ways of achieving your own unique effects. So let's take a look at how light interacts with surfaces of various types.

In the real world, we get some sort of interaction (reflection, etc.) when a photon interacts with a surface boundary. Thus we see the effects not only when we have a transparent—opaque boundary (like air-plastic), but also a transparent—transparent boundary (like air-water). The key feature here is that we get some visual effect when a photon interacts with some boundary between two different materials. The conductivity of the materials directly affects how the photon is reflected. At the surface of a conductor (metals, etc.), the light is mostly reflected. For dielectrics (nonconductors), there is usually more penetration and transmittance of the light. For both kinds of materials, the dispersion of the light is a function of the roughness of the surface .

The simplest model assumes that the roughness of the surface is so fine that light is dispersed equally in all directions , though later we'll look at fixing this assumption.

A generalization is that conductors are opaque and dielectrics are transparent. This gets confusing since most of the dielectric surfaces that we are interested in modeling are mixtures and don't fall into the simple models we've described so far. Consider a thick colored lacquer surface. The lacquer itself is transparent, but suspended in the lacquer are reflective pigment off of which light gets reflected, bounced, split, shifted or altered before perhaps reemerging from the surface. where the light rays are not just reflected but bounced around a bit inside the medium before getting retransmitted to the outside.

Metallic paint, brushed metal, velvet, etc. are all materials for which we need to examine better models to try to represent these surfaces. But with a little creativity in the modeling, it's possible to mimic the effect. what you get when you use multiple broad specular terms for multiple base colors combined with a more traditional shiny specular term. There's also a high-frequency normal perturbation that simulates the sparkle from a metallic flake pigment. As you can see, you can get something that looks particularly striking with a fairly simple model.

A simple shader to simulate metallic paint: (a) shows the two-tone paint shading pass; (b) shows the specular sparkle shading pass; (c) shows the environment mapping pass; (d) shows the final composite image

The traditional model gives us a specular term and a diffuse term. We have been able to add in texture maps to give our scenes some uniqueness, but the lighting effects have been very simple. Shaders allow us to be much more creative with lighting effects. with just a few additional specular terms, you can bring forth a very interesting look. But before we go off writing shaders, we'll need to take a look at how it all fits together in the graphics pipeline. And a good place to start is by examining the traditional lighting model that has been around for the last two decades.

TRADITIONAL 3D HARDWARE-ACCELERATED LIGHTING MODELS

Before we get into the more esoteric uses of shaders, we'll first take a look at the traditional method of calculating lighting in hardware—a method that you'll find is sufficient for most of your needs.

The traditional approach in real-time computer graphics has been to calculate lighting at a vertex as a sum of the ambient, diffuse, and specular light. In the simplest form (used by OpenGL and Direct3D), the function is simply the sum of these lighting components (clamped to a maximum color value). Thus we have an ambient term and then a sum of all the light from the light sources.

where itotal is the intensity of light (as an rgb value) from the sum of the intensity of the global ambient value and the diffuse and specular components of the light from the light sources. This is called a local lighting model since the only light on a vertex is from a light source, not from other objects. That is, lights are lights, not objects. Objects that are brightly lit don't illuminate or shadow any other objects.

I've included the reflection coefficients for each term, k for completeness since you'll frequently see the lighting equation in this form. The reflection coefficients are in the [0,1] range and are specified as part of the material property. However, they are strictly empirical and since they simply adjust the overall intensity of the material color, the material color values are usually adjusted so the color intensity varies rather than using a reflection coefficient, so we'll ignore them in our actual color calculations.

This is a very simple lighting equation and gives fairly good results. However, it does fail to take into account any gross roughness or anything other than perfect isotropic reflection. That is, the surface is treated as being perfectly smooth and equally reflective in all directions. Thus this equation is really only good at modeling the illumination of objects that don't have any "interesting" surface properties. By this I mean anything other than a smooth surface (like fur or sand) or a surface that doesn't really reflect light uniformly in all directions (like brushed metal, hair, or skin). However, with liberal use of texture maps to add detail, this model has served pretty well and can still be used for a majority of the lighting processing to create a realistic environment in real time. Let's take a look at the individual parts of the traditional lighting pipeline.

Ambient Light

Ambient light is the light that comes from all directions—thus all surfaces are illuminated equally regardless of orientation. However, this is a big hack in traditional lighting calculations since "real" ambient light really comes from the light reflected from the "environment." This would take a long time to calculate and would require ray tracing or the use of radiosity methods, so traditionally, we just say that there's x amount of global ambient light and leave it at that. This makes ambient light a little different from the other lighting components since it doesn't depend on a light source. However, you typically do want ambient light in your scene because having a certain amount of ambient light makes the scene look natural. One large problem with the simplified lighting model is that there is no illumination of an object with reflected light—the calculations required are enormous for a scene of any complexity (every object can potentially reflect some light and provide some illumination for every other object in a scene) and are too time consuming to be considered for real-time graphics.

So, like most things in computer graphics, we take a look at the real world, decide it's too complicated, and fudge up something that kinda works. Thus the ambient light term is the "fudge factor" that accounts for our simple lighting model's lack of an inter-object reflectance term.

where ia is the ambient light intensity, ma is the ambient material color, and sa is the light source ambient color. Typically, the ambient light is some amount of white (i.e., equal rgb values) light, but you can achieve some nice effects using colored ambient light. Though it's very useful in a scene, ambient light doesn't help differentiate objects in a scene since objects rendered with the same value of ambient tend to blend since the resulting color is the same. You can see that it's difficult to make out details or depth information with just ambient light.

Ambient lighting is your friend. With it you make your scene seem more realistic than it is. A world without ambient light is one filled with sharp edges, of bright objects surrounded by sharp, dark, harsh shadows. A world with too much ambient light looks washed out and dull. Since the number of actual light sources supported by hardware FFP is limited (typically to eight simultaneous), you'll be better off to apply the lights to add detail to the area that your user is focused on and let ambient light fill in the rest. Before you point out that talking about the hardware limitation of the number of lights has no meaning in a book on shaders, where we do the lighting calculations, I'll point out that eight lights were typically the maximum that the hardware engineers created for their hardware. It was a performance consideration. There's nothing stopping you (except buffer size) from writing a shader that calculates the effects from a hundred simultaneous lights. But I think that you'll find that it runs much too slowly to be used to render your entire scene. But the nice thing about shaders is you can.

Diffuse Light

Diffuse light is the light that is absorbed by a surface and is reflected in all directions. In the traditional model, this is ideal diffuse reflection—good for rough surfaces where the reflected intensity is constant across the surface and is independent of viewpoint but depends only upon the direction of the light source to the surface. This means that regardless of the direction from which you view an object with a stationary diffuse light source on it, the brightness of any point on the surface will remain the same. Thus, unlike ambient light, the intensity of diffuse light is directional and is a function of the angle of the incoming light and the surface. This type of shading is called Lambertian shading after Lambert's cosine law, which states that the intensity of the light reflected from an ideal diffuse surface is proportional to the cosine of the direction of the light to the vertex normal.

Since we're dealing with vertices here and not surfaces, each vertex has a normal associated with it. You might hear talk of per-vertex normals vs. perpolygon normals. The difference being that per polygon has one normal shared for all vertices in a polygon, whereas per vertex has a normal for each vertex. OpenGL has the ability to specify per-polygon normals, and Direct3D does not. Since vertex shaders can't share information between vertices (unless you explicitly copy the data yourself), we'll focus on per-vertex lighting.

which is similar to the ambient light equation, except that the diffuse light term is now multiplied by the dot product of the unit normal of the vertex and the unit direction vector to the light from the vertex (not the direction from the light). Note that the md value is a color vector, so there are rgb or rgba values that will get modulated.

Since , where θ is the angle between vectors, when the angle between them is zero, cos(θ) is 1 and the diffuse light is at its maximum. When the angle is 90°, cos(θ) is zero and the diffuse light is zero. One calculation advantage is that when the cos(θ) value is negative, this means that the light isn't illuminating the vertex at all. However, since you (probably!) don't want the light illuminating sides that it physically can't shine on, you want to clamp the contribution of the diffuse light to contribute only when cos(θ) is positive. Thus the equation in practice looks more like

where we've clamped the diffuse value to only positive values. Notice how you can tell a lot more detail about the objects and pick up distance cues from the shading.

The problem with just diffuse lighting is that it's independent of the viewer's direction. That is, it's strictly a function of the surface normal and the light direction. Thus as we change the viewing angle to a vertex, the vertex's diffuse light value never changes. You have to rotate the object (change the normal direction) or move the light (change the light direction) to get a change in the diffuse lighting of the object.

However, when we combine the ambient and diffuse, we can see that the two types of light give a much more realistic representation than either does alone. This combination of ambient and diffuse is used for a surprisingly large number of items in rendered scenes since when combined with texture maps to give detail to a surface you get a very convincing shading effect.

Specular Light

Ambient light is the light that comes from the environment (i.e., it's directionless); diffuse light is the light from a light source that is reflected by a surface evenly in all directions (i.e., it's independent of the viewer's position). Specular light is the light from a light source that is reflected by a surface and is reflected in such a manner that it's both a function of the light's vector and the viewer's direction. While ambient light gives the object an illuminated matte surface, specular light is what gives the highlights to an object. These highlights are greatest when the viewer is looking directly along the reflection angle from the surface.

Most discussions of lighting (including this one) start with Phong's lighting equation (which is not the same as Phong's shading equation). In order to start discussing specular lighting, let's look at a diagram of the various vectors that are used in a lighting equation. We have a light source, some point the light is shining on, and a viewpoint. The light direction (from the point to the light) is vector l, the reflection vector of the light vector (as if the surface were a mirror) is r, the direction to the viewpoint from the point is vector v. The point's normal is n.

Phong's Specular Light Equation

Warnock [WARNOCK 1969] and Romney [ROMNEY 1969] were the first to try to simulate highlights using a cosn(θ) term. But it wasn't until Phong Bui-Tong [BUI 1998] reformulated this into a more general model that formalized the power value as a measure of surface roughness that we approach the terms used today for specular highlights.

It basically says that the more the view direction, v, is aligned with the reflection direction, r, the brighter the specular light will be. The big difference is the introduction of the ms term, which is a power term that attempts to approximate the distribution of specular light reflection. The ms term is typically called the "shininess" value. The larger the ms value, the "tighter" (but not brighter) the specular highlights will be. which shows values of for values of m ranging from 1 to 128. As you can see, the specular highlights get narrower for higher values, but they don't get any brighter.

Figure 3.11: Phong's specular term for various values of the "shininess" term. Note that the values never get above 1.

Now, as you can see, this requires some calculations since we can't know r beforehand since it's the v vector reflected around the point's normal. To calculate r we can use the following equation:[3]

If 1 and n are normalized, then the resulting r is normalized and the equation can be simplified.

And just as we did for diffuse lighting, if the dot product is negative, then the term is ignored.

the scene with just specular lighting. As you can see, we get an impression of a very shiny surface.

When we add the ambient, diffuse, and specular terms together, The three terms all act in concert to give us a fairly good imitation of a nice smooth surface that can have a varying degree of shininess to it.

You may have noticed that computing the reflection vector took a fair amount of effort. In the early days of computer graphics, there was a concerted effort to reduce anything that took a lot of computation, and the reflection vector of Phong's equation was one such item.

Blinn's Simplification: OpenGL and DirectX Lighting

Now it's computationally expensive to calculate specular lighting using Phong's equation since computing the reflection vector is expensive. Blinn [BLINN 1977] suggested, instead of using the reflection and view vectors, that we create a "half" vector that lies between the light and view vectors. This is shown as the h vector . Just as Phong's equation maximizes when the reflection vector is coincident with the view vector (thus the viewer is looking directly along the reflection vector), so does Blinn's. When the half vector is coincident with the normal vector, then the angle between the view vector and the normal vector is the same as between the light vector and the normal vector. Blinn's version of Phong's equation is:

where the half vector is defined as

The advantage is that no reflection vector is needed; instead, we can use values that are readily available, namely, the view and light vectors. Note that both OpenGL and the DirectX FFP use Blinn's equation for specular light.

Besides a speed advantage, there are some other effects to note between Phong's specular equation and Blinn's.

  • If you multiply Blinn's exponent by 4, you approximate the results of Phong's equation.

  • Thus if there's an upper limit on the value of the exponent, Phong's equation can produce sharper highlights.

  • For l • v angles greater than 45° (i.e., when the light is behind an object and you're looking at an edge), the highlights are longer along the edge direction for Phong's equation.

  • Blinn's equation produces results closer to those seen in nature.

For an in-depth discussion of the differences between the two equations, there's an excellent discussion in [FISHER 1994]. shows the difference between Phong lighting and Blinn—Phong lighting.

The Lighting Equation

So now that we've computed the various light contributions to our final color value, we can add them up to get the final color value. Note that the final color values will have to be made to fit in the [0,1] range for the final rgb values.

Our final scene with ambient, diffuse, and (Blinn's) specular light contributions (with one white light above and to the left of the viewer) .

It may be surprising to discover that there's more than one way to calculate the shading of an object, but that's because the model is empirical, and there's no correct way, just different ways that all have tradeoffs. Until now though, the only lighting equation you've been able to use has been the one we just formulated.

Most of the interesting work in computer graphics is tweaking that equation, or in some cases, throwing it out altogether and coming up with something new.

The next sections will discuss some refinements and alternative ways of calculating the various coefficients of the lighting equation. We hope you'll get some ideas that you'll be able to use to create your own unique shaders.

Light Attenuation

Light in the real world loses its intensity as the inverse square of the distance from the light source to the surface being illuminated. However, when put into practice, this seemed to drop off the light intensity in too abrupt a manner and then not to vary too much after the light was far away. An empirical model was developed that seems to give satisfactory results. This is the attenuation model that's used in OpenGL and DirectX. The fatten factor is the attenuation factor. The distance d between the light and the vertex is always positive. The attenuation factor is calculated by the following equation:

where the kc, k1, and kq parameters are the constant, linear, and quadratic attenuation constants, respectively. To get the "real" attenuation factor, you can set kq to one and the others to zero.

The attenuation factor is multiplied by the light diffuse and specular values. Typically, each light will have a set of these parameters for itself. The lighting equation with the attenuation factor looks like this.

Schlick's Simplification for the Specular Exponential Term

Real-time graphics programmers are always looking for simplifications. You've probably gathered that there's no such thing as the "correct" lighting equation, just a series of hacks to make things look right with as little computational effort as possible. Schlick [SCHLICK 1994] suggested a replacement for the exponential term since that's a fairly expensive operation. If we define part of our specular light term as follows:

where S is either the Phong or Blinn-Phong flavor of the specular lighting equation, then Schlick's simplification is to replace the preceding part of the specular equation with

which eliminates the need for an exponential term. At first glance, a plot of Schlick's function looks very similar to the exponential equation.



If we plot both equations, we can see some differences and evaluate just how well Schlick's simplification works. The blue values are Schlick's, and the red are the exponential plot. As the view and light angles get closer (i.e., get closer to zero on the x axis), we can see that the values of the curves are quite close. (For a value of zero, they overlap.) As the angles approach a grazing angle, we can see that the approximation gets worse. This would mean that when there is little influence from a specular light, Schlick's equation would be slightly less sharp for the highlight.

You might notice the green line. Unlike the limit of a value of 128 for the exponential imposed in both OpenGL and DirectX FFP, we can easily make our values in the approximation any value we want. The green line is a value of 1024 in Schlick's equation. You may be thinking that we can make a very sharp specular highlight using Schlick's approximation with very large values—sharper than is possible using the exponential term. Unfortunately, we can't since you really need impractically large values (say, around 100 million) to boost it significantly over the exponential value for 128. But that's just the kind of thinking that's going to get your creative juices flowing when writing your own shaders! If the traditional way doesn't work, figure out something that will.

Oren—Nayar Diffuse Reflection

Though there's been a lot of research on specular reflection models, there's been less research on diffuse reflection models. One of the problems of the standard Lambertian model is that it considers the surface as a smooth diffuse surface. Surfaces that are really rough, like sandpaper, exhibit much more of a backscattering effect, particularly when the light source and the view direction are in the same direction.

The classic example of this is a full moon. If you look at the picture of the moon , it's pretty obvious that this doesn't follow the Lambertian distribution—if it did, the edges of the moon would be in near darkness. In fact, the edges look as bright as the center of the moon. This is because the moon's surface is rough—the surface is made of a jumble of dust and rock with diffuse reflecting surfaces at all angles—thus the quantity of reflecting surfaces is uniform no matter the orientation of the surface; hence no matter the orientation of the surface to the viewer, the amount of light reflecting off the surface is nearly the same.

The effect we're looking at is called backscattering. Backscattering is when a rough surface bounces around a light ray and then reflects the ray in the direction the light originally came from. Note that there is a similar but different effect called retroreflection. Retroreflection is the effect of reflecting light toward the direction from which it came, no matter the orientation of the surface. This is the same effect that we see on bicycle reflectors. However, this is due to the design of the surface features (made up of v-shaped or spherical reflectors) rather than a scattering effect.

In a similar manner, when the light direction is closer to the view direction, we get the effect of forward scattering. Forward scattering is just backscattering from a different direction. In this case, instead of near uniform illumination though, we get near uniform loss of diffuse lighting. You can get the same effects here on Earth. the same surfaces demonstrating backscattering and forward scattering. Both the dirt field and the soybean field can be considered rough diffuse reflecting surfaces.

Notice how the backscattering image shows a near uniform diffuse illumination, whereas the forward scattering image shows a uniform dull diffuse illumination. Also note that you can see specular highlights and more color variation because of the shadows due to the rough surface, whereas the backscattered image washes out the detail.

In an effort to better model rough surfaces, Oren and Nayar [OREN 1992] came up with a generalized version of a Lambertian diffuse shading model that tries to account for the roughness of the surface. They applied the Torrance—Sparrow model for rough surfaces with isotropic roughness and provided parameters to account for the various surface structures found in the Torrance—Sparrow model. By comparing their model with actual data, they simplified their model to the terms that had the most significant impact. The Oren—Nayar diffuse shading model looks like this.

Now this may look daunting, but it can be simplified to something we can appreciate if we replace the original notation with the notation we've already been using. ρ/π is a surface reflectivity property, which we can replace with our surface diffuse color. E0 is a light input energy term, which we can replace with our light diffuse color. And the θi term is just our familiar angle between the vertex normal and the light direction. Making these exchanges gives us

which looks a lot more like the equations we've used. There are still some parameters to explain.

  • σ is the surface roughness parameter. It's the standard deviation in radians of the angle of distribution of the microfacets in the surface roughness model. The larger the value, the rougher the surface.

  • θr is the angle between the vertex normal and the view direction.

  • φr φi is the circular angle (about the vertex normal) between the light vector and the view vector.

  • α is max(θi, θr).

  • β is min (θi, θr).

Note that if the roughness value is zero, the model is the same as the Lambertian diffuse model. Oren and Nayar also note that you can replace the value 0.33 in coefficient A with 0.57 to better account for surface interreflection.

[3]An excellent explanation of how to compute the reflection vector can be found in [RTR].