Baking artifact-free lightmaps on the GPU

57653746-ef56-4880-9ab3-c03ea8419d1f

 

Since 2015 I was working on a GPU lightmapper called Bakery. It’s finally done, and you can even buy it on Unity’s Asset Store (it can be used outside of Unity as well, but the demand is higher there). Originally intended for my own game, I decided to make it a product by itself and hopefully help other people bake nice lighting. In my old tweet I promised to write about how it works, as there were many, MANY unexpected problems on the way, and I think such write-up would be useful. I also thought of open-sourcing it at the time, but having spent almost more than a year of work and coding it full-time now, I think it’s fair to delay it a bit.

The major focus of this project was to minimize any kinds of artifacts lightmapping often produces, like seams and leaks, and also make it flexible and fast. This blog post won’t cover lighting computation much, but will instead focus on what it takes to produce a high quality lightmap. We will start with picture on the left and will make it look like the one on the right:

Contents:
UV space rasterization
Optimizing UV GBuffer: shadow leaks
Optimizing UV GBuffer: shadow terminator
Ray bias
Fixing UV seams
Final touches
Bonus: mip-mapping lightmaps


UV space rasterization

Bakery is in fact a 4th lightmapper I designed. Somehow I’m obsessed with baking stuff. First one simply rasterized forward lights in UV space, 2nd generated UV surface position and normal and then rendered the scene from every texel to get GI (huge batches with instancing), 3rd was PlayCanvas’ runtime lightmapper, which is actually very similar to 1st. All of them had one thing in common – something had to be rasterized in UV space.

Let’s say, you have a simple lighting shader, and a mesh with lightmap UVs:

bake2_realtime

You want to bake this lighting, how do you that? Instead of outputting transformed vertex position

OUT.Position = mul(IN.Position, MVP);

you just output UVs straight on the screen:

OUT.Position = float4(IN.LightmapUV * 2 - 1, 0, 1);

Note that “*2-1” is necessary to transform from typical [0,1] UV space into typical [-1;1] clip space.

Voila:

bake_lmnaive

That was easy, now let’s try to apply this texture:

bake2_nodilate.png

Oh no, black seams everywhere!

Because of the bilinear interpolation and typical non-conservative rasterization, our texels now blend into background color. But if you ever worked with any baking software you know about padding, and how they expand or dilate pixels around to cover the background. So let’s try to process our lightmap with a very simple dilation shader. Here is one from PlayCanvas, in GLSL:

varying vec2 vUv0;
uniform sampler2D source;
uniform vec2 pixelOffset;
void main(void) {
    vec4 c = texture2D(source, vUv0);
    c = c.a>0.0? c : texture2D(source, vUv0 - pixelOffset);
    c = c.a>0.0? c : texture2D(source, vUv0 + vec2(0, -pixelOffset.y));
    c = c.a>0.0? c : texture2D(source, vUv0 + vec2(pixelOffset.x, -pixelOffset.y));
    c = c.a>0.0? c : texture2D(source, vUv0 + vec2(-pixelOffset.x, 0));
    c = c.a>0.0? c : texture2D(source, vUv0 + vec2(pixelOffset.x, 0));
    c = c.a>0.0? c : texture2D(source, vUv0 + vec2(-pixelOffset.x, pixelOffset.y));
    c = c.a>0.0? c : texture2D(source, vUv0 + vec2(0, pixelOffset.y));
    c = c.a>0.0? c : texture2D(source, vUv0 + pixelOffset);
    gl_FragColor = c;
}

For every empty background pixel, it simply looks at 8 neighbours around it and copies the first non-empty value value it finds. And the result:

There are still many imperfections, but it’s much better.

To generate more sophisticated lighting, with area shadows, sky occlusion, colored light bounces, etc, we’ll have to trace some rays. Although you can write a completely custom ray-tracing implementation using bare DirectX/WebGL/Vulkan/Whatever, there are already very efficient APIs to do that, such as OptiX,  RadeonRays and DXR. They have a lot of similarities, so knowing one should give you an idea of how to operate the other: you define surfaces, build an acceleration structure and intersect rays against it. Note that none of the APIs generate lighting, but they only give you a very flexible way of fast ray-primitive intersection on the GPU, and there are potentially lots of different ways to (ab)use it. OptiX was the first of such kind, and that is why I chose it for Bakery, as there were no alternatives back in the day. Today it’s also unique for having an integrated denoiser. DXR can be used on both Nvidia/AMD (not sure about Intel), but it requires Win10. I don’t know much about RadeonRays, but it seems to be the most cross-platform one.  Anyway, in this article I’m writing from OptiX/CUDA (ray-tracing) and DX11 (tools) perspective.

To trace rays from the surface we first need to acquire a list of sample points on it along with their normals. There are multiple ways to do that, for example in a recent OptiX tutorial it is suggested to randomly distribute points over triangles and then resample the result to vertices (or possibly, lightmap texels). I went with a more direct approach, by rendering what I call a UV GBuffer.
It is exactly what it sounds like – just a bunch of textures of the same size with rasterized surface attributes, most importantly position and normal:

Example of an UV GBuffer for a sphere. Left: position, center: normal, right: albedo.

Having rasterized position and normal allows us to run a ray generation program using texture dimensions with every thread spawning a ray (or multiple rays, or zero rays) at every texel. GBuffer position can be used as a starting point and normal will affect ray orientation.
“Ray generation program” is a term used in both OptiX and DXR – think of it as a compute shader with additional ray-tracing capabilities. They can create rays, trace them by executing and waiting for intersection/hit programs (also new shader types) and then obtain the result.
Full Bakery UV GBuffer consists of 6 textures: position, “smooth position”, normal, face normal, albedo and emissive. Alpha channels also contain interesting things like world-space texel size and triangle ID. I will cover them in more detail later.

Calculating lighting for every GBuffer texel and dilating the result looks horrible, and there are many artifacts:

What have we done? Why is it so bad? Shadows leak, tiny details look like garbage, and smooth surfaces are lit like they are flat-shaded. I’ll go over each of these problems one by one.

Simply rasterizing the UV GBuffer is not enough. Even dilating it is not enough. UV layouts are often imperfect and can contain very small triangles that will be simply skipped or rendered with noticeable aliasing. If a triangle was too small to be drawn, and if you dilate nearby texels over its supposed place, you will get artifacts. This is what happens on the vertical bar of the window frame here.

Instead of post-dilation, we need to use or emulate conservative rasterization. Currently, not all GPUs support real conservative raster (hey, I’m not sponsored by Nvidia, just saying), but there are multiple ways to achieve similar results without it:

  • Repurposing MSAA samples
  • Rendering geometry multiple times with sub-pixel offset
  • Rendering lines over triangles

Repurposing MSAA samples is a fun idea. Just render the UV layout with, say, 8x MSAA, then resolve it without any blur by either using any sample or somehow averaging them. It should give you more “conservative” results, but there is a problem. Unfortunately I don’t have a working code of this implementation anymore, but I remember it was pretty bad. Recall the pattern of 8x MSAA:

d3d11_msaapatterns_8_16

Because samples are scattered around, and none of them are in the center, and because we use them to calculate lightmap texel centers, there is a mismatch that produces even more shadow leaking.

Rendering lines is something I thought too late about, but it might work pretty well.

So in the end I decided to do multipass rendering with different sub-pixel offsets. To avoid aforementioned MSAA problems, we can have a centered sample with zero offset, always rendered last on top of everything else (or you can use depth/stencil buffer and render it first instead… but I was lazy to do so, and it’s not a perf-critical code). These are the offsets I use (to be multiplied by half-texel size):

float uvOffset[5 * 5 * 2] =
{
    -2, -2,
    2, -2,
    -2, 2,
    2, 2,

    -1, -2,
    1, -2,
    -2, -1,
    2, -1,
    -2, 1,
    2, 1,
    -1, 2,
    1, 2,

    -2, 0,
    2, 0,
    0, -2,
    0, 2,

    -1, -1,
    1, -1,
    -1, 0,
    1, 0,
    -1, 1,
    1, 1,
    0, -1,
    0, 1,

    0, 0
};

Note how larger offsets are used first, then overdrawn by smaller offsets and finally the unmodifed UV GBuffer. It dilates the buffer, reconstructs tiny features and preserves sample centers in the majority of cases. This implementation is still limited, just as MSAA, by the amount of samples used, but in practice I found it handling most real life cases pretty well.

Here you can see a difference it makes. Artifacts present on thin geometry disappear:

ftbake_multitap

Left: simple rasterization. Right: multi-tap rasterization.


Optimizing UV GBuffer: shadow leaks

To fix remaining artifacts, we will need to tweak GBuffer data. Let’s start with shadow leaks.

Shadow leaks occur because texels are large. We calculate lighting for a single point, but it gets interpolated over a much larger area. Once again, there are multiple ways to fix it. A popular approach is to supersample the lightmap, calculating multiple lighting values inside one texel area and then averaging. It is however quite slow and doesn’t completely solve the problem, only being able to lighten wrong shadows a bit.

leak2

To fix it I instead decided to push sample points out of shadowed areas where leaks can occur. Such spots can be detected and corrected using a simple algorithm:

  • Trace at least 4 tangential rays pointing in different directions from texel center. Ray length = world-space texel size * 0.5.
  • If ray hits a backface, this texel will leak.
  • Push texel center outside using both hit face normal and ray direction.

This method will only fail when you have huge texels and thin double-sided walls, but this is rarely the case. Here is an illustration of how it works:

leakfix

Here is a 4 rays loop. First ray that hits the backface (red dot) decides new sample position (blue dot) with the following formula:

newPos = oldPos + rayDir * hitDistance + hitFaceNormal * bias

Note that in this case 2 rays hit the backface, so potentially new sample position can randomly change based on the order of ray hits, but in practice it doesn’t matter. Bias values is an interesting topic by itself and I will cover it later.

I calculate tangential ray directions using a simple cross product with normal (face normal, not the interpolated one), which is not completely correct. Ideally you’d want to use actual surface tangent/binormal based on the lightmap UV direction, but it would require even more data to be stored in the GBuffer. Having rays not aligned to UV direction can produce some undershoots and overshoots:

rayrotate1

In the end I simply chose to stay with a small overshoot.

Why should we even care about ray distance, what happens if it’s unlimited? Consider this case:

raydist

Only 2 shortest rays will properly push the sample out, giving it similar color to its surroundings. Attempting to push it behind more distant faces with leave the texel incorrectly shadowed.

As mentioned, we need world-space texel size in the UV GBuffer to control ray distance. A handy and cheap way to obtain a sufficiently accurate value is:

float3 dUV1 = max(abs(ddx(IN.worldPos)), abs(ddy(IN.worldPos)));
float dPos = max(max(dUV1.x, dUV1.y), dUV1.z);
dPos = dPos * sqrt(2.0); // convert to diagonal (small overshoot)

Calling ddx/ddy during UV GBuffer rasterization will answer the question “how much this value changes in one lightmap texel horizontally/vertically”, and plugging in world position basically gives us texel size. I’m simply taking the maximum here which is not entirely accurate. Ideally you may want 2 separate values for non-square texels, but it’s not common for lightmaps, as all automatic unwrappers try to avoid heavy distortion.

Adjusting sample positions is a massive improvement:

And another comparison, before and after:

DZ-lMZsWAAEWEAX


Optimizing UV GBuffer: shadow terminator

image_2018-04-09_21-58-00 (2)

Next thing to address is the wrong self-shadowing of smooth low-poly surfaces. This problem is fairly common in ray-tracing in general  and often mentioned as the “shadow terminator problem” (google it). Because rays are traced against actual geometry, they perceive it as faceted, giving faceted shadows. When combined with typical smooth interpolated normals, it looks wrong. “Smooth normals” is a hack everyone was using since the dawn of times, and to make ray-tracing practical we have to support it.

There is not much literature on that, but so far these are the methods I found being used:

  • Adding constant bias to ray start (used in many offline renderers)
  • Making shadow rays ignore adjacent faces that are almost coplanar (used in 3dsmax in “Advanced Ray-Traced Shadow” mode)
  • Tessellating/smoothing geometry before ray-tracing (mentioned somewhere)
  • Blurring the result (mentioned somewhere)
  • A mysterious getSmoothP function in Houdini (“Returns modified surface position based on a smoothing function”)

Constant bias just sucks, similarly to shadowmapping bias. It’s a balance between wrong self-shadowing and peter-panning:

GUID-68EC3CB9-1820-47B0-8451-E8E8E75965BB
Shadow ray bias demonstration from 3dsmax documentation

Coplanar idea requires adjacent face data, and just doesn’t work well. Tweaking the value for one spot breaks the shadow on another:

photo_2018-04-08_22-01-01

Blurring the result will mess with desired shadow width and proper lighting gradients, so it’s a bad idea. Tessellating real geometry is fun but slow.

What really made my brain ticking is the Houdini function. What is smooth position? Can we place samples as they were on a round object, not a faceted one? Turns out, we can. Meet Phong Tessellation (again):

phongtess.png

It is fast, it doesn’t require knowledge of adjacent faces, it makes up a plausible smooth position based on smooth normal. It’s just what we need.

Instead of actually tessellating anything, we can compute modified position on a fragment level during GBuffer drawing. Geometry shader can be used to output 3 vertex positions/normals together with barycentric coordinates down to the pixel shader, where Phong Tessellation code is executed.

samples

Note that it should be only applied to “convex” triangles, with normals pointing outwards. Triangles with inwards pointing normals don’t exhibit the problem anyway, and we never want sample points to go inside the surface. Using a plane equation with the plane constructed from the face normal and a point on this face is a good test to determine if you got modified position right, and at least flatten fragments that go the wrong way.

Simply using rounded “smooth” position gets us here:

image_2018-04-09_22-28-30 (2)

Almost nice 🙂 But what is this little seam on the left?

Sometimes there are weird triangles with 2 normals pointing out, and one in (or the other way around) making  some samples go under the face. It’s a shame, because it could produce an almost meaningful extruded position, but instead goes inside and we have to flatten it.smnormal

To improve such cases I try transforming normals into triangle’s local space, with one axis aligned to an edge, flattening them in one direction, transforming back and seeing if situation improves. There are probably better ways to do that. The code I wrote for it is terribly inefficient and was the result of quick experimentation, but it gets the job done, and we only need to execute it once before any lightmap rendering:

	// phong tessellation
	float3 projA = projectToTangentPlane(IN.worldPos, IN.worldPosA, IN.NormalA);
	float3 projB = projectToTangentPlane(IN.worldPos, IN.worldPosB, IN.NormalB);
	float3 projC = projectToTangentPlane(IN.worldPos, IN.worldPosC, IN.NormalC);
	float3 smoothPos = triLerp(IN.Barycentric, projA, projB, projC);

	// only push positions away, not inside
	float planeDist = pointOnPlane(smoothPos, IN.FaceNormal, IN.worldPos);
	if (planeDist < 0.0f)
	{
		// default smooth triangle is inside - try flattening normals in one dimension

		// AB
		float3 edge = normalize(IN.worldPosA - IN.worldPosB);
		float3x3 edgePlaneMatrix = float3x3(edge, IN.FaceNormal, cross(edge, IN.FaceNormal));
		float3 normalA = mul(edgePlaneMatrix, IN.NormalA);
		float3 normalB = mul(edgePlaneMatrix, IN.NormalB);
		float3 normalC = mul(edgePlaneMatrix, IN.NormalC);
		normalA.z = 0;
		normalB.z = 0;
		normalC.z = 0;
		normalA = mul(normalA, edgePlaneMatrix);
		normalB = mul(normalB, edgePlaneMatrix);
		normalC = mul(normalC, edgePlaneMatrix);
		projA = projectToTangentPlane(IN.worldPos, IN.worldPosA, normalA);
		projB = projectToTangentPlane(IN.worldPos, IN.worldPosB, normalB);
		projC = projectToTangentPlane(IN.worldPos, IN.worldPosC, normalC);
		smoothPos = triLerp(IN.Barycentric, projA, projB, projC);

		planeDist = pointOnPlane(smoothPos, IN.FaceNormal, IN.worldPos);
		if (planeDist < 0.0f)
		{
			// BC
			edge = normalize(IN.worldPosB - IN.worldPosC);
			edgePlaneMatrix = float3x3(edge, IN.FaceNormal, cross(edge, IN.FaceNormal));
			float3 normalA = mul(edgePlaneMatrix, IN.NormalA);
			float3 normalB = mul(edgePlaneMatrix, IN.NormalB);
			float3 normalC = mul(edgePlaneMatrix, IN.NormalC);
			normalA.z = 0;
			normalB.z = 0;
			normalC.z = 0;
			normalA = mul(normalA, edgePlaneMatrix);
			normalB = mul(normalB, edgePlaneMatrix);
			normalC = mul(normalC, edgePlaneMatrix);
			projA = projectToTangentPlane(IN.worldPos, IN.worldPosA, normalA);
			projB = projectToTangentPlane(IN.worldPos, IN.worldPosB, normalB);
			projC = projectToTangentPlane(IN.worldPos, IN.worldPosC, normalC);
			smoothPos = triLerp(IN.Barycentric, projA, projB, projC);

			planeDist = pointOnPlane(smoothPos, IN.FaceNormal, IN.worldPos);
			if (planeDist < 0.0f)
			{
				// CA
				edge = normalize(IN.worldPosC - IN.worldPosA);
				edgePlaneMatrix = float3x3(edge, IN.FaceNormal, cross(edge, IN.FaceNormal));
				float3 normalA = mul(edgePlaneMatrix, IN.NormalA);
				float3 normalB = mul(edgePlaneMatrix, IN.NormalB);
				float3 normalC = mul(edgePlaneMatrix, IN.NormalC);
				normalA.z = 0;
				normalB.z = 0;
				normalC.z = 0;
				normalA = mul(normalA, edgePlaneMatrix);
				normalB = mul(normalB, edgePlaneMatrix);
				normalC = mul(normalC, edgePlaneMatrix);
				projA = projectToTangentPlane(IN.worldPos, IN.worldPosA, normalA);
				projB = projectToTangentPlane(IN.worldPos, IN.worldPosB, normalB);
				projC = projectToTangentPlane(IN.worldPos, IN.worldPosC, normalC);
				smoothPos = triLerp(IN.Barycentric, projA, projB, projC);

				planeDist = pointOnPlane(smoothPos, IN.FaceNormal, IN.worldPos);
				if (planeDist < 0.0f)
				{
					// Flat
					smoothPos = IN.worldPos;
				}
			}
		}
	}

Most of these matrix multiplies could be replaced by something cheaper, but anyway, here’s the result:

image_2018-04-12_14-01-36 (2)

The seam is gone. The shadow is still having somewhat weird shape, but in fact it looks exactly like that even when using classic shadowmapping, so I call it a day.

However, there is another problem. An obvious consequence of moving sample positions too far from the surface is that they now can go inside another surface!

sampleintersect
Meshes don’t intersect, but sample points of one object penetrate into another

Turns out, smooth position alone is not enough. This problem can’t be entirely solved with it. So on top of that I execute following algorithm:

  • Trace a ray from flat position to smooth position
  • If there is an obstacle, use flat, otherwise smooth

In practice it will give us weird per-texel discontinuities when the same triangle is partially smooth and flat. We can improve the algorithm further and also cut the amount of rays traced:

  • Create an array with 1 bit per triangle (or byte to make thing easier).
  • For every texel:
    • If triangle bit is 0, trace one ray from real position to smooth position.
      • If there is an obstacle, set triangle bit to 1.
    • position = triangleBitSet ? flatPos : smoothPos

That means you also need triangle IDs in the GBuffer, and I output one into alpha of the smooth position texture using SV_PrimitiveID.

To test the approach, I used Alyx Vance model as it’s an extremely hard case for ray-tracing due to a heavy mismatch between low-poly geometry and interpolated normals. Here it shows triangles marked by the algorithm above:

image_2018-04-12_00-58-06 (2).png

Note how there are especially many triangles marked in the area of belt/body intersection, where both surfaces are smooth-shaded. And the result:

image_2018-04-10_12-11-24 (2).png
Left: artifacts using smooth position. Right: fixed by using flat position in marked areas.

Final look:

alyxlm.png

I consider it a success. There is still a couple of weird-looking spots where normals are just way too off, but I don’t believe it can improved further without breaking self-shadowing, so this is where I stopped.

I expect more attention to this problem in the future, as real-time ray-tracing is getting bigger, and games can’t just apply real tessellation to everything like in offline rendering.


Ray bias

Previously I mentioned a “bias” value in the context of a tiny polygon offset, also referred to as epsilon. Such offset is often needed in ray-tracing. In fact every time you cast a ray from the surface, you usually have to offset the origin a tiny bit to prevent noisy self-overlapping due to floating-point inaccuracy. Quite often (e.g. in OptiX samples or small demos) this value is hard-coded to something like 0.0001. But because of floats being floats, the further object is getting from the world origin, the less accuracy we get for coordinates. At some point constant bias will start to jitter and break. You can fix it by simply increasing the value to 0.01 or something, but the more you increase it, the less accurate all rays get everywhere. We can’t fix it completely, but we can solve the problem of adaptive bias that’s always “just enough”.

image_2018-06-07_21-21-07.png
At first I thought my GPU is fried

The image above was the first time the lightmapper was tested on a relatively large map. It was fine near the world origin, but the further you moved, the worse it got. After I realized why it happens I spent a considerable amount of time researching, reading papers, testing solutions, thinking of porting nextafterf() to CUDA.

But then my genius friend Boris comes in and says:

position += position * 0.0000002

Wait, is that it? Turns out… yes, it works surprisingly well. In fact 0.0000002 is a rounded version of FLT_EPSILON. When doing the same thing with FLT_EPSILON, the values are sometimes exactly identical to what nextafterf() gives, sometimes slightly larger, but nevertheless it looks like a fairly good and cheap approximation. The rounded value was chosen due to a better precision reported on some GPUs.

In case we need to add a small bias in the desired direction, this trick can be expanded into:

position += sign(normal) * abs(position * 0.0000002)

city3.png


Fixing UV seams

UV seams are a known problem and a widely accepted solution was published in the “Lighting Technology of The Last Of Us” before. The proposed idea was to utilize least squares to make texels from different sides of the seam match. But it’s slow, it’s hard to run on the GPU, and also I’m bad at maths. In general it felt like making colors match is a simple enough problem to solve without such complicated methods. What I went for was:

  • [CPU] Find seams and create a line vertex buffer with them. Set line UVs to those from another side of the seam.
  • [GPU] Draw lines on top of the lightmap. Do it multiple times with alpha blending, until both sides match.

To find seams:

  • Collect all edges. Edge is a pair of vertex indices. Sort edge indices so their order is always the same (if there are AB and BA, they both become AB), as it will speed up comparisons.
  • Put first edge vertices into a some kind of acceleration structure to perform quick neighbour searches. Brute force search can be terribly slow. I rolled out a poor man’s Sweep and Prune by only sorting in one axis, but even that gave a significant performance boost.
  • Take an edge and test its first vertex against first vertices of its neighbours:
    • Position difference < epsilon
    • Normal difference < epsilon
    • UV difference > epsilon
    • If so, perform same tests on second vertices
    • Now also check if edge UVs share a line segment
    • If they don’t, this is clearly a seam

A naive approach would be to just compare edge vertices directly, but because I don’t trust floats, and geometry can be imperfect, difference tests are more reliable. Checking for a shared line segment is an interesting one, and it wasn’t initially obvious. It’s like when you have 2 adjacent rectangles in the UV layout, but their vertices don’t meet.

After the seams are found, and the line buffer is created, you can just ping-pong 2 render targets, leaking some opposite side color with every pass. I’m also using the same “conservative” trick as mentioned in the first chapter.

Because mip-mapping can’t be used when reading the lightmap (to avoid picking up wrong colors), a problem can arise if 2 edges from the same seam have vastly different sizes. In this case we’ll have to rely on the nearest neighbour texture fetch potentially skipping some texels, but in practice I never noticed it being an issue.

Results:

Even given some imperfections, I think the quality is quite good for most cases and simple to understand/implement comparing to least squares. Also we only process pixels on the GPU where they belong.


Final touches

  • Denoising. A good denoiser can save a load of time. Being OK with a Nvidia-only solution I was lucky to use Optix AI denoiser, and it’s incredible. My own knowledge of denoising is limited to bilaterial blur, but this is just next level. It makes it possible to render with a modest amount of samples and fix it up. Lightmaps are also a better candidate for machine-learned denoising, comparing to final frames, as we don’t care about messing texture detail and unrelated effects, only lighting.

    A few notes:
  • Denoising must happen before UV seam fixing.
  • Previously OptiX denoiser was only trained on non-HDR data (although input/output is float3). I heard it’s not the case today, but even with this limitation, it’s still very usable. The trick is to use a reversible tonemapping operator, here is a great article by Timothy Lottes (tonemap -> denoise -> inverseTonemap).
  • Bicubic interpolation. If you are not shipping on mobile, there are exactly 0 reasons to not use bicubic interpolation for lightmaps. Many UE3 games in the past did that, and it is a great trick. But some engines (Unity, I’m looking at you) still think they can get away with a single bilinear tap in 2018. Bicubic hides low resolution and makes jagged lines appear smooth. Sometimes I see people fixing jagged sharp shadows by super-sampling during bake, but it feels like a waste of lightmapping time to me.bilinearVsBicubic.jpg
    Left: bilinear. Right: bicubic.
    Bakery comes with a shader patch enabling bicubic for all shaders. There are many known implementations, here is one in CUDA (tex2DFastBicubic) for example. You’ll need 4 taps and a pinch of maths.

Wrapping it up the complete algorithm is:

  • Draw a UV GBuffer using (pseudo) conservative rasterization. It should at least have:
    • Flat position
    • Smooth position obtained by Phong Tessellation adapted to only produce convex surfaces
    • Normal
    • World-space texel size
    • Triangle ID
  • Select smooth or flat position per-triangle.
  • Push positions outside of closed surfaces.
  • Compute lighting for every position. Use adaptive bias.
  • Dilate
  • Denoise
  • Fix UV seams
  • Use bicubic interpolation to read the lightmap.

Final result:

ftfinalbake.png

Nice and clean


Bonus: mip-mapping lightmaps

In general, lightmaps are best left without mip-mapping. Since a well-packed UV layout contains an awful lot of detail sitting close to each other, simply downsampling the texture makes lighting leak over neighbouring UV charts.

But sometimes you may really need mips, as they are useful not only for per-pixel mip-mapping, but also for LODs to save memory. That’s the case I had: the lightmapper itself has sometimes to load all previously rendered scene data, and to not go out of VRAM, distant lightmaps must be LODed.

To solve this problem we can generate a special mip-friendly UV layout, or repack an existing one, based on the knowledge of the lowest required resolution. Suppose we want to downsample the lightmap to 128×128:

  • Trace 4 rays, like in the leak-fixing pass, but with 1/128 texel size. Set bit for every texel we’d need to push out.
  • Find all UV charts.
  • Pack them recursively as AABBs, but snap AABB vertices to texel centers of the lowest resolution mip wanted. In this case we must snap UVs to ceil(uv*128)/128. Never let any AABB to be smaller than one lowest-res texel (1/128).
  • Render using this layout, or transfer texels from the original layout to this one after rendering.
  • Generate mips.
  • Use previously traced bitmask to clear marked texels and instead dilate their neighbours inside.

Because of such packing, UV charts will get downsampled individually, but won’t leak over other charts, so it works much better,  at least for a nearest-neighbour lookup. The whole point of tracing the bitmask and dilating these spots inside is to minimize shadow leaking in mips, while not affecting the original texture.

It’s not ideal, and it doesn’t work for bilinear/bicubic, but it was enough for my purposes. Unfortunately to support bilinear/bicubic sampling, we would need to add (1/lowestRes) empty space around all UV charts, and it might be too wasteful. Another limitation of this approach is that your UV chart count must be less than lowestMipWidth * lowestMipHeight.


P.S.

Top ways to annoy me:

  • Don’t use gamma correct rendering.
  • Ask “will it support real-time GI?”
  • Complain about baking being slow for your 100×100 kilometers world.
  • Tell me how lightmaps are only needed for mobile, and we’re totally in the high quality real-time GI age now.
  • Say “gigarays” one more time.

 

Advertisements

After the Flood

“After the Flood” is a WebGL 2.0 demo I worked on for PlayCanvas and Mozilla.

It features procedural clouds, water ripple generation, transform feedback particles and simple tree motion simulation.

It’s not as polished as I wanted it to be though.

Here’s the post from Mozilla: https://hacks.mozilla.org/2017/01/webgl-2-lands-in-firefox/

And another: https://blog.mozilla.org/blog/2017/01/24/gets-better-video-gaming-non-secure-web-warning/

And from PlayCanvas: https://blog.playcanvas.com/mozilla-launches-webgl-2-with-playcanvas/

wgs0.jpg

wgyyy.jpg

wgsfl.jpg

 

 

 

 

 

 

Also, water ripple shader: https://www.shadertoy.com/view/lltXD4

The first music is System by Carbon Based Lifeforms.

The 2nd track (after the phone booth) is composed by Anton Krivosejenko.

The demo was shown on GDC and 3DWebFest.

Why games

I was recently talking to a friend, listing reasons of why I’m orbiting around the game industry, and decided to make a post out of it.

While I’m not truly an accomplished game developer, meaning I didn’t ship a finished game, I still exist in this world, making engines, playable demos, prototypes and similar things. I respect this medium and defend it, sometimes even too aggressively.

I’ve seen different stances towards games. I know a lot of people, who say they “grew up” from games, and now have to do their Important Adult Things (like hanging around in social networks for hours and drinking). I know game artists, who don’t care about game/movie differences, as one of my friends used to say “both are just media content”. This is certainly not my position.

I know and I’ve experienced things in games, that no other medium can produce, and I find it quite fascinating, and I still think the industry is young and what we see today is far from what it can become. If only people would experiment more and copy successful products less…

Anyway here’s the list. Perhaps I will update it occasionally. Also, note that not every game has these features, but just sometimes in some games they happen.

  • Here and Now. It’s hard to describe, but only in games (mostly 1st/3rd person) I can feel that things are happening right now, and they weren’t prerecorded. You can just stop from following the plot and observe the environment, noticing tiny details, seeing smoke/trees/clouds/etc slowly moving. More realistic games can even provoke smell/temperature associations in my brain. You can just walk around for hours, enjoying the day, without story rapidly moving you somewhere in a narrow corridor. It sounds like it can only happen in open world games, but really I remember feeling this even in HL2, where I could just stay and stare at the sea in some sort of trance, thinking of this world. For me it feels very different from observing prerecorded videos. There’s spatial continuity of my movement, and there’s actually me, or at least some avatar of me, that reacts immediately on my thoughts, translated through a controller, which I don’t even notice after getting used to it. The great part of that feature is that even when player freely moves around, not caring about plot and gameplay, they still read the story through environment observation.
  • Consequence. Only in games you can have a choice. And if you agree with the choice you’ve made, it can feel very personal to you (on the other hand, when all options are crap you wouldn’t choose, it’s quite annoying and breaks the experience). Then, when the game shows you a consequence of your decision, you take it more seriously, comparing to a static narrative. Only games can make you feel guilty, which in turn leads you to review your own decisions, and what made you to select this option (and this can expand to your real life decision-making).
  • What If. The more complex the mechanics of the game, the more creative freedom there is. You can exploit stuff, experiment, try different combinations of options and see how it goes. This is simply pleasing to the brain, and an important aspect of “fun”. It also makes your walkthrough much more personal, and it creates memorable moments (comparing again to a static narrative).
  • Situation models. Sometimes in games you find yourself in a situation, you could be in, but didn’t yet. It’s an interesting exercise to try playing it and see the result. One of my favorite examples is Morrowind: you have a bunch of things to do, you need to find some places you’ve never been to (there’re no markers you could just run straight to, unlike next TES games), and I also had a mod that added hunger/thirst/need to sleep into it. Now manage it! The situation is quite similar to what I later experienced in life, and this past in-game experience made me more confident that I can cope with a lot of things without being overwhelmed.
  • Simply technical awe. Not all people experience it, but I simply love seeing how game tech advances, new techniques used, new cool effects made possible. That may be just my nerdiness, but I’m amazed realizing that beautiful things I see are rendered right now on my GPU, faster than my eyes blink, how is this even possible?!

I’m sure there are more reasons, and I could forget something, but it’s a start. You can suggest me something you like 🙂

Rendering painted world in JG

Here’s a little breakdown and implementation details of the real-time painted world in my last demo – JG.

Here’s the video

Here’s the demo

 

(Click for Russian version)

The “painted” effect wasn’t planned. Originally I only had an idea to render a natural scenery of a certain kind, and I wasn’t ready to spend a whole lot of time on it. It became clear to me, a “realistic” approach won’t work, resulting in either very mediocre visuals (due to engine limitations and the complexity of real-time vegetation modeling), or a whole year of trying to catch up with Crysis. So it wasn’t the way.

What I really wanted is to preserve the atmosphere, the feeling, avoiding ruining it with technical limitations.

So I have to render something very complex without killing myself and players’ computers, what do I do? Intuition said: “bake everything”. I recalled seeing outdoor 3D scans: even with bad geometry (or even as point clouds), they still looked quite convincing, thanks to right colors being in right places, with all nice and filtered real-life lighting already integrated into everything. Unfortunately, the time of year was absolutely the opposite of desired, so I wasn’t able to try my mad photogrammetry skills.
But what if we “scan” a realistic offline 3D scene? Vue surfaced in my memory as something that movie/exterior visualization folks use to produce nice renderings of nature. I had no idea what to expect from it, but I tried.

I took a sample scene, rendered it from several viewpoints and put those into Agisoft Photoscan to reconstruct some approximate geometry with baked lighting. And… alas, no luck. Complex vegetation structure and anti-aliasing weren’t the best traits for shape reconstruction.
Then it hit me. What does Agisoft do? It generates depth maps, then a point cloud out of multiple depths. But I can render a depth map right in Vue, so why do I need to reconstruct?

Being familiar with deferred rendering and depth->position conversion, I was able to create a point cloud out of Vue renderings. Not quite easily, though: Vue’s depth appeared to have some non-conventional encoding. Luckily, I finally found an answer to it.

And from this:

paintprocess1

With some MaxScript magic, we get this:

paintprocess2

Which is a solid single textured mesh.

Hard part is over, now I only needed to repeat the process until I get a relatively hole-free scene. Finally it’s time to have some fun with shaders 🙂

Each projected pixel acts as a camera-facing quad, textured with one of those stroke textures:

daubs

Almost. There was a bug in my atlas reading code, so some quads only had a fraction of stroke on them. However, it actually looked better, than the intended version, so I left the bug. It’s now a feature 🙂

Quads size obviously depends on depth, becoming larger with distance. It was quite important to not mix together small and large quads, so I had to carefully choose viewpoints.

Test scene looked promising, so I started to work on the one I wanted:

pond_v2

I made the house, fence and terrain from scratch. Plants were taken from various existing packs. Then I assembled the final composition out of this stuff. I lost count on the amount of renderings I had to do to cover all playable area:

sdf5

Some had to be photoshopped a little to get rid of dark spots and to add more colors:

pond_300_fill

At first, I had troubles with getting the lighting right, so I had a lot of these black spots to fix, then I actually managed to tune it better. Final scene is actually a mix of different approaches, because I didn’t have the time to re-render everything with different settings, and because it actually looked less monotonous.

Some early screenshots:

At this moment I also had stroke direction set up properly, what was pretty important, as uniform strokes had very unnatural look. At first, I tried to generate stroke direction procedurally (similar to how you generate normal map from a height map), but it wasn’t sufficient. It was obvious to me how some strokes must lay, for example, I really wanted vertical strokes for the grass and fence strokes following the shape of the fence. Not being able to direct it with purely procedural approach, I simply decided to manually paint stroke direction in additional textures. Final version uses manual direction near the camera and procedural for distant quads. Here’re some examples of direction maps:

pond4_dir_hi

To be honest, painting vectors with colors in Photoshop wasn’t the most exciting thing to do, but still, it was the quickest way I could think of 😀

The difference was quite obvious. Here’s uniform direction on the left, changed on the right:

paintprocess3

And this is it. The point cloud nature of the scene also allowed me to have some fun in the ending part, making quads behave like a surreal particle system. All motion was done in vertex shader.

I hope it was somewhat interesting to read, at least I’ll not forget the technique myself 🙂

 

Bonus information

Recently I was asked how to fill inevitable holes between quads. The way I did here is simple – I just used very rough underlying geometry:

paintprocess4

Рендер нарисованного мира в JG

Речь пойдёт о том, как был сделан нарисованный мир в моей последней демке – JG.

(Click for English version)

Эффект нарисованности не был запланированным. Была идея показать природную сцену определённого типа и мало времени. Стало ясно: пытаться делать реалистично даст либо очень посредственную картинку (ввиду Unity и сложности моделирования растительности для игр), либо год мучений в надежде догнать Crysis (да и тот, на взгляд не привыкшего к графике игр человека, вряд ли выглядит совершенно; картонно-крестовидные плоскости листвы и меня до сих пор коробят). В общем, это был не вариант.

Главное – сохранить правильное ощущение, атмосферу, не испоганив её ограничениями графики. Очень хотелось избежать синтетичности и компьютерности (это же природная сцена всё-таки).

Итак, нужно нарисовать что-то очень сложное, не убив себя и компьютеры игроков.
Интуиция подсказывала: “надо всё запечь”. По крайней мере, с освещением это всегда прокатывало. В данном случае вообще всё сложное, так что и запечь надо всё. Вспомнились 3D-сканы местности: даже при плохой геометрии (или вообще в виде облака точек) они все равно смотрелись достаточно убедительно, из-за того, что все цвета на своих местах, всё со всем сочетается, и детальное реалистичное освещение уже отфильтровано и запечено. К сожалению, время года на момент разработки было прямо противоположно желаемому, так что вариант со сканом отпал.
Но что если мы сделаем реалистичную оффлайн сцену с красивым освещением и получим её “скан”? Где-то в моей памяти всплыл Vue, как нечто, в чём для кино и всяких экстерьерных визуализаций рендерят красивые природные ландшафты. Да, пожалуй это что надо, подумал я.

Покрутив неуклюжий интерфейс, решил для теста воссоздать в Юнити фрагмент какой-нибудь сцены из примеров. Отрендерил её с нескольких ракурсов, сунул в Агисофт и… разочаровался. Сложность геометрии растительности и сглаживание были не лучшими качествами для хорошего скана. Точки еле находились, всё было не на своих местах.
Тут меня осенило. Что делает Агисофт? Он пытается создать несколько карт глубины из картинок, а затем по ним ставит точки. Но ведь Vue сам умеет рендерить точную глубину из камеры, так что зачем мне её восстанавливать?

Каждый, кто писал деферед рендерер, знает, как восстановить позицию из глубины (правда я туплю каждый раз все равно). Таким образом мы и получаем облако точек из всех видимых камерой пикселей. Глубина в Vue, однако, оказалась непростой. К счастью, я в конце концов набрёл на ответ разработчиков о её кодировании.

Из этого:

paintprocess1

Некоторыми манипуляциями с MaxScript’ом получаем это:

paintprocess2

Это цельный меш, затекстуренный рендером.

Сложная часть позади, пришло время собрать из таких штуковин сцену и поиграться с шейдером 🙂

Каждый квад поворачивается на камеру и текстурится одним из этих мазков:

daubs

Почти. На самом деле, в шейдере баг, из-за которого местами попадает не целый мазок, а его фрагмент. Однако, исправленная версия мне показалась более скучной и синтетичной, так что я вернул, как было. Это не баг, это фича 🙂

Размер квадов меняется в зависимости от глубины, т.е. вдалеке они огромные, чтобы компенсировать их разряженность. Вообще очень важно было правильно подбирать ракурсы рендеров, чтобы детализация мазков была консистентной, и мелкие с крупными в одну кучу не мешались.

Далее я делал в Vue, собственно, нужную мне сцену. Графон выходил такого рода:

pond_v2

Заборчик, дом и ландшафт делались с нуля, растения же практичнее было поискать в готовых паках и собрать из всего этого цельную композицию. Я сбился со счёта, сколько мне потребовалось рендеров, чтобы забить всё играбельное пространство точками:

sdf5

Многие рендеры приходилось дополнительно немного обрабатывать для более “живописного” эффекта – вытягивать больше оттенков, убирать темноту, делать тени немного синее:

pond_300_fill

Сперва я долго не мог подобрать хорошее освещение, и этой черноты, требующей выправления, было много. Затем удалось всё же получать сразу на рендере картинку лучше, но итоговая сцена в игре сшита из рендеров разных времён, что мне даже понравилось, делало её более интересной, менее монотонной.

Некоторые ранние кадры:

К этому моменту, в отличие от первых попыток, уже была реализована смена направления мазков, ибо, лежащие одинаково, они смотрелись очень неестественно и похоже на фильтр из фотошопа. Сперва я понадеялся задать её абсолютно процедурно, но этого не оказалось достаточно. Процедурный вариант находил разницы яркостей соседних пикселей и на основе этого создавал вектор направления – похоже на то, как из карты высот считают карту нормалей. Но местами мне было очевидно, как должны лежать мазки, а шейдеру нет: скажем, я знал, что траву здесь лучше рисовать вертикальными линиями, а забор по направлению палочек самого забора. В итоге я решил рисовать карты направлений для каждого рендера, где цвет задавал вектор, и совмещать это с процедурным направлением вдалеке. Вот так странно выглядели карты направлений:

pond4_dir_hi

Рисовать векторы в фотошопе цветом – то ещё удовольствие (для извращенцев).

Разница довольно очевидна: слева с одинаковым направлением, справа с изменённым:

paintprocess3

Таким образом мы и подходим к финальной картинке. В концовке я решил немного оторваться и воспользоваться тем, что всё состоит из маленьких квадов, заставив их крутиться, разлетаться и собираться по всякому. Вся анимация частиц задана в шейдере.

Домик в конце пришлось практически полностью рисовать кистями в фотошопе, т.к. фотографичная версия слишком выбивалась из общего стиля.

Такие вот дела. Записал всё это, чтобы хотя бы самому не забыть, что и как делал 🙂

 

Бонус:

Недавно спрашивали, как заделывать неизбежные дырки в пустоту между квадами. Для этого я использовал очень грубую геометрию похожих цветов позади мазков (навроде подмалёвка):

paintprocess4

Notes on shadow bias

These are notes for myself about shadow mapping bias.
Good summary about all aspects of shadow mapping: http://mynameismjp.wordpress.com/2013/09/10/shadow-maps/

My results:
bias

I’m not sure what’s wrong about Receiver Plane depth bias. What is interesting, it does work OK when there is no interpolation between samples.
In this presentation, there’s a comparison, but it also uses samples without interpolation: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Isidoro-ShadowMapping.pdf (page 39).
Here they also get strange artifact with it similar to one I have on sphere: http://www.digitalrune.com/Support/Blog/tabid/719/EntryId/218/Shadow-Acne.aspx
MJP also says that “When it works, it’s fantastic. However it will still run into degenerate cases where it can produce unpredictable results”.
So, maybe I implemented it wrong, or maybe I was unlucky enough to quickly get degenerate cases, but I’m not really willing to try this technique anymore.

Normal offset:
http://www.dissidentlogic.com/old/#Notes%20on%20the%20Normal%20Offset%20Materials
Also this may better explain why it works: http://c0de517e.blogspot.ru/2011/05/shadowmap-bias-notes.html

There are 2 ways how to implement Normal Offset bias. One way is to inset geometry by normal when rendering the shadow map. The insetting amount is also scaled by slope aka dot(N,L) and also can be scaled by distance factor with FOV included for using with perspective projection.
Second way is to render shadow map normally, but add (instead of subtract) same scaled vertex normal to fragment position just before multiplying it by shadow map matrix and comparing.
The second method has less impact on shadow silhouette distortion and gives better results. It is, however, not easy to do with deferred rendering, because you need vertex normal, not normal mapped one!
Unity 5 seems to use 1st version exactly because it can’t hold vertex normal in G-Buffer.

Funnily enough, Infamous Second Son is OK with storing it there:
http://www.redgamingtech.com/infamous-second-son-engine-postmortem-analysis-breakdown/

And they use it exactly for normal offset (and other stuff too): https://twitter.com/adrianb3000/status/464584971483893762

You can also try to calculate face normal from depth, BUT you’ll get unpredictable results on edges.
Even Intel guy couldn’t solve that: https://twitter.com/AndrewLauritzen/status/539669636912914432

Hole appears after insetting geometry:
tea

normaloffset2

2nd variant doesn’t suffer from this (you can still see a tiny hole there though… but it also exists with just constant bias, so it’s not a normal bias problem).
Real-time demo with Normal Offset 2nd version: http://geom.io/pc25d/demoShaders5.html
No acne, no peter-panning, yay!
Use RMB + WASD to fly around. Feel free to look into source.

You can tweak both normal/constant bias in browser console using
light2.light.normalOffsetBias
light2.light.shadowBias

Third person camera

Many people complained about jerky and jumpy camera behaviour in the prototype of my game, Faded. I wasn’t happy with it myself either, I just had to implement so many things with no enough time to make each one perfect. Recently I decided to finally fix it.

Third person cameras are very different in every game, from simple orbiting + collision to some attempts to make it more “cinematic”. The idea of making a “cinematic” one was also my original diploma thesis, however after a few tests I abandoned it and changed the topic of my thesis to something more familiar (real-time rendering) because I was unsure if those experiments will yield any good results, so it was just risky.

Let’s start with basic problems.

Problem 1: occlusion
95% of answers for it you’ll find up googling is “throw a ray from character to camera and position camera at picked point!“. It’s a good starting point of course, but you just can’t leave it this way, there are plenty of reasons why it’s a bad idea:
– your camera’s near plane has size, while ray has zero thickness, so you have a chance of seeing through walls;
– camera will jump from point to point abruptly.
Positioning camera to “pickedPosition + pickedNormal * radiusAroundNearPlane” is still insufficient, as can be seen here:
cameraPushByNormal1

Luckily most physics engines support “thick” rays. If you use Unity/PhysX, use SphereCast.
There are still a few problems however:
– if spherecast already intersects a wall at its origin, it will move through it further;
– you still have abrupt jumps.

cameraSphereCast

The alternative way is just to use a physical sphere and move it to the desired camera position accounting for all collisions, but the sphere can just get stuck in some concave level geometry.

To fix the first spherecast problem, you can do following:
– project the sphere to the opposite direction of the character-camera ray. So the origin of the ray is still character, by the direction is inverted;
– use picked point that is far enough as new ray origin. If nothing is picked, just use origin + invDir * farEnough;
– do SphereCast as usual, but with new origin. This way you will get rid of sphere intersecting nearby walls.
Code for Unity: http://pastebin.com/k3ti7kV2

The remaining problem is abrupt camera teleportation. How do other games deal with it? Let’s see:

Watch Dogs seems to use the simplest method – just teleporting camera at thick ray’s projected position. I can also see a quick interpolation of camera distance from close-up back to default.

L.A. Noire has more pronounced smoothed distance interpolation when the occlusion is gone. Sudden appearance of occlusion still makes abrupt movement though. The most interesting thing in L.A. Noire is the way camera follows you when you don’t move mouse. It can move around corners very intelligently. Not sure how it’s implemented, perhaps it uses AI navigation system?

Hitman Absolution tries to move camera as smoothly as possible, sliding along obstacles, before they’re in front of camera.
I think it’s a good solution, and I decided to implement it.

So here’s the idea:

twoCapsules

Use two spherecasts. One thin (with radius to encapsulate near plane) and one thick. Then:
– project thick collision point (green point) onto ray. You’ll get red point;
– get direction from thick collision point to projected point, multiply it by thin radius and offset projected point back by it. This way you’ll get thick collision point projected onto thin capsule (cyan point);
– Get distance from cyan point to green point. Divide it by (thickRadius – thinRadius). You’ll get the [0-1] number representing how close the obstacle is to thin spherecast. Use it for lerping camera distance.
Code for Unity: http://pastebin.com/BqaJh3Vx

I think that’s quite enough for camera occlusion. You can still try to make camera even smarter at walking around corners as in Noire, but I think it’s an overkill for now. Later I’ll maybe get back to this topic.

Problem 2: composition
Now onto some “cinematic” stuff. First 3rd person games had characters mostly centered on the screen. As games evolved, overall image aesthetics started to become more important. Many photographers will agree that it’s not always the best idea to lock objects dead center – it’s just doesn’t look interesting. The basic rule you (and most importantly, computer) can apply is The Rule of Thirds. Most games today use it to simply put the character a little bit to the side.

thirds

However, can we implement a more dynamic composition search, that is not just dead locked on character being aligned to one line? And how is it supposed to look?

The best references here, in my opinion, are steadicam shots, because these are most closely related to game third-person cameras.
Take a look at some:



As you can see camera changes the focus point and distance quite dynamically, and it looks very interesting. What is not great in context of games, is that camera lags behind characters, so they see something earlier, than the camera.
Camera mainly focuses on character’s points of interest. Also what should be noted is the height of the camera, which is mostly static and not orbiting around at different heights.

Here are results of my first tests (year ago) that implemented some of the ideas:

The middle part is boring and sucks though.
The idea was to mark important objects in the level and make camera adapt to them, aligning everything by rule of thirds together. That’s what debug view can reveal:

Unity 2014-10-14 16-32-19-43

As you can see, the “important” objects marked as green 2D boxes. These boxes are the actual input data for the algorithm. The first box always represents main character.

The algorithm itself is not ideal though and it takes designer’s time to decide which objects should be marked as important to ensure interesting camera movement. The code is a bit dirty and still work in progress, so I’m not sure about posting it here right now. However, if you find it interesting, just tell me, and I’ll post.

Here are the results so far together with smooth occlusion avoidance:

Designing an Ubershader system

OK, so you probably know what ubershaders are? Unfortunately there is no wiki on this term, but mostly by it we mean very fat shaders containing all possible features with compile-time branching that allows them to be then specialized into any kind of small shader with a limited amount of tasks. But it can be implemented very differently, so here I’ll share my experience on this.

#ifdefs

So, you can use #ifdef, #define and #include in your shaders? Or you’re going to implement it yourself? Anyway, it’s the first idea anyone has.

Why it sucks:
  • Too many #ifdefs make your code hard to read. You have to scroll the whole ubershader to see some scattered compile-time logic.
  • How do you say “compile this shader with 1 spot light and that shader with 2 directional lights”? Or 2 decals instead of 6? One PCF shadow and one hard? You can’t specify it with #ifdefs elegantly, only by copy-pasting code making it even less readable.

Terrible real-life examples: 1, 2

Code generation from strings

Yet another approach I came across and have seen in some projects. Basically you use your language of choice and use branching and loops to generate new shader string.

Why it sucks:
  • Mixing shader language with other languages looks like total mess
  • Quotes, string additions, spaces inside strings and \n’s are EVERYWHERE flooding your vision
  • Still have to scroll a lot to understand the logic

Terrible real-life examples: 1, 2

Code generation from modules

So you take your string-based code generation and try to decouple all shader code from engine code as much as possible. And you definitely don’t want to have hundreds of files with 1-2 lines each, so you start to think how to accomplish it.
So you make some small code chunks like this one, some of them are interchangeable, some contain keywords to replace before adding.

Why naive approach sucks:
  • All chunks share the same scope, can lead to conflicts
  • You aren’t sure what data is available for each chunk
  • Takes time to understand what generated shader actually does

Code generation from modules 2.0

So you need some structure. The approach I found works best is:

struct internalData {
some data
};

void shaderChunk1(inout internalData data) {
float localVar;
read/write data
}

float4 main() {
internalData data;
shaderChunk1(data);
shaderChunk2(data);
return colorCombinerShaderChunk(data);
}

So you just declare an r/w struct for all intermediate and non local data, like diffuse/specular light accumulation, global UV offset or surface normal used for most effects.
Each shader chunk is then a processing function working with that struct and a call to it, put between other calls. Most compilers will optimize out unused struct members, so basically you should end up with some pretty fast code, and it’s easy to change the parts of your shader. Shader body also looks quite descriptive and doesn’t require you to scroll a lot.
The working example of such system is my contribution to PlayCanvas engine: 1, 2

Examples of generated code: vertex, pixel, pixel 2, pixel 3

So, I’m not saying this is the best approach. But for me, it’s the easiest one to use/debug/maintain so far.

geom.io [beta]

webgllll1

А тем временем я делаю новый проект – geom.io – хостинг 3D моделей.

Сие даёт возможность вам:

– Тестить различные реал-тайм материалы и свет на ваших моделях (аля Marmoset) – доступен стандартный материал с простыми настройками и возможность написать/скопипастить любой другой шейдер вместо него;

– Загружать ваши модели с настроеным рендером в инет, в результате вы получаете прямую ссылку и HTML код для встраивания в любую страницу (как ютуб).

Вскоре будут добавлены тени, АО, регистрация с созданием личной галереи и прочие ништяки.

Можно посмотреть уже залитые модельки: http://geom.io/gallery.php

Требуется актуальный браузер с HTML5 и WebGL – свежие Firefox, Chrome гарантированно будут работать, в хроме на Android тоже можно просматривать модельки худо-бедно.

Если вы читаете это, и у вас есть модели – заходите потестить и оставляйте отзывы/пожелания/багрепорты/фичреквесты.

Ещё про мой диплом: затухание света и спекуляр

На самом деле, кроме теней, я не проводил такой капитальный ресерч, достойный отдельного поста по другим темам. Поэтому я решил выгрузить все остальные интересные вещи в один пост. Пока выгрузил только одну.

Во-первых, о чём, собственно речь? Это видео с финальной дипломной сцены:

http://www.youtube.com/watch?v=7IolXxg1_q8

И пара скринов:

buildFinal 2013-06-20 00-22-52-35 buildFinal 2013-06-20 00-38-38-52

На самом деле, это далеко от того, что планировалось. Институт обязал нас делать много бумажной работы и, собственно, качество самого диплома мы поднять не успели. Многие из хитрых графических трюков я просто не успел вставить в сцену, которую друг-артист прислал мне в последние 3 (!) дня перед защитой.

Что здесь интересного, и на что я и в будущем планирую обращать внимание:

Спекуляр загораживается тенями от источника, даже если их не видно в дифузе. Это почти никогда не применяется в играх, но это действительно так, можете проверить в вашем любимом рейтрейсере:

Это то, что мы видим обычно в играх:

scanline

Тут есть большая ошибка: затухание источника искусственно ограничено и быстро уходит в ноль на краях. Из-за этого спекуляр обрезается ещё палевнее, чем диффуз. В играх свет часто обрезают так, чтобы можно было вписать источник в сферу и избавить множество фрагментов от вычисления света. В дефереде так можно растеризовывать лоу-поли сферы, в которых семплить глубину с нормалями и быстро шейдить.

Однако это всё идёт мимо реализма.

В реальности, источники затухают обратно квадратично – т.е. 1/(dist^2). Вы можете удивиться, насколько большая разница будет в освещении, если вы попытаетесь сменить привычные range-based лайты на реальные.

Вот ещё неплохой пост на тему: http://imdoingitwrong.wordpress.com/2011/01/31/light-attenuation/

У такого затухания есть большой минус для производительности – оно может затухать очень-очень долго, что может быть совсем не практично для десятков реалтайм лайтов, которым придётся покрыть всю сцену. Однако в моём случае свет был статичный и всё что было нужно, можно было запечь.

Так будет выглядеть тот же кадр с обратно-квадратичным затуханием:

scanline2

А теперь давайте посмотрим на более интересные кадры:

vray1

Здесь мы имеем чисто диффузные поверхности, освещённые одним лайтом. Это то, что вы обычно запекаете в лайтмапы.

А теперь:

vray2

То же самое, но с более отражающими поверхностями.

Мы можем хорошо видеть появление длинного блика от лайта на полу, но этот блик никогда не будет заходить в тень от него. Там где в диффузе нам, благодаря затуханию, видится сплошная тьма – оно, на самом деле ещё продолжается, просто нашему глазу сложно отличить черноту затухания от черноты теней. И не только нашему глазу – числовая точность в этих тёмных местах тоже будет очень низка.

Объяснение тут очень простое: диффузный свет имеет значительно меньшую интенсивность, чем отражённый. Отражённый свет фокусируется в более “плотные” узкие пучки, в то время как диффузный рассеивается во все стороны.

Поэтому там, где диффуз на глаз совсем затух, спекуляр ещё жив и продолжает обрезаться тенями.

В обычных лайтмапах для этого недостаточно информации, если только вы не собираетесь их хранить во флоате.

11IMG_8577

На этих убогеньких фото можно увидеть тот же эффект: на левых картинках можно хорошо увидеть тень (от столба, от машины), а на правых её нет. На левых картинках  блик падал на поверхность и отделял тени – на правых нет. Запекание диффузного света даст нам только правые.

Попробуйте и сами погулять ночью по улицам – вы увидите, что из-за множества источников света, эффект усиливается, одни тени будут часто исчезать, другие появляться.

Так это может выглядеть в реалтайме:

specc

(Да, это юнити, но с моими шейдерами).

С разных ракурсов кажется, будто тени с разных сторон – такое же можно увидеть и в реале.

В плане реализации, я не придумал ничего умнее, кроме как просто запечь отдельно тени без затухания. RGBA8 текстура вместила 4 теневых маски, по штуке на канал.

Т.к. тени – это просто чёрно-белые маски,  неплохо прокатывает и 4-битная точность. Я пробовал засунуть 8 масок в ргба8 и оставить одну выборку, но при распаковке ломалась фильтрация.

Таким образом, у меня вышли GI-лайтмап (только индирект) и маски теней без затухания. Распространение света и затухание считалось в шейдере.

Индирект довольно рассеянный и не имеет широкого дипазона обычно – его можно хранить в лоуресе DXT1. Вообще, их было три штуки (radiosity normal mapping).

Маски плохо реагируют на DXT – поэтому их я хранил по паре RGBA4 в финале.

Что-то лень стало дальше писать, так что To be continued.

Penumbra shadows

Тут я начну цикл постов по темам того, чем я занимался в своей дипломной работе. Называлась она незатейливо: “Реалистичные материалы в реалтайм рендеринге”, однако под собой это подразумевало всё что угодно от реалистичных теней до избавления мелкого спекуляра от алиасинга.
В целом, задача была – рендерить красивую сцену в реалтайме.

В этом посте я расскажу, что я делал с динамическими тенями.
Тени должны были иметь вариативный радиус полутени – чтобы вблизи кастера они были чёткими, а вдали – размытыми, степень размытости должна варьироваться от физического размера источника.

Я изначально упростил себе задачу – пусть тени отбрасывают только динамические объекты (которых в демке будет немного), а на статике запечём лайтмапы.

Мучавшись с месяц, я родил вот такую демку:
!iengine 2013-02-13 01-10-23-92

!iengine 2013-02-13 01-10-36-45

iengine 2013-02-13 01-18-55-10

Её можно скачать здесь: http://geom.io/iengineShadows.zip

/*
В ini файлике можно поменять разрешение.
 Если у вас не нвидия - снизьте antialiasing, т.к. по умолчанию там нвидия-специфик CSAA.
 Мышь + WASD - летать
 LMB - задать направление света в соответствнии с направлением камеры.
 Колесо мыши - менять размер источника света (т.е. размер пенумбры теней). Идеально чёткими конечно не сделать, т.к. ограничено разрешением шадоумапы.
 Требуется нормальная видеокарта скорее всего.
*/

На что следует обратить внимание в первую очередь, так это на отсутствие шума, так надоевшего мне в тенях многих современных игр и действительно большой радиус размытия (широченные гауссы в реалтайме проблематичны).

Короче говоря, как это работает: за основу я взял технику PCSS, суть которой в нахождении для каждого фрагмента некого среднего значения глубины вокруг него в шадоумапе – это значение конвертируется в радиус размытия, который затем юзается в PCF.

Технику юзали не часто, ибо она тормозила. Поиск средней глубины требовал множество выборок, PCF при большом радиусе – не меньше. Чтобы PCF был широким и гладким, его надо сделать совсем тормозящим, и все равно ещё будет присутствовать алиасинг на поверхностях под острым углом (отсутствие мипмаппинга шадоумапы). Альтернативы – мало семплов и жуткий banding или вышеупомянутый шум. В общем то, в играх научились маскировать этот шум не так уж плохо – проходясь по нему скринспейс блюром. Но намётанный глаз все равно спалит =).

Первым делом я решил заменить PCF на другой алгоритм. Чудесность PCSS в том, что PCF в нём совершенно необязателен – даже при не высоком числе выборок в стадии blocker search, мы получаем не самые кривые коэффициенты размытия, которые можем засунуть в любой алгоритм.

Меня заинтересовали summed area tables. Суть их в том, что благодаря простой арифметике, имея картинку, каждый пиксель которой содержит сумму всех пикселей выше и левее его (существуют вариации и с ниже и правее, но не суть), мы можем найти среднее значение всех пикселей в любом прямоугольнике на ней, имея лишь угловые значения. Сперва это может туго пониматься – но атишная дока довольно наглядна. Таким образом, сделав один раз препасс и превратив любую текстуру в SAT, мы можем за 4 выборки и маленькое число инструкций получить блюр любого радиуса. Ух ты!

Ух ты ли? На самом деле далеко не совсем.

Во-первых, суммы пикселей будут иметь чертовски широкий диапазон значений. Если текстура была RGBA8 формата, для SAT в большинстве случаев придётся создавать RGBA32F. И даже в точность флоатов SAT вносит много погрешностей. Если на цветовой текстуре их может быть не заметно, это может сломать шадоумапы. Я бы не стал юзать SAT для больших теней аля открытый мир, но т.к. в моих планах было маленькое число дин. объектов в мире статики – жить было можно.

Во-вторых, препасс очень тяжёлый. “Сложить все пиксели текстуры” звучит несложно на словах, но совсем не дешёво на практике. Лучший известный способ, он же представленный в атишной доке, требует несколько пассов, причём кол-во пассов очень быстро увеличивается при увеличении текстуры. Генерировать SAT больше, чем на 512х512 – дохлый номер. Дешевле делать VSM с широким блюром.

Но, однако, в вышеобозначенной демке я всё же использовал SAT – ещё не успев окончательно в нём разочароваться.

Были применены некоторые дополнительные трюки:

Дело в том, что у PCSS техники есть один знатный баг – невозможно получить несколько полутеней, пересекающих друг друга корректно – blocker search видит только ближайшие к камере данные из шадоумапы. Поэтому “главной” полутенью будет полутень ближайшего к ней объекта – и если какой-нибудь более мелкий объект стоит в тени и кастует свою тень на полутень объекта за ним – она не отобразится. Будет полутень главного, а потом резко начнётся тень мелкого, как только он появится в шадоумапе.

Пока тени не пересекаются, это не заметно, но я хотел это исправить. Для этого я сделал из шадоумапа атлас, в котором выделил отдельно место для каждого объекта – таким образом я ещё и сэкономил пространство текстуры и смог крутить препасс SAT отдельно на каждый блок атласа. Вообще там было хитро – 512х512 атлас с 4мя шадоумапами по 256х256, я смог генерировать SAT атласа за кол-во пассов, необходимое для одной 256х256 текстуры.

Таким образом, у меня были данные всех объектов в шадоумапе не загороженные и можно было избежать этого артефакта – можно заметить его отсутствие на втором скрине.

Тем временем сроки подходили к концу, на меня снова стал сыпаться контент, и такие экспериментальные методы пришлось отбросить. У меня не было времени подготовить к “продакшену” всю эту систему с атласами.

Дело было упрощено до VSM + PCSS. Шадоумапа рисовалась в VSM текстуру без всяких атласов, далее по ней проходился минимальный блюр. PCSS юзал тот же PCF цикл, который вместо бинарных сравнений/hardware pcf’а семплил эту VSM карту. Минимальный блюр в ней был конечно пошире хардварного псфа, это позволяло брать мало семплов (при псфе это выглядело бы как жуткий бандинг). В результате получались тени с широким (много семплов неширокого) блюром вдали от кастера и менее широким вблизи. В идеале хотелось сделать их вблизи чётче – но и так более-менее устраивало:

btest 2013-05-29 16-45-03-04 1dxc btest 2013-06-19 19-52-11-26

Конечно, эффект уже был не такой, но с ними просто было работать. Тут можно посмотреть видео: http://www.youtube.com/watch?v=2jk5TmfKNZA

Остался, правда, косяк, который я не успел исправить – а именно резкие границы у черезчур размытых теней. Для оптимизации у меня в шейдере стоял if, который не считал тени там где их не должно было быть – но работал он не совсем корректно.

Далее подобным образом я сделал и тени от поинтлайтов – для VSM, их пришлось рендерить как dual-paraboloid‘ы.

Прикольная черта теней от поинт лайтов – т.к. мы снимаем шадоумапы с центра лайта с включённой перспективой – у нас автоматически дальние объекты становятся мельче и тени от них размываются сильнее. Получается дополнительный фейк в пользу визуального эффекта корректной полутени =).

btest 2013-06-01 01-33-28-62 btest 2013-06-01 01-34-11-40 btest 2013-06-05 03-43-43-04 btest 2013-06-03 23-13-26-26

Каковы мои дальнейшие планы?

Мне нравятся distance fields и то, что с ними можно сделать. Даже очень лоуресные DF могут трейситься как довольно похожие на оригинальную форму геометрии – в дипломной работе я использовал их для самоотражений объектов (но об этом в другой раз). Много чего можно запечь в маленькие DF. А можно и попытаться генерить их в реалтайме…

Арт

Прежде чем рассказать о своём дипломе, запощу ка я сюда вялую коллекцию своих моделек и картинок.

debris6.jpg1e44cf4d-c1fb-421e-90b7-5427b8c2f090Large fence8.jpg1d41c893-537c-4b35-b019-3d4308049c34Large

Environment: я делал всю геометрию уровней/пропсов в Incident, а друг текстурил её. Сейчас наши самые вменяемые модели тех времён можно купить на турбосквиде.

Модель вертолёта друг так и не затекстурил – а ведь она планировалась быть “главным героем” симулятора:

mi2

mi2_3

А вообще, я люблю моделить лица людей. Я вообще люблю лица людей.

98b1dd2e2bd515daa2bb782b523fa574 96874deebbf298c696ea502e3357ad96 99f30b8f3631ae85d476b9f48fd5bc5f ec749be4dac3f5f02d5fd505480685f7 83a36c55ff2829e2ddcdc836928109a4

Давно планирую заняться, наконец, продолжением сей деятельности.

А пока остаётся показать моё 2D – но оно уж совсем убого:

someI1a copy3c_comp copyLo

fadedsk1 copyBW IMG_3780