Baking artifact-free lightmaps on the GPU

57653746-ef56-4880-9ab3-c03ea8419d1f

 

Since 2015 I was working on a GPU lightmapper called Bakery. It’s finally done, and you can even buy it on Unity’s Asset Store (it can be used outside of Unity as well, but the demand is higher there). Originally intended for my own game, I decided to make it a product by itself and hopefully help other people bake nice lighting. In my old tweet I promised to write about how it works, as there were many, MANY unexpected problems on the way, and I think such write-up would be useful. I also thought of open-sourcing it at the time, but having spent almost more than a year of work and coding it full-time now, I think it’s fair to delay it a bit.

The major focus of this project was to minimize any kinds of artifacts lightmapping often produces, like seams and leaks, and also make it flexible and fast. This blog post won’t cover lighting computation much, but will instead focus on what it takes to produce a high quality lightmap. We will start with picture on the left and will make it look like the one on the right:

Contents:
UV space rasterization
Optimizing UV GBuffer: shadow leaks
Optimizing UV GBuffer: shadow terminator
Ray bias
Fixing UV seams
Final touches
Bonus: mip-mapping lightmaps


UV space rasterization

Bakery is in fact a 4th lightmapper I designed. Somehow I’m obsessed with baking stuff. First one simply rasterized forward lights in UV space, 2nd generated UV surface position and normal and then rendered the scene from every texel to get GI (huge batches with instancing), 3rd was PlayCanvas’ runtime lightmapper, which is actually very similar to 1st. All of them had one thing in common – something had to be rasterized in UV space.

Let’s say, you have a simple lighting shader, and a mesh with lightmap UVs:

bake2_realtime

You want to bake this lighting, how do you that? Instead of outputting transformed vertex position

OUT.Position = mul(IN.Position, MVP);

you just output UVs straight on the screen:

OUT.Position = float4(IN.LightmapUV * 2 - 1, 0, 1);

Note that “*2-1” is necessary to transform from typical [0,1] UV space into typical [-1;1] clip space.

Voila:

bake_lmnaive

That was easy, now let’s try to apply this texture:

bake2_nodilate.png

Oh no, black seams everywhere!

Because of the bilinear interpolation and typical non-conservative rasterization, our texels now blend into background color. But if you ever worked with any baking software you know about padding, and how they expand or dilate pixels around to cover the background. So let’s try to process our lightmap with a very simple dilation shader. Here is one from PlayCanvas, in GLSL:

varying vec2 vUv0;
uniform sampler2D source;
uniform vec2 pixelOffset;
void main(void) {
    vec4 c = texture2D(source, vUv0);
    c = c.a>0.0? c : texture2D(source, vUv0 - pixelOffset);
    c = c.a>0.0? c : texture2D(source, vUv0 + vec2(0, -pixelOffset.y));
    c = c.a>0.0? c : texture2D(source, vUv0 + vec2(pixelOffset.x, -pixelOffset.y));
    c = c.a>0.0? c : texture2D(source, vUv0 + vec2(-pixelOffset.x, 0));
    c = c.a>0.0? c : texture2D(source, vUv0 + vec2(pixelOffset.x, 0));
    c = c.a>0.0? c : texture2D(source, vUv0 + vec2(-pixelOffset.x, pixelOffset.y));
    c = c.a>0.0? c : texture2D(source, vUv0 + vec2(0, pixelOffset.y));
    c = c.a>0.0? c : texture2D(source, vUv0 + pixelOffset);
    gl_FragColor = c;
}

For every empty background pixel, it simply looks at 8 neighbours around it and copies the first non-empty value value it finds. And the result:

There are still many imperfections, but it’s much better.

To generate more sophisticated lighting, with area shadows, sky occlusion, colored light bounces, etc, we’ll have to trace some rays. Although you can write a completely custom ray-tracing implementation using bare DirectX/WebGL/Vulkan/Whatever, there are already very efficient APIs to do that, such as OptiX,  RadeonRays and DXR. They have a lot of similarities, so knowing one should give you an idea of how to operate the other: you define surfaces, build an acceleration structure and intersect rays against it. Note that none of the APIs generate lighting, but they only give you a very flexible way of fast ray-primitive intersection on the GPU, and there are potentially lots of different ways to (ab)use it. OptiX was the first of such kind, and that is why I chose it for Bakery, as there were no alternatives back in the day. Today it’s also unique for having an integrated denoiser. DXR can be used on both Nvidia/AMD (not sure about Intel), but it requires Win10. I don’t know much about RadeonRays, but it seems to be the most cross-platform one.  Anyway, in this article I’m writing from OptiX/CUDA (ray-tracing) and DX11 (tools) perspective.

To trace rays from the surface we first need to acquire a list of sample points on it along with their normals. There are multiple ways to do that, for example in a recent OptiX tutorial it is suggested to randomly distribute points over triangles and then resample the result to vertices (or possibly, lightmap texels). I went with a more direct approach, by rendering what I call a UV GBuffer.
It is exactly what it sounds like – just a bunch of textures of the same size with rasterized surface attributes, most importantly position and normal:

Example of an UV GBuffer for a sphere. Left: position, center: normal, right: albedo.

Having rasterized position and normal allows us to run a ray generation program using texture dimensions with every thread spawning a ray (or multiple rays, or zero rays) at every texel. GBuffer position can be used as a starting point and normal will affect ray orientation.
“Ray generation program” is a term used in both OptiX and DXR – think of it as a compute shader with additional ray-tracing capabilities. They can create rays, trace them by executing and waiting for intersection/hit programs (also new shader types) and then obtain the result.
Full Bakery UV GBuffer consists of 6 textures: position, “smooth position”, normal, face normal, albedo and emissive. Alpha channels also contain interesting things like world-space texel size and triangle ID. I will cover them in more detail later.

Calculating lighting for every GBuffer texel and dilating the result looks horrible, and there are many artifacts:

What have we done? Why is it so bad? Shadows leak, tiny details look like garbage, and smooth surfaces are lit like they are flat-shaded. I’ll go over each of these problems one by one.

Simply rasterizing the UV GBuffer is not enough. Even dilating it is not enough. UV layouts are often imperfect and can contain very small triangles that will be simply skipped or rendered with noticeable aliasing. If a triangle was too small to be drawn, and if you dilate nearby texels over its supposed place, you will get artifacts. This is what happens on the vertical bar of the window frame here.

Instead of post-dilation, we need to use or emulate conservative rasterization. Currently, not all GPUs support real conservative raster (hey, I’m not sponsored by Nvidia, just saying), but there are multiple ways to achieve similar results without it:

  • Repurposing MSAA samples
  • Rendering geometry multiple times with sub-pixel offset
  • Rendering lines over triangles

Repurposing MSAA samples is a fun idea. Just render the UV layout with, say, 8x MSAA, then resolve it without any blur by either using any sample or somehow averaging them. It should give you more “conservative” results, but there is a problem. Unfortunately I don’t have a working code of this implementation anymore, but I remember it was pretty bad. Recall the pattern of 8x MSAA:

d3d11_msaapatterns_8_16

Because samples are scattered around, and none of them are in the center, and because we use them to calculate lightmap texel centers, there is a mismatch that produces even more shadow leaking.

Rendering lines is something I thought too late about, but it might work pretty well.

So in the end I decided to do multipass rendering with different sub-pixel offsets. To avoid aforementioned MSAA problems, we can have a centered sample with zero offset, always rendered last on top of everything else (or you can use depth/stencil buffer and render it first instead… but I was lazy to do so, and it’s not a perf-critical code). These are the offsets I use (to be multiplied by half-texel size):

float uvOffset[5 * 5 * 2] =
{
    -2, -2,
    2, -2,
    -2, 2,
    2, 2,

    -1, -2,
    1, -2,
    -2, -1,
    2, -1,
    -2, 1,
    2, 1,
    -1, 2,
    1, 2,

    -2, 0,
    2, 0,
    0, -2,
    0, 2,

    -1, -1,
    1, -1,
    -1, 0,
    1, 0,
    -1, 1,
    1, 1,
    0, -1,
    0, 1,

    0, 0
};

Note how larger offsets are used first, then overdrawn by smaller offsets and finally the unmodifed UV GBuffer. It dilates the buffer, reconstructs tiny features and preserves sample centers in the majority of cases. This implementation is still limited, just as MSAA, by the amount of samples used, but in practice I found it handling most real life cases pretty well.

Here you can see a difference it makes. Artifacts present on thin geometry disappear:

ftbake_multitap

Left: simple rasterization. Right: multi-tap rasterization.


Optimizing UV GBuffer: shadow leaks

To fix remaining artifacts, we will need to tweak GBuffer data. Let’s start with shadow leaks.

Shadow leaks occur because texels are large. We calculate lighting for a single point, but it gets interpolated over a much larger area. Once again, there are multiple ways to fix it. A popular approach is to supersample the lightmap, calculating multiple lighting values inside one texel area and then averaging. It is however quite slow and doesn’t completely solve the problem, only being able to lighten wrong shadows a bit.

leak2

To fix it I instead decided to push sample points out of shadowed areas where leaks can occur. Such spots can be detected and corrected using a simple algorithm:

  • Trace at least 4 tangential rays pointing in different directions from texel center. Ray length = world-space texel size * 0.5.
  • If ray hits a backface, this texel will leak.
  • Push texel center outside using both hit face normal and ray direction.

This method will only fail when you have huge texels and thin double-sided walls, but this is rarely the case. Here is an illustration of how it works:

leakfix

Here is a 4 rays loop. First ray that hits the backface (red dot) decides new sample position (blue dot) with the following formula:

newPos = oldPos + rayDir * hitDistance + hitFaceNormal * bias

Note that in this case 2 rays hit the backface, so potentially new sample position can randomly change based on the order of ray hits, but in practice it doesn’t matter. Bias values is an interesting topic by itself and I will cover it later.

I calculate tangential ray directions using a simple cross product with normal (face normal, not the interpolated one), which is not completely correct. Ideally you’d want to use actual surface tangent/binormal based on the lightmap UV direction, but it would require even more data to be stored in the GBuffer. Having rays not aligned to UV direction can produce some undershoots and overshoots:

rayrotate1

In the end I simply chose to stay with a small overshoot.

Why should we even care about ray distance, what happens if it’s unlimited? Consider this case:

raydist

Only 2 shortest rays will properly push the sample out, giving it similar color to its surroundings. Attempting to push it behind more distant faces with leave the texel incorrectly shadowed.

As mentioned, we need world-space texel size in the UV GBuffer to control ray distance. A handy and cheap way to obtain a sufficiently accurate value is:

float3 dUV1 = max(abs(ddx(IN.worldPos)), abs(ddy(IN.worldPos)));
float dPos = max(max(dUV1.x, dUV1.y), dUV1.z);
dPos = dPos * sqrt(2.0); // convert to diagonal (small overshoot)

Calling ddx/ddy during UV GBuffer rasterization will answer the question “how much this value changes in one lightmap texel horizontally/vertically”, and plugging in world position basically gives us texel size. I’m simply taking the maximum here which is not entirely accurate. Ideally you may want 2 separate values for non-square texels, but it’s not common for lightmaps, as all automatic unwrappers try to avoid heavy distortion.

Adjusting sample positions is a massive improvement:

And another comparison, before and after:

DZ-lMZsWAAEWEAX


Optimizing UV GBuffer: shadow terminator

image_2018-04-09_21-58-00 (2)

Next thing to address is the wrong self-shadowing of smooth low-poly surfaces. This problem is fairly common in ray-tracing in general  and often mentioned as the “shadow terminator problem” (google it). Because rays are traced against actual geometry, they perceive it as faceted, giving faceted shadows. When combined with typical smooth interpolated normals, it looks wrong. “Smooth normals” is a hack everyone was using since the dawn of times, and to make ray-tracing practical we have to support it.

There is not much literature on that, but so far these are the methods I found being used:

  • Adding constant bias to ray start (used in many offline renderers)
  • Making shadow rays ignore adjacent faces that are almost coplanar (used in 3dsmax in “Advanced Ray-Traced Shadow” mode)
  • Tessellating/smoothing geometry before ray-tracing (mentioned somewhere)
  • Blurring the result (mentioned somewhere)
  • A mysterious getSmoothP function in Houdini (“Returns modified surface position based on a smoothing function”)

Constant bias just sucks, similarly to shadowmapping bias. It’s a balance between wrong self-shadowing and peter-panning:

GUID-68EC3CB9-1820-47B0-8451-E8E8E75965BB
Shadow ray bias demonstration from 3dsmax documentation

Coplanar idea requires adjacent face data, and just doesn’t work well. Tweaking the value for one spot breaks the shadow on another:

photo_2018-04-08_22-01-01

Blurring the result will mess with desired shadow width and proper lighting gradients, so it’s a bad idea. Tessellating real geometry is fun but slow.

What really made my brain ticking is the Houdini function. What is smooth position? Can we place samples as they were on a round object, not a faceted one? Turns out, we can. Meet Phong Tessellation (again):

phongtess.png

It is fast, it doesn’t require knowledge of adjacent faces, it makes up a plausible smooth position based on smooth normal. It’s just what we need.

Instead of actually tessellating anything, we can compute modified position on a fragment level during GBuffer drawing. Geometry shader can be used to output 3 vertex positions/normals together with barycentric coordinates down to the pixel shader, where Phong Tessellation code is executed.

samples

Note that it should be only applied to “convex” triangles, with normals pointing outwards. Triangles with inwards pointing normals don’t exhibit the problem anyway, and we never want sample points to go inside the surface. Using a plane equation with the plane constructed from the face normal and a point on this face is a good test to determine if you got modified position right, and at least flatten fragments that go the wrong way.

Simply using rounded “smooth” position gets us here:

image_2018-04-09_22-28-30 (2)

Almost nice 🙂 But what is this little seam on the left?

Sometimes there are weird triangles with 2 normals pointing out, and one in (or the other way around) making  some samples go under the face. It’s a shame, because it could produce an almost meaningful extruded position, but instead goes inside and we have to flatten it.smnormal

To improve such cases I try transforming normals into triangle’s local space, with one axis aligned to an edge, flattening them in one direction, transforming back and seeing if situation improves. There are probably better ways to do that. The code I wrote for it is terribly inefficient and was the result of quick experimentation, but it gets the job done, and we only need to execute it once before any lightmap rendering:

	// phong tessellation
	float3 projA = projectToTangentPlane(IN.worldPos, IN.worldPosA, IN.NormalA);
	float3 projB = projectToTangentPlane(IN.worldPos, IN.worldPosB, IN.NormalB);
	float3 projC = projectToTangentPlane(IN.worldPos, IN.worldPosC, IN.NormalC);
	float3 smoothPos = triLerp(IN.Barycentric, projA, projB, projC);

	// only push positions away, not inside
	float planeDist = pointOnPlane(smoothPos, IN.FaceNormal, IN.worldPos);
	if (planeDist < 0.0f)
	{
		// default smooth triangle is inside - try flattening normals in one dimension

		// AB
		float3 edge = normalize(IN.worldPosA - IN.worldPosB);
		float3x3 edgePlaneMatrix = float3x3(edge, IN.FaceNormal, cross(edge, IN.FaceNormal));
		float3 normalA = mul(edgePlaneMatrix, IN.NormalA);
		float3 normalB = mul(edgePlaneMatrix, IN.NormalB);
		float3 normalC = mul(edgePlaneMatrix, IN.NormalC);
		normalA.z = 0;
		normalB.z = 0;
		normalC.z = 0;
		normalA = mul(normalA, edgePlaneMatrix);
		normalB = mul(normalB, edgePlaneMatrix);
		normalC = mul(normalC, edgePlaneMatrix);
		projA = projectToTangentPlane(IN.worldPos, IN.worldPosA, normalA);
		projB = projectToTangentPlane(IN.worldPos, IN.worldPosB, normalB);
		projC = projectToTangentPlane(IN.worldPos, IN.worldPosC, normalC);
		smoothPos = triLerp(IN.Barycentric, projA, projB, projC);

		planeDist = pointOnPlane(smoothPos, IN.FaceNormal, IN.worldPos);
		if (planeDist < 0.0f)
		{
			// BC
			edge = normalize(IN.worldPosB - IN.worldPosC);
			edgePlaneMatrix = float3x3(edge, IN.FaceNormal, cross(edge, IN.FaceNormal));
			float3 normalA = mul(edgePlaneMatrix, IN.NormalA);
			float3 normalB = mul(edgePlaneMatrix, IN.NormalB);
			float3 normalC = mul(edgePlaneMatrix, IN.NormalC);
			normalA.z = 0;
			normalB.z = 0;
			normalC.z = 0;
			normalA = mul(normalA, edgePlaneMatrix);
			normalB = mul(normalB, edgePlaneMatrix);
			normalC = mul(normalC, edgePlaneMatrix);
			projA = projectToTangentPlane(IN.worldPos, IN.worldPosA, normalA);
			projB = projectToTangentPlane(IN.worldPos, IN.worldPosB, normalB);
			projC = projectToTangentPlane(IN.worldPos, IN.worldPosC, normalC);
			smoothPos = triLerp(IN.Barycentric, projA, projB, projC);

			planeDist = pointOnPlane(smoothPos, IN.FaceNormal, IN.worldPos);
			if (planeDist < 0.0f)
			{
				// CA
				edge = normalize(IN.worldPosC - IN.worldPosA);
				edgePlaneMatrix = float3x3(edge, IN.FaceNormal, cross(edge, IN.FaceNormal));
				float3 normalA = mul(edgePlaneMatrix, IN.NormalA);
				float3 normalB = mul(edgePlaneMatrix, IN.NormalB);
				float3 normalC = mul(edgePlaneMatrix, IN.NormalC);
				normalA.z = 0;
				normalB.z = 0;
				normalC.z = 0;
				normalA = mul(normalA, edgePlaneMatrix);
				normalB = mul(normalB, edgePlaneMatrix);
				normalC = mul(normalC, edgePlaneMatrix);
				projA = projectToTangentPlane(IN.worldPos, IN.worldPosA, normalA);
				projB = projectToTangentPlane(IN.worldPos, IN.worldPosB, normalB);
				projC = projectToTangentPlane(IN.worldPos, IN.worldPosC, normalC);
				smoothPos = triLerp(IN.Barycentric, projA, projB, projC);

				planeDist = pointOnPlane(smoothPos, IN.FaceNormal, IN.worldPos);
				if (planeDist < 0.0f)
				{
					// Flat
					smoothPos = IN.worldPos;
				}
			}
		}
	}

Most of these matrix multiplies could be replaced by something cheaper, but anyway, here’s the result:

image_2018-04-12_14-01-36 (2)

The seam is gone. The shadow is still having somewhat weird shape, but in fact it looks exactly like that even when using classic shadowmapping, so I call it a day.

However, there is another problem. An obvious consequence of moving sample positions too far from the surface is that they now can go inside another surface!

sampleintersect
Meshes don’t intersect, but sample points of one object penetrate into another

Turns out, smooth position alone is not enough. This problem can’t be entirely solved with it. So on top of that I execute following algorithm:

  • Trace a ray from flat position to smooth position
  • If there is an obstacle, use flat, otherwise smooth

In practice it will give us weird per-texel discontinuities when the same triangle is partially smooth and flat. We can improve the algorithm further and also cut the amount of rays traced:

  • Create an array with 1 bit per triangle (or byte to make thing easier).
  • For every texel:
    • If triangle bit is 0, trace one ray from real position to smooth position.
      • If there is an obstacle, set triangle bit to 1.
    • position = triangleBitSet ? flatPos : smoothPos

That means you also need triangle IDs in the GBuffer, and I output one into alpha of the smooth position texture using SV_PrimitiveID.

To test the approach, I used Alyx Vance model as it’s an extremely hard case for ray-tracing due to a heavy mismatch between low-poly geometry and interpolated normals. Here it shows triangles marked by the algorithm above:

image_2018-04-12_00-58-06 (2).png

Note how there are especially many triangles marked in the area of belt/body intersection, where both surfaces are smooth-shaded. And the result:

image_2018-04-10_12-11-24 (2).png
Left: artifacts using smooth position. Right: fixed by using flat position in marked areas.

Final look:

alyxlm.png

I consider it a success. There is still a couple of weird-looking spots where normals are just way too off, but I don’t believe it can improved further without breaking self-shadowing, so this is where I stopped.

I expect more attention to this problem in the future, as real-time ray-tracing is getting bigger, and games can’t just apply real tessellation to everything like in offline rendering.


Ray bias

Previously I mentioned a “bias” value in the context of a tiny polygon offset, also referred to as epsilon. Such offset is often needed in ray-tracing. In fact every time you cast a ray from the surface, you usually have to offset the origin a tiny bit to prevent noisy self-overlapping due to floating-point inaccuracy. Quite often (e.g. in OptiX samples or small demos) this value is hard-coded to something like 0.0001. But because of floats being floats, the further object is getting from the world origin, the less accuracy we get for coordinates. At some point constant bias will start to jitter and break. You can fix it by simply increasing the value to 0.01 or something, but the more you increase it, the less accurate all rays get everywhere. We can’t fix it completely, but we can solve the problem of adaptive bias that’s always “just enough”.

image_2018-06-07_21-21-07.png
At first I thought my GPU is fried

The image above was the first time the lightmapper was tested on a relatively large map. It was fine near the world origin, but the further you moved, the worse it got. After I realized why it happens I spent a considerable amount of time researching, reading papers, testing solutions, thinking of porting nextafterf() to CUDA.

But then my genius friend Boris comes in and says:

position += position * 0.0000002

Wait, is that it? Turns out… yes, it works surprisingly well. In fact 0.0000002 is a rounded version of FLT_EPSILON. When doing the same thing with FLT_EPSILON, the values are sometimes exactly identical to what nextafterf() gives, sometimes slightly larger, but nevertheless it looks like a fairly good and cheap approximation. The rounded value was chosen due to a better precision reported on some GPUs.

In case we need to add a small bias in the desired direction, this trick can be expanded into:

position += sign(normal) * abs(position * 0.0000002)

city3.png


Fixing UV seams

UV seams are a known problem and a widely accepted solution was published in the “Lighting Technology of The Last Of Us” before. The proposed idea was to utilize least squares to make texels from different sides of the seam match. But it’s slow, it’s hard to run on the GPU, and also I’m bad at maths. In general it felt like making colors match is a simple enough problem to solve without such complicated methods. What I went for was:

  • [CPU] Find seams and create a line vertex buffer with them. Set line UVs to those from another side of the seam.
  • [GPU] Draw lines on top of the lightmap. Do it multiple times with alpha blending, until both sides match.

To find seams:

  • Collect all edges. Edge is a pair of vertex indices. Sort edge indices so their order is always the same (if there are AB and BA, they both become AB), as it will speed up comparisons.
  • Put first edge vertices into a some kind of acceleration structure to perform quick neighbour searches. Brute force search can be terribly slow. I rolled out a poor man’s Sweep and Prune by only sorting in one axis, but even that gave a significant performance boost.
  • Take an edge and test its first vertex against first vertices of its neighbours:
    • Position difference < epsilon
    • Normal difference < epsilon
    • UV difference > epsilon
    • If so, perform same tests on second vertices
    • Now also check if edge UVs share a line segment
    • If they don’t, this is clearly a seam

A naive approach would be to just compare edge vertices directly, but because I don’t trust floats, and geometry can be imperfect, difference tests are more reliable. Checking for a shared line segment is an interesting one, and it wasn’t initially obvious. It’s like when you have 2 adjacent rectangles in the UV layout, but their vertices don’t meet.

After the seams are found, and the line buffer is created, you can just ping-pong 2 render targets, leaking some opposite side color with every pass. I’m also using the same “conservative” trick as mentioned in the first chapter.

Because mip-mapping can’t be used when reading the lightmap (to avoid picking up wrong colors), a problem can arise if 2 edges from the same seam have vastly different sizes. In this case we’ll have to rely on the nearest neighbour texture fetch potentially skipping some texels, but in practice I never noticed it being an issue.

Results:

Even given some imperfections, I think the quality is quite good for most cases and simple to understand/implement comparing to least squares. Also we only process pixels on the GPU where they belong.


Final touches

  • Denoising. A good denoiser can save a load of time. Being OK with a Nvidia-only solution I was lucky to use Optix AI denoiser, and it’s incredible. My own knowledge of denoising is limited to bilaterial blur, but this is just next level. It makes it possible to render with a modest amount of samples and fix it up. Lightmaps are also a better candidate for machine-learned denoising, comparing to final frames, as we don’t care about messing texture detail and unrelated effects, only lighting.

    A few notes:
  • Denoising must happen before UV seam fixing.
  • Previously OptiX denoiser was only trained on non-HDR data (although input/output is float3). I heard it’s not the case today, but even with this limitation, it’s still very usable. The trick is to use a reversible tonemapping operator, here is a great article by Timothy Lottes (tonemap -> denoise -> inverseTonemap).
  • Bicubic interpolation. If you are not shipping on mobile, there are exactly 0 reasons to not use bicubic interpolation for lightmaps. Many UE3 games in the past did that, and it is a great trick. But some engines (Unity, I’m looking at you) still think they can get away with a single bilinear tap in 2018. Bicubic hides low resolution and makes jagged lines appear smooth. Sometimes I see people fixing jagged sharp shadows by super-sampling during bake, but it feels like a waste of lightmapping time to me.bilinearVsBicubic.jpg
    Left: bilinear. Right: bicubic.
    Bakery comes with a shader patch enabling bicubic for all shaders. There are many known implementations, here is one in CUDA (tex2DFastBicubic) for example. You’ll need 4 taps and a pinch of maths.

Wrapping it up the complete algorithm is:

  • Draw a UV GBuffer using (pseudo) conservative rasterization. It should at least have:
    • Flat position
    • Smooth position obtained by Phong Tessellation adapted to only produce convex surfaces
    • Normal
    • World-space texel size
    • Triangle ID
  • Select smooth or flat position per-triangle.
  • Push positions outside of closed surfaces.
  • Compute lighting for every position. Use adaptive bias.
  • Dilate
  • Denoise
  • Fix UV seams
  • Use bicubic interpolation to read the lightmap.

Final result:

ftfinalbake.png

Nice and clean


Bonus: mip-mapping lightmaps

In general, lightmaps are best left without mip-mapping. Since a well-packed UV layout contains an awful lot of detail sitting close to each other, simply downsampling the texture makes lighting leak over neighbouring UV charts.

But sometimes you may really need mips, as they are useful not only for per-pixel mip-mapping, but also for LODs to save memory. That’s the case I had: the lightmapper itself has sometimes to load all previously rendered scene data, and to not go out of VRAM, distant lightmaps must be LODed.

To solve this problem we can generate a special mip-friendly UV layout, or repack an existing one, based on the knowledge of the lowest required resolution. Suppose we want to downsample the lightmap to 128×128:

  • Trace 4 rays, like in the leak-fixing pass, but with 1/128 texel size. Set bit for every texel we’d need to push out.
  • Find all UV charts.
  • Pack them recursively as AABBs, but snap AABB vertices to texel centers of the lowest resolution mip wanted. In this case we must snap UVs to ceil(uv*128)/128. Never let any AABB to be smaller than one lowest-res texel (1/128).
  • Render using this layout, or transfer texels from the original layout to this one after rendering.
  • Generate mips.
  • Use previously traced bitmask to clear marked texels and instead dilate their neighbours inside.

Because of such packing, UV charts will get downsampled individually, but won’t leak over other charts, so it works much better,  at least for a nearest-neighbour lookup. The whole point of tracing the bitmask and dilating these spots inside is to minimize shadow leaking in mips, while not affecting the original texture.

It’s not ideal, and it doesn’t work for bilinear/bicubic, but it was enough for my purposes. Unfortunately to support bilinear/bicubic sampling, we would need to add (1/lowestRes) empty space around all UV charts, and it might be too wasteful. Another limitation of this approach is that your UV chart count must be less than lowestMipWidth * lowestMipHeight.


P.S.

Top ways to annoy me:

  • Don’t use gamma correct rendering.
  • Ask “will it support real-time GI?”
  • Complain about baking being slow for your 100×100 kilometers world.
  • Tell me how lightmaps are only needed for mobile, and we’re totally in the high quality real-time GI age now.
  • Say “gigarays” one more time.

 

Advertisement

Post-Russia

A week ago I have left Russia. Less because of sanctions and more because of the insanity that is happening within. Since the war started I was looking at my news feed with horror and disbelief, my arms and legs shaking, I barely slept or ate.

I think it’s important to share my thoughts right now, because silence is damaging. Russian society is split. There are people who don’t want to go out of their comfort zone and who want to pretend that nothing horrible is happening; for them it’s too impossible to imagine that “we” can do anything bad; surely all of these accusations are “fake”. There are aggressive people with little empathy who look at wars as it’s some kind of sport. There are also reasonable, intellectual people who are against it. And inside this latter group there are people who don’t believe they can change anything and those who believe they can. I have huge respect for people who don’t give up. But who am I? I’m a coward. I went to a protest, I saw the aggressive cops, and now I don’t know how to change anything. I’m not a fighter or a diplomat, I have different skills. I must continue working, that’s the only thing I know how to do. Maybe I can also write something to show that people like me exist.

How did we end up here?

When people in charge of our country violently and treacherously attack our closest neighbours, Ukraine, the country where the huge proportion of us have their own relatives and friends. When the official propaganda goes full Goebbels, fueling the hate between us. When some abstract, virtual lines on the map, cost real human lives.

This war doesn’t even have a clear goal, it doesn’t make sense. What do they need? Why?

Russia and Ukraine grew side by side, our culture and history is shared, we are the same people. But I respect Ukraine’s right to live their own way in their own country, like I respect my neighbour’s right to live in their own apartment. My friends from Ukraine are not “neo-nazists”, “banderas” or whatever, they are just people who want to live their life normally, without having to hide from bombs.

This event made words “Russia” and “Russian” have negative meaning, and it makes me very sad. Russia is not Putin. Neither it is Brezhnev, Stalin, Lenin, Nicholas II, Peter I… through the centuries, Russian culture evolved, often not with the help of governments, but against their will. Like plants growing through concrete, true artists always emerged above it all. My Russia is the country of Chehov, Turgenev, Bunin, Kuprin… Andrey Tarkovsky… Boris Grebenshikov, Andrey Makarevich, Yuri Shevchuk… and finally my parents and friends, people who taught me to seek for beauty and truth, unlike the politicians, seeding their lies and hate.

The conflict is very old. In the late 1800s Russian students regularly went to protests against pointless imperialistic wars and bad political decisions (as described in “Moscow and Muscovites” by V. Gilyarovsky). They were met with violence, arrested, some sent to serve in the army. Summer 1971, Moscow hippies went to a protest against the Vietnam war (even though the war was led by the U.S.) and were raided by the KGB, arrested, some sent to the army. My parents were hippies, and during the same time, Putin was in the KGB. We were on the opposing sides from the start.

The cynicism with which our government used our victory day, the victory against fascism, is horrible. What was once the day of sorrow, the day to remember all of those gone, the losses of people who had no choice; became the day of military pride, the day of showing off their new flashy vehicles of killing. Some car owners started buying stickers saying “1941-1945. We can repeat”, as it was some football game victory, not a tragedy where the USSR lost millions of lives. They still sell these stickers.

The only thing that we learn from history is that we learn nothing from history.

The propaganda is working. Many people are too easy to persuade, especially when they don’t read from any alternative sources. The most insane story I know: my friend’s mother lives in Moscow and her sisters live in Kiev. They were on a phone call. Kiev sisters told her about the bombings, but she didn’t believe them. She had more trust in the TV. Surely TV cannot lie.

But how can’t they see? Young girls with “No war” banners are being beaten and captured by cops. Top propaganda mouths brag about their ability to nuke all other countries; or how they can return the capital punishment. This is some cartoonish, over-the-top level of villainy, it’s almost like a badly directed movie with no “shades of gray” where the “baddies” are too obvious. There is nothing to justify it. Nothing previously happening inside Ukraine can justify this war.

With all our evolution, humanity is still a monkey with a grenade. Until our selfishness is gone, until we evolve from tribe thinking and learn some universal empathy, our future will be bleak. Religions, capitalism and socialism are all prone to corruption; they were always bent to serve someone’s selfish will. In this world, no system can save us. Systems are designed and ran by people. Saying “just be a good person” sounds dumb, but it’s what we need to do. Live and respect others’ right to live. How hard can it be?

Reflections on game design after playing RDR2

Through the course of life I tried to explain to myself what is, at least in my opinion, the most important thing about art, what is this difference between things that affect me deeply and those only providing temporary amusement or the lack thereof. Having read/watched/listened/played many things, I concluded that a (subjectively) great book/movie/song/game makes me feel like I’m seeing, through the surface, into the essence of life itself; when through a combination of images, sounds and thoughts I feel like I can see some universal truth; which, put in a formal, straight-forward sentence, would likely look banal, but being indirectly constructed in your mind by observing, hits hard; it has a funny resemblance to what some of my friends trying drugs reported experiencing, as one of them said, “like a spotlight shining into the dusty closet of your brain”. Catharsis can be a right word.

It is understandable that even great pieces may not trigger these feelings when your mind isn’t ready for them, so this is subjective. But I believe that this potential-to-trigger still can be seen and felt, making it possible to separate the value of the piece from your own emotional state.

Another, highly subjective conclusion I made, is that, having experienced said feelings, the purpose of your own existence becomes slightly clearer. You may not think of it this way – but sometimes, the effect of being simply reminded of things dear to your heart, of the never-changing nature of human relations, situations, the world itself and even just its harmony we can relate to, no matter who, when and where we are, can be staggering; author’s creativity being the tool to pierce through the superficial and boost the feelings; not stating their position directly, but letting us observe and feel it on our own.

Playing RDR2 gave me the most mixed of possibly mixed feelings. It often looks great and believable. The characters are pretty interesting, memorable, the difference between their psychology, motivation and values playing well.

The first biggest problem to me is ludo-narrative dissonance, the term now so well known, it even has a wikipedia page! While gameplay doesn’t directly conflict with the story, their conventions do: story being full of well-written characters who protect their lives, and gameplay full of hundreds of nameless enemies, whose only purpose is to get shot by the protagonist and his friends. Adding to this dissonance, the act of shooting a dozen of nameless enemies is often used to evolve relations between story characters, who often refer to the process as “fun” (which is understandable on gameplay side, given they usually need 1-2 shots, while story characters can get a whole lot and don’t care).

But the problems go beyond that. Having immerse myself in a few days of gameplay I start to see the game world as a whole – and it’s not encouraging. In a sense, the world of RDR2 is hell: there is nothing worthwhile to achieve; player has very minimal choice; the essence of RDR2 world is the process of killing and suffering. There is literally nothing in the game that feels like something worth living for – everything being on a scale from hopeless to hostile, which is a very Rockstar kind of view, shared with GTA and Max Payne 3 as well. The view that portrays everything as kinda pathetic, wrong or worth making fun of. While I have nothing against satire or tragedies, this view bothers me, being both not entirely satirical, not entirely realistic and not entirely tragic, but only reinforcing the feeling of hopelessness, unbeatable wrongness and dissonance. It does not capture any truth of life beyond some interesting character dialogues and scripted events, but these, alas, only work as a surface detail against the overall picture of game experience.

When it comes to storytelling, games have drastically different strong and weak points comparing to other mediums, and I tend to believe that movie-like approaches simply do not work in games. Cutscenes, long scripted dialogues, fancy scripted events don’t explore the possibilities of the medium and, on top of that, are ridiculously expensive to execute with high quality. Every time I hear “cinematic”, I cringe. The truly unique point of games is player’s ability to create their own stories within the framework of the game. This is something not any other medium can do. While it is possible for a game to tell a specific story with scripted branches and do well at it, I just don’t think it’s the best thing the medium is capable of. And RDR2 fails on both fronts: spending an insane budget to deliver a mostly linear story spoiled with dissonance and not giving the player much control beyond weapon/horse/beard/quest order selection.

Dialogues, characters, events can support the narrative but their ultimate purpose, in my opinion, is to support the expression of author’s feeling of the world, to hint at what is it they live for; I don’t think that a deeply apathetic person can come up with a good story, even if they master their craft and manage to produce technically impressive things. Be it the apathy, the stress of gigantic production machine, the risks or something else, but I see modern “AAA” games step on the same rake year after year. In fact nor dialogues voiced by Hollywood actors, neither scripted events are necessary for the game to convey a story: INSIDE did an awesome job with no words at all; Minecraft does not have a “story” per se, but it has a very definite feeling of the world as an infinitely changeable playground, which is empowering and lets players to try their own stories in it.

I still think that game industry is in its infancy, like the Lumiere brothers’ era of cinema, but it feels like games will take a lot longer to evolve.

After the Flood

“After the Flood” is a WebGL 2.0 demo I worked on for PlayCanvas and Mozilla.

It features procedural clouds, water ripple generation, transform feedback particles and simple tree motion simulation.

It’s not as polished as I wanted it to be though.

Here’s the post from Mozilla: https://hacks.mozilla.org/2017/01/webgl-2-lands-in-firefox/

And another: https://blog.mozilla.org/blog/2017/01/24/gets-better-video-gaming-non-secure-web-warning/

And from PlayCanvas: https://blog.playcanvas.com/mozilla-launches-webgl-2-with-playcanvas/

wgs0.jpg

wgyyy.jpg

wgsfl.jpg

 

 

 

 

 

 

Also, water ripple shader: https://www.shadertoy.com/view/lltXD4

The first music is System by Carbon Based Lifeforms.

The 2nd track (after the phone booth) is composed by Anton Krivosejenko.

The demo was shown on GDC and 3DWebFest.

Why games

I was recently talking to a friend, listing reasons of why I’m orbiting around the game industry, and decided to make a post out of it.

While I’m not truly an accomplished game developer, meaning I didn’t ship a finished game, I still exist in this world, making engines, playable demos, prototypes and similar things. I respect this medium and defend it, sometimes even too aggressively.

I’ve seen different stances towards games. I know a lot of people, who say they “grew up” from games, and now have to do their Important Adult Things (like hanging around in social networks for hours and drinking). I know game artists, who don’t care about game/movie differences, as one of my friends used to say “both are just media content”. This is certainly not my position.

I know and I’ve experienced things in games, that no other medium can produce, and I find it quite fascinating, and I still think the industry is young and what we see today is far from what it can become. If only people would experiment more and copy successful products less…

Anyway here’s the list. Perhaps I will update it occasionally. Also, note that not every game has these features, but just sometimes in some games they happen.

  • Here and Now. It’s hard to describe, but only in games (mostly 1st/3rd person) I can feel that things are happening right now, and they weren’t prerecorded. You can just stop from following the plot and observe the environment, noticing tiny details, seeing smoke/trees/clouds/etc slowly moving. More realistic games can even provoke smell/temperature associations in my brain. You can just walk around for hours, enjoying the day, without story rapidly moving you somewhere in a narrow corridor. It sounds like it can only happen in open world games, but really I remember feeling this even in HL2, where I could just stay and stare at the sea in some sort of trance, thinking of this world. For me it feels very different from observing prerecorded videos. There’s spatial continuity of my movement, and there’s actually me, or at least some avatar of me, that reacts immediately on my thoughts, translated through a controller, which I don’t even notice after getting used to it. The great part of that feature is that even when player freely moves around, not caring about plot and gameplay, they still read the story through environment observation.
  • Consequence. Only in games you can have a choice. And if you agree with the choice you’ve made, it can feel very personal to you (on the other hand, when all options are crap you wouldn’t choose, it’s quite annoying and breaks the experience). Then, when the game shows you a consequence of your decision, you take it more seriously, comparing to a static narrative. Only games can make you feel guilty, which in turn leads you to review your own decisions, and what made you to select this option (and this can expand to your real life decision-making).
  • What If. The more complex the mechanics of the game, the more creative freedom there is. You can exploit stuff, experiment, try different combinations of options and see how it goes. This is simply pleasing to the brain, and an important aspect of “fun”. It also makes your walkthrough much more personal, and it creates memorable moments (comparing again to a static narrative).
  • Situation models. Sometimes in games you find yourself in a situation, you could be in, but didn’t yet. It’s an interesting exercise to try playing it and see the result. One of my favorite examples is Morrowind: you have a bunch of things to do, you need to find some places you’ve never been to (there’re no markers you could just run straight to, unlike next TES games), and I also had a mod that added hunger/thirst/need to sleep into it. Now manage it! The situation is quite similar to what I later experienced in life, and this past in-game experience made me more confident that I can cope with a lot of things without being overwhelmed.
  • Simply technical awe. Not all people experience it, but I simply love seeing how game tech advances, new techniques used, new cool effects made possible. That may be just my nerdiness, but I’m amazed realizing that beautiful things I see are rendered right now on my GPU, faster than my eyes blink, how is this even possible?!

I’m sure there are more reasons, and I could forget something, but it’s a start. You can suggest me something you like 🙂

Rendering painted world in JG

Here’s a little breakdown and implementation details of the real-time painted world in my last demo – JG.

Here’s the video

Here’s the demo

 

(Click for Russian version)

The “painted” effect wasn’t planned. Originally I only had an idea to render a natural scenery of a certain kind, and I wasn’t ready to spend a whole lot of time on it. It became clear to me, a “realistic” approach won’t work, resulting in either very mediocre visuals (due to engine limitations and the complexity of real-time vegetation modeling), or a whole year of trying to catch up with Crysis. So it wasn’t the way.

What I really wanted is to preserve the atmosphere, the feeling, avoiding ruining it with technical limitations.

So I have to render something very complex without killing myself and players’ computers, what do I do? Intuition said: “bake everything”. I recalled seeing outdoor 3D scans: even with bad geometry (or even as point clouds), they still looked quite convincing, thanks to right colors being in right places, with all nice and filtered real-life lighting already integrated into everything. Unfortunately, the time of year was absolutely the opposite of desired, so I wasn’t able to try my mad photogrammetry skills.
But what if we “scan” a realistic offline 3D scene? Vue surfaced in my memory as something that movie/exterior visualization folks use to produce nice renderings of nature. I had no idea what to expect from it, but I tried.

I took a sample scene, rendered it from several viewpoints and put those into Agisoft Photoscan to reconstruct some approximate geometry with baked lighting. And… alas, no luck. Complex vegetation structure and anti-aliasing weren’t the best traits for shape reconstruction.
Then it hit me. What does Agisoft do? It generates depth maps, then a point cloud out of multiple depths. But I can render a depth map right in Vue, so why do I need to reconstruct?

Being familiar with deferred rendering and depth->position conversion, I was able to create a point cloud out of Vue renderings. Not quite easily, though: Vue’s depth appeared to have some non-conventional encoding. Luckily, I finally found an answer to it.

And from this:

paintprocess1

With some MaxScript magic, we get this:

paintprocess2

Which is a solid single textured mesh.

Hard part is over, now I only needed to repeat the process until I get a relatively hole-free scene. Finally it’s time to have some fun with shaders 🙂

Each projected pixel acts as a camera-facing quad, textured with one of those stroke textures:

daubs

Almost. There was a bug in my atlas reading code, so some quads only had a fraction of stroke on them. However, it actually looked better, than the intended version, so I left the bug. It’s now a feature 🙂

Quads size obviously depends on depth, becoming larger with distance. It was quite important to not mix together small and large quads, so I had to carefully choose viewpoints.

Test scene looked promising, so I started to work on the one I wanted:

pond_v2

I made the house, fence and terrain from scratch. Plants were taken from various existing packs. Then I assembled the final composition out of this stuff. I lost count on the amount of renderings I had to do to cover all playable area:

sdf5

Some had to be photoshopped a little to get rid of dark spots and to add more colors:

pond_300_fill

At first, I had troubles with getting the lighting right, so I had a lot of these black spots to fix, then I actually managed to tune it better. Final scene is actually a mix of different approaches, because I didn’t have the time to re-render everything with different settings, and because it actually looked less monotonous.

Some early screenshots:

At this moment I also had stroke direction set up properly, what was pretty important, as uniform strokes had very unnatural look. At first, I tried to generate stroke direction procedurally (similar to how you generate normal map from a height map), but it wasn’t sufficient. It was obvious to me how some strokes must lay, for example, I really wanted vertical strokes for the grass and fence strokes following the shape of the fence. Not being able to direct it with purely procedural approach, I simply decided to manually paint stroke direction in additional textures. Final version uses manual direction near the camera and procedural for distant quads. Here’re some examples of direction maps:

pond4_dir_hi

To be honest, painting vectors with colors in Photoshop wasn’t the most exciting thing to do, but still, it was the quickest way I could think of 😀

The difference was quite obvious. Here’s uniform direction on the left, changed on the right:

paintprocess3

And this is it. The point cloud nature of the scene also allowed me to have some fun in the ending part, making quads behave like a surreal particle system. All motion was done in vertex shader.

I hope it was somewhat interesting to read, at least I’ll not forget the technique myself 🙂

 

Bonus information

Recently I was asked how to fill inevitable holes between quads. The way I did here is simple – I just used very rough underlying geometry:

paintprocess4

Рендер нарисованного мира в JG

Речь пойдёт о том, как был сделан нарисованный мир в моей последней демке – JG.

(Click for English version)

Эффект нарисованности не был запланированным. Была идея показать природную сцену определённого типа и мало времени. Стало ясно: пытаться делать реалистично даст либо очень посредственную картинку (ввиду Unity и сложности моделирования растительности для игр), либо год мучений в надежде догнать Crysis (да и тот, на взгляд не привыкшего к графике игр человека, вряд ли выглядит совершенно; картонно-крестовидные плоскости листвы и меня до сих пор коробят). В общем, это был не вариант.

Главное – сохранить правильное ощущение, атмосферу, не испоганив её ограничениями графики. Очень хотелось избежать синтетичности и компьютерности (это же природная сцена всё-таки).

Итак, нужно нарисовать что-то очень сложное, не убив себя и компьютеры игроков.
Интуиция подсказывала: “надо всё запечь”. По крайней мере, с освещением это всегда прокатывало. В данном случае вообще всё сложное, так что и запечь надо всё. Вспомнились 3D-сканы местности: даже при плохой геометрии (или вообще в виде облака точек) они все равно смотрелись достаточно убедительно, из-за того, что все цвета на своих местах, всё со всем сочетается, и детальное реалистичное освещение уже отфильтровано и запечено. К сожалению, время года на момент разработки было прямо противоположно желаемому, так что вариант со сканом отпал.
Но что если мы сделаем реалистичную оффлайн сцену с красивым освещением и получим её “скан”? Где-то в моей памяти всплыл Vue, как нечто, в чём для кино и всяких экстерьерных визуализаций рендерят красивые природные ландшафты. Да, пожалуй это что надо, подумал я.

Покрутив неуклюжий интерфейс, решил для теста воссоздать в Юнити фрагмент какой-нибудь сцены из примеров. Отрендерил её с нескольких ракурсов, сунул в Агисофт и… разочаровался. Сложность геометрии растительности и сглаживание были не лучшими качествами для хорошего скана. Точки еле находились, всё было не на своих местах.
Тут меня осенило. Что делает Агисофт? Он пытается создать несколько карт глубины из картинок, а затем по ним ставит точки. Но ведь Vue сам умеет рендерить точную глубину из камеры, так что зачем мне её восстанавливать?

Каждый, кто писал деферед рендерер, знает, как восстановить позицию из глубины (правда я туплю каждый раз все равно). Таким образом мы и получаем облако точек из всех видимых камерой пикселей. Глубина в Vue, однако, оказалась непростой. К счастью, я в конце концов набрёл на ответ разработчиков о её кодировании.

Из этого:

paintprocess1

Некоторыми манипуляциями с MaxScript’ом получаем это:

paintprocess2

Это цельный меш, затекстуренный рендером.

Сложная часть позади, пришло время собрать из таких штуковин сцену и поиграться с шейдером 🙂

Каждый квад поворачивается на камеру и текстурится одним из этих мазков:

daubs

Почти. На самом деле, в шейдере баг, из-за которого местами попадает не целый мазок, а его фрагмент. Однако, исправленная версия мне показалась более скучной и синтетичной, так что я вернул, как было. Это не баг, это фича 🙂

Размер квадов меняется в зависимости от глубины, т.е. вдалеке они огромные, чтобы компенсировать их разряженность. Вообще очень важно было правильно подбирать ракурсы рендеров, чтобы детализация мазков была консистентной, и мелкие с крупными в одну кучу не мешались.

Далее я делал в Vue, собственно, нужную мне сцену. Графон выходил такого рода:

pond_v2

Заборчик, дом и ландшафт делались с нуля, растения же практичнее было поискать в готовых паках и собрать из всего этого цельную композицию. Я сбился со счёта, сколько мне потребовалось рендеров, чтобы забить всё играбельное пространство точками:

sdf5

Многие рендеры приходилось дополнительно немного обрабатывать для более “живописного” эффекта – вытягивать больше оттенков, убирать темноту, делать тени немного синее:

pond_300_fill

Сперва я долго не мог подобрать хорошее освещение, и этой черноты, требующей выправления, было много. Затем удалось всё же получать сразу на рендере картинку лучше, но итоговая сцена в игре сшита из рендеров разных времён, что мне даже понравилось, делало её более интересной, менее монотонной.

Некоторые ранние кадры:

К этому моменту, в отличие от первых попыток, уже была реализована смена направления мазков, ибо, лежащие одинаково, они смотрелись очень неестественно и похоже на фильтр из фотошопа. Сперва я понадеялся задать её абсолютно процедурно, но этого не оказалось достаточно. Процедурный вариант находил разницы яркостей соседних пикселей и на основе этого создавал вектор направления – похоже на то, как из карты высот считают карту нормалей. Но местами мне было очевидно, как должны лежать мазки, а шейдеру нет: скажем, я знал, что траву здесь лучше рисовать вертикальными линиями, а забор по направлению палочек самого забора. В итоге я решил рисовать карты направлений для каждого рендера, где цвет задавал вектор, и совмещать это с процедурным направлением вдалеке. Вот так странно выглядели карты направлений:

pond4_dir_hi

Рисовать векторы в фотошопе цветом – то ещё удовольствие (для извращенцев).

Разница довольно очевидна: слева с одинаковым направлением, справа с изменённым:

paintprocess3

Таким образом мы и подходим к финальной картинке. В концовке я решил немного оторваться и воспользоваться тем, что всё состоит из маленьких квадов, заставив их крутиться, разлетаться и собираться по всякому. Вся анимация частиц задана в шейдере.

Домик в конце пришлось практически полностью рисовать кистями в фотошопе, т.к. фотографичная версия слишком выбивалась из общего стиля.

Такие вот дела. Записал всё это, чтобы хотя бы самому не забыть, что и как делал 🙂

 

Бонус:

Недавно спрашивали, как заделывать неизбежные дырки в пустоту между квадами. Для этого я использовал очень грубую геометрию похожих цветов позади мазков (навроде подмалёвка):

paintprocess4

GPU cubemap filtering

Prefiltered cubemaps are widely used in games. The idea is to match cubemap blurriness at each mip to appearance of the BRDF you use.

It’s all started with Modified Cubemapgen
Then there’s also cmft which is quite faster
Finally there’s Lys

All these tools are separate utilities which also have a command-line support (except Lys).

Sometimes you want to filter the cubemap in engine, without exporting stuff/running separate utils and you want it to be fast. Cubemapgen may take minutes to filter, cmft is much better here, but sometimes you still need more control and performance. Sometimes you also have to filter it in something like WebGL, where running separate utils isn’t very acceptable (yes, there is Emscripten, but the resulting code is often too fat).

At first I thought that this is a very slow and complicated process (why otherwise cubemapgen does it sooo slow?) and was quite afraid of implementing it myself, but turns out, it’s not THAT hard.

Here’s the result:

You may notice some blurred seam appearance on the rightmost sphere in all versions – this is the inevitable consequence of using the seamless cubemap filtering trick on a DX9-level GAPI. DX10+ doesn’t have this problem and looks smoother. For WebGL implementation I had to support DX9, because latest Firefox still uses it to emulate WebGL (Chrome and IE11 render through DX11).

As you can see, Blinn-Phong importance sampled version looks extremely close to cubemapgen offline output. This version takes around 1.5 sec on GTX560 to compute from 128×128 source cubemap. “Simple” version takes even less than 1 sec.

So, how it is done?
We don’t just blur the cubemap. We imagine a sphere, reflecting it and associate each mip with different roughness value. Rougher surfaces have more noisy microfacets reflecting light in many directions at each visible point. The rougher the surface, the wider will be the cone of incoming light:
RoughnessReflection
So what we should do is to average this cone of lighting.
The simple averaging is not enough though – there is also a complicated weight factor depending on deviation from the original normal based on many factors. This factor actually makes your BRDF to have the falloff it has, instead of just being an averaged circle.

So what, do you have to cast a million rays and then multiply each by a weight? No.
Importance sampling is a technique, the idea of which is to just generate the rays with density depending on your BRDF’s shape. Usually you’ll end up having more rays with direction similar to the surface normal and less deviating rays. Simply averaging the lighting from each ray will naturally give you more intensity near the center, because there were more rays there.

Here’s a difference between “Simple” version (top), which uses simple cosine-weighted distribution of rays and Blinn-Phong version (bottom):
1cosVsImportance

As you can see, getting the correct ray distribution can be important for getting nice highlight falloff instead of just circular spots.

The Blinn-Phong is, of course, not ideal and quite old. GGX is considered more realistic, but I haven’t used it yet.

Different BRDF have different requirements for the ray count. That’s the point of having a “Simple” version – despite of having less correct highlights, it requires MUCH less rays for an artifact-free result (because it’s more uniform).

So the basic algorithm is:
– generate uniformly random directions on the hemisphere;
– focus the directions depending on your BRDF intensity-direction relation at given roughness;
– render lower mip cubemap faces, while reading higher mip data;
– for each direction
–{
—transform the direction to be relative to the original lookup vector;
—light += texCUBE(higher mip, transformed direction)
–}
–light /= numDirs;

You don’t have to pre-generate the directions though – all this can be done in shader sufficiently fast.
Here is a good read on generating points on the hemisphere. It even has an interactive demo.
The Hammersley method relies on bitwise operations though, which are not always accessible (not in WebGL/DX9). In such old-school GAPIs you have to either precompute the directions, or use some other way to generate random numbers. There are many other ways, one you can see in my source code, others on ShaderToy or somewhere else. Such randoms will likely be less uniform.

Ideally, when writing each mip, you should probably read the highest available mip with different cones of rays, but it’s not very fast, and you’ll get some significant aliasing trying to sample a high resolution cubemap with a very wide cone of limited rays.
Instead, it is actually quite sufficient to read just a 2x larger mip. This larger mip must resemble the highest mip as much as possible, so something simple like automatic mipmap generation (bilinear downsample) will do. Note, that you must NOT cone-sample the mip that was already cone-sampled, because you’ll get noticeable over-blurring.

My version of filtering is now in the PlayCanvas engine repository, and it’s open source.
Sadly, there’s a bug in ANGLE, which prevents us from using it on Win Chrome/FF, ironically only IE11 works correctly.

The source in here:
https://github.com/playcanvas/engine/blob/28100541996a74112b8d8cda4e0b653076e255a2/src/graphics/graphics_prefiltercubemap.js
https://github.com/playcanvas/engine/blob/28100541996a74112b8d8cda4e0b653076e255a2/src/graphics/programlib/chunks/prefilterCubemap.ps

The latest version with additional bells and whistles will be always here:
https://github.com/playcanvas/engine/tree/master/src/graphics

Notes on shadow bias

These are notes for myself about shadow mapping bias.
Good summary about all aspects of shadow mapping: http://mynameismjp.wordpress.com/2013/09/10/shadow-maps/

My results:
bias

I’m not sure what’s wrong about Receiver Plane depth bias. What is interesting, it does work OK when there is no interpolation between samples.
In this presentation, there’s a comparison, but it also uses samples without interpolation: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Isidoro-ShadowMapping.pdf (page 39).
Here they also get strange artifact with it similar to one I have on sphere: http://www.digitalrune.com/Support/Blog/tabid/719/EntryId/218/Shadow-Acne.aspx
MJP also says that “When it works, it’s fantastic. However it will still run into degenerate cases where it can produce unpredictable results”.
So, maybe I implemented it wrong, or maybe I was unlucky enough to quickly get degenerate cases, but I’m not really willing to try this technique anymore.

Normal offset:
http://www.dissidentlogic.com/old/#Notes%20on%20the%20Normal%20Offset%20Materials
Also this may better explain why it works: http://c0de517e.blogspot.ru/2011/05/shadowmap-bias-notes.html

There are 2 ways how to implement Normal Offset bias. One way is to inset geometry by normal when rendering the shadow map. The insetting amount is also scaled by slope aka dot(N,L) and also can be scaled by distance factor with FOV included for using with perspective projection.
Second way is to render shadow map normally, but add (instead of subtract) same scaled vertex normal to fragment position just before multiplying it by shadow map matrix and comparing.
The second method has less impact on shadow silhouette distortion and gives better results. It is, however, not easy to do with deferred rendering, because you need vertex normal, not normal mapped one!
Unity 5 seems to use 1st version exactly because it can’t hold vertex normal in G-Buffer.

Funnily enough, Infamous Second Son is OK with storing it there:
http://www.redgamingtech.com/infamous-second-son-engine-postmortem-analysis-breakdown/

And they use it exactly for normal offset (and other stuff too): https://twitter.com/adrianb3000/status/464584971483893762

You can also try to calculate face normal from depth, BUT you’ll get unpredictable results on edges.
Even Intel guy couldn’t solve that: https://twitter.com/AndrewLauritzen/status/539669636912914432

Hole appears after insetting geometry:
tea

normaloffset2

2nd variant doesn’t suffer from this (you can still see a tiny hole there though… but it also exists with just constant bias, so it’s not a normal bias problem).
Real-time demo with Normal Offset 2nd version: http://geom.io/pc25d/demoShaders5.html
No acne, no peter-panning, yay!
Use RMB + WASD to fly around. Feel free to look into source.

You can tweak both normal/constant bias in browser console using
light2.light.normalOffsetBias
light2.light.shadowBias

Chamfer normals trick

chamferNormals2

A little trick how to chamfer geometry edges with just one face and get good results. It was originally invented by Arenshi, a good friend of mine, I just made a script to automate it and decided to share the knowledge.

The problem is that default face smoothing behaviour in modeling programs (at least in 3dsmax) is simple averaging and it gives you quite buggy looking results if you only use one face for chamfer, so you are forced to use many polygons, and it is still not perfect:

chamferNormals

The solution is to manually select each original face after chamfer and straigten it’s normals:

chamferNormals3

This way your faces remain straight, while chamfers interpolate between them. And you can use just a single face! This is very useful for game low-poly models.

Finally, here’s a tiny MaxScript that automatically chamfers all edges based on angle between polygons and fixes all normals this way:
http://geom.io/autoChamfer_orig.zip

I haven’t time to add any UI to it, so to use, you have to select an Editable Poly object and run it. Chamfer amount and threshold angle are hardcoded and set to 1 (chamferAmount variable) and 0.85 (chamferAngle variable, range is from 0 to 1).
Script is not greatly optimized, but usually runs quite fast.

Third person camera

Many people complained about jerky and jumpy camera behaviour in the prototype of my game, Faded. I wasn’t happy with it myself either, I just had to implement so many things with no enough time to make each one perfect. Recently I decided to finally fix it.

Third person cameras are very different in every game, from simple orbiting + collision to some attempts to make it more “cinematic”. The idea of making a “cinematic” one was also my original diploma thesis, however after a few tests I abandoned it and changed the topic of my thesis to something more familiar (real-time rendering) because I was unsure if those experiments will yield any good results, so it was just risky.

Let’s start with basic problems.

Problem 1: occlusion
95% of answers for it you’ll find up googling is “throw a ray from character to camera and position camera at picked point!“. It’s a good starting point of course, but you just can’t leave it this way, there are plenty of reasons why it’s a bad idea:
– your camera’s near plane has size, while ray has zero thickness, so you have a chance of seeing through walls;
– camera will jump from point to point abruptly.
Positioning camera to “pickedPosition + pickedNormal * radiusAroundNearPlane” is still insufficient, as can be seen here:
cameraPushByNormal1

Luckily most physics engines support “thick” rays. If you use Unity/PhysX, use SphereCast.
There are still a few problems however:
– if spherecast already intersects a wall at its origin, it will move through it further;
– you still have abrupt jumps.

cameraSphereCast

The alternative way is just to use a physical sphere and move it to the desired camera position accounting for all collisions, but the sphere can just get stuck in some concave level geometry.

To fix the first spherecast problem, you can do following:
– project the sphere to the opposite direction of the character-camera ray. So the origin of the ray is still character, by the direction is inverted;
– use picked point that is far enough as new ray origin. If nothing is picked, just use origin + invDir * farEnough;
– do SphereCast as usual, but with new origin. This way you will get rid of sphere intersecting nearby walls.
Code for Unity: http://pastebin.com/k3ti7kV2

The remaining problem is abrupt camera teleportation. How do other games deal with it? Let’s see:

Watch Dogs seems to use the simplest method – just teleporting camera at thick ray’s projected position. I can also see a quick interpolation of camera distance from close-up back to default.

L.A. Noire has more pronounced smoothed distance interpolation when the occlusion is gone. Sudden appearance of occlusion still makes abrupt movement though. The most interesting thing in L.A. Noire is the way camera follows you when you don’t move mouse. It can move around corners very intelligently. Not sure how it’s implemented, perhaps it uses AI navigation system?

Hitman Absolution tries to move camera as smoothly as possible, sliding along obstacles, before they’re in front of camera.
I think it’s a good solution, and I decided to implement it.

So here’s the idea:

twoCapsules

Use two spherecasts. One thin (with radius to encapsulate near plane) and one thick. Then:
– project thick collision point (green point) onto ray. You’ll get red point;
– get direction from thick collision point to projected point, multiply it by thin radius and offset projected point back by it. This way you’ll get thick collision point projected onto thin capsule (cyan point);
– Get distance from cyan point to green point. Divide it by (thickRadius – thinRadius). You’ll get the [0-1] number representing how close the obstacle is to thin spherecast. Use it for lerping camera distance.
Code for Unity: http://pastebin.com/BqaJh3Vx

I think that’s quite enough for camera occlusion. You can still try to make camera even smarter at walking around corners as in Noire, but I think it’s an overkill for now. Later I’ll maybe get back to this topic.

Problem 2: composition
Now onto some “cinematic” stuff. First 3rd person games had characters mostly centered on the screen. As games evolved, overall image aesthetics started to become more important. Many photographers will agree that it’s not always the best idea to lock objects dead center – it’s just doesn’t look interesting. The basic rule you (and most importantly, computer) can apply is The Rule of Thirds. Most games today use it to simply put the character a little bit to the side.

thirds

However, can we implement a more dynamic composition search, that is not just dead locked on character being aligned to one line? And how is it supposed to look?

The best references here, in my opinion, are steadicam shots, because these are most closely related to game third-person cameras.
Take a look at some:



As you can see camera changes the focus point and distance quite dynamically, and it looks very interesting. What is not great in context of games, is that camera lags behind characters, so they see something earlier, than the camera.
Camera mainly focuses on character’s points of interest. Also what should be noted is the height of the camera, which is mostly static and not orbiting around at different heights.

Here are results of my first tests (year ago) that implemented some of the ideas:

The middle part is boring and sucks though.
The idea was to mark important objects in the level and make camera adapt to them, aligning everything by rule of thirds together. That’s what debug view can reveal:

Unity 2014-10-14 16-32-19-43

As you can see, the “important” objects marked as green 2D boxes. These boxes are the actual input data for the algorithm. The first box always represents main character.

The algorithm itself is not ideal though and it takes designer’s time to decide which objects should be marked as important to ensure interesting camera movement. The code is a bit dirty and still work in progress, so I’m not sure about posting it here right now. However, if you find it interesting, just tell me, and I’ll post.

Here are the results so far together with smooth occlusion avoidance:

Designing an Ubershader system

OK, so you probably know what ubershaders are? Unfortunately there is no wiki on this term, but mostly by it we mean very fat shaders containing all possible features with compile-time branching that allows them to be then specialized into any kind of small shader with a limited amount of tasks. But it can be implemented very differently, so here I’ll share my experience on this.

#ifdefs

So, you can use #ifdef, #define and #include in your shaders? Or you’re going to implement it yourself? Anyway, it’s the first idea anyone has.

Why it sucks:
  • Too many #ifdefs make your code hard to read. You have to scroll the whole ubershader to see some scattered compile-time logic.
  • How do you say “compile this shader with 1 spot light and that shader with 2 directional lights”? Or 2 decals instead of 6? One PCF shadow and one hard? You can’t specify it with #ifdefs elegantly, only by copy-pasting code making it even less readable.

Terrible real-life examples: 1, 2

Code generation from strings

Yet another approach I came across and have seen in some projects. Basically you use your language of choice and use branching and loops to generate new shader string.

Why it sucks:
  • Mixing shader language with other languages looks like total mess
  • Quotes, string additions, spaces inside strings and \n’s are EVERYWHERE flooding your vision
  • Still have to scroll a lot to understand the logic

Terrible real-life examples: 1, 2

Code generation from modules

So you take your string-based code generation and try to decouple all shader code from engine code as much as possible. And you definitely don’t want to have hundreds of files with 1-2 lines each, so you start to think how to accomplish it.
So you make some small code chunks like this one, some of them are interchangeable, some contain keywords to replace before adding.

Why naive approach sucks:
  • All chunks share the same scope, can lead to conflicts
  • You aren’t sure what data is available for each chunk
  • Takes time to understand what generated shader actually does

Code generation from modules 2.0

So you need some structure. The approach I found works best is:

struct internalData {
some data
};

void shaderChunk1(inout internalData data) {
float localVar;
read/write data
}

float4 main() {
internalData data;
shaderChunk1(data);
shaderChunk2(data);
return colorCombinerShaderChunk(data);
}

So you just declare an r/w struct for all intermediate and non local data, like diffuse/specular light accumulation, global UV offset or surface normal used for most effects.
Each shader chunk is then a processing function working with that struct and a call to it, put between other calls. Most compilers will optimize out unused struct members, so basically you should end up with some pretty fast code, and it’s easy to change the parts of your shader. Shader body also looks quite descriptive and doesn’t require you to scroll a lot.
The working example of such system is my contribution to PlayCanvas engine: 1, 2

Examples of generated code: vertex, pixel, pixel 2, pixel 3

So, I’m not saying this is the best approach. But for me, it’s the easiest one to use/debug/maintain so far.

On @femfreq and violence in games

I actually made this blog to write about purely technical stuff mostly, but all my twitter is so full of these hot debates, so I decided to declare my point of view, just in case.

If you have no idea what I’m talking about, the short story is:
There is a girl called Anita Sarkeesian, who makes videos, pointing to the problems of female characters in games, specifically objectification of women, over-sexualization. violence towards them and often lack of character development. You can watch it here: http://www.youtube.com/user/feministfrequency
The videos made kind of huge response, both positive and negative, from both gamers, developers and random bystanders.

My opinion is that the problem is not limited to just female characters in games, but actually much more global.
What I agree with Anita is the problem of characters. I played quite many games, but I can remember only 3 types of women I’ve seen:

1. Damsel in distress.
2. Ridiculously strong warriors, usually without much armor and visible muscles, but with big tits, of course. Think of Tomb Raider, Remember Me and so on.
3. Third plan character, who you’ll immediately forget.
A bit boring, isn’t it?

But the violence problem is much deeper. It’s not just about female characters. The problem is that actually most games are made of violence. Violence is the main option of progressing through the game in most AAA titles. It was this way for many years, since first characters appeared on the screen. The enemies had to die, and the player had to win. And since female characters appeared in games, it didn’t change too much. You always have the attack button, and you can attack anyone. And while Anita complains about violence towards over-sexualized women, I complain about violence in general, victim’s gender doesn’t matter much. In most games you would kill much more men (usually armed) than women, and men just can’t be “sexualized”, I have no idea how it may ever look, but instead I see the problem in violence itself.

I definitely don’t want too look like some Jack Thompson and I won’t blame games like this. After all, I’m a gamer myself, and I like parts of experience games give to me. I love this industry and I hate censorship.

In the beginning of 2000s, game industry was making huge leaps forward. I was amazed at how fast graphics, physics, and at the same time, plots and characters were evolving. There were new revelations in this medium, new genres, and I felt like I was witnessing a revolution. A revolution of art, a revolution, which will give us a new, incredibly beautiful, realistic yet interesting, interactive experience. Games like Mafia, Morrowind and Half-Life 2 made me feel so.
Did it happen? Not really.

While the technical side of games was evolving, the core ideology stayed the same. There were rare exceptions, like Quantic Dream games or Pathologic, the games that tried to do something out of familiar bounds of expectations.

But most games are still bent on murder. On routine murder that doesn’t affect the story much, that doesn’t suppose to invoke any emotions. I feel like in many RPGs, the only thing that changes on different play sessions, with different characters and different stats and abilities – is the way how you kill. As simple as that. Kill is your primary action and many games seem to differ mostly in the ways you do it.

And even for an indie game developer, it’s actually quite tempting to repeat this pattern. It’s easier to make characters shoot and get killed than to invent something, that really touches you and try to implement it in a game. And when I add characters to the game, that’s actually the first thing I do: shooting and killing. Because it feels like a mechanic, you can play it, it gives challenge, and you’ve seen it so many times, you realize its implementation quite well. Decals, particles, ragdolls, familiar stuff!

But then I feel it’s not the thing I want to leave after myself. What do I love about games? I love to feel myself in a different life, a different place, being a different person and experience emotions of living this way which will then become a part of me. I want to feel my abilities and the consequences of using them. I felt something like this in the latest Deus Ex, Fahrenheit and Pathologic, but it wasn’t absolutely perfect. I don’t want it to feel like a Mario game, I don’t want infinitely kill some enemies. And at the same time I want it to be interactive. I want to do what I want and not the designer. I want to have the ability to kill, because this way, it becomes YOUR choice, it provokes drama and emotions, but I don’t want it to be the only way. I don’t want to play as superhuman, who decides everyone’s fate.

I don’t even fucking know what I want. And this is very sad, considering I already made a demo of my game, which in the end turned out to be the same crap I don’t like in games in some of its aspects. I know it should be different.

I think this medium is still very young and we’re capable of making something totally new.

Tiled deferred shading tricks

Last update: 26 May 2014.

This post will cover some of my humble findings when implementing a tiled deferred rendering with MSAA support.
I will also update it occasionally.

Recap:
————————–
Deferred shading is a well-known rendering technique: we first render the scene to a ‘G-Buffer’, containing geometry and material data (e.g. position/depth, normals, surface glossiness etc) and then compute all the lighting and shading in screen space [1].

Pros:
– reduced shading overdraw: only one complex shader invocation per pixel (+ additional per pixel inside of each light’s influence area); you can do the same with Z-Prepass in forward, but it will cost you 2x drawcalls.
– lighting is decoupled from materials/objects.
– G-Buffer is anyway required for many advanced effects, which are difficult to substitute with something else (e.g. SSAO, SSR).

Cons:
– doesn’t handle semi-transparency, you have to draw transparent stuff forward-style.
– can be bandwidth-heavy, requires tight G-Buffer packing, Crysis 3 is a good example [2], also stencil culling is extremely useful for selecting pixels, only affected by light. The less you repaint your render targets, the better.
– overly compressed G-Buffer can exhibit artifacts (e.g. Unity).

– difficult to integrate with MSAA, many deferred games just use post-AA (e.g. FXAA), however, quality is far from MSAA due to lack of sub-pixel data. The latest approach is to perform an edge detection into stencil and then do per-sample shading only on these edges and simple per-pixel everywhere else, also used in Crysis 3 [2][3], however this approach suffers from bad locality of edge pixels on the screen.

Tiled shading is an approach, where we divide our screen into tiles, that are bigger than pixels (e.g. 8×8, 16×16), test which lights affect these tiles and then shade pixels with only lights that belong to their tiles [4]. Simply, ‘divide and conquer’. However, for good culling quality, access to depth buffer is necessary.
Tiled shading can be implemented multiple ways and used with both forward and deferred approaches, notable examples are BF3 tiled deferred (with code!) [5], Forward+ [6], and Clustered shading [7].
I’ll divide all these approaches into 2 groups: tiled forward and tiled deferred.

Tiled forward pros:
– decouples lighting from materials and objects (like deferred).
– works with MSAA.
– can work with semi-transparency!
– each drawcall can implement its own lighting model (unlike deferred, where we have to fit all lighting models in one shader).

Tiled forward cons:
– requires Z-Prepass for good light culling (a lot of false positives otherwise).
– heavy shaders can be slow on small triangles [8][9].

Tiled deferred pros:
– reduces bandwidth cost by replacing old multi-pass light accumulation.
– light can be accumulated at better precision in a single register (in classic deferred you accumulate usually in 16 or 10 bit textures, because full 32-bit float is too heavy).
– can reuse the same per-tile data to shade transparency the tiled forward way.

Tiled deferred cons:
– still hard to do MSAA.
– still have to be careful with G-Buffer size.
————————-

Now back to topic. So I decided to develop a tiled deferred renderer with MSAA. I packed my G-Buffer (best fit normals is your best friend [10]) and arrived at DirectCompute land.

The most fun thing is that you can actually perform ALL rendering in a single compute shader, after you have G-Buffer. Light culling, shading, edge detection, AA resolve and everything can be fit into one CS, which is very nice, because we can reuse a lot of data without reloading it in every stage. Compute shaders are beautiful and I really recommend you to look into BF3 paper [5] to see how you can switch from per-pixel processing to per-light and generally process data in any unimaginable patterns.

The must read paper is also Andrew Lauritzen’s “Deferred Rendering for Current and Future Rendering Pipelines” [8].

There is also a very helpful code: http://visual-computing.intel-research.net/art/publications/deferred_rendering/

Lauritzen proposed an interesting idea of dealing with MSAA: instead of branching on each pixel and selecting per-pixel or per-sample (if it’s on the edge) shading, you find all edge pixels, collect them into some array and then distribute the processing of this array to all threads. This way it is more parallel: first all threads shade per-pixel, then they all process remaining edge samples.

Now onto my tricks.

Trick 1: resolve in the same CS.

Lauritzen’s method of redistributing per-sample shading is great, however, where do we output these sample values? As we try to distribute samples uniformly across all threads in a thread group, each thread now may output values completely randomly, into different samples of different pixels. In his sample code, Lauritzen addresses this by having a ‘Flat’ framebuffer, with the size of  GBufferWidth * GBufferHeight * MSAASamples and element size of uint2 (RG+BA 16 bit) which is resolved later. However, this can be quite costly.

Instead, we can allocate a small array for an each thread group, like

groupshared uint2 msaaAccumBuffer[BLOCKSIZE * BLOCKSIZE];

When you do per-pixel shading, you simply save the result there:

msaaAccumBuffer[groupIndex] = PackColor(color);

However, for each edge-pixel found, with per-sample shading required, you output scaled value:

float weight = 1.0 / numSamples;
msaaAccumBuffer[groupIndex] = PackColor(color * weight);

And when you process redistributed edge samples, you also scale them and accumulate in this array:

uint2 packed = PackColor(color * weight);
InterlockedAdd(msaaAccumBuffer[g.y].x, packed.x, tmp1);
InterlockedAdd(msaaAccumBuffer[g.y].y, packed.y, tmp2);

CS can do InterlockedAdd only for int/uint, and can’t work with floats. Instead, we scale float color channels into big uints and pack RGBA into uint2 with 16-bit per channel. The trick is that even when packed, addition will still work correctly and we can directly accumulate all samples into one anti-aliased color without any further resolve – 1 add per two channels.

When all samples are shaded, you unpack:

renderTarget[id.xy] = UnpackColor(msaaAccumBuffer[groupIndex]);

Packing/Unpacking:

// Look for PackUint2/UnpackUint2 in Lauritzen's code
uint2 PackColor(float4 color)
{
    uint4 colori = color * 65535;
    return uint2(PackUint2(colori.xy), PackUint2(colori.zw));
}

float4 UnpackColor(uint2 packed)
{
    uint2 RG = UnpackUint2(packed.x);
    uint2 BA = UnpackUint2(packed.y);
    return float4(RG, BA) / 65535;
}

So, it turns 1.0f into 65535 (uint). Why not just 255? Because, we accumulate these uints, small and scaled, and we have to have better precision for small values to get correctly looking sum.

Note, that I accumulate already tonemapped clamped colors – it is required to not break anti-aliasing [11].

 

Trick 2: Ray vs Thickened Cone for spotlight culling

Culling of non-point lights for tiled shading seems to be a poorly documented area. However, from what I’ve heard, most people implement light culling by checking intersections of tile frustum (a thin one, coming through tile vertices) with some geometric shape, like OBB or sphere around light, but frustum-cone intersection is not very easy and cheap thing to do, so you have to overestimate the number of tiles affected.

However, if you’d have a single ray and not a tile frustum, things become much easier and computationally cheaper.

The biggest problem of replacing a thin frustum with a ray is that ray is very small and doesn’t cover the whole tile and can easily miss the primitive, but we can solve it by ‘thickening’ primitives based on the distance.

Here’s the code, I came up for this kind of culling. Note: it can be optimized further, e.g. using something cheaper instead of matrix multiplies to transform from space to space, but you should get the idea:
http://pastebin.com/Ld7sfBbN

(Something’s very very wrong with wordpress text formatting. It makes code completely unreadable, so I had to use pastebin. Fuck you, wordpress).

The result should look like what you get from stencil light culling, but per tile:
coneculling
If you’re interested in math behind it, I actually found the useful formulas in [12]

The above code works fine when you’re close to the light source, but it does not account for mentioned thickening and will look buggy, when you move far enough.
As I use matrices to transform into cone space and back, I actually do the thickening on the CPU by tweaking these matrices.
What I currently do, is definitely not an ideal solution by any means, but it’s still kind of works: the idea is to find the most distant point on the cone (we can approximate it as a sphere this time) and then somehow calculate the thickening amount from distance between this point and the camera.
It is C# Unity-specific code (yes, I’m trying to glue my new renderer to it), but you should be able to understand it:

// Finding scale for the unit cone from its length and angle - without thickening
// Can be done once, unless light dynamically changes its shape
float baseRadius = length * Mathf.Sin(angle * Mathf.Deg2Rad * 0.5f);
lightScale.Add(new Vector3(baseRadius*Mathf.PI, baseRadius*Mathf.PI, length));
float lightMaxScale = Mathf.Max(Mathf.Max(lightScale[i].x, lightScale[i].y), lightScale[i].z);
-----------------
// Thickening
Vector3 lightEnd = lights[i].transform.position + lights[i].transform.forward * lights[i].range;
Vector3 lightCenter = (lights[i].transform.position + lightEnd) * 0.5f;
Vector3 vecToLight = lightCenter - camera.transform.position;

// Black magic starts
float distToFarthestPoint = Mathf.Sqrt(vecToLight.magnitude + lightMaxScale * 0.5); // don't ask me about the sqrt
float posOffset = distToFarthestPoint * 0.2f;
lights[i].transform.position -= lights[i].transform.forward * posOffset;
lights[i].transform.localScale = lightScale[i] + new Vector3(posOffset, posOffset, posOffset*2);
// Black magic ends. I don't like these 0.2 and 2 and sqrt and all, and will think further about making it all more meaningful. But it kinda thickens.

 

————————–
[1]
http://en.wikipedia.org/wiki/Deferred_shading

[2]
Tiago Sousa, Rendering Technologies from Crysis 3
http://www.slideshare.net/TiagoAlexSousa/rendering-technologies-from-crysis-3-gdc-2013

[3]
Nicolas Thibieroz, Deferred Shading Optimizations
http://developer.amd.com/gpu_assets/Deferred%20Shading%20Optimizations.pps

[4]
Ola Olsson and Ulf Assarsson, Tiled Shading
http://www.cse.chalmers.se/~uffe/tiled_shading_preprint.pdf

[5]
Johan Andersson, DirectX 11 Rendering in Battlefield 3
http://dice.se/wp-content/uploads/GDC11_DX11inBF3_Public.pdf

[6]
Jay McKee, Technology Behind AMD’s “Leo Demo”
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/AMD_Demos_LeoDemoGDC2012.ppsx

[7]
Emil Persson, Practical Clustered Shading
http://www.humus.name/Articles/PracticalClusteredShading.pdf

[8]
Andrew Lauritzen, Deferred Rendering for Current and Future Rendering Pipelines
http://bps10.idav.ucdavis.edu/talks/12-lauritzen_DeferredShading_BPS_SIGGRAPH2010.pdf

[9]
Kayvon Fatahalian, Solomon Boulos, James Hegarty
Reducing Shading on GPUs using Quad-Fragment Merging
http://graphics.stanford.edu/papers/fragmerging/shade_sig10.pdf

[10]
Anton Kaplanyan, CryENGINE 3: reaching the speed of light
http://www.crytek.com/cryengine/presentations/CryENGINE3-reaching-the-speed-of-light

[11]
Emil Persson – Custom Resolve
http://www.humus.name/index.php?page=3D&ID=77

[12]
Vjeux – Javascript Ray Tracer
http://blog.vjeux.com/2012/javascript/javascript-ray-tracer.html

geom.io [beta]

webgllll1

А тем временем я делаю новый проект – geom.io – хостинг 3D моделей.

Сие даёт возможность вам:

– Тестить различные реал-тайм материалы и свет на ваших моделях (аля Marmoset) – доступен стандартный материал с простыми настройками и возможность написать/скопипастить любой другой шейдер вместо него;

– Загружать ваши модели с настроеным рендером в инет, в результате вы получаете прямую ссылку и HTML код для встраивания в любую страницу (как ютуб).

Вскоре будут добавлены тени, АО, регистрация с созданием личной галереи и прочие ништяки.

Можно посмотреть уже залитые модельки: http://geom.io/gallery.php

Требуется актуальный браузер с HTML5 и WebGL – свежие Firefox, Chrome гарантированно будут работать, в хроме на Android тоже можно просматривать модельки худо-бедно.

Если вы читаете это, и у вас есть модели – заходите потестить и оставляйте отзывы/пожелания/багрепорты/фичреквесты.

Ещё про мой диплом: затухание света и спекуляр

На самом деле, кроме теней, я не проводил такой капитальный ресерч, достойный отдельного поста по другим темам. Поэтому я решил выгрузить все остальные интересные вещи в один пост. Пока выгрузил только одну.

Во-первых, о чём, собственно речь? Это видео с финальной дипломной сцены:

http://www.youtube.com/watch?v=7IolXxg1_q8

И пара скринов:

buildFinal 2013-06-20 00-22-52-35 buildFinal 2013-06-20 00-38-38-52

На самом деле, это далеко от того, что планировалось. Институт обязал нас делать много бумажной работы и, собственно, качество самого диплома мы поднять не успели. Многие из хитрых графических трюков я просто не успел вставить в сцену, которую друг-артист прислал мне в последние 3 (!) дня перед защитой.

Что здесь интересного, и на что я и в будущем планирую обращать внимание:

Спекуляр загораживается тенями от источника, даже если их не видно в дифузе. Это почти никогда не применяется в играх, но это действительно так, можете проверить в вашем любимом рейтрейсере:

Это то, что мы видим обычно в играх:

scanline

Тут есть большая ошибка: затухание источника искусственно ограничено и быстро уходит в ноль на краях. Из-за этого спекуляр обрезается ещё палевнее, чем диффуз. В играх свет часто обрезают так, чтобы можно было вписать источник в сферу и избавить множество фрагментов от вычисления света. В дефереде так можно растеризовывать лоу-поли сферы, в которых семплить глубину с нормалями и быстро шейдить.

Однако это всё идёт мимо реализма.

В реальности, источники затухают обратно квадратично – т.е. 1/(dist^2). Вы можете удивиться, насколько большая разница будет в освещении, если вы попытаетесь сменить привычные range-based лайты на реальные.

Вот ещё неплохой пост на тему: http://imdoingitwrong.wordpress.com/2011/01/31/light-attenuation/

У такого затухания есть большой минус для производительности – оно может затухать очень-очень долго, что может быть совсем не практично для десятков реалтайм лайтов, которым придётся покрыть всю сцену. Однако в моём случае свет был статичный и всё что было нужно, можно было запечь.

Так будет выглядеть тот же кадр с обратно-квадратичным затуханием:

scanline2

А теперь давайте посмотрим на более интересные кадры:

vray1

Здесь мы имеем чисто диффузные поверхности, освещённые одним лайтом. Это то, что вы обычно запекаете в лайтмапы.

А теперь:

vray2

То же самое, но с более отражающими поверхностями.

Мы можем хорошо видеть появление длинного блика от лайта на полу, но этот блик никогда не будет заходить в тень от него. Там где в диффузе нам, благодаря затуханию, видится сплошная тьма – оно, на самом деле ещё продолжается, просто нашему глазу сложно отличить черноту затухания от черноты теней. И не только нашему глазу – числовая точность в этих тёмных местах тоже будет очень низка.

Объяснение тут очень простое: диффузный свет имеет значительно меньшую интенсивность, чем отражённый. Отражённый свет фокусируется в более “плотные” узкие пучки, в то время как диффузный рассеивается во все стороны.

Поэтому там, где диффуз на глаз совсем затух, спекуляр ещё жив и продолжает обрезаться тенями.

В обычных лайтмапах для этого недостаточно информации, если только вы не собираетесь их хранить во флоате.

11IMG_8577

На этих убогеньких фото можно увидеть тот же эффект: на левых картинках можно хорошо увидеть тень (от столба, от машины), а на правых её нет. На левых картинках  блик падал на поверхность и отделял тени – на правых нет. Запекание диффузного света даст нам только правые.

Попробуйте и сами погулять ночью по улицам – вы увидите, что из-за множества источников света, эффект усиливается, одни тени будут часто исчезать, другие появляться.

Так это может выглядеть в реалтайме:

specc

(Да, это юнити, но с моими шейдерами).

С разных ракурсов кажется, будто тени с разных сторон – такое же можно увидеть и в реале.

В плане реализации, я не придумал ничего умнее, кроме как просто запечь отдельно тени без затухания. RGBA8 текстура вместила 4 теневых маски, по штуке на канал.

Т.к. тени – это просто чёрно-белые маски,  неплохо прокатывает и 4-битная точность. Я пробовал засунуть 8 масок в ргба8 и оставить одну выборку, но при распаковке ломалась фильтрация.

Таким образом, у меня вышли GI-лайтмап (только индирект) и маски теней без затухания. Распространение света и затухание считалось в шейдере.

Индирект довольно рассеянный и не имеет широкого дипазона обычно – его можно хранить в лоуресе DXT1. Вообще, их было три штуки (radiosity normal mapping).

Маски плохо реагируют на DXT – поэтому их я хранил по паре RGBA4 в финале.

Что-то лень стало дальше писать, так что To be continued.

Penumbra shadows

Тут я начну цикл постов по темам того, чем я занимался в своей дипломной работе. Называлась она незатейливо: “Реалистичные материалы в реалтайм рендеринге”, однако под собой это подразумевало всё что угодно от реалистичных теней до избавления мелкого спекуляра от алиасинга.
В целом, задача была – рендерить красивую сцену в реалтайме.

В этом посте я расскажу, что я делал с динамическими тенями.
Тени должны были иметь вариативный радиус полутени – чтобы вблизи кастера они были чёткими, а вдали – размытыми, степень размытости должна варьироваться от физического размера источника.

Я изначально упростил себе задачу – пусть тени отбрасывают только динамические объекты (которых в демке будет немного), а на статике запечём лайтмапы.

Мучавшись с месяц, я родил вот такую демку:
!iengine 2013-02-13 01-10-23-92

!iengine 2013-02-13 01-10-36-45

iengine 2013-02-13 01-18-55-10

Её можно скачать здесь: http://geom.io/iengineShadows.zip

/*
В ini файлике можно поменять разрешение.
 Если у вас не нвидия - снизьте antialiasing, т.к. по умолчанию там нвидия-специфик CSAA.
 Мышь + WASD - летать
 LMB - задать направление света в соответствнии с направлением камеры.
 Колесо мыши - менять размер источника света (т.е. размер пенумбры теней). Идеально чёткими конечно не сделать, т.к. ограничено разрешением шадоумапы.
 Требуется нормальная видеокарта скорее всего.
*/

На что следует обратить внимание в первую очередь, так это на отсутствие шума, так надоевшего мне в тенях многих современных игр и действительно большой радиус размытия (широченные гауссы в реалтайме проблематичны).

Короче говоря, как это работает: за основу я взял технику PCSS, суть которой в нахождении для каждого фрагмента некого среднего значения глубины вокруг него в шадоумапе – это значение конвертируется в радиус размытия, который затем юзается в PCF.

Технику юзали не часто, ибо она тормозила. Поиск средней глубины требовал множество выборок, PCF при большом радиусе – не меньше. Чтобы PCF был широким и гладким, его надо сделать совсем тормозящим, и все равно ещё будет присутствовать алиасинг на поверхностях под острым углом (отсутствие мипмаппинга шадоумапы). Альтернативы – мало семплов и жуткий banding или вышеупомянутый шум. В общем то, в играх научились маскировать этот шум не так уж плохо – проходясь по нему скринспейс блюром. Но намётанный глаз все равно спалит =).

Первым делом я решил заменить PCF на другой алгоритм. Чудесность PCSS в том, что PCF в нём совершенно необязателен – даже при не высоком числе выборок в стадии blocker search, мы получаем не самые кривые коэффициенты размытия, которые можем засунуть в любой алгоритм.

Меня заинтересовали summed area tables. Суть их в том, что благодаря простой арифметике, имея картинку, каждый пиксель которой содержит сумму всех пикселей выше и левее его (существуют вариации и с ниже и правее, но не суть), мы можем найти среднее значение всех пикселей в любом прямоугольнике на ней, имея лишь угловые значения. Сперва это может туго пониматься – но атишная дока довольно наглядна. Таким образом, сделав один раз препасс и превратив любую текстуру в SAT, мы можем за 4 выборки и маленькое число инструкций получить блюр любого радиуса. Ух ты!

Ух ты ли? На самом деле далеко не совсем.

Во-первых, суммы пикселей будут иметь чертовски широкий диапазон значений. Если текстура была RGBA8 формата, для SAT в большинстве случаев придётся создавать RGBA32F. И даже в точность флоатов SAT вносит много погрешностей. Если на цветовой текстуре их может быть не заметно, это может сломать шадоумапы. Я бы не стал юзать SAT для больших теней аля открытый мир, но т.к. в моих планах было маленькое число дин. объектов в мире статики – жить было можно.

Во-вторых, препасс очень тяжёлый. “Сложить все пиксели текстуры” звучит несложно на словах, но совсем не дешёво на практике. Лучший известный способ, он же представленный в атишной доке, требует несколько пассов, причём кол-во пассов очень быстро увеличивается при увеличении текстуры. Генерировать SAT больше, чем на 512х512 – дохлый номер. Дешевле делать VSM с широким блюром.

Но, однако, в вышеобозначенной демке я всё же использовал SAT – ещё не успев окончательно в нём разочароваться.

Были применены некоторые дополнительные трюки:

Дело в том, что у PCSS техники есть один знатный баг – невозможно получить несколько полутеней, пересекающих друг друга корректно – blocker search видит только ближайшие к камере данные из шадоумапы. Поэтому “главной” полутенью будет полутень ближайшего к ней объекта – и если какой-нибудь более мелкий объект стоит в тени и кастует свою тень на полутень объекта за ним – она не отобразится. Будет полутень главного, а потом резко начнётся тень мелкого, как только он появится в шадоумапе.

Пока тени не пересекаются, это не заметно, но я хотел это исправить. Для этого я сделал из шадоумапа атлас, в котором выделил отдельно место для каждого объекта – таким образом я ещё и сэкономил пространство текстуры и смог крутить препасс SAT отдельно на каждый блок атласа. Вообще там было хитро – 512х512 атлас с 4мя шадоумапами по 256х256, я смог генерировать SAT атласа за кол-во пассов, необходимое для одной 256х256 текстуры.

Таким образом, у меня были данные всех объектов в шадоумапе не загороженные и можно было избежать этого артефакта – можно заметить его отсутствие на втором скрине.

Тем временем сроки подходили к концу, на меня снова стал сыпаться контент, и такие экспериментальные методы пришлось отбросить. У меня не было времени подготовить к “продакшену” всю эту систему с атласами.

Дело было упрощено до VSM + PCSS. Шадоумапа рисовалась в VSM текстуру без всяких атласов, далее по ней проходился минимальный блюр. PCSS юзал тот же PCF цикл, который вместо бинарных сравнений/hardware pcf’а семплил эту VSM карту. Минимальный блюр в ней был конечно пошире хардварного псфа, это позволяло брать мало семплов (при псфе это выглядело бы как жуткий бандинг). В результате получались тени с широким (много семплов неширокого) блюром вдали от кастера и менее широким вблизи. В идеале хотелось сделать их вблизи чётче – но и так более-менее устраивало:

btest 2013-05-29 16-45-03-04 1dxc btest 2013-06-19 19-52-11-26

Конечно, эффект уже был не такой, но с ними просто было работать. Тут можно посмотреть видео: http://www.youtube.com/watch?v=2jk5TmfKNZA

Остался, правда, косяк, который я не успел исправить – а именно резкие границы у черезчур размытых теней. Для оптимизации у меня в шейдере стоял if, который не считал тени там где их не должно было быть – но работал он не совсем корректно.

Далее подобным образом я сделал и тени от поинтлайтов – для VSM, их пришлось рендерить как dual-paraboloid‘ы.

Прикольная черта теней от поинт лайтов – т.к. мы снимаем шадоумапы с центра лайта с включённой перспективой – у нас автоматически дальние объекты становятся мельче и тени от них размываются сильнее. Получается дополнительный фейк в пользу визуального эффекта корректной полутени =).

btest 2013-06-01 01-33-28-62 btest 2013-06-01 01-34-11-40 btest 2013-06-05 03-43-43-04 btest 2013-06-03 23-13-26-26

Каковы мои дальнейшие планы?

Мне нравятся distance fields и то, что с ними можно сделать. Даже очень лоуресные DF могут трейситься как довольно похожие на оригинальную форму геометрии – в дипломной работе я использовал их для самоотражений объектов (но об этом в другой раз). Много чего можно запечь в маленькие DF. А можно и попытаться генерить их в реалтайме…

Арт

Прежде чем рассказать о своём дипломе, запощу ка я сюда вялую коллекцию своих моделек и картинок.

debris6.jpg1e44cf4d-c1fb-421e-90b7-5427b8c2f090Large fence8.jpg1d41c893-537c-4b35-b019-3d4308049c34Large

Environment: я делал всю геометрию уровней/пропсов в Incident, а друг текстурил её. Сейчас наши самые вменяемые модели тех времён можно купить на турбосквиде.

Модель вертолёта друг так и не затекстурил – а ведь она планировалась быть “главным героем” симулятора:

mi2

mi2_3

А вообще, я люблю моделить лица людей. Я вообще люблю лица людей.

98b1dd2e2bd515daa2bb782b523fa574 96874deebbf298c696ea502e3357ad96 99f30b8f3631ae85d476b9f48fd5bc5f ec749be4dac3f5f02d5fd505480685f7 83a36c55ff2829e2ddcdc836928109a4

Давно планирую заняться, наконец, продолжением сей деятельности.

А пока остаётся показать моё 2D – но оно уж совсем убого:

someI1a copy3c_comp copyLo

fadedsk1 copyBW IMG_3780

Hello world

Привет.

Я – самопровозглашённый программист графики и indie-разработчик игр. Моё желание и цель – разработка хороших и интересных видеоигр.

Помимо графики я также писал физику, геймплей, тулзы, плагины… и чего только не писал. Главное – было бы ради чего.

Что же я сделал и каков мой пройденный путь?

Жили были я и мой друг (я обязательно оставлю его ник/контакты, как только он решит что мне о нём здесь написать). И хотели мы делать игры. И повелось так, что я писал весь код, а он делал арт. Впрочем, я тоже иногда делаю арт – но об этом позже. Ну и мы начали:

– Так и не завершённый (заморожен до далёких времён) шутер Incident. Изначально был модом на игру Mafia: The City of Lost Heaven. Следует упомянуть, что никаких официальных моддинговых тулз к движку не было – и фанаты изощрялись как могли (и, сложно поверить, изощряются до сих пор). Приходилось реверсить форматы, писать самопальные недоредакторы, конвертеры, мучить файлы в хексе… Наибольший вклад в создание вообще возможностей моддинга тогда внесли такие люди, как GOLOD55, zibob32 и Akay, возможно и другие, кого я незаслуженно забыл. Попачкать руки пришлось и мне – отличительной фишкой тогдашнего Инцидента были лайтмапы – до этого моддеры довольствовались повертексным движковым светом.

Счастливые и безмятежные были времена моддинга. Несмотря на кривость наших недоредакторов, мы всё же могли пользоваться готовым, проверенным движком, который скрывал от нас многие сложности и “просто работал”.

Скриншоты с Mafia-версии:

screen_ls3d1

screen_ls3d2

screen_ls3d3

screen_ls3d4 screen_ls3d5

По сравнению с мафийными модами того времени, картинка у нас была самая понтовая, и ЧСВ так и распирало.

Однако этого нам было мало. Прогресс стремительно двигался вперёд, графика Мафии устаревала (дело было в 2007 году, если память не изменяет), и мы хотели сделать круче. Резко прекратив разработку мода, мы объявили Инцидент отдельной игрой. Вероятно, зря – в виде мода мы бы хотя бы смогли бы его довести до конца.

Тут надо сделать небольшое лирическое отсутпление что бы понять причину такого решения. С детства я интересовался программированием. Моя бабушка была программистом, её брат тоже что-то умел – тут и там валялись книжки от “MS-DOS для пользователя” до “Введение в язык С” (названия пишу по памяти, но смысл верен). ПК в семье был с моего 4х-летнего возраста и много раз менялся по мере моего взросления, однако до 2004 года доступные мне ПК всегда были на несколько лет отстающими от современных. Страстными глазами я смотрел на скриншоты современных игр в журналах и мечтал о них. По большей части, все игры, которыми я располагал были очень скучными и сложными. Я не был одним из тех хардкорных игроков, которые проходили HL1 по несколько раз – я не смог пройти его даже один. У меня просто не было мотивации, уровни были однообразны, а атмосфера неизменно угнетающей. И это уже HL1 – чего уж говорить о поколениях более ранних игр.

В общем, мне нравились игры как сама по себе форма – но до 2000х я не встречал конкретных экземпляров, которые бы вызывали у меня большую симпатию. И т.к. в моём распоряжении был компьютер, на котором каким-то чудом стоял qbasic, несколько книжек и англоязычный хелп этого самого бейсика – я (взяв толстый словарь) начал пытаться непременно делать свои игры, в которых бы всё было как я хочу. За кубейсиком последовал Dark Basic – к моему огорчению, 3D-ускорителя у меня не оказалось, и пришлось довольствоваться 2D. Тогда я сделал свои первые спрайтовые гоночки и недостратегии. Dark Basic сменился Blitz Basic’ом – и в него я влюбился надолго, заодно скорешившись с коммьюнити русских блицеров. Блиц казался менее “казуальным”, и к нему писали множество либ и врапперов – можно было даже пожамкать PhysX.

Будучи долгое время на устаревшем железе, жадно смотрящим в сторону современных игр, и судя о них по скриншотам в журналах (ибо больше не по чему было – ютубы ещё не появились, да и на диалапе много не посмотришь), я восхищался их графикой, и даже нередко её переоценивал – кто там разберёт на маленькой картинке – настоящее отражение или в диффуз впечено? Это мотивировало меня и самому пытаться делать что-то “графичное”, пытаясь извратиться на бедном FFP как только можно (вспомнить хотя бы тормозящее попиксельное искажение на ЦПУ).

В то время (конец 90х, начало 2000х) игры развивались очень быстро – как технологически, так и идейно. Мне казалось, что я стою на пороге некой революции – будто ещё немного, и человечество создаст, наконец, Матрицу. Создание игр приняло для меня облик сотворения миров, такой эпичный и захватывающий, что хотелось быть частью этого.

Поворотным моментом для меня стало появление движка Xors3D – который добавлял в блиц DX9 рендер с шейдерами, от самого упоминания которых я с восхищением дрожал. ШЕЙДЕРЫ! Я люблю, как звучит это слово!

Итак, в моём распоряжении был любительский DX9 движок и, благодаря помощи и примерам Моки, я разобрался в основах HLSL. Помимо этого, благодаря моддингу и сопутствующему реверсингу, я худо-бедно понимал как нормальные люди хранят бинарные данные. Это и было причинами, по которым казалось, что разработка качественной игры вполне себе осуществима.

Наш энтузиазм, на удивление, передавался и другим – немало помогли нам тогда художники Сергей Су и SkyGround.

Что же было дальше? Скриншоты:

screen1

screen2

screen3

Можно даже сказать, что получалось не так плохо. Однако тратили мы на это чертовски много времени – да и на что тратили – вместо того, чтобы делать, собственно игру, мы погрязли в создании графики и, в конце концов, поняли, что в наши уровни невозможно играть. Они, выстраданные, пробегались за пару минут, а продумывание геймплея, по большей части, было упрощено до расставления врагов на равномерном расстоянии по пути игрока. Это не было похоже на глубокое атмосферное приключение в пост-апокалиптическом мире – наш энтузиазм начал иссякать. Диздок был многократно переписан с нуля, и, наконец, проект был заморожен – мы просто не могли потянуть его в то время, или были не столь умны, чтобы сделать его уникальным и интересным и в то же время дешёвым в реализации.

Проект дал мне много опыта – под шквалом падающего в игру контента, необходимо было оптимизировать как пайплайн его засовывания в движок, так и сам рендер. Много пейперов было прочитано.

Результирующим подходом по пайплайну было определение свойств поверхности по её текстуре, причём не только графических но и физических и прочих. К примеру, когда игра загружала модель ящика, она связывала имя его текстуры с одноимённым конфигом, содержащим свойства шейдера ящика (дефайны, константы), физические свойства (трение, масса), звуки (слабый/сильный удар, звук попадания пули), декали (виды дырок от пуль). Вообще, подход был довольно приятен в работе, но, как оказалось, нам периодически хотелось использовать одну и ту же текстуру, но давать объекту разные свойства рендера/физики, и тут система начинала разваливаться – доходило до того, что мы дублировали одинаковые текстуры ради того, чтобы у них были разные имена.

Рендер был сделан в традициях Source – лайтмапы для статики, ambient cubes для динамики; динамические тени от дин. объектов я блюрил в скрин-спейсе и умножал на лайтмапленую картинку. Сцена состояла из множества convex секторов, соединённых порталами – через порталы проводился дополнительный фрустум, который куллил видимый в нём сектор. Шейдеров, кроме постпроцессов, было 2 – для статики и для динамики, но это были толстые убершейдеры, контроллируемые дефайнами.

Впрочем, ФПС все равно не радовал – движок был не слишком оптимизирован, да и за дипами мы плохо следили. Последней каплей стало отсутствие поддержки в движке фоновой загрузки ресурсов – наш контент весил так много, что мы просто обязаны были стримить его по ходу движения игрока, он не влезал в тогдашнюю мелкую видеопамять (у меня было 256 мб).

Ну и тогда я пошёл дальше – надо было непременно перейти на C++ и иметь возможность выбирать разные движки.

Я быстренько поглядел туторы по С++, подключил, для начала, тот же Xors3D и пошёл делать свой игровой движок – по ходу дела допуская все мыслимые и немыслимые ошибки, создавая утечки и рандомные краши. Первый блин – комом!

Движок загружал сцены, сделанные в моём же редакторе, причём большую часть времени я потратил как раз на редактор – теперь мне приходилось не только заботится о работоспобности игры, но и функционале и удобности редактора.

Редактор умел много чего за пределами расставления моделек: создание CSG геометрии и булевые операции с ней, автоматическая развёртка под лайтмапы, интеграция с Beast (не спрашивайте откуда он у нас тогда был), настройка постпроцесса, внутренний скриптовый редактор…

blitzccf3133704

blitzccf3137072

blitzccf3259104

Отдельной гордостью была система партиклов, полностью на GPU:

blitccf3517200

Однако всё это было сильно сырым и глючным. Кроме того, осознав, что в данный момент мы не потянем большой проект, мы решили испытать это всё в чём-нибудь попроще. Таким образом, наш следующий проект был аркадой про лодочки, которые плавают и стреляют друг в друга.

blitzccf3284536

Графика в этой игре не блистала – были взяты первые попавшиеся под руку модельки и сделано несколько тестовых миссий. В них можно было играть, но крайне скучно. Энтузиазм снова иссяк.

Отчаявшись, мы начали делать приложение для вконтакта – но это не только было невесело делать, но и конкурентов было полно, а за серверы надо платить. Слишком мало гарантий, слишком мало интереса в разработки для соц. сетей. Проект был снова заброшен, не успев толком начаться – разве что успел немного вкурить в AS/PHP/mySQL/VK API.

Где-то в параллели, я продолжал помогать моддингу Мафии, периодически выпуская разные тулзы и небольшие моды. На mafiapub.com (в момент написания поста сайт, увы не работает; но упомянутые штуки можно найти и на других фан-сайтах) можно найти мои импортер/экспортер моделей для макса (на MaxScript) а также мод, добавляющий шейдерную воду (Water Shader Mod), сделанный через жестокий прокси длл между игрой и DX8 (шейдеры 1.0, написаны на асме).

Вообще, стоит отметить, после C++ опыта никакой язык уже не выглядел для меня совсем уж “не родным”. Везде, по большому счёту, одинаковые конструкции, всем правит логика. Так что с того времени я сбился с счёту на каких языках я писал. Когда ты понимаешь основные принципы кодинга и работы компьютера, ты видишь их отражение в любом языке или API.

Тем временем, я был студентом института по киношной специальности. В качестве курсовых мы делали ролики, и один стоит отметить и здесь – т.к. времени на него было угрохано не мало. В плане программирования я делал некоторые помогающие пайплайну скрипты и шейдеры (HLSL для вьюпорта и для mental ray’евские для рендера), но по большей части на мне было всё 3D, монтаж и часть компоуза, а друг был “арт директором” и делал всё 2D.

А вот и сам ролик:

http://youtu.be/trHQPw_pzWk

Множество шотов из него было вырезано и так и не дошли до финала из-за того что их просто нельзя было куда-то вставить “в тему”.

Как, например, вот этот кадр с Йоханнесбургом: youtu.be/V3d7PKOejgM

Между тем, Render, автор PhysX враппера для блица и соавтор Xors3D, связал меня с чуваками, занимающимися авиатренажёрами. Перспектива разрабатывать вертолётный симулятор для тренировки пилотов была интересной – с неё можно было получить намного больше опыта (и денег, как предполагалось), чем с клепанья унылых казуалок (чем в большинстве случаев занимаются унылые геймдев стартапы).

Так началась разработка симулятора и самая большая моя прокачка как кодера. Симулятор делался полностью на своём движке (DX9), который переписывался с нуля пару раз. Мой движок наконец поддерживал стриминг. Друг-артист отказался рисовать террейн тайлами и я был вынужден реализовывать мегатекстуры и их стриминг. Мегатекстуры были по началу по заветам Кармака – рисовался специальный пасс, выводящий инфу о необходимых тайлах, анализировался на цпу, и данные подгружались для отрисовки. Однако, гонять текстуру в оперативку было слишком дорого, так же как и позволять лишний пасс при большой геометрической сложности террейна. Т.к. мегатекстуры использовались только для террейна, система была упрощена до квадтри, и определение требуемых тайлов делалось быстро на цпу без участия гпу. Благодаря этому так же и исчезла назойливая черта классических мегатекстур – зависимость от ротации камеры. Можно стало резко поворачиваться во все стороны и не видеть запаздывающих подгрузок.

Движок имел полный набор тулз к нему – экспортер моделей и разные вспомогательные утилиты.

Сама геометрия террейна также была разбита на квадтри с хитрым морфинг в вертексном шейдере для скрытия швов между кусками.

С тенями тоже пришлось повозиться – идея накинуть на всё каскадные провалилась – либо у нас не было теней вдалеке, либо мы теряли качество близких теней, 4х сплитов не хватало чтобы получить хорошее качество везде. Кроме того, флоатовая погрешность не давала нам получить аккуратные тени от мелочей под острым углом (закат) на всей территории. В результате пришлось делать смешанный подход: каскадные тени остались от построек, для террейна – мини гпу рейтрейсер хейтмапы, пересчитывающий лайтмапу террейна только при отклонении солнца на несколько градусов, от вертолёта был отдельный маленький VSM, всегда детальный, а облака рисовались в ещё одну текстуру чёрным по белому без сортировки, которая затем умножающе проецировалась на всё сверху.

Таковые реалии реалтайм рендеринга – или всё выглядит херово и тормозит или ты реализовываешь частные случаи для всего.

Было реализовано ещё до черта всего – от спавна машинок на дорогах и симуляции капель на стекле до динамической смены времени суток и даже года (!).

iengine 2011-08-28 20-21-59-26

iengine 2011-08-28 20-19-52-65

iengine 2011-08-24 19-50-32-32

iengine 2011-08-23 21-33-16-51

iengine 2011-08-16 18-18-48-03

iengine 2011-07-30 20-53-57-40

iengine 2011-07-29 02-03-29-20

iengine 2011-07-10 22-57-10-43

iengine 2011-07-07 22-03-01-93

iengine 2011-03-25 01-19-38-09

iengine 2011-03-04 00-13-26-89

Не трудно догадаться, что при малом опыте и больших амбициях, разработка затянулась на черезчур долгое время. В результате сроки были упущены, но нам было предложено переделать симулятор из вертолётного в парашютный – на него спрос ещё был, а сложность была ниже.

И началась разработка парашютного: было выпилено всё недоделанное и малонужное (поезда, светофоры, песчаные бури), был взят чуть поменьше масштаб мира и всё стало вылизываться.

Для параштного также был сделан интересный процедурный генератор домов – в результате мы имели тысячи разных и более-менее аккуратных построек на карте.

Результат был довольно неплох:

4f0e6b7efc1be5e88ccab060c8cf9bdd 1788f398155afa46f8a70235c2f7e4ce 751a2858f00f31045c9ccc66b8cc439e 2ea48a985a2e492faddd18add590c7e2 29a49096615f6e701aad1bc22670cc31

iengine 2011-11-02 16-32-22-01 iengine 2011-11-02 16-32-19-20 iengine 2011-10-25 22-34-19-79 iengine 2011-10-20 16-56-01-06

И рандомное видео: http://www.youtube.com/watch?v=vsahx8WcU1Y

За это мы даже умудрились получить деньги. И жили бы долго и счастливо, но запланированных массовых продаж не последовало – конкурентные продукты обошли нас в глазах покупателей, и тренажёрные товарищи забили на нас.

Дальше я делал что-то по мелочам.

Например, вот эту реалтайм бутылочку:

iengine 2012-06-28 21-47-41-96

iengine 2012-06-28 21-49-49-67

Тут я в шейдере симулировал движение всех лучей отражённых/искажённых и проходящих сквозь стекло. Целью было приблизить качество к рейтрейсеру. Вроде что-то получилось:

vrayVsMe

Ещё я делал radiosity лайтмаппер на GPU:

iengine 2012-07-09 05-13-04-37

Но он вышел слишком медленным, чтобы его юзать. Зато он считал GI! Это было весело.

Таким образом проведя ещё одно лето, я внезапно столкнулся с тем, что оканчиваю институт – и пора бы мне делать диплом. И решил я делать диплом о том, о чём больше знал – о реалтайм рендеринге.

Но об этом напишу в следующем посте. Сейчас пора спать.

Вот такой я у мамы молодец.