Geometry shadersGeometry shaders are a  not so new anymore  addition to GPUs. They bring amazing new possibilities to the graphics hardware. Now you can operate on whole primitives, change them, replicate them, drop them, or even change their type. Efficient use has been proven to being quiet tricky, though. It's not so much the possibilities, but the inherent limitations. They are simply not made for generating large numbers of primitives. I'll try, anyway! Update: In the meantime, a GPU with tessellation support has arrived in my lab. I used this opportunity to directly compare primitive generation using the geometry shader to the tessellation unit. See below. My work confronts me with 3D city models, whose geometry has two major properties:
City models are triangle soups in their best sense. I'm also confronted with interesting output devices, namely a cylindrical projection wall. Realtime rendering on it needs to take a cylindrical projection into account. Cylindrical projections turn straight lines into curves, thus on current, polygonbased GPUs there are two options: image warping or lot's of triangles. The geometry shader's promise of flexibly generating primitives on the GPU sparked the idea of handing the graphics card the coarse triangle soup and let it produce a sufficient number of triangles by itself! Dynamic mesh refinement on GPUA vertexbased cylindrical projection. Thick line highlight original model edges. Fine lines show the tessellated mesh. Click for full image. Computer graphics people rarely seem to be happy with their meshes. Changing meshes in a viewdependent way to fit the current frame has been around for decades. Applications range from coarsening detailed meshes, e.g., with progressive meshes, to generating triangles for smoothing surfaces. For my work, I need to refine triangle soups. Over time, more and more work load has been transferred to the GPU, but without geometry shaders, triangle soups couldn't be processed entirely on the GPU. Since I operate on individual triangles, geometry shaders seemed the perfect solution: get one triangle and split it into as many triangles as needed. Until I hit the output limit. 1k vec4's isn't that much. In a viewdependent setting, triangles usually split up into few subtriangles only, since they're small on the screen. Some tris are closer and larger, though, and split up more and more. Eventually, the geometry shader can't emit enough vertices. Unfortunately, the resulting artifacts are very prominent... Dynamic mesh refinement provides each triangle viewdependently with its optimal tessellation level. PNtriangles can then easily smooth the originally coarse mesh. The standard solution is iterative: capture the geometry shader's output in an intermediate mesh using transform feedback and repeat processing until sufficient triangles have been produced. This works well in case of uniform amplification (as seen in this NVidia example). In my case, every triangle has its own amplification factor with only a few being very large. Thus, interating over the complete buffer would be rather inefficient. My solution is to switch from recreating the intermediate mesh for every frame in multiple passes to updating it with exactly one pass per frame. This enables the GPU to do all necessary steps by itself while giving each triangle its own optimal tessellation level:
There is no need for transfering buffers between CPU and GPU or for selectively rendering the mesh. My method requires exactly 3 draw calls per frame. The only quantity flowing back from GPU to CPU is the number of generated primitives (DX10 and NV_transform_feedback2 even hide this detail in DrawAuto() or glDrawTransformFeedbackNV()). For more details on the inner workings of my approach, read my paper:
Dynamic Mesh Refinement on GPU using Geometry Shaders (
abstract,
pdf) In the end, I do use the geometry shader for tessellation, even though it's said not to do so (see slide 26...). Nevertheless, I achieve reasonable framerates compared to other recent approaches. And this technique is the starting point for something completely different. Piecewise Perspective Projections (OpenGL demo), (DX11 demo)An approximated cylindrical projection. Piecewise perspective projections completely hide nonlinear aspects of a projection. As a result, any rendering style can be used instantly, even screenspace dependent styles such as hatching (left) or solid wireframe (right). Click for full image. One goal of dynamic mesh refinement was vertexbased cylindrical projection. While producing great results for watertight meshes, it fell a little short for my triangle soups. In particular, interpenetrations (e.g., the water surface vs. terrain surface) produce ugly zbuffer artifacts. They are caused by mismatched tessellations of the penetrating triangles. If you could "coordinate" these triangles' tessellations, the artifacts would disappear. One initial idea was fixing new vertices to an onscreen grid. This removed the artifacts to some extent, but not completely. And the shaders became huge. The remaining artifacts appeared at the original vertices and edges, which didn't align to my grid. The root cause is the (perspective correct) linear interpolation used in the GPU which is not enough for a cylindrical projection. The solution would be to get rid of the cylindrical projection. Then why not tessellate the projection instead of the triangles such that each projection piece uses a perspective projection and the combined projections approximate the desired cylindrical projection? There are a number of interesting properties of such an approach:
Piecewise perspective projections enable direct rendering of a warped screen. The result are better image quality and consistent rendering effects. Check out this demo (4MB, OpenGL, WinXP/Vista, NVidia GeForce 8000+) or this demo (5MB, Direct3D 11, WinVista/7, AMD Radeon HD5000+) and see for yourself. There must be a catch, though. The approximation with a set of perspective projections works only for socalled singlecenterofprojection effects, where all projection rays are straight and meet in a single point. In a nutshell, piecewise perspective projections can do the same things you could do with an environment map. Only without the texture sampling. For rendering, just call your render function for each projection piece with the correct projection matrix and clipping in effect and you're done. If you want to render fast, it's a little more complicated. In the end, you would like to render each triangle only into those projection pieces it is visible in. And you wouldn't want the CPU to decide that for each triangle in each frame. Instead, the GPU should examine each triangle and render it into a varying number of projection pieces. Sounds familiar? Dynamic mesh refinement examines each triangle and renders a varying number of subtriangles instead. Piecewise perspective projections can use an implementation similar to dynamic mesh refinement. It's even simpler, as no tessellation but replication is needed  the same triangle is rendered to all relevant pieces. Each replication only needs to know the piece it will be rendered into. The resulting rendering routine also uses exactly three draw calls. First, examine all triangles. Second, create the intermediate mesh of replicated triangles. And third, render to all pieces at once! All the projection matrices are fed to the final vertex shader in a large buffer (e.g., using bindable uniforms or buffer textures) and each triangle replication fetches the right one for transformation. Again, read my paper for more details:
Realtime Piecewise Perspective Projections
(abstract,
pdf,
video)
To get a feeling for piecewise perspective projections, there is a demo (4MB) available. It implements screen warping using my technique and others for comparison. Update: It uses OpenGL 2.0 under WinXP/Vista with OpenGL 3 extensions. You need a GPU with support for geometry shaders and transform feedback, i.e., an NVidia GeForce 8000 series or better or theoretically an AMD Radeon HD4xxx or better. Theoretically AMD hardware because I was unable to get my technique to run due to some obsure INVALID_OPERATION (Catalyst driver ver. 10.3) that does not appear on NVidia hardware. A new driver version might solve this problem. The program has been tested on various hardware and software configurations but I can't quarantee its function. Update: Here is a new demo (5MB) that uses the tessellation unit available on DirectX11class hardware. It provides a comparison between the original approach described in the paper and a new, simpler implementation that uses the tessellation unit for primitive replication. Again, the program has been tested on various hardware and software configurations but I can't quarantee its function. This time, NVidia hardware might cause problems as I couldn't test the program so far. What the future bringsWith the birth of Dynamic Mesh Refinement, its end seemed to be set, too: DX11 has the tessellation unit. Problem solved. But there is at least some light: My technique can still serve as fallback option on DX10 hardware. Update: Surprisingly, the tessellationbased approach is not signicantly faster than the original geometry shader approach, at least on AMD GPUs. I'm not exactly sure why, though. The bottleneck is not necessarily the tessellation unit (NVidia GPUs are said to be much faster in that respect) since my approach renders tiny triangles (less that 8x8 pixels in size), which are known to be less efficient than larger triangles. Piecewise Perspective Projections hopefully outlast geometry shaders. The idea does not rely on them, but on producing a given number of primitives on the GPU. The tessellation unit serves this purpose equally well.
All images and content copyright © 20052011 by Haik Lorenz. No reproduction without permission.
