One thing Cerny was not at all shy about discussing are the system's bottlenecks -- because, in his view, he and his engineers have done a great job of devising ways to work around them.
"With graphics, the first bottleneck you’re likely to run into is memory bandwidth. Given that 10 or more textures per object will be standard in this generation, it’s very easy to run into that bottleneck," he said. "Quite a few phases of rendering become memory bound, and beyond shifting to lower bit-per-texel textures, there’s not a whole lot you can do. Our strategy has been simply to make sure that we were using GDDR5 for the system memory and therefore have a lot of bandwidth."
That's one down. "If you're not bottlenecked by memory, it's very possible -- if you have dense meshes in your objects -- to be bottlenecked on vertices. And you can try to ask your artists to use larger triangles, but as a practical matter, it's difficult to achieve that. It's quite common to be displaying graphics where much of what you see on the screen is triangles that are just a single pixel in size. In which case, yes, vertex bottlenecks can be large."
"There are a broad variety of techniques we've come up with to reduce the vertex bottlenecks, in some cases they are enhancements to the hardware," said Cerny. "The most interesting of those is that you can use compute as a frontend for your graphics."
This technique, he said, is "a mix of hardware, firmware inside of the GPU, and compiler technology. What happens is you take your vertex shader, and you compile it twice, once as a compute shader, once as a vertex shader. The compute shader does a triangle sieve -- it just does the position computations from the original vertex shader and sees if the triangle is backfaced, or the like. And it's generating, on the fly, a reduced set of triangles for the vertex shader to use. This compute shader and the vertex shader are very, very tightly linked inside of the hardware."
It's also not a hard solution to implement, Cerny suggested. "From a graphics programmer perspective, using this technique means setting some compiler flags and using a different mode of the graphics API. So this is the kind of thing where you can try it in an afternoon and see if it happens to bump up your performance."
These processes are "so tightly linked," said Cerny, that all that's required is "just a ring buffer for indices... it's the Goldilocks size. It's small enough to fit the cache, it's large enough that it won't stall out based on discrepancies between the speed of processing of the compute shaders and the vertex shaders."
He has also promised Gamasutra that the company is working on a version of its performance analysis tool, Razor, optimized for the PlayStation 4, as well as example code to be distrusted to developers. Cerny would also like to distribute real-world code: "If somebody has written something interesting and is willing to post the source for it, to make it available to the other PlayStation developers, then that has the highest value."