This removes explicit checks sprinkled all over the codebase to instead
just have the SW rasterizer expose an implementation with no-ops for
most operations.
The change in depth range seems to reflect better to what applications are expecting, and makes for cleaner code overall (hence is more likely to reflect hardware behavior).
Use a new buffer management scheme in the clipper that allows using a
bounded minimal amount of buffer space. Even though it copies more data
it is still slightly faster likely due to using less cache.
The triangle clipper was allocating its temporary input, output and work
buffers using a std::vector. Since this is a hot path, it's desirable to
use stack allocation instead.