Use VK_KHR_pipeline_executable_properties when enabled and available to
log statistics about the pipeline cache in a game.
For example, this is on Turing GPUs when generating a pipeline cache
from Super Smash Bros. Ultimate:
Average pipeline statistics
==========================================
Code size: 6433.167
Register count: 32.939
More advanced results could be presented, at the moment it's just an
average of all 3D and compute pipelines.
Move code to separate files to be able to reuse it from OpenGL. This
greatly simplifies the pipeline cache logic on Vulkan.
Transform feedback state is not yet abstracted and it's still
intrusively stored inside vk_pipeline_cache. It will be moved when
needed on OpenGL.
Mostly fixing unused *, implicit conversion, braced scalar init,
fpermissive, and some others.
Some Clang errors likely remain in video_core, and std::ranges is still
a pertinent issue in shader_recompiler
shader_recompiler: cmake: Force bracket depth to 1024 on Clang
Increases the maximum fold expression depth
thread_worker: Include condition_variable
Don't use list initializers in control flow
Co-authored-by: ReinUsesLisp <reinuseslisp@airmail.cc>
Users may want to fall back to the CPU ASTC texture decoder due to hangs
and crashes that may be caused by keeping the GPU under compute heavy
loads for extended periods of time. This is especially the case in games
such as Astral Chain which make extensive use of ASTC textures.
Reimplement the buffer cache using cached bindings and page level
granularity for modification tracking. This also drops the usage of
shared pointers and virtual functions from the cache.
- Bindings are cached, allowing to skip work when the game changes few
bits between draws.
- OpenGL Assembly shaders no longer copy when a region has been modified
from the GPU to emulate constant buffers, instead GL_EXT_memory_object
is used to alias sub-buffers within the same allocation.
- OpenGL Assembly shaders stream constant buffer data using
glProgramBufferParametersIuivNV, from NV_parameter_buffer_object. In
theory this should save one hash table resolve inside the driver
compared to glBufferSubData.
- A new OpenGL stream buffer is implemented based on fences for drivers
that are not Nvidia's proprietary, due to their low performance on
partial glBufferSubData calls synchronized with 3D rendering (that
some games use a lot).
- Most optimizations are shared between APIs now, allowing Vulkan to
cache more bindings than before, skipping unnecesarry work.
This commit adds the necessary infrastructure to use Vulkan object from
OpenGL. Overall, it improves performance and fixes some bugs present on
the old cache. There are still some edge cases hit by some games that
harm performance on some vendors, this are planned to be fixed in later
commits.
Also renames related CMake variables to match both the Find*FFmpeg* and
variables defined within the file. Fixes odd errors produced by the old
FindFFmpeg.
Citra's FindFFmpeg is slightly modified here: adds Citra's copyright at
the beginning, renames FFmpeg_INCLUDES to FFmpeg_INCLUDE_DIR, disables a
few components in _FFmpeg_ALL_COMPONENTS, and adds the missing avutil
component to the comment above.
For Linux, instructs CMake to use the FFmpeg submodule in externals.
This is HEAVILY based on our usage of the late Unicorn. Minimal change
to MSVC as it uses the yuzu-emu/ext-windows-bin. MinGW now targets the
same ext-windows-bin libraries as MSVC for FFmpeg. Adds FFMPEG_LIBRARIES
to WIN32 and simplifies video_core/CMakeLists.txt a bit.
It keeps track of the modified CPU and GPU ranges on a CPU page
granularity, notifying the given rasterizer about state changes
in the tracking behavior of the buffer.
Use a small vector optimization to store buffers smaller than 256 KiB
locally instead of using free store memory allocations.
Move surface initialization code to a separate file. It's unlikely to
use this code outside of Vulkan, but keeping platform-specific code
(Win32, Xlib, Wayland) in its own translation unit keeps things cleaner.
Initialize debug callbacks (messenger) from a separate file. This allows
sharing code with different backends.
Change our Vulkan error handling to use exceptions instead of error
codes, simplifying the initialization process.
The current texture cache has several points that hurt maintainability
and performance. It's easy to break unrelated parts of the cache
when doing minor changes. The cache can easily forget valuable
information about the cached textures by CPU writes or simply by its
normal usage.The current texture cache has several points that hurt
maintainability and performance. It's easy to break unrelated parts
of the cache when doing minor changes. The cache can easily forget
valuable information about the cached textures by CPU writes or simply
by its normal usage.
This commit aims to address those issues.
Most of the time people write code that always returns a value,
terminates execution, throws an exception, or uses an unconventional
jump primitive.
This is not always true when we build without asserts on mainline builds.
To avoid introducing undefined behavior on our most used builds, enforce
this warning signalling an error and stopping the build from shipping.
This commit aims to implement the NVDEC (Nvidia Decoder) functionality, with video frame decoding being handled by the FFmpeg library.
The process begins with Ioctl commands being sent to the NVDEC and VIC (Video Image Composer) emulated devices. These allocate the necessary GPU buffers for the frame data, along with providing information on the incoming video data. A Submit command then signals the GPU to process and decode the frame data.
To decode the frame, the respective codec's header must be manually composed from the information provided by NVDEC, then sent with the raw frame data to the ffmpeg library.
Currently, H264 and VP9 are supported, with VP9 having some minor artifacting issues related mainly to the reference frame composition in its uncompressed header.
Async GPU is not properly implemented at the moment.
Co-Authored-By: David <25727384+ogniK5377@users.noreply.github.com>
These compiler flags aren't shared with clang, so specifying these flags
unconditionally can lead to a bit of warning spam.
While we're in the area, we can also enable -Wunused-but-set-parameter
given this is almost always a bug.
This reworks how host<->device synchronization works on the Vulkan
backend. Instead of "protecting" resources with a fence and signalling
these as free when the fence is known to be signalled by the host GPU,
use timeline semaphores.
Vulkan timeline semaphores allow use to work on a subset of D3D12
fences. As far as we are concerned, timeline semaphores are a value set
by the host or the device that can be waited by either of them.
Taking advantange of this, we can have a monolithically increasing
atomic value for each submission to the graphics queue. Instead of
protecting resources with a fence, we simply store the current logical
tick (the atomic value stored in CPU memory). When we want to know if a
resource is free, it can be compared to the current GPU tick.
This greatly simplifies resource management code and the free status of
resources should have less false negatives.
To workaround bugs in validation layers, when these are attached there's
a thread waiting for timeline semaphores.
Add the necessary CMake code to copy the contents in a string source
shader (GLSL or GLASM) to a header file then consumed by video_core
files.
This allows editting GLSL in its own files without having to maintain
them in source files.
For now, only OpenGL presentation shaders are moved, but we can add
GLASM presentation shaders and static SPIR-V generation through
glslangValidator in the future.