This commit aims to implement the NVDEC (Nvidia Decoder) functionality, with video frame decoding being handled by the FFmpeg library.
The process begins with Ioctl commands being sent to the NVDEC and VIC (Video Image Composer) emulated devices. These allocate the necessary GPU buffers for the frame data, along with providing information on the incoming video data. A Submit command then signals the GPU to process and decode the frame data.
To decode the frame, the respective codec's header must be manually composed from the information provided by NVDEC, then sent with the raw frame data to the ffmpeg library.
Currently, H264 and VP9 are supported, with VP9 having some minor artifacting issues related mainly to the reference frame composition in its uncompressed header.
Async GPU is not properly implemented at the moment.
Co-Authored-By: David <25727384+ogniK5377@users.noreply.github.com>
These compiler flags aren't shared with clang, so specifying these flags
unconditionally can lead to a bit of warning spam.
While we're in the area, we can also enable -Wunused-but-set-parameter
given this is almost always a bug.
This emulates the behavior we get on GLSL with regular SSBOs with a
pointer + length pair. It aims to be consistent with the crashes we
might get.
Out of bounds stores are ignored. Atomics are ignored and return zero.
Reads return zero.
Vulkan has requirements for primitive topologies that don't play nicely
with yuzu's. Since it's only 4 bits, we can move it to fixed state
without changing the size of the pipeline key.
- Fixes a regression on recent Nvidia drivers on Fire Emblem: Three
Houses.
RDNA devices seem to crash when using VK_EXT_extended_dynamic_state in
the latest 20.9.2 proprietary Windows drivers. As a workaround, for now
we block device names corresponding to current RDNA released products.
TMML takes an array argument that has no known meaning, this one appears
as the first component in gpr8 followed by s, t and r. Skip this
component when arrays are being used. Also implement CUBE texture types.
- Used by Pikmin 3: Deluxe Demo.
The old code had a sort function that was invalid and it didn't work as
expected when the base vector had a different order (e.g. renderdoc was
attached).
This sorts devices as expected and fixes a debug assert on MSVC.
The previous fix only partially solved the issue, as only certain GPUs that needed 9 or less MiB subtracted would work (i.e. GTX 980 Ti, GT 730). This takes from DXVK's example to divide `heap_size` by 2 to determine `allocable_size`. Additionally tested on my Quadro K4200, which previously required setting it to 12 to boot.
When HEADER_GENERATOR was included in the DEPENDS section of custom
commands, msbuild assumed this was always modified. Changing this file
is not common so we can remove it from there.
Allows some implementations to avoid completely zeroing out the internal
buffer of the optional, and instead only set the validity byte within
the structure.
This also makes it consistent how we return empty optionals.
This is a hack to destroy all HostCounter instances before the base
class destructor is called. The query cache should be redesigned to have
a proper ownership model instead of using shared pointers.
For now, destroy the host counter hierarchy from the derived class
destructor.
This reworks how host<->device synchronization works on the Vulkan
backend. Instead of "protecting" resources with a fence and signalling
these as free when the fence is known to be signalled by the host GPU,
use timeline semaphores.
Vulkan timeline semaphores allow use to work on a subset of D3D12
fences. As far as we are concerned, timeline semaphores are a value set
by the host or the device that can be waited by either of them.
Taking advantange of this, we can have a monolithically increasing
atomic value for each submission to the graphics queue. Instead of
protecting resources with a fence, we simply store the current logical
tick (the atomic value stored in CPU memory). When we want to know if a
resource is free, it can be compared to the current GPU tick.
This greatly simplifies resource management code and the free status of
resources should have less false negatives.
To workaround bugs in validation layers, when these are attached there's
a thread waiting for timeline semaphores.
Now that the GPU is initialized when video backends are initialized,
it's no longer needed to query components once the game is running: it
can be done when yuzu is booting.
This allows us to pass components between constructors and in the
process remove all Core::System references in the video backend.
'driver_id' can only be known on Vulkan 1.1 after creating a logical
device. Move the driver id check to disable
VK_EXT_extended_dynamic_state after the logical device is successfully
initialized.
The Vulkan device will have the extension enabled but it will not be
used.
I made a request on the Xbyak issue tracker to allow some constructors
to be constexpr in order to avoid static constructors from needing to
execute for some of our register constants.
This request was implemented, so this updates Xbyak so that we can make
use of it.
Vertex binding's <stride> is bugged on AMD's proprietary drivers when
using VK_EXT_extended_dynamic_state. Blacklist it for now while we
investigate how to report this issue to AMD.
Add the necessary CMake code to copy the contents in a string source
shader (GLSL or GLASM) to a header file then consumed by video_core
files.
This allows editting GLSL in its own files without having to maintain
them in source files.
For now, only OpenGL presentation shaders are moved, but we can add
GLASM presentation shaders and static SPIR-V generation through
glslangValidator in the future.
State track the current primitive topology with a regular comparison
instead of using dirty flags.
This fixes a bug in dirty flags for this particular state and it also
avoids unnecessary state changes as this property is stored in a
frequently changed bit field.
Migrates a remaining common file over to the Common namespace, making it
consistent with the rest of common files.
This also allows for high-traffic FS related code to alias the
filesystem function namespace as
namespace FS = Common::FS;
for more concise typing.
This was assigning the field to itself, which is a no-op. The size
doesn't change between its initial assignment and this one, so this is a
safe change to make.
Does not allocate more threads than available in the host system for boot-time shader compilation and always allocates at least 1 thread if hardware_concurrency() returns 0.
There were two issues with block linear copies. First the swizzling was
wrong and this commit reimplements them.
The other issue was that these copies are generally used to download
render targets from the GPU and yuzu was not downloading them from
host GPU memory unless the extreme GPU accuracy setting was selected.
This commit enables cached memory reads for all accuracy levels.
- Fixes level thumbnails in Super Mario Maker 2.
The puller register array is made up of u32s however the `NUM_REGS` value is the size in bytes, so switch it to avoid making the struct unnecessary large. Also fix a small typo in a comment.
We can make use of emplace()'s return value to determine whether or not
we need to perform an increment.
emplace() performs no insertion if an element already exist, so this can
eliminate a find() call.
NV_shader_buffer_{load,store} is a 2010 extension that allows GL applications
to use what in Vulkan is known as physical pointers, this is basically C
pointers. On GLASM these is exposed through the LOAD/STORE/ATOM
instructions.
Up until now, assembly shaders were using NV_shader_storage_buffer_object.
These work fine, but have a (probably unintended) limitation that forces
us to have the limit of a single stage for all shader stages. In contrast,
with NV_shader_buffer_{load,store} we can pass GPU addresses to the
shader through local parameters (GLASM equivalent uniform constants, or
push constants on Vulkan). Local parameters have the advantage of being
per stage, allowing us to generate code without worrying about binding
overlaps.
Given the expression involves a 32-bit value, this simplifies down to
just: 0x3ffffff. This is likely a remnant from testing that was never
cleaned up.
Resolves a -Wshift-overflow warning.
The purpose of make_pair is generally to deduce the types within the
pair without explicitly specifying the types, so these usages were
generally unnecessary, particularly when the type is enforced by the
array declaration.