Move more Vulkan code to report errors with exceptions and report them
through a log before notifying it with an error boolean for backwards
compatibility. In the future we can replace the rasterizer two-step
initialization to always use exceptions.
Initialize debug callbacks (messenger) from a separate file. This allows
sharing code with different backends.
Change our Vulkan error handling to use exceptions instead of error
codes, simplifying the initialization process.
The current texture cache has several points that hurt maintainability
and performance. It's easy to break unrelated parts of the cache
when doing minor changes. The cache can easily forget valuable
information about the cached textures by CPU writes or simply by its
normal usage.The current texture cache has several points that hurt
maintainability and performance. It's easy to break unrelated parts
of the cache when doing minor changes. The cache can easily forget
valuable information about the cached textures by CPU writes or simply
by its normal usage.
This commit aims to address those issues.
Without using VK_EXT_robustness2, we can't consider the 'enabled' (not
null) vertex buffers as dynamic state, as this leads to invalid Vulkan
state. Move this to static state that is always hashed and compared in
the pipeline key.
The bits for enabled vertex buffers are moved into the attribute state
bitfield. This is not 'correct' as it's not an attribute state, but that
struct has bits to spare, and it's used in an array of 32 elements (the
exact same number of vertex buffer bindings).
Most of the time people write code that always returns a value,
terminates execution, throws an exception, or uses an unconventional
jump primitive.
This is not always true when we build without asserts on mainline builds.
To avoid introducing undefined behavior on our most used builds, enforce
this warning signalling an error and stopping the build from shipping.
fmt now automatically prints the numeric value of an enum class member
by default, so we don't need to use casts any more.
Reduces the line noise a bit.
The previous definition was:
#define NUM(field_name) (sizeof(Maxwell3D::Regs::field_name) / sizeof(u32))
In cases where `field_name` happens to refer to an array, Clang thinks
`sizeof(an array value) / sizeof(a type)` is an instance of the idiom
where `sizeof` is used to compute an array length. So it thinks the
type in the denominator ought to be the array element type, and warns if
it isn't, assuming this is a mistake.
In reality, `NUM` is not used to get array lengths at all, so there is no
mistake. Silence the warning by applying Clang's suggested workaround
of parenthesizing the denominator.
On Apple platforms, FALSE and TRUE are defined as macros by
<mach/boolean.h>, which is included by various system headers.
Note that there appear to be no actual users of the names to fix up.
Migrates the video core code closer to enabling variable shadowing
warnings as errors.
This primarily sorts out shadowing occurrences within the Vulkan code.
This was only necessary for use with the
avcodec_decode_video2/avcoded_decode_audio4 APIs which are also
deprecated.
Given we use avcodec_send_packet/avcodec_receive_frame, this isn't
necessary, this is even indicated directly within the FFmpeg API changes
document here on 2017-09-26:
https://github.com/FFmpeg/FFmpeg/blob/master/doc/APIchanges#L410
This prevents our code from breaking whenever we update to a newer
version of FFmpeg in the future if they ever decide to fully remove this
API member.
Force early fragment tests when the 3D method is enabled.
The established pipeline cache takes care of recompiling if needed.
This is implemented only on Vulkan to avoid invalidating the shader
cache on OpenGL.
- Use .at() instead of raw indexing when dealing with untrusted indices.
- For the special case of WaitFence with syncpoint id UINT32_MAX,
instead of crashing, log an error and ignore. This is what I get when
running Super Mario Maker 2.
EmuWindow::PollEvents was called from the GPU thread (or the CPU thread
in sync-GPU mode) when swapping buffers. It had three implementations:
- In GRenderWindow, it didn't actually poll events, just set a flag and
emit a signal to indicate that a frame was displayed.
- In EmuWindow_SDL2_Hide, it did nothing.
- In EmuWindow_SDL2, it did call SDL_PollEvents, but this is wrong
because SDL_PollEvents is supposed to be called on the thread that set
up video - in this case, the main thread, which was sleeping in a
busyloop (regardless of whether sync-GPU was enabled). On macOS this
causes a crash.
To fix this:
- Rename EmuWindow::PollEvents to OnFrameDisplayed, and give it a
default implementation that does nothing.
- In EmuWindow_SDL2, do not override OnFrameDisplayed, but instead have
the main thread call SDL_WaitEvent in a loop.
This reduces the overhead of bounds checking on each element.
It won't reduce the cost of allocation because usually this vector's
capacity is usually large enough to hold whatever we push to it.
It's deprecated in the language to autogenerate these if the destructor
for a type is specified, so we can explicitly specify how we want these
to be generated.
The API of VP9 exposes a WasFrameHidden() function which accesses this
member. Given the constructor previously didn't initialize this member,
it's a potential vector for an uninitialized read.
Instead, we can initialize this to a deterministic value to prevent that
from occurring.
This implements texture cube arrays with shadow comparisons but doesn't
fix the asserts related to it.
Fixes out of bounds reads on swizzle constructors and makes them use
bounds checked ::at instead of the unsafe operator[].
This commit aims to implement the NVDEC (Nvidia Decoder) functionality, with video frame decoding being handled by the FFmpeg library.
The process begins with Ioctl commands being sent to the NVDEC and VIC (Video Image Composer) emulated devices. These allocate the necessary GPU buffers for the frame data, along with providing information on the incoming video data. A Submit command then signals the GPU to process and decode the frame data.
To decode the frame, the respective codec's header must be manually composed from the information provided by NVDEC, then sent with the raw frame data to the ffmpeg library.
Currently, H264 and VP9 are supported, with VP9 having some minor artifacting issues related mainly to the reference frame composition in its uncompressed header.
Async GPU is not properly implemented at the moment.
Co-Authored-By: David <25727384+ogniK5377@users.noreply.github.com>
These compiler flags aren't shared with clang, so specifying these flags
unconditionally can lead to a bit of warning spam.
While we're in the area, we can also enable -Wunused-but-set-parameter
given this is almost always a bug.
This emulates the behavior we get on GLSL with regular SSBOs with a
pointer + length pair. It aims to be consistent with the crashes we
might get.
Out of bounds stores are ignored. Atomics are ignored and return zero.
Reads return zero.
Vulkan has requirements for primitive topologies that don't play nicely
with yuzu's. Since it's only 4 bits, we can move it to fixed state
without changing the size of the pipeline key.
- Fixes a regression on recent Nvidia drivers on Fire Emblem: Three
Houses.
RDNA devices seem to crash when using VK_EXT_extended_dynamic_state in
the latest 20.9.2 proprietary Windows drivers. As a workaround, for now
we block device names corresponding to current RDNA released products.
TMML takes an array argument that has no known meaning, this one appears
as the first component in gpr8 followed by s, t and r. Skip this
component when arrays are being used. Also implement CUBE texture types.
- Used by Pikmin 3: Deluxe Demo.
The old code had a sort function that was invalid and it didn't work as
expected when the base vector had a different order (e.g. renderdoc was
attached).
This sorts devices as expected and fixes a debug assert on MSVC.
The previous fix only partially solved the issue, as only certain GPUs that needed 9 or less MiB subtracted would work (i.e. GTX 980 Ti, GT 730). This takes from DXVK's example to divide `heap_size` by 2 to determine `allocable_size`. Additionally tested on my Quadro K4200, which previously required setting it to 12 to boot.
When HEADER_GENERATOR was included in the DEPENDS section of custom
commands, msbuild assumed this was always modified. Changing this file
is not common so we can remove it from there.
Allows some implementations to avoid completely zeroing out the internal
buffer of the optional, and instead only set the validity byte within
the structure.
This also makes it consistent how we return empty optionals.
This is a hack to destroy all HostCounter instances before the base
class destructor is called. The query cache should be redesigned to have
a proper ownership model instead of using shared pointers.
For now, destroy the host counter hierarchy from the derived class
destructor.
This reworks how host<->device synchronization works on the Vulkan
backend. Instead of "protecting" resources with a fence and signalling
these as free when the fence is known to be signalled by the host GPU,
use timeline semaphores.
Vulkan timeline semaphores allow use to work on a subset of D3D12
fences. As far as we are concerned, timeline semaphores are a value set
by the host or the device that can be waited by either of them.
Taking advantange of this, we can have a monolithically increasing
atomic value for each submission to the graphics queue. Instead of
protecting resources with a fence, we simply store the current logical
tick (the atomic value stored in CPU memory). When we want to know if a
resource is free, it can be compared to the current GPU tick.
This greatly simplifies resource management code and the free status of
resources should have less false negatives.
To workaround bugs in validation layers, when these are attached there's
a thread waiting for timeline semaphores.
Now that the GPU is initialized when video backends are initialized,
it's no longer needed to query components once the game is running: it
can be done when yuzu is booting.
This allows us to pass components between constructors and in the
process remove all Core::System references in the video backend.
'driver_id' can only be known on Vulkan 1.1 after creating a logical
device. Move the driver id check to disable
VK_EXT_extended_dynamic_state after the logical device is successfully
initialized.
The Vulkan device will have the extension enabled but it will not be
used.
I made a request on the Xbyak issue tracker to allow some constructors
to be constexpr in order to avoid static constructors from needing to
execute for some of our register constants.
This request was implemented, so this updates Xbyak so that we can make
use of it.
Vertex binding's <stride> is bugged on AMD's proprietary drivers when
using VK_EXT_extended_dynamic_state. Blacklist it for now while we
investigate how to report this issue to AMD.
Add the necessary CMake code to copy the contents in a string source
shader (GLSL or GLASM) to a header file then consumed by video_core
files.
This allows editting GLSL in its own files without having to maintain
them in source files.
For now, only OpenGL presentation shaders are moved, but we can add
GLASM presentation shaders and static SPIR-V generation through
glslangValidator in the future.
State track the current primitive topology with a regular comparison
instead of using dirty flags.
This fixes a bug in dirty flags for this particular state and it also
avoids unnecessary state changes as this property is stored in a
frequently changed bit field.
Migrates a remaining common file over to the Common namespace, making it
consistent with the rest of common files.
This also allows for high-traffic FS related code to alias the
filesystem function namespace as
namespace FS = Common::FS;
for more concise typing.
This was assigning the field to itself, which is a no-op. The size
doesn't change between its initial assignment and this one, so this is a
safe change to make.
Does not allocate more threads than available in the host system for boot-time shader compilation and always allocates at least 1 thread if hardware_concurrency() returns 0.
There were two issues with block linear copies. First the swizzling was
wrong and this commit reimplements them.
The other issue was that these copies are generally used to download
render targets from the GPU and yuzu was not downloading them from
host GPU memory unless the extreme GPU accuracy setting was selected.
This commit enables cached memory reads for all accuracy levels.
- Fixes level thumbnails in Super Mario Maker 2.
The puller register array is made up of u32s however the `NUM_REGS` value is the size in bytes, so switch it to avoid making the struct unnecessary large. Also fix a small typo in a comment.
We can make use of emplace()'s return value to determine whether or not
we need to perform an increment.
emplace() performs no insertion if an element already exist, so this can
eliminate a find() call.
NV_shader_buffer_{load,store} is a 2010 extension that allows GL applications
to use what in Vulkan is known as physical pointers, this is basically C
pointers. On GLASM these is exposed through the LOAD/STORE/ATOM
instructions.
Up until now, assembly shaders were using NV_shader_storage_buffer_object.
These work fine, but have a (probably unintended) limitation that forces
us to have the limit of a single stage for all shader stages. In contrast,
with NV_shader_buffer_{load,store} we can pass GPU addresses to the
shader through local parameters (GLASM equivalent uniform constants, or
push constants on Vulkan). Local parameters have the advantage of being
per stage, allowing us to generate code without worrying about binding
overlaps.
Given the expression involves a 32-bit value, this simplifies down to
just: 0x3ffffff. This is likely a remnant from testing that was never
cleaned up.
Resolves a -Wshift-overflow warning.
The purpose of make_pair is generally to deduce the types within the
pair without explicitly specifying the types, so these usages were
generally unnecessary, particularly when the type is enforced by the
array declaration.
Change GOB sizes from free-functions to constexpr constants.
Add SwizzleSliceToVoxel, a function that swizzles a 2D array of pixels
into a 3D texture and use it for 3D copies.
* Switch game settings to use a pointer
In order to add full per-game settings, we need to be able to tell yuzu to switch
to using either the global or game configuration. Using a pointer makes it easier
to switch.
* configuration: add new UI without changing existing funcitonality
The new UI also adds General, System, Graphics, Advanced Graphics,
and Audio tabs, but as yet they do nothing. This commit keeps yuzu
to the same functionality as originally branched.
* configuration: Rename files
These weren't included in the last commit. Now they are.
* configuration: setup global configuration checkbox
Global config checkbox now enables/disables the appropriate tabs in the game
properties dialog. The use global configuration setting is now saved to the
config, defaulting to true. This also addresses some changes requested in the PR.
* configuration: swap to per-game config memory for properties dialog
Does not set memory going in-game. Swaps to game values when opening the
properties dialog, then swaps back when closing it. Uses a `memcpy` to swap.
Also implements saving config files, limited to certain groups of configurations
so as to not risk setting unsafe configurations.
* configuration: change config interfaces to use config-specific pointers
When a game is booted, we need to be able to open the configuration dialogs
without changing the settings pointer in the game's emualtion. A new pointer
specific to just the configuration dialogs can be used to separate changes
to just those config dialogs without affecting the emulation.
* configuration: boot a game using per-game settings
Swaps values where needed to boot a game.
* configuration: user correct config during emulation
Creates a new pointer specifically for modifying the configuration while
emulation is in progress. Both the regular configuration dialog and the game
properties dialog now use the pointer Settings::config_values to focus edits to
the correct struct.
* settings: split Settings::values into two different structs
By splitting the settings into two mutually exclusive structs, it becomes easier,
as a developer, to determine how to use the Settings structs after per-game
configurations is merged. Other benefits include only duplicating the required
settings in memory.
* settings: move use_docked_mode to Controls group
`use_docked_mode` is set in the input settings and cannot be accessed from the
system settings. Grouping it with system settings causes it to be saved with
per-game settings, which may make transferring configs more difficult later on,
especially since docked mode cannot be set from within the game properties
dialog.
* configuration: Fix the other yuzu executables and a regression
In main.cpp, we have to get the title ID before the ROM is loaded, else the
renderer will reflect only the global settings and now the user's game specific
settings.
* settings: use a template to duplicate memory for each setting
Replaces the type of each variable in the Settings::Values struct with a new
class that allows basic data reading and writing. The new struct
Settings::Setting duplicates the data in memory and can manage global overrides
per each setting.
* configuration: correct add-ons config and swap settings when apropriate
Any add-ons interaction happens directly through the global values struct.
Swapping bewteen structs now also includes copying the necessary global configs
that cannot be changed nor saved in per-game settings. General and System config
menus now update based on whether it is viewing the global or per-game settings.
* settings: restore old values struct
No longer needed with the Settings::Setting class template.
* configuration: implement hierarchical game properties dialog
This sets the apropriate global or local data in each setting.
* clang format
* clang format take 2
can the docker container save this?
* address comments and style issues
* config: read and write settings with global awareness
Adds new functions to read and write settings while keeping the global state in
focus. Files now generated per-game are much smaller since often they only need
address the global state.
* settings: restore global state when necessary
Upon closing a game or the game properties dialog, we need to restore all global
settings to the original global state so that we can properly open the
configuration dialog or boot a different game.
* configuration: guard setting values incorrectly
This disables setting values while a game is running if the setting is
overwritten by a per game setting.
* config: don't write local settings in the global config
Simple guards to prevent writing the wrong settings in the wrong files.
* configuration: add comments, assume less, and clang format
No longer assumes that a disabled UI element means the global state is turned
off, instead opting to directly answer that question. Still however assumes a
game is running if it is in that state.
* configuration: fix a logic error
Should not be negated
* restore settings' global state regardless of accept/cancel
Fixes loading a properties dialog and causing the global config dialog to show
local settings.
* fix more logic errors
Fixed the frame limit would set the global setting from the game properties
dialog. Also strengthened the Settings::Setting member variables and simplified
the logic in config reading (ReadSettingGlobal).
* fix another logic error
In my efforts to guard RestoreGlobalState, I accidentally negated the IsPowered
condition.
* configure_audio: set toggle_stretched_audio to tristate
* fixed custom rtc and rng seed overwriting the global value
* clang format
* rebased
* clang format take 4
* address my own review
Basically revert unintended changes
* settings: literal instead of casting
"No need to cast, use 1U instead"
Thanks, Morph!
Co-authored-by: Morph <39850852+Morph1984@users.noreply.github.com>
* Revert "settings: literal instead of casting
"
This reverts commit 95e992a87c898f3e882ffdb415bb0ef9f80f613f.
* main: fix status buttons reporting wrong settings after stop emulation
* settings: Log UseDockedMode in the Controls group
This should have happened when use_docked_mode was moved over to the controls group
internally. This just reflects this in the log.
* main: load settings if the file has a title id
In other words, don't exit if the loader has trouble getting a title id.
* use a zero
* settings: initalize resolution factor with constructor instead of casting
* Revert "settings: initalize resolution factor with constructor instead of casting"
This reverts commit 54c35ecb46a29953842614620f9b7de1aa9d5dc8.
* configure_graphics: guard device selector when Vulkan is global
Prevents the user from editing the device selector if Vulkan is the global
renderer backend. Also resets the vulkan_device variable when the users
switches back-and-forth between global and Vulkan.
* address reviewer concerns
Changes function variables to const wherever they don't need to be changed. Sets Settings::Setting to final as it should not be inherited from. Sets ConfigurationShared::use_global_text to static.
Co-Authored-By: VolcaEM <volcaem@users.noreply.github.com>
* main: load per-game settings after LoadROM
This prevents `Restart Emulation` from restoring the global settings *after* the per-game settings were applied. Thanks to BSoDGamingYT for finding this bug.
* Revert "main: load per-game settings after LoadROM"
This reverts commit 9d0d48c52d2dcf3bfb1806cc8fa7d5a271a8a804.
* main: only restore global settings when necessary
Loading the per-game settings cannot happen after the ROM is loaded, so we have to specify when to restore the global state. Again thanks to BSoD for finding the bug.
* configuration_shared: address reviewer concerns except operator overrides
Dropping operator override usage in next commit.
Co-Authored-By: LC <lioncash@users.noreply.github.com>
* settings: Drop operator overrides from Setting template
Requires using GetValue and SetValue explicitly. Also reverts a change that broke title ID formatting in the game properties dialog.
* complete rebase
* configuration_shared: translate "Use global configuration"
Uses ConfigurePerGame to do so, since its usage, at least as of now, corresponds with ConfigurationShared.
* configure_per_game: address reviewer concern
As far as I understand, it prevents the program from unnecessarily copying strings.
Co-Authored-By: LC <lioncash@users.noreply.github.com>
Co-authored-by: Morph <39850852+Morph1984@users.noreply.github.com>
Co-authored-by: VolcaEM <volcaem@users.noreply.github.com>
Co-authored-by: LC <lioncash@users.noreply.github.com>
This fixes some cases where entries could have been removed multiple
times reading freed memory. To address this issue this commit removes
duplicates from entries marked for removal and sorts out the removal
process to fix another use-after-free situation.
Another issue fixed in this commit is orphan invalidation cache entries.
Previously only the entries that were invalidated in the current
operations had its entries removed. This led to more use-after-free
situations when these entries were actually invalidated but referenced
an object that didn't exist.
Like MirrorOnceBorder, this requires the GL_EXT_texture_mirror_clamp extension. This extension is unfortunately not available on Intel's drivers (both Windows proprietary and Linux Mesa). Use GL_MIRROR_CLAMP_TO_EDGE as a fallback if the extension is unavailable.
Macro code is just uploaded sequentially from a starting address, however that does not mean the entry point for the macro is at that address. This PR adds preliminary support for executing macros in the middle of our cached code.
This commit: Implements CPU Interrupts, Replaces Cycle Timing for Host
Timing, Reworks the Kernel's Scheduler, Introduce Idle State and
Suspended State, Recreates the bootmanager, Initializes Multicore
system.
This moves dynamic state present in VK_EXT_extended_dynamic_state to a
separate structure in FixedPipelineState. This is structure is at the
bottom allowing us to hash and memcmp only when the extension is not
supported.
Avoid illegal copies. This intercepts the last step of a copy to avoid
generating validation errors or corrupting the driver on some instances.
We can create views and emit copies accordingly in future commits and
remove this last-step validation.
Add a flat table to test if it's legal to create a texture view between
two formats or copy betweem them.
This table is based on ARB_copy_image and ARB_texture_view. Copies are
more permissive than views.
After marking buffers as resident, Nvidia's driver seems to take a
slow path. To workaround this issue, copy to a STREAM_READ buffer and
then call GetNamedBufferSubData on it.
This is a temporary solution until we have asynchronous flushing.
Making the stream buffer resident increases GPU usage significantly on
some games. This seems to be addressed invalidating the stream buffer
with InvalidateBufferData instead of using a Unmap + Map (with
invalidation flags).
Switch games are allowed to bind less data than what they use in a
vertex buffer, the expected behavior here is that these values are read
as zero. At the moment of writing this only D3D12, OpenGL and NVN through
NV_vertex_buffer_unified_memory support vertex buffer with a size limit.
In theory this could be emulated on Vulkan creating a new VkBuffer for
each (handle, offset, length) tuple and binding the expected data to it.
This is likely going to be slow and memory expensive when used on the
vertex buffer and we have to do it on all draws because we can't know
without analyzing indices when a game is going to read vertex data out
of bounds.
This is not a problem on OpenGL's BufferAddressRangeNV because it takes
a length parameter, unlike Vulkan's CmdBindVertexBuffers that only takes
buffers and offsets (the length is implicit in VkBuffer). It isn't a
problem on D3D12 either, because D3D12_VERTEX_BUFFER_VIEW on
IASetVertexBuffers takes SizeInBytes as a parameter (although I am not
familiar with robustness on D3D12).
Currently this only implements buffer ranges for vertex buffers,
although indices can also be affected. A KHR_robustness profile is not
created, but Nvidia's driver reads out of bound vertex data as zero
anyway, this might have to be changed in the future.
- Fixes SMO random triangles when capturing an enemy, getting hit, or
looking at the environment on certain maps.
Make stream buffer and cached buffers as resident and query their
address. This allows us to use GPU addresses for several proprietary
Nvidia extensions.
Expose NV_vertex_buffer_unified_memory when the driver supports it.
This commit adds a function the determine if a GL_RENDERER is a Turing
GPU. This is required because on Turing GPUs Nvidia's driver crashes
when the buffer is marked as resident or on DeleteBuffers. Without a
synchronous debug output (single threaded driver), it's likely that
the driver will crash in the first blocking call.
Add HSET2_IMM. Due to the complexity of the encoding avoid using
BitField unions and read the relevant bits from the code itself.
This is less error prone.
Update validation layer string to VK_LAYER_KHRONOS_validation.
While we are at it, properly check for available validation layers
before enabling them.
Enable GL_EXT_texture_shadow_lod if available. If this extension is not available, such as on Intel/AMD proprietary drivers, use textureGrad as a workaround.
Variables that are marked as const cannot have the move constructor
invoked when returning from a function (the move constructor requires a
non-const variable so it can "steal" the resources from it.
Check() can throw an exception if the Vulkan result isn't successful.
We remove the check so that std::terminate isn't outright called and
allows for better debugging (should it ever actually fail).
Renames some variables to prevent ones in inner scopes from shadowing
outer-scoped variables.
The Copy* functions have no shadowing, but we rename them anyways to
remain consistent with the other functions.
We can reduce the capture scope so that it's not possible for both "reg"
variables to clash with one another.
While we're at it, we can prevent unnecessary copies while we're at it.
There's no need to load contents from the CPU when a clear resets all
the contents of the underlying memory. This is already implemented on
OpenGL and the texture cache.
maxwell_to_vk: Reorder filtering modes to start with None, then Nearest, then Linear.
maxwell_to_vk: Logs filter modes under UNREACHABLE_MSG instead of UNIMPLEMENTED_MSG, since any unknown filter modes are invalid and not unimplemented.
maxwell_to_vk: Return VK_SAMPLER_MIPMAP_MODE_NEAREST instead of VK_SAMPLER_MIPMAP_MODE_LINEAR when mipmap_filter is None with the description from the VkSamplerCreateInfo(3) man page.
maxwell_to_gl: Log unimplemented features under UNIMPLEMENTED_MSG instead of LOG_ERROR to bring into parity with maxwell_to_vk
maxwell_to_gl: Deduplicate logging in VertexType(), merging them into one.
maxwell_to_gl: Return GL_NEAREST instead of GL_LINEAR if an unknown texture filter mode is encountered.
maxwell_to_gl: Log the mipmap filter mode if an unknown value is passed in.
maxwell_to_gl: Reorder filtering modes to start with None, then Nearest, then Linear.
Due to the limitation of GL_MAX_IMAGE_UNITS being low (8) on Intel's and Nvidia's proprietary drivers, we have to reserve an appropriate amount of image bindings for each of the stages. So far games have been observed to use 4 image bindings on the fragment stage (Kirby Star Allies) and 1 on the vertex stage (TWD series).
No games thus far in my limited testing used more than 4 images concurrently and across all currently active programs.
This fixes shader compilation errors on Kirby Star Allies on OpenGL (GLSL/GLASM)
All registers are now callee-save registers.
RBX and RBP selected for STATE and RESULT because these are most commonly accessed; this is to avoid the REX prefix.
RBP not used for STATE because there are some SIB restrictions, RBX emits smaller code.
Emit code compatible with NV_gpu_program5.
This should emit code compatible with Fermi, but it wasn't tested on
that architecture. Pascal has some issues not present on Turing GPUs.
Instead of using as template argument a shared pointer, use the
underlying type and manage shared pointers explicitly. This can make
removing shared pointers from the cache more easy.
While we are at it, make some misc style changes and general
improvements (like insert_or_assign instead of operator[] + operator=).
Vertex buffers bindings become invalid after the stream buffer is
invalidated. We were originally doing this, but it got lost at some
point.
- Fixes Animal Crossing: New Horizons, but it affects everything.