Several standard constructors generally check if objects can be moved in
a non-throwing manner (usually via std::move_if_noexcept) to preserve
its exception guarantees. This means that if these were used with
certain containers any reallocations internally would cause resource
churn, as copies would be necessary instead of moves.
This way, if they're every used in that manner, the right behavior is
always performed.
Avoids copying the std::function when we don't need to. Particularly
given the std::function isn't actually stored anywhere, so there's no
need to move it.
This fixes#5067 by reverting a speculative change made in a previous PR.
From this one can conclude that, for disabled textures, black (0,0,0,1) is the correct colour and clear (0,0,0,0) is not.
* video_core/renderer_opengl/gl_rasterizer_cache: Create Format Reinterpretation Framework
Adds RGBA4 -> RGB5A1 reinterpretation commonly used by virtual console
If no matching surface can be found, ValidateSurface checks for a surface in the cache which is reinterpretable to the requested format.
If that fails, the cache is checked for any surface with a matching bit-width. If one is found, the region is flushed.
If not, the region is checked against dirty_regions to see if it was created entirely on the GPU.
If not, then the surface is flushed.
Co-Authored-By: James Rowe <jroweboy@users.noreply.github.com>
Co-Authored-By: Ben <b3n30@users.noreply.github.com>
temporary change to avoid merge conflicts with video dumping
* re-add D24S8->RGBA8 res_scale hack
* adress review comments
* fix dirty region check
* check for surfaces with invalid pixel format, and break logic into separate functions
* video_core/renderer_opengl: Move SurfaceParams into its own file
Some of its enums are needed outside of the rasterizer cache
and trying to use it caused circular dependencies.
* video_core/renderer_opengl: Overhaul the texture filter framework
This should make it less intrusive.
Now texture filtering doesn't have any mutable global state.
The texture filters now always upscale to the internal rendering resolution.
This simplifies the logic in UploadGLTexture and it simply takes the role of BlitTextures at the end of the function.
This also prevent extra blitting required when uploading to a framebuffer surface with a mismatched size.
* video_core/renderer_opengl: Use generated mipmaps for filtered textures
The filtered guest mipmaps often looked terrible.
* core/settings: Remove texture filter factor
* sdl/config: Remove texture filter factor
* qt/config: Remove texture filter factor
src/video_core/renderer_opengl/texture_filters/bicubic/bicubic.cpp:51:86: error: cannot initialize a parameter of type 'GLuint' (aka 'unsigned int') with an rvalue of type 'nullptr_t'
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, NULL, 0);
^~~~
src/video_core/renderer_opengl/texture_filters/xbrz/xbrz_freescale.cpp:95:86: error: cannot initialize a parameter of type 'GLuint' (aka 'unsigned int') with an rvalue of type 'nullptr_t'
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, NULL, 0);
^~~~
/usr/include/sys/_null.h:37:14: note: expanded from macro 'NULL'
#define NULL nullptr
^~~~~~~
The default is discrete_interval which has dynamic open-ness.
We only use right_open intervals anyway. In theory this could allow some compile-time optimizations.
This uses the mailbox model to move pixel downloading to its own thread, eliminating Nvidia's warnings and (possibly) making use of GPU copy engine.
To achieve this, we created a new mailbox type that is different from the presentation mailbox in that it never discards a rendered frame.
Also, I tweaked the projection matrix thing so that it can just draw the frame upside down instead of having the CPU flip it.
* videocore/renderer_opengl/gl_rasterizer_cache: Move bits per pixel table out of function
GCC and MSVC copy the table at runtime with the old implementation, which is wasteful and prevents inlining. Unfortunately, static constexpr variables are not legal in constexpr functions, so the table has to be external.
Also replaced non-standard assert with DEBUG_ASSERT_MSG.
* fix case of table name in assert
* set table to private
This is based on what was done using additional layouts, but modified
to have a variable to control rotation and making it so Single Screen
Layout behaves like Upright Single would, and Default Layout behaves
like Upright Double would, when the new variable is used.
Large Layout and Side Layout currently ignore the new variable.
New variable still currently doesn't have a hotkey.
Previously we would first attempt to use any buffer that was free,
meaning whichever buffer has already been displayed. This has poor
interactions when the operating system throttles the update rate of the
window, so if there isn't any free buffers available, just reuse the
oldest frame instead.
While QOpenGLWidget sounds like a good idea, it has issues which are
harder to debug due to how Qt manages the context behind the scenes. We
could probably work around any of these issues over time, but its
probably easier to do it ourselves with a QWindow directly.
Plus using QWindow + createWindowContainer is the easiest to use
configuration for Qt + Vulkan so this is probably much better in the
long run.
For better tracking of performance regressions on incoming changes, this
change adds a way to dump frametime to file by changing an ini config
option. This is intentionally hidden as its only useful to a small
number of individuals, and not really applicable to the general
userbase.
Two PBOs are used to speed up pixel copying process. To avoid getting the wrong speed/FPS, a new parameter is added to DrawScreens about whether to increase the frame count.
* Add Anaglyph 3D
Change 3D slider in-game
Change shaders while game is running
Move shader loading into function
Disable 3D slider setting when stereoscopy is off
The rest of the shaders
Address review issues
Documentation and minor fixups
Forgot clang-format
Fix shader release on SDL2-software rendering
Remove unnecessary state changes
Respect 3D factor setting regardless of stereoscopic rendering
Improve shader resolution passing
Minor setting-related improvements
Add option to toggle texture filtering
Rebase fixes
* One final clang-format
* Fix OpenGL problems
Since C++17, the introduction of deduction guides for locking facilities
means that we no longer need to hardcode the mutex type into the locks
themselves, making it easier to switch mutex types, should it ever be
necessary in the future.
Previously this check is in GetSurface (if (addr == 0)). This worked fine because GetTextureSurface directly forwarded the address value to GetSurface. However, now with mipmap support, GetTextureSurface would call GetSurface several times with different address offset, resulting some >0 but still invalid address in case the input is 0. We should error out early on invalid address instead of sending it furthor down which would cause invalid memory access
video_core: shorten GetGLSLVersionString
video_core: make GLES version and extensions consistent
video_core: move some logic to LoadShader
video_core: deduplicate fragment shader precision specifier
Allows capturing screenshot at the current internal resolution (native for software renderer), but a setting is available to capture it in other resolutions. The screenshot is saved to a single PNG in the current layout.
They were missed, and Copy is very high in profile here. It doesn't block the GPU,
but it stalls the driver thread. So with our bad GL instructions, this might block quite a while.
Those implementations are quite costly, so there is no need to inline them to the caller.
Ressource deletion is often a performance bug, so in this way, we support to add breakpoints to them.
* renderer_opengl: Namespace OpenGL code
Namespaces all OpenGL code under the OpenGL namespace.
Prevents polluting the global namespace and allows clear distinction
between other renderers' code in the future.
* Also namespace TextureCubeConfig
This was very likely intended to be a logical OR based off the
conditioning and testing of inversion in one case.
Even if this was intentional, this is the kind of non-obvious thing one
should be clarifying with a comment.
This change to floor() was made in 2927c88, which was a result of doing some hwtest. It turned out that it was buggy edge cases in PICA, and for most cases round() still applies
…class
Makes the global a member of the RendererBase class. We also change this
to be a reference. Passing any form of null pointer to these functions
is incorrect entirely, especially given the code itself assumes that the
pointer would always be in a valid state.
This also makes it easier to follow the lifecycle of instances being
used, as we explicitly interact the renderer with the rasterizer, rather
than it just operating on a global pointer.
* Fixed crash when right eye isn't available
* Enabled swap screens in stereo views. Fixed window alignment in stereo
views to handle all screen aspect ratios.
* Minor code cleanup and clang fomat updates.
* Minor cleanup of swapped and aspect ratio code
The state object isn't used anywhere else, so there
is no need to revert the state.
And the comment is just wrong: It doesn't matter
which textures are bound on framebuffer binding, it
only matters at draw time. And we reset all bindings
before the draw call. So let's use gl_state as it is
designed to avoid flipping states.
Framebuffer blits depends on pixel tests:
Ownership (is fine)
Scissor (is broken on the last commit)
Masking (is broken on master for a while)
So let's be honest and start with a clean state in
those helper functions.
There is no need to disable this state after the draw call,
gl_state will handle this for us if needed. This kind of
redundant state changes are bad for the driver overhead,
as flipping bits will invalidate the driver state.
This replaces the glBufferData logic with the shared stream buffer code.
The new code doesn't need a temporary staging buffer any more, so the
performance should imrpove quite a bit.
We are going to add private memebers to ShaderSetup, which forbids the usage of offsetof. The JIT program only use the uniform part of the setup, so we can just isolate it.
The FresnelSelector was already working like a bitfield, so just make it actual bitfield to reduce redundant code. Also, it is already confirmed that this field also affects shadow on alpha. Given that the only two source that can affect alpha components are both controlled by this field, this field should be renamed to a general alpha switch
Internally these errors are handled by exceptions. Only fallbackable errors (that can be handled by CPU shader emulation) is reported. Completely ill-formed shader is still ASSERTed. Code logic related stuff is DEBUG_ASSERTed
- remove unnecessary ";"
- use std::tie for lexicographical ordering
- simplify loop condition
The offset always has step +1 on each iteration, so it would just hit one of the two boundary anyway
* Discard gas_mode renders
This discards the gas_mode / fog effect from games that use it and allows the games to display without it. Note that gas mode is still unimplemented and will LOG<CRITICAL>.
This bypasses #3287. (Doesn't fix it)
* fix clang
citra-qt: Add customizable speed limit target
* Update SDL config for the new frame_limit option
* Made max lag time a function of target speed percent.
* Added a checkbox to enable/disable frame limiter
* UI: Prevent frame_limit from under/overflowing
* UI: Hide target speed percent when frame limiter is off
* Disable frame limit spin box when framelimit isn't enabled
Optimize AttributeBuffer to OutputVertex conversion
First I unrolled the inner loop, then I pushed semantics validation
outside of the hotloop.
I also added overflow slots to avoid conditional branches.
Super Mario 3D Land's intro runs at almost full speed when compiled with
Clang, and theres a noticible speed increase in MSVC. GCC hasn't been
tested but I'm confident in its ability to optimize this code.
Several games such as Smash will cause some regions that are cached on
the gpu to be revalidated, but (seemingly) we can just ignore these
cases. If the data is already found on the gpu in dirty_regions, then we
validate those, and skip flushing that region from cpu.
Its unknown if this breaks any games, but it does speed up many games.
Additionally, it removes outlines in the pokemon games.
The previous commits added the methods where they were located
originally to try to get an easy to read diff between changes. This
commit fixes compliation since the static methods are now declared
before they are used.
Changes the public interface of the surface cache to make it easier to
use. Reintroduces the cached page count cached pages that was removed in
an earlier commit.
Breaks CachedSurface into two classes, the parameters used to create or
find a cached surface, and the actual cached surface. This also adds a
few helper methods for getting surfaces from cache
In a future commit, the count of cached pages will be reintroduced in
the actual surface cache. Also adds an Invalidate only to the cache
which marks a region as invalid in order to try to avoid a costly flush
from 3ds memory
The only functional change is the error handling of GSP_GPU::ReadHWRegs function. We previously didn't return error codes (not even for success). The new returns were found by reverse engineering the GSP module.
It is unlikely we will ever use this without first doing a Cast to a signed type.
Fixes 9 "unary minus operator applied to unsigned type, result still unsigned" warnings on MSVC2017.3
PR #1461 introduced a regression where some games would change configuration
even while in the poorly named "drawing" mode, which broke the heuristic
citra was using to determine when to draw the batch. This change adds
back in a draw call for batching, and also adds in a draw call in
immediate mode each time it adds a triangle.
This function is called in clipping, before the pespective divide, and is not used in later rasterization. Thus it doesn't need perspective correction.
The geometry pipeline manages data transfer between VS, GS and primitive assembler. It has known four modes:
- no GS mode: sends VS output directly to the primitive assembler (what citra currently does)
- GS mode 0: sends VS output to GS input registers, and sends GS output to primitive assembler
- GS mode 1: sends VS output to GS uniform registers, and sends GS output to primitive assembler. It also takes an index from the index buffer at the beginning of each primitive for determine the primitive size.
- GS mode 2: similar to mode 1, but doesn't take the index and uses a fixed primitive size.
hwtest shows that immediate mode also supports GS (at least for mode 0), so the geometry pipeline gets refactored into its own class for supporting both drawing mode.
In the immediate mode, some games don't set the pipeline registers to a valid value until the first attribute input, so a geometry pipeline reset flag is set in `pipeline.vs_default_attributes_setup.index` trigger, and the actual pipeline reconfigure is triggered in the first attribute input.
In the normal drawing mode with index buffer, the vertex cache is a little bit modified to support the geometry pipeline. Instead of OutputVertex, it now holds AttributeBuffer, which is the input to the geometry pipeline. The AttributeBuffer->OutputVertex conversion is done inside the pipeline vertex handler. The actual hardware vertex cache is believed to be implemented in a similar way (because this is the only way that makes sense).
Both geometry pipeline and GS unit rely on states preservation across drawing call, so they are put into the global state. In the future, the other three vertex shader units should be also placed in the global state, and a scheduler should be implemented on top of the four units. Note that the current gs_unit already allows running VS on it in the future.
hwtest shows that, although GS always emit a group of three vertices as one primitive, it still respects to the topology type, as if the three vertices are input into the primitive assembler independently and sequentially. It is also shown that the winding flag in SETEMIT only takes effect for Shader topology type, which is believed to be the actual difference between List and Shader (hence removed the TODO). However, only Shader topology type is observed in official games when GS is in use, so the other mode seems to be just unintended usage.
Among four shader units in pica, a special unit can be configured to run both VS and GS program. GSUnitState represents this unit, which extends UnitState (which represents the other three normal units) with extra state for primitive emitting. It uses lots of raw pointers to represent internal structure in order to keep it standard layout type for JIT to access.
This unit doesn't handle triangle winding (inverting) itself; instead, it calls a WindingSetter handler. This will be explained in the following commits