Commit graph

229 commits

Author SHA1 Message Date
GPUCode
106364e01e
video_core: Use source3 when GPU_PREVIOUS is used in first stage (#7411) 2024-02-05 09:53:54 -08:00
GPUCode
d5a1bd07f3
glsl_shader_gen: Increase z=0 epsillon (#7408) 2024-02-05 09:53:41 -08:00
GPUCode
480604ec72
glsl_shader_fs_gen: Apply shadow before ambient light (#7404) 2024-01-31 23:29:39 +02:00
merry
63feac6bb3
externals: Update dynarmic to 6.6.1, Update oaknut to 2.0.1 (#7398) 2024-01-30 19:50:39 -08:00
GPUCode
8d82adb3d3
glsl_shader_gen: Remove invariant qualifier (#7376)
* glsl_shader_gen: Remove invariant qualifier

* Causes visual regressions in Pokemon with RADV

* rasterizer_cache: Clear null surface to transparent
2024-01-21 13:39:35 +02:00
GPUCode
8e87bd606c
glsl_shader_gen: Use epsilon for both ends of NDC range (#7355) 2024-01-20 22:13:16 +02:00
Vitor K
c8c2beaeff
misc: fix issues pointed out by msvc (#7316)
* do not move constant variables

* applet_manager: avoid possible use after move

* use constant references where pointed out by msvc

* extra_hid: initialize response

* ValidateSaveState: passing slot separately is not necessary

* common: mark HashCombine as nodiscard

* cityhash: remove use of using namespace std

* Prefix all size_t with std::

done automatically by executing regex replace `([^:0-9a-zA-Z_])size_t([^0-9a-zA-Z_])` -> `$1std::size_t$2`
based on 7d8f115

* shared_memory.cpp: fix log error format

* fix compiling with pch off
2024-01-07 12:37:42 -08:00
GPUCode
2bb7f89c30
video_core: Refactor GPU interface (#7272)
* video_core: Refactor GPU interface

* citra_qt: Better debug widget lifetime
2023-12-28 11:46:57 +01:00
Wunk
ea9f522c0c
shader_jit_a64: Use LDP/STP for address registers (#7225)
Move `address_registers` to be earlier in the `UnitState` structure to allow LDP/STP's 7-bit offset to reach these members.

Follow-up of https://github.com/citra-emu/citra/pull/7002#discussion_r1367270804
2023-12-03 05:07:21 -08:00
Wunk
83b329f6e1
video_core/shader: Refactor JIT-Engines into JitEngine type (#7210) 2023-11-26 15:15:36 -08:00
GPUCode
26d5727b19
video_core: Merge tex0 and tex_cube (#7173) 2023-11-17 03:14:10 -08:00
GPUCode
d5b50a9fc0
spv_fs_shader_gen: Remove OpTypeSampledImage from texture buffers (#7153) 2023-11-12 22:40:30 -08:00
GPUCode
168f168c33
spv_fs_shader_gen: Implement quaternion correction with barycentric extension (#7152) 2023-11-12 22:40:21 -08:00
merry
271218b733
shader_jit_a64_compiler: Improve MAX, MIN (#7137) 2023-11-11 18:27:01 +05:30
merry
80213bf88f
shader_jit_a64_compiler: Improve Compile_SwizzleSrc (#7136) 2023-11-11 18:26:48 +05:30
Steveice10
d4f31bc617
video_core: Fix fragment shader interlock usage on OpenGL. (#7144) 2023-11-10 13:14:52 -08:00
Steveice10
84f9e9a10f
video_core: Perform quaternion correction and interpolation in fragment shader using barycentric extension. (#7126) 2023-11-09 15:23:56 -08:00
Wunk
1d4d421097
shader_jit_a64: Optimize MOVA dest-enable (#7122)
Rather than branching the 3 cases of dest-enablement, just emit a single
move-and-sign-extend instruction for each case.

From this review:
https://github.com/citra-emu/citra/pull/7002#discussion_r1381560584
2023-11-07 11:46:40 -08:00
GPUCode
1f6393e7d5
video_core: Refactor GLSL fragment emitter (#7093)
* video_core: Refactor GLSL fragment emitter

* shader: Add back custom normal maps
2023-11-06 12:26:28 -08:00
Wunk
e13735b624
video_core: Implement an arm64 shader-jit backend (#7002)
* externals: Add oaksim submodule

Used for emitting ARM64 assembly

* common: Implement aarch64 ABI

Utilize oaknut to implement a stack frame.

* tests: Allow shader-jit tests for x64 and a64

Run the shader-jit tests for both x86_64 and arm64 targets

* video_core: Initialize arm64 shader-jit backend

Passes all current unit tests!

* shader_jit_a64: protect/unprotect memory when jit-ing

Required on MacOS. Memory needs to be fully unprotected and then
re-protected when writing or there will be memory access errors on
MacOS.

* shader_jit_a64: Fix ARM64-Imm overflow

These conditionals were throwing exceptions since the immediate values
were overflowing the available space in the `EOR` instructions. Instead
they are generated from `MOV` and then `EOR`-ed after.

* shader_jit_a64: Fix Geometry shader conditional

* shader_jit_a64: Replace `ADRL` with `MOVP2R`

Fixes some immediate-generation exceptions.

* common/aarch64: Fix CallFarFunction

* shader_jit_a64: Optimize `SantitizedMul`

Co-authored-by: merryhime <merryhime@users.noreply.github.com>

* shader_jit_a64: Fix address register offset behavior

Based on https://github.com/citra-emu/citra/pull/6942
Passes unit tests.

* shader_jit_a64: Fix `RET` address offset

A64 stack is 16-byte aligned rather than 8. So a direct port of the x64
code won't work. Fixes weird branches into invalid memory for any
shaders with subroutines.

* shader_jit_a64: Increase max program size

Tuned for A64 program size.

* shader_jit_a64: Use `UBFX` for extracting loop-state

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Optimize `SUB+CMP` to `SUBS`

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Optimize `CMP+B` to `CBNZ`

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Use `FMOV` for `ONE` vector

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Remove x86-specific documentation

* shader_jit_a64: Use `UBFX` to extract exponent

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Remove redundant MIN/MAX `SRC2`-NaN check

Special handling only needs to check SRC1 for NaN, not SRC2.
It would work as follows in the four possible cases:

No NaN: No special handling needed.
Only SRC1 is NaN: The special handling is triggered because SRC1 is NaN, and SRC2 is picked.
Only SRC2 is NaN: FMAX automatically picks SRC2 because it always picks the NaN if there is one.
Both SRC1 and SRC2 are NaN: The special handling is triggered because SRC1 is NaN, and SRC2 is picked.

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit/tests:: Add catch-stringifier for vec2f/vec3f

* shader_jit/tests: Add Dest Mask unit test

* shader_jit_a64: Fix Dest-Mask `BSL` operand order

Passes the dest-mask unit tests now.

* shader_jit_a64: Use `MOVI` for DestEnable mask

Accelerate certain cases of masking with MOVI as well

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit/tests: Add source-swizzle unit test

This is not expansive. Generating all `4^4` cases seems to make Catch2
crash. So I've added some component-masking(non-reordering) tests based
on the Dest-Mask unit-test and some additional ones to test
broadcasts/splats and component re-ordering.

* shader_jit_a64: Fix swizzle index generation

This was still generating `SHUFPS` indices and not the ones that we wanted for the `TBL` instruction. Passes all unit tests now.

* shader_jit/tests: Add `ShaderSetup` constructor to `ShaderTest`

Rather than using the direct output of `CompileShaderSetup` allow a
`ShaderSetup` object to be passed in directly.  This enabled the ability
emit assembly that is not directly supported by nihstro.

* shader_jit/tests: Add `CALL` unit-test

Tests nested `CALL` instructions to eventually reach an `EX2`
instruction.

EX2 is picked in particular since it is implemented as an even deeper
dispatch and ensures subroutines are properly implemented between `CALL`
instructions and implementation-calls.

* shader_jit_a64: Fix nested `BL` subroutines

`lr` was getting writen over by nested calls to `BL`, causing undefined
behavior with mixtures of `CALL`, `EX2`, and `LG2` instructions.

Each usage of `BL` is now protected with a stach push/pop to preserve
and restore teh `lr` register to allow nested subroutines to work
properly.

* shader_jit/tests: Allocate generated tests on heap

Each of these generated shader-test objects were causing the stack to
overflow.  Allocate each of the generated tests on the heap and use
unique_ptr so they only exist within the life-time of the `REQUIRE`
statement.

* shader_jit_a64: Preserve `lr` register from external function calls

`EMIT` makes an external function call, and should be preserving `lr`

* shader_jit/tests: Add `MAD` unit-test

The Inline Asm version requires an upstream fix:
https://github.com/neobrain/nihstro/issues/68

Instead, the program code is manually configured and added.

* shader_jit/tests: Fix uninitialized instructions

These `union`-type instruction-types were uninitialized, causing tests
to indeterminantly fail at times.

* shader_jit_a64: Remove unneeded `MOV`

Residue from the direct-port of x64 code.

* shader_jit_a64: Use `std::array` for `instr_table`

Add some type-safety and const-correctness around this type as well.

* shader_jit_a64: Avoid c-style offset casting

Add some more const-correctness to this function as well.

* video_core: Add arch preprocessor comments

* common/aarch64: Use X16 as the veneer register

https://developer.arm.com/documentation/102374/0101/Procedure-Call-Standard

* shader_jit/tests: Add uniform reading unit-test

Particularly to ensure that addresses are being properly truncated

* common/aarch64: Use `X0` as `ABI_RETURN`

`X8` is used as the indirect return result value in the case that the
result is bigger than 128-bits. Principally `X0` is the general-case
return register though.

* common/aarch64: Add veneer register note

`LR` is generally overwritten by `BLR` anyways, and would also be a safe
veneer to utilize for far-calls.

* shader_jit_a64: Remove unneeded scratch register from `SanitizedMul`

* shader_jit_a64: Fix CALLU condition

Should be `EQ` not `NE`. Fixes the regression on Kid Icarus.
No known regressions anymore!

---------

Co-authored-by: merryhime <merryhime@users.noreply.github.com>
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
2023-11-05 21:40:31 +01:00
GPUCode
ef43776c7b
shader: Fix address register offset behavior in x64 Jit (#6942)
* shader: Fix address register offset behavior in x64 Jit

* shader: Remove redundant jump

* tests: Add address register tests

* shader: Remove additional pre-multiplications by 16

* tests: Add catch-stringifier for vec4f

* tests: Format
2023-10-18 19:41:36 +03:00
Steveice10
4c59443ed2
common: Add more robust ZSTD handling. (#7071) 2023-10-15 14:08:29 -07:00
SachinVin
053b3618dc
Android: gl_shader_decompiler.cpp: get rid of the not-so-accurate multiplication (#7011) 2023-10-05 10:28:26 -07:00
Steveice10
50f22d1f59
video_core: Abstract shader generators. (#6990)
* video_core: Abstract shader generators.

* shader: Extract common generator structures and move generators to specific namespaces.

* shader: Minor fixes and clean-up.
2023-09-30 02:06:06 -07:00
Vitor K
6cfb8e02a8
clang format (#7017) 2023-09-27 13:42:19 -03:00
GPUCode
542209c993
video_core: Initialize unrefered attributes to 1.f (#6966) 2023-09-16 14:46:44 -07:00
GPUCode
1159e4d928
video_core: Take factors into account with min/max blending functions (#6925)
* sw_framebuffer: Take factors into account for min/max blending

* renderer_gl: Take factors into account for min/max blending

* Address review comments

* gl_shader_gen: Fix frambuffer fetch on qcom and mali

* renderer_opengl: Add fallback path for mesa

* gl_shader_gen: Avoid emitting blend emulation if minmax_factor is present
2023-08-30 21:26:28 +03:00
Steveice10
df57012c50
video_core: Correct shader interpeter address register handling. (#6910) 2023-08-27 01:49:31 +03:00
GPUCode
a94297922b
shader_interpreter: Fix control flow edge cases and implement break/breakc (#6844) 2023-08-21 18:12:46 -07:00
Steveice10
6ddf4b241f
renderer/vulkan: Emulate custom border colors in shaders when unavailable. (#6878) 2023-08-17 22:22:25 +02:00
Wunk
a94af8ea62
shader_jit: Add optimizations up to x86-64-v4 (#6668) 2023-07-11 09:21:37 -07:00
Wunk
71aea7e571
shader/jit: Use xbyak::util::Cpu for Host capabilities (#6643)
Xbyak has a complete utility-class for determining the host-processor's
ISA-features such as SSE4.1, AVX, AVX2, AVX512{F,VL,DQ,VBMI,etc}, and so
on for further potential optimizations.
2023-07-02 08:05:07 +05:30
GPUCode
9b82de6b24
Refactor software renderer (#6621) 2023-06-24 00:59:18 +02:00
Steveice10
03dbdfc12f
video_core: Reduce nihstro includes in headers. (#6626) 2023-06-19 15:49:22 -07:00
SachinVin
796e8a9f24 video_core:fix warnings 2023-06-17 21:23:58 +05:30
Vitor K
34de77d429
Various miscelaneous changes (#6496) 2023-05-03 17:24:10 +02:00
SachinVin
41f13456c0
Chore: Enable warnings as errors on MSVC (#6456)
* tests: add Sanity test for SplitFilename83

fix test

fix test

* disable `C4715:not all control paths return a value` for nihstro includes

nihstro: no warn

* Chore: Enable warnings as errors on msvc + fix warnings

fixes

some more warnings

clang-format

* more fixes

* Externals: Add target_compile_options `/W0` nihstro-headers and ...

Revert "disable `C4715:not all control paths return a value` for nihstro includes"
This reverts commit 606d79b55d3044b744fb835025b8eb0f4ea5b757.

* src\citra\config.cpp: ReadSetting: simplify type casting

* settings.cpp: Get*Name: remove superflous logs
2023-05-01 22:38:58 +03:00
GPUCode
b5d6f645bd
Prepare frontend for multiple graphics APIs (#6347)
* externals: Update dynarmic

* settings: Introduce GraphicsAPI enum

* For now it's OpenGL only but will be expanded upon later

* citra_qt: Introduce backend agnostic context management

* Mostly a direct port from yuzu

* core: Simplify context acquire

* settings: Add option to create debug contexts

* renderer_opengl: Abstract initialization to Driver

* This commit also updates glad and adds some useful extensions which we will use in part 2

* Rasterizer construction is moved to the specific renderer instead of RendererBase.
  Software rendering has been disable to achieve this but will be brought back in the next commit.

* video_core: Remove Init/Shutdown methods from renderer

* The constructor and destructor can do the same job

* In addition move opengl function loading to Qt since SDL already does this. Also remove ErrorVideoCore which is never reached

* citra_qt: Decouple software renderer from opengl part 1

* citra: Decouple software renderer from opengl part 2

* android: Decouple software renderer from opengl part 3

* swrasterizer: Decouple software renderer from opengl part 4

* This commit simply enforces the renderer naming conventions in the software renderer

* video_core: Move RendererBase to VideoCore

* video_core: De-globalize screenshot state

* video_core: Pass system to the renderers

* video_core: Commonize shader uniform data

* video_core: Abstract backend agnostic rasterizer operations

* bootmanager: Remove references to OpenGL for macOS

OpenGL macOS headers definitions clash heavily with each other

* citra_qt: Proper title for api settings

* video_core: Reduce boost usage

* bootmanager: Fix hide mouse option

Remove event handlers from RenderWidget for events that are
already handled by the parent GRenderWindow.
Also enable mouse tracking on the RenderWidget.

* android: Remove software from graphics api list

* code: Address review comments

* citra: Port per-game settings read

* Having to update the default value for all backends is a pain so lets centralize it

* android: Rename to OpenGLES

---------

Co-authored-by: MerryMage <MerryMage@users.noreply.github.com>
Co-authored-by: Vitor Kiguchi <vitor-kiguchi@hotmail.com>
2023-03-27 14:29:17 +03:00
Steveice10
a8848cce43 build: Update to support multi-arch builds. 2023-01-07 01:09:32 -08:00
Tobias
bb05d8c12a
src/CMakeLists: Enforce multiple warnings on MSVC (#5692) 2022-11-09 23:14:28 +01:00
SachinVin
4a590d1fcb xbyak: Correct xbyak include directory
xbyak is intended to be installed in /usr/local/include/xbyak.
Since we desire not to install xbyak before using it, we copy the headers
to the appropriate directory structure and use that instead
Co-authored-by: merry <git@mary.rs>
2022-10-23 13:19:32 +05:30
liushuyu
c6153bb32e
CI: fix recent build issues (#6148)
Co-authored-by: Vitor K <vitor-kiguchi@hotmail.com>
2022-10-05 18:43:07 +02:00
GPUCode
cbd5d1c15c
Upgrade codebase to C++ 20 + fix warnings + update submodules (#6115) 2022-09-21 18:36:12 +02:00
SachinVin
65611e5b51 Shader jit: Save and restore LOOPCOUNT_REG for nested loops,
also add the assert back for nested loops
update test
2022-05-21 11:24:32 +05:30
SachinVin
047e238d09 shader_jit: Compile nested loops
and use `T_NEAR` instead of the default in Compile_BREAKC
2022-04-24 23:12:53 +05:30
Lioncash
9c08409e0e externals: Update Xbyak to 5.96
I made a request on the Xbyak issue tracker to allow some constructors
to be constexpr in order to avoid static constructors from needing to
execute for some of our register constants.

This request was implemented, so this updates Xbyak so that we can make
use of it.
2021-01-04 05:11:18 +01:00
Hamish Milne
26f9364062
Apply suggestions from code review
Co-Authored-By: Ben <bene_thomas@web.de>
2020-03-28 16:28:07 +00:00
Hamish Milne
558e710e17 Finished archives; remove pod.h 2020-02-13 17:41:26 +08:00
Hamish Milne
7b846ffa98 clang-format fixes 2020-02-13 17:39:15 +08:00
Hamish Milne
050c3bdee5 Serialize primitive_assembly 2020-02-13 17:38:18 +08:00