Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64 (aarch64) DRC backend #13162

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from
Draft

Conversation

987123879113
Copy link
Contributor

  • Updated asmjit to latest master because it has some fixes for ARM64
  • Removed the vector-related code from UML and backends. It's not used anywhere that I could find so it's better to just cut it out.
  • Added ARM64 (aarch64) DRC backend

It's a big PR. I've done my best to make it work but I'm sure there are still bugs or questionable implementation choices. It passes all of the same tests I was doing for the previous DRC changes so I think the outputs should be the same as the other DRC backends.

Additional testing and/or misc feedback like benchmarks on various devices would also be appreciated. For hardware I've tested with an M1 Pro CPU and a Raspberry Pi 4 Model B. For software I've been testing using a few games on the Naomi (SH4), Firebeat (PPC), NWK-TR (PPC), and some very small testing of dgpix (Hyperstone E1-32XT).

@cuavas cuavas self-assigned this Jan 2, 2025
@cuavas
Copy link
Member

cuavas commented Jan 2, 2025

I’ll take a look at this when I get a chance.

@987123879113
Copy link
Contributor Author

A small note about my experience cross compiling for Raspberry Pi: I had to change the configuration { "x64" } lines to configuration { } in the build scripts to get it to build with the DRC in my Docker mame_raspberrypi_cross_compile environment. I don't know if it's the environment's fault or if it's a difference between MacOS and Linux and I just didn't implement that part correctly.

> uname -mps
Darwin arm64 arm

crosstoolng@ed182c661828:/$ uname -mps
Linux aarch64 aarch64

@belegdol
Copy link
Contributor

belegdol commented Jan 2, 2025

What happens on 32-bit arm? Looking at the makefile and genie scripts changes, it looks like arm64 backend would get built. Is this what we want?

@987123879113
Copy link
Contributor Author

It shouldn't do anything on 32-bit. asmjit only supports aarch64.

@belegdol
Copy link
Contributor

belegdol commented Jan 2, 2025

asmjit only supports aarch64.

I know. But looking at makefile changes, c backend is no longer forced if arm is found in uname -mps output. This could be a problem. On RPi 1B the output looks as follows:

$ uname -mps
Linux armv6l unknown

On an odroid HC1 NAS as follows:

$ uname -mps
Linux armv7l unknown

@987123879113
Copy link
Contributor Author

That doesn't mean it won't use the C backend though. FORCE_DRC_C_BACKEND just makes it skip the check to see if the platform configuration has a DRC available. For it to not use the C backend it would need to match the platform ("arm"/"arm64") and also have the x64 configuration available.

if not _OPTIONS["FORCE_DRC_C_BACKEND"] then
	if _OPTIONS["BIGENDIAN"]~="1" then
		if (_OPTIONS["PLATFORM"]=="arm" or _OPTIONS["PLATFORM"]=="arm64") then
			configuration { "x64" }
				defines {
					"NATIVE_DRC=drcbe_arm64",
				}
			configuration {  }
...

On MacOS it matches using the ifeq ($(findstring arm,$(UNAME)),arm) check, not the aarch64 one.

@rb6502
Copy link
Contributor

rb6502 commented Jan 3, 2025

MIPS: gauntleg shows an out of memory error with the AArch64 BE but -drc_use_c boots fine. Also sfrush, calsped, and mwskins - same error. (The error is shown by the game, not MAME).

PPC: scud doesn't boot without -drc_use_c.

@rb6502
Copy link
Contributor

rb6502 commented Jan 3, 2025

ppctest results:

Mismatch: instr=MULLW., src1=0x80000000, src2=0x80000000
expected: dest=0x0, XER=0x0, CR=0x20000000
got: dest=0x0, XER=0x0, CR=0x40000000

Mismatch: instr=MULLWO., src1=0x80000000, src2=0x80000000
expected: dest=0x0, XER=0xc0000000, CR=0x30000000
got: dest=0x0, XER=0xc0000000, CR=0x50000000
Test file line #: 488

src/devices/cpu/drcbearm64.cpp Outdated Show resolved Hide resolved
src/devices/cpu/drcbearm64.cpp Outdated Show resolved Hide resolved
@987123879113
Copy link
Contributor Author

987123879113 commented Jan 3, 2025

Fixed some more multiplication errors in addition to the ones AJR commented. That fixes the Midway/MIPS-based games RB mentioned. Also tried to emulate the call stack similar to how the C backend does it so that mapvars can be recovered properly which fixes scud. Also rebased against latest master to get rid of the build error.

@cuavas
Copy link
Member

cuavas commented Jan 3, 2025

@seleuco you might want to take a look at this, as ARM CPUs are the predominant targets for Android builds.

@seleuco
Copy link

seleuco commented Jan 3, 2025

@seleuco you might want to take a look at this, as ARM CPUs are the predominant targets for Android builds.

Yes. I was already aware of this pull request. I will start testing and will report back to you.

Copy link
Member

@cuavas cuavas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few relatively minor things in the declarations. I haven’t looked at the code generation yet.

src/devices/cpu/drcbearm64.h Outdated Show resolved Hide resolved
src/devices/cpu/drcbearm64.h Outdated Show resolved Hide resolved
src/devices/cpu/drcbearm64.h Outdated Show resolved Hide resolved
src/devices/cpu/drcbearm64.h Outdated Show resolved Hide resolved
src/devices/cpu/drcbearm64.h Outdated Show resolved Hide resolved
@rb6502
Copy link
Contributor

rb6502 commented Jan 4, 2025

Next up: pmac6100. -drc_use_c will show the mouse pointer and flashing disk, aarch64 loses track of the return address during an exception and crashes to PC=0x00000004. (And to clarify, you don't need any of my WIP changes to the driver to get that).

@987123879113
Copy link
Contributor Author

987123879113 commented Jan 4, 2025

@rb6502 More mapvar/recover issues. The callstack depth wasn't actually storing the new depth so it would overwrite the hashstacksave pointer sooner than it should've.

I tested the previously mentioned scud and MIPS games and those are still working + pmac6100 is booting as far as it does with the C backend now.

@rb6502
Copy link
Contributor

rb6502 commented Jan 4, 2025

Confirmed full functionality in pmac6100 with that change. I think this is good to go in terms of my regression testing now.

@seleuco
Copy link

seleuco commented Jan 4, 2025

I've successfully cross-compiled the pull request and tested performance on these Android devices without any problem at first sight:

Qualcomm Snapdragon 685 (low-midrange; Geekbench 6 single core: 473)

Game No DRC DRC (C backend) DRC (aarch64)
SF III 3D Strike (cps3) (attract mode) 90% 60% 230%
K. Instinct (attract mode) 35% 30% 150%
Die Hard (ST-V) (attract mode) 34% 19% 44%

Qualcomm Snapdragon 8 Gen 2 (high-end; Geekbench 6 single core: 1595)

Game No DRC DRC (C backend) DRC (aarch64)
SF III 3D Strike (cps3) (attract mode) 250-280% 140-160% 600-800%
K. Instinct (attract mode) 90-110% 55-90% 370-560%
Die Hard (ST-V) (attract mode) 65-95% 40-55% 80-150%
C. Taxi (Naomi) (in game) 30% 20% 50% (60% on performance mode)

A build with these changes is available for testing:

https://drive.google.com/file/d/1aJcKK-ugzOzx_f5ridu-nPR3qnbFDYaC/view?usp=sharing

I don't plan to publish anything until the next version of MAME that includes these changes, obviously.
If you want me to remove this file from my Google Drive, just let me know.

@987123879113
Copy link
Contributor Author

@seleuco Thank you for testing and providing benchmarks. Those are some really significant improvements for Android.

@invertego
Copy link
Contributor

I am happy to report this PR is working fine on a Windows ARM64 device with a Snapdragon X Elite CPU.

A small note about my experience cross compiling for Raspberry Pi: I had to change the configuration { "x64" } lines to configuration { } in the build scripts to get it to build with the DRC in my Docker mame_raspberrypi_cross_compile environment. I don't know if it's the environment's fault or if it's a difference between MacOS and Linux and I just didn't implement that part correctly.

I was just looking at the makefile with the intent of streamlining Win/ARM64 builds, and I think I can shed some light on this.

macOS ARM64 builds use this target:

.PHONY: macosx_arm64_clang
macosx_arm64_clang: generate $(PROJECTDIR)/$(MAKETYPE)-osx-clang/Makefile
	$(SILENT) $(MAKE) $(MAKEPARAMS) -C $(PROJECTDIR)/$(MAKETYPE)-osx-clang config=$(CONFIG)64 precompile
	$(SILENT) $(MAKE) $(MAKEPARAMS) -C $(PROJECTDIR)/$(MAKETYPE)-osx-clang config=$(CONFIG)64

...whereas Linux ARM64 builds use this generic target:

.PHONY: linux
linux: generate $(PROJECTDIR)/$(MAKETYPE)-linux/Makefile
	$(SILENT) $(MAKE) $(MAKEPARAMS) -C $(PROJECTDIR)/$(MAKETYPE)-linux config=$(CONFIG) precompile
	$(SILENT) $(MAKE) $(MAKEPARAMS) -C $(PROJECTDIR)/$(MAKETYPE)-linux config=$(CONFIG)

The macOS recipe passes config=$(CONFIG)64 and the Linux one passes config=$(CONFIG). The macOS target is selected by this late ad-hoc fixup to the ARCHITECTURE variable:

ifneq ($(filter arm64%,$(UNAME_M)),)
ARCHITECTURE := _arm64_clang
else

Interestingly, the macosx_arm64_clang recipe is identical to the macosx_x64_clang recipe, so there's little added value in maintaining it as a separate target.

This leaves me with a couple of questions:

  1. Should Linux/Windows ARM64 builds use a *64 configuration (and should macOS continue to use one)?
  2. Should there be separate _arm64 targets for Linux/Windows?

The answers should clarify the right thing to do in this PR.

scripts/src/3rdparty.lua Outdated Show resolved Hide resolved
@danmons
Copy link

danmons commented Jan 6, 2025

I'm unable to compile this via crosstool-ng on a Linux x86_64 machine for a Linux aarch64 target. This is the same environment I use to package up MAME for Raspberry Pi users, which to date I've been doing with NOASM=1 set at compile time. I suspect at this stage it's a problem with crosstool-ng and/or my setup, but I'm not completely sure.

My build environment is here:

If I remove NOASM=1 from my build scripts, MAME builds, however benchmarking with -drc produces the same result as benchmarking with -drc_use_c (and -nodrc is different, and in line with my old NOASM=1 results). So it seems to not use the arm64 asmjit.

If I try the trick above of replacing configuration { "x64" } with configuration { } in scripts/genie.lua, that seems to correctly set NATIVE_DRC=drcbe_arm64, however I get errors that look like the following during Linking mame when linking with GNU ld:

undefined reference to `asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&)'

GNU gold produces similar output. The asmjit documentation suggests that ASMJIT_STATIC is not set, but I can see that it is in various generated makefiles and when compiling with VERBOSE=1.

Interestingly I see that libasmjit.a has only !<arch> inside it and nothing else, and likewise build/projects/sdl/mame/gmake-linux/obj/Release/3rdparty/asmjit/src/asmjit only has directories and no object files. It seems like asmjit isn't being built perhaps?

I'm out of my depth here a bit, but it almost feels like asmjit isn't picking up the cross-compile environment and arm64 target instructions, and things are getting confused at compile/link time. Other than the linker errors, there are no compile time errors relating to asmjit or drc or anything similar.

I'll keep poking at it, but I'm running out of ideas on what to try next.

@seleuco
Copy link

seleuco commented Jan 6, 2025

Linux aarch64 target

You're probably getting the linkage error because you haven't modified the script in 3rdparty.lua and you haven't removed "configuration { "x64" }" there as well.

But that's not the way to go. The real problem you're having is that you're not running a rule that calls config=$(CONFIG)64 so the x64 configuration isn't being selected.

I personally haven't checked the Linux aarch64 target so I don't know where the problem is right now.

You definitely need to avoid NOASM=1 because it forces the use of FORCE_DRC_C_BACKEND no matter what you do and asmjit won't be included.

@987123879113
Copy link
Contributor Author

987123879113 commented Jan 6, 2025

@danmons seleuco is right, it's due to the same issue with the config string stuff. That'll eventually need to be address for this PR I think but I'm not very comfortable with makefiles and genie and such so it'll have to be discussed and decided on how exactly to approach that.

Having said that, I was using your build environment (inside Docker) and with some changes I did get it to work with the DRC after a few failed attempts. Here's the .patch file I made for testing with your Docker build environment:

diff --git a/scripts/genie.lua b/scripts/genie.lua
index bcdbc82236f..f6a5abc5486 100644
--- a/scripts/genie.lua
+++ b/scripts/genie.lua
@@ -711,11 +711,10 @@ end
 if not _OPTIONS["FORCE_DRC_C_BACKEND"] then
 	if _OPTIONS["BIGENDIAN"]~="1" then
 		if (_OPTIONS["PLATFORM"]=="arm" or _OPTIONS["PLATFORM"]=="arm64") then
-			configuration { "x64" }
+			configuration {  }
 				defines {
 					"NATIVE_DRC=drcbe_arm64",
 				}
-			configuration {  }
 		else
 			configuration { "x64" }
 				defines {
diff --git a/scripts/src/3rdparty.lua b/scripts/src/3rdparty.lua
index e56665ef121..6466637d760 100755
--- a/scripts/src/3rdparty.lua
+++ b/scripts/src/3rdparty.lua
@@ -1949,7 +1949,7 @@ project "asmjit"
 		}

 	if (_OPTIONS["PLATFORM"]=="arm" or _OPTIONS["PLATFORM"]=="arm64") then
-		configuration { "x64" }
+		configuration { }
 			defines {
 				"ASMJIT_NO_X86",
 			}
diff --git a/src/devices/cpu/drcbearm64.cpp b/src/devices/cpu/drcbearm64.cpp
index 5a895392852..6e0b989b8b8 100644
--- a/src/devices/cpu/drcbearm64.cpp
+++ b/src/devices/cpu/drcbearm64.cpp
@@ -826,6 +826,8 @@ drcbe_arm64::drcbe_arm64(drcuml_state &drcuml, device_t &device, drc_cache &cach
 	, m_baseptr(cache.near() + 0x80)
 	, m_near(*(near_state *)cache.alloc_near(sizeof(m_near)))
 {
+	printf("Using ARM DRC\n");
+
 	// get pointers to C functions we need to call
 	using debugger_hook_func = void (*)(device_debug *, offs_t);
 	static const debugger_hook_func debugger_inst_hook = [] (device_debug *dbg, offs_t pc) { dbg->instruction_hook(pc); };

And also removed NOASM=1 from functions/compile since that forces FORCE_DRC_C_BACKEND as seleuco said.

The printf there in drcbearm64.cpp was to be sure it was actually using the ARM DRC.

My guess is you need to also edit scripts/src/3rdparty.lua to get it working for you.

@danmons
Copy link

danmons commented Jan 6, 2025

You definitely need to avoid NOASM=1 because it forces the use of FORCE_DRC_C_BACKEND no matter what you do and asmjit won't be included.

I don't think I was clear on that - yup, NOASM=1 was removed for all of my testing/output above. Only mentioned it because it's still there in my repo (as prior to this, DRC was at best missing, and at worst caused compile time errors which caused me delays in getting builds out).

Thank you both for the notes. I'll try them out shortly.

[edit]

Ooh yeah that sorted it. Running all of these with -bench 90:

Raspberry Pi 4B, 1.5GHz ARM Cortex-A72 (older model, newer RPi4s clock at 1.8GHz):
--
sfiii3n -nodrc     : 63.70%
sfiii3n -drc_use_c : 30.35%
sfiii3n -drc       : 226.03%
--
kinst -nodrc       : 27.17%
kinst -drc_use_c   : 16.76%
kinst -drc         : 125.12% 
--
diehard -nodrc     : 22.57%
diehard -drc_use_c : 8.30% 
diehard -drc       : 32.55%
--

Orange Pi 5B, 2.3GHz ARM Cortex-A76 (roughly equivalent to an RPi5 which is 2.4GHz by spec):
--
sfiii3n -nodrc     : 160.42%
sfiii3n -drc_use_c : 87.72%
sfiii3n -drc       : 526.73%
--
kinst -nodrc       : 62.81%
kinst -drc_use_c   : 48.77%
kinst -drc         : 273.50%
--
diehard -nodrc     : 53.59%
diehard -drc_use_c : 27.00% 
diehard -drc       : 71.13%
--

@sonicboy904
Copy link

I have tested some games out on my Ayn Odin 2 Pro and i have seen a lotta small and big improvement in performance for games that ran poorly on MAME4droid or a lil jump in speed.

Here are links to my tests for the ARM64 beta for MAME4droid.

Panic Park - https://youtu.be/025FNrFFo6M?si=XLUwAjCc8GycoIl5
California Speed - https://youtu.be/K7_z6FM7Rx0?si=YYTQOzKES5bKre-N
NFL Blitz - https://youtu.be/NzdQh7bFCWM?si=rlLqVQNfJh1xDsIR
Final Furlong 2 - https://youtu.be/NzdQh7bFCWM?si=rlLqVQNfJh1xDsIR

AYN Odin 2 Pro Specs

RAM: 12GB
GPU: Adreno 740
Processor: Snapdragon 8 Gen 2
Storage: 256GB + 512GB (micro SD)
OS: Android 13

@cuavas
Copy link
Member

cuavas commented Jan 7, 2025

This leaves me with a couple of questions:

1. Should Linux/Windows ARM64 builds use a *64 configuration (and should macOS continue to use one)?

2. Should there be separate _arm64 targets for Linux/Windows?

The answers should clarify the right thing to do in this PR.

Philosophically, I’d say that Linux and Windows systems, where 32-bit ARM binaries are supported, should probably be using a *64 configuration for AArch64 builds. However, macOS no longer supports 32-bit applications at all, so there may not be any point keeping the *64 there.

@invertego
Copy link
Contributor

FYI, the latest Windows 11 feature update (24H2) dropped support for 32-bit ARM applications.

@987123879113
Copy link
Contributor Author

987123879113 commented Jan 8, 2025

I pushed a change that I think should probably clear up the build issues. I tested macOS (which reports Darwin arm64 arm) and with the Raspberry Pi build environment (which reports Linux aarch64 aarch64).

5683a37

@cuavas
Copy link
Member

cuavas commented Jan 9, 2025

FYI, the latest Windows 11 feature update (24H2) dropped support for 32-bit ARM applications.

What happened to Microsoft’s famous backwards compatibility?

Copy link
Member

@cuavas cuavas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve started going through the actual code generation part. It’s going to take several cycles for me to get through all of it. Please bear with me with stupid or obvious questions.

src/devices/cpu/drcbearm64.cpp Outdated Show resolved Hide resolved
src/devices/cpu/drcbearm64.cpp Outdated Show resolved Hide resolved
src/devices/cpu/drcbearm64.cpp Outdated Show resolved Hide resolved
src/devices/cpu/drcbearm64.cpp Outdated Show resolved Hide resolved
src/devices/cpu/drcbearm64.h Outdated Show resolved Hide resolved
src/devices/cpu/drcbearm64.cpp Outdated Show resolved Hide resolved
src/devices/cpu/drcbearm64.cpp Outdated Show resolved Hide resolved
src/devices/cpu/drcbearm64.cpp Outdated Show resolved Hide resolved
src/devices/cpu/drcbearm64.cpp Outdated Show resolved Hide resolved
src/devices/cpu/drcbearm64.cpp Outdated Show resolved Hide resolved
@k2-git
Copy link
Contributor

k2-git commented Jan 10, 2025

FYI, the latest Windows 11 feature update (24H2) dropped support for 32-bit ARM applications.

What happened to Microsoft’s famous backwards compatibility?

I think that not Microsoft decided.
ARM announced only support 64-bit.

https://newsroom.arm.com/news/pushing-the-boundaries-of-performance-and-security-to-unleash-the-power-of-64-bit-computing

Edit:
ARM recommended all application 64-bit only transition.
https://newsroom.arm.com/blog/64-bit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants