Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scripting: use double precision when on hardware with double support #24001

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

tridge
Copy link
Contributor

@tridge tridge commented Jun 7, 2023

this makes lua numbers double precision, and integers int64_t
it means that code such as this works with high precision

-- get time in seconds since boot
function get_time()
   return millis():tofloat() * 0.001
end

It can now be replaced with seconds_since_boot() which is a 64 bit float on systems with hardware double support
The memory overhead of this seems surprisingly low. In SITL testing with aerobatics scripting there appears to be close to zero cost. Possibly because it mostly uses Vector3f and quaternions which remain single precision

Tested on CubeOrange with SIM-on-hardware and aerobatics:
image

The extra flash cost of 17k is from the double precision exp() and pow() functions

@tridge tridge requested a review from IamPete1 June 7, 2023 23:33
@tridge tridge force-pushed the pr-lua-double branch 3 times, most recently from a3184d7 to 6c7abec Compare June 9, 2023 03:22
@@ -460,7 +460,7 @@ end
local knife_edge_ms = 0
function do_knife_edge(arg1,arg2)
-- arg1 is angle +/-180, duration is arg2
local now = millis():tofloat() * 0.001
local now = seconds_since_boot()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just correct this to do the time wrapping correctly with the uint32 case? It's a simple replacement, and more accurate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using uint32 is both slow and less accurate. The seconds_since_boot() is microsecond accurate, and cheaper

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you lose all the accuracy immediately when you don't have double precision support available though?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

23 bit mantissa does not leave a lot of space on single precision float, I'd actually expect this to be significantly worse then before in accuracy on any 32bit board....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the point is to get 64 bit float on any board running a non-trivial script

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on the dev call - isn't it really time to recognize that it's not really worthwhile supporting scripting on these low end boards?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the point is to get 64 bit float on any board running a non-trivial script

I kind of get it, but at the same time I think there's a lot of value on say more lightly loaded Periph's (on M4's) to not forcing everything down this path. And it seems odd to chase the precision here when the core controls aren't at this resolution.

I do support a time wrapper utility if needed, but I'd almost prefer to see that just be us provide a library with get delta functions and the like. My big objection is I just don't want to force M4's to use double precision in scripting, as it hamstrings some of the AP_Periph style uses, and if we encourage the default path to be "rely upon high precision" that makes things harder to do. (I think at that point we also need a binding of "does this board have double support")

Copy link
Member

@IamPete1 IamPete1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to do some more investigation into the memory costs, sorry I have not had a chance to test this yet.

I'm not really a fan of the new seconds_since_boot binding. Mostly the callers are after a time difference so would get the same result using the current micros call and converting to seconds after subtracting to get the difference. 0.001 * millis():tofloat() is a bad pattern that has snuck into our examples, it fails on wrap. All of those could be fixed without the need to go to double.

// if this assert fails, you will need to add an upper bounds
// check that ensures the value isn't greater then UINT32_MAX
static_assert(sizeof(lua_Number) == sizeof(uint32_t), "32 bit integers are only supported");
static_assert(sizeof(lua_Number) == sizeof(lua_Integer), "sizes of lua_Number and lua_Integer must match");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does lua_Number and lua_Integer being the same size have to do with this uint32 binding?

@@ -43,6 +43,13 @@ int lua_micros(lua_State *L) {
return 1;
}

// seconds_since_boot
int lua_seconds_since_boot(lua_State *L) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really a fan, this hides the underlying precision.

@tridge
Copy link
Contributor Author

tridge commented Jun 13, 2023

I'm not really a fan of the new seconds_since_boot binding. Mostly the callers are after a time difference so would get the same result using the current micros call and converting to seconds after subtracting to get the difference. 0.001 * millis():tofloat() is a bad pattern that has snuck into our examples, it fails on wrap. All of those could be fixed without the need to go to double.

that is why seconds_since_boot() is useful. It makes for an easy and safe pattern when we have 64 bit float. The pattern of having to keep 2 uint32 around and subtract them is more complex for users so more error prone
also, the wrap isn't really the issue with the millis()*0.001 pattern - the wrap happens after 49 days. The precision problem happens a long time before that.

@tridge
Copy link
Contributor Author

tridge commented Jun 13, 2023

another possibility is to go 64 bit on F4 as well. That solves the problem of different API on different boards, and you can really only do trivial scripts on F4 anyway, and in those trivial scripts you are less likely to be CPU limited
we shouldn't hold back the usefulness of scripts on F7/H7 where users are really using them in order to keep F4 limping along (where users really don't use scripts in any serious manner)

@timtuxworth
Copy link
Contributor

another possibility is to go 64 bit on F4 as well.

Best answer so far!

@IamPete1
Copy link
Member

I have done a little testing on real hardware, I used #23647 to get a better picture of the memory usage.

Firstly math.lua, there are four different usages until it settles to the same value.

Single:

Runtime, Total mem, script mem, allocated mem
1134,    81448,     62103,      163271
75190,   29885,     10540,      83444
323,     29721,     10376,      816
227,     29721,     10376,      852

Double

Runtime, Total mem, script mem, allocated mem
1117,    99324,     72991,      192975
90144,   39630,     13297,      115721
274,     39594,     13261,      1632
260,     39594,     13261,      1668

Difference (%)

Runtime, Total mem, script mem, allocated mem
-1.49,   21.94,     17.53,      18.19
19.88,   32.60,     26.15,      38.68
-15.17,  33.21,     27.80,      100
14.53,   33.21,     27.80,      95.77

Runtime is about the same, 20% slower on the very long run but that is a unusual script, mostly it seems it will be insignificant relative to the other other overheads.

Memory is more in every case. 20 to 40% increase in total and script memory and upto double in memory allocation per run.

Interestingly I some asserts in math.lua were failing, I had to comment out some lines to get it to pass with double. The script is tested as part of CI so it must be a real hardware specific issue. These are the asserts that I had to remove:

assert(eq(math.tan(math.pi/4), 1))
assert(eq(math.sin(math.pi/2), 1) and eq(math.cos(math.pi/2), 0))
assert(eq(math.atan(1), math.pi/4) and eq(math.acos(0), math.pi/2) and
       eq(math.asin(1), math.pi/2))
assert(eq(math.deg(math.pi/2), 90) and eq(math.rad(90), math.pi/2))

assert(eq(math.atan(1,0), math.pi/2))

assert(eq(math.sin(10), math.sin(10%(2*math.pi))))

Trig stuff, I wonder if that is still being done in single.

For a more typical script I also ran simple_loop.lua it settled into:

Single:
Total mem: 27961, Script mem: 8616, allocated mem: 852

Double:
Total mem: 37853, Script mem: 11520, allocated mem: 1668

Difference:
Total mem: 35.37%, Script mem: 33.7%, allocated mem: 95.77%

So more or less the same pattern, 30% more memory at the end of the run and double the allocation during the run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants