Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linuxcncrsh segmentation faults on get/set spindle commands. #3180

Open
1 of 3 tasks
JTrantow opened this issue Nov 14, 2024 · 18 comments
Open
1 of 3 tasks

Linuxcncrsh segmentation faults on get/set spindle commands. #3180

JTrantow opened this issue Nov 14, 2024 · 18 comments

Comments

@JTrantow
Copy link
Contributor

I have been working on using LightBurn serial/TCP to communicate with the linuxcncrsh server and one of my first steps was extending tests/linuxcncrsh/test.sh to test all the commands to help me understand linuxcnc. Basically, I took the output from "help", "help get", and "help set" and added all the commands to test.sh. Doing this I discovered the get/set commands involving the spindle (brake, spindle, spindle_override) crash with a segmentation fault.

Here are the steps I follow to reproduce the issue:

  1. In tests/linuxcnc/test.sh add a line "echo get brake" after the hello EMC line.
  2. scripts/runtest tests/linuxcncrsh

This is what I expected to happen:

Script should execute and display spindle brake status.

This is what happened instead:

Segmentation fault when it hits the "get brake" command.

It worked properly before this:

Initially, I thought this issue was caused by the simulation not initializing a spindle.
halcmd getp spindle.0 brake verifies that a spindle is not initialized with the current test config.

Next, I added num_spindles=[TRAJ]SPINDLES to lib/hallib/core_sim.hal loadrt motmot line along with SPINDLES=1 in linuxcncrsh-test.ini and verified with halcmd that spindle.0 does exist, but "get spindle" still crashes.

Now, I suspect that emcrsh.cc was written prior to the multiple spindle implementation and these commands are no longer functional???

Information about my hardware and software:

  • I am using this Linux distribution and version : Debian GNU/Linux 12 (bookworm)

  • I am using this kernel version : Linux debian 6.1.0-17-rt-amd64 use Tcl_SetResult and Tcl_GetStringResult #1 SMP PREEMPT_RT Debian 6.1.69-1 (2023-12-30) x86_64 GNU/Linux

  • I am running ...

    • A binary version from linuxcnc.org (including buildbot.linuxcnc.org)
    • A binary I built myself
    • A binary version from some other source besides linuxcnc.org
  • I am using this LinuxCNC version : 2.9.2

  • I am using this user interface (GUI) : linuxcncrsh (no other GUI)

  • I am using this interface hardware vendor and chipset : simulation

@Sigma1912
Copy link
Contributor

Not sure what's broken but looking at some code in 'emcrsh.cc' that handles spindle it seems that it should handle multiple spindles:

static cmdResponseType getSpindle(char *s, connectionRecType *context)
{
  const char *pSpindleStr = "SPINDLE %d %s";
  int spindle = -1;
  int n;
  s = strtok(NULL, delims);
  if (sscanf(s, "%d", &spindle) < 0) spindle = -1; // no spindle number given return all
  for (n = 0; n < emcStatus->motion.traj.spindles; n++){
	  if (n == spindle || spindle == -1){
		  if (emcStatus->motion.spindle[n].increasing > 0)
			snprintf(context->outBuf, sizeof(context->outBuf), pSpindleStr, n, "INCREASE");
		  else
			if (emcStatus->motion.spindle[n].increasing < 0)
			  snprintf(context->outBuf, sizeof(context->outBuf), pSpindleStr, n, "DECREASE");
			else
			  if (emcStatus->motion.spindle[n].direction > 0)
				snprintf(context->outBuf, sizeof(context->outBuf), pSpindleStr, n, "FORWARD");
			  else
				if (emcStatus->motion.spindle[n].direction < 0)
				  snprintf(context->outBuf, sizeof(context->outBuf), pSpindleStr, n, "REVERSE");
			else snprintf(context->outBuf, sizeof(context->outBuf), pSpindleStr, n, "OFF");
	  }
  }
  return rtNoError; 
}

static cmdResponseType getBrake(char *s, connectionRecType *context)
{
  const char *pBrakeStr = "BRAKE %s";
  int spindle;
  int n;
  s = strtok(NULL, delims);
  if (sscanf(s, "%d", &spindle) < 0) spindle = -1; // no spindle number return all
  for (n = 0; n < emcStatus->motion.traj.spindles; n++){
	  if (n == spindle || spindle == -1){
		  if (emcStatus->motion.spindle[spindle].brake == 1)
			snprintf(context->outBuf, sizeof(context->outBuf),pBrakeStr, "ON");
		  else snprintf(context->outBuf, sizeof(context->outBuf),pBrakeStr, "OFF");
	  }
  }
  return rtNoError; 
}

@rmu75
Copy link
Contributor

rmu75 commented Nov 15, 2024

a bunch of bugs in linuxcncrsh was fixed by daniel about a year ago. see 8b91f27, so using master should work (I just checked).

we should really get rid of all C-style string handling ;-)

@rmu75
Copy link
Contributor

rmu75 commented Nov 15, 2024

@andypugh
Copy link
Collaborator

Actual bugfixes should have gone into 2.9
But looking at the change, I can probably see why it didn't.

@JTrantow
Copy link
Contributor Author

I pulled the master and verified that RIP linuxcnc and linuxcncrsh run the following commands in the tests/linuxcncrsh-tcp/test.sh loop. (get/set spindle/brake/spindle-override segmentation fault does not occur with the RIP master when it did occur with the RIP 2.9.2 linuxcncrsh.

echo hello EMC mt 1.0
echo set enable EMCTOO
echo set set_wait done

echo set mode manual
echo set estop off
echo set machine on

echo get mode manual
echo get estop off
echo get machine on

echo get spindle
echo get brake
echo get spindle_override

echo set spindle forward
echo set brake
echo set spindle_override 50

echo get spindle
echo get brake
echo get spindle_override

Test.sh could be improved by saving the nc output from the commands to a file. At the end of the echo loop, I used:
nc -vv -q2 localhost 5007 > $OUT_FILE 2>&1

This would catch the results from all the commands as well as the M100 output including depreciated commands. I am not sure why "set spindle" complains about invalid state?

localhost [127.0.0.1] 5007 (?) open
HELLO ACK EMCNETSVR 1.1
WARNING: "set_wait" command is depreciated and will be removed in the future. Please use "wai
t_mode" instead.
MODE MANUAL
ESTOP OFF
MACHINE ON
SPINDLE 0 OFF
BRAKE ON
SPINDLE_OVERRIDE 0 50.000000
BRAKE ON
error: invalid state: "(null)" (valid: on, off)
SPINDLE 0 FORWARD
BRAKE OFF
SPINDLE_OVERRIDE 0 50.000000
BRAKE OFF
...

@Sigma1912
Copy link
Contributor

Sigma1912 commented Nov 16, 2024

Just tested and I get:

get mode
MODE MANUAL
get estop
ESTOP OFF
get machie
GET MACHIE NAK
get machine
MACHINE ON
get spindle
SPINDLE 0 OFF
get brake
BRAKE ON
get spindle_override
SPINDLE_OVERRIDE 0 51.000000
BRAKE ON
set spindle forward
set brake
error: invalid state: "(null)" (valid: on, off)
set spindle off
set brake
error: invalid state: "(null)" (valid: on, off)
set spindle_override 50
get spindle
SPINDLE 0 OFF
get spindle_override
SPINDLE_OVERRIDE 0 50.000000
SPINDLE 0 OFF
get spindle_override
SPINDLE_OVERRIDE 0 50.000000
SPINDLE 0 OFF
get brake
BRAKE ON
get spindle_override
SPINDLE_OVERRIDE 0 50.000000
BRAKE ON
get machine
MACHINE ON
get spindle_override
SPINDLE_OVERRIDE 0 50.000000
MACHINE ON

it seems that 'get spindle_override' always returns the last reply along the expected 'SPINDLE_OVERRIDE 0 '

@Sigma1912
Copy link
Contributor

Sigma1912 commented Nov 16, 2024

This change seems to fix the bug with the extra reply to 'get spindle_override' but I'm not sure if there is a particular reason for using 'dprintf' :

diff --git a/src/emc/usr_intf/emcrsh.cc b/src/emc/usr_intf/emcrsh.cc
index 46fe5a51a5..b34f1acaee 100644
--- a/src/emc/usr_intf/emcrsh.cc
+++ b/src/emc/usr_intf/emcrsh.cc
@@ -2554,7 +2554,7 @@ static cmdResponseType getSpindleOverride(connectionRecType *context)
             continue;
 
                  double percent = emcStatus->motion.spindle[n].spindle_scale * 100.0;
-                 dprintf(context->cliSock, "SPINDLE_OVERRIDE %d %f\r\n", n, percent);
+                 OUT("SPINDLE_OVERRIDE %d %f", n, percent);
   }
 
   return rtNoError;

I have filed a PR for this
#3182

@Sigma1912
Copy link
Contributor

Just realized:
'error: invalid state: "(null)" (valid: on, off)' is due to the 'set brake' command missing 'on'/'off'. So that is to be expected.

@JTrantow
Copy link
Contributor Author

I just tested RIP 2.9.3 and the issue does not occur. The mistake I made was reporting the issue when I was running RIP 2.9.2 followed up by missing the on in "set brake on".

My apologies for not checking the latest code which had this issue fixed.

I did learn quite a bit trying to figure this out and would like to use that to improve testing of the linuxcncrsh. The current test only covers "set mdi m100" commands. I have a linuxcncrsh-cmds directory that performs a minimal test of all linuxcncrsh commands, I would like to push this addition and close this issue. Does this sound appropriate?

@JTrantow
Copy link
Contributor Author

I spoke too soon. RIP 2.9.3 does NOT work correctly with get/set spindle/spindle_override/brake commands. (RIP Master is ok.)

@JTrantow
Copy link
Contributor Author

When I reported earlier that 2.9.3 worked, I was still running the Master.
I have RIP 2.9.3 and Master installed and built. Using rip-environment to switch between then, I see 2.9.3 fail and Master succeeds.

@JTrantow
Copy link
Contributor Author

I was originally running RIP tag v2.9.2 and noticed the seg fault on any get/set spindle/spindle_override/brake command.
I also tried the RIP v2.9.3 tag which has the same issue. (even though it looks like the fix was available before the tag)
The master and branch 2.9 currently have this issue fixed.
I will close this issue as my understanding is the next 2.9.4 tag from 2.9 branch should have the fix in place.

@rmu75
Copy link
Contributor

rmu75 commented Nov 21, 2024

are you really sure that is fixed? it looks like the out-of-bounds array access is still there. https://github.com/LinuxCNC/linuxcnc/blame/2.9/src/emc/usr_intf/emcrsh.cc#L1690

it seems linuxcncrsh is not used that much so potential for breakage seems low. Maybe it makes sense to backport the changes from 2.10.

@JTrantow
Copy link
Contributor Author

I opened the issue when I was testing basic commands on tag v2.9.2. tag v2.9.3 has the same issue, but branch 2.9 does not show the issue (for simple tests).

It appears that because some probing tests were occasionally failing, the tests and fix did not get into the v2.9.3 tag??? (threw out the baby with the bathwater?)

From #1690 there are probably more improvements needed. I agree that backporting linuxcncrsh improvements from 2.10 branch is probably the way to go.

@JTrantow JTrantow reopened this Nov 22, 2024
@andypugh
Copy link
Collaborator

I still see a segfault with the latest 2.9, following your instructions:

+ echo hello EMC mt 1.0
+ echo get brake
+ echo set enable EMCTOO
+ echo set set_wait done
+ echo set mode manual
+ echo set estop off
+ echo set machine on
+ echo set mode mdi
+ echo set mdi m100 p-1 q-2
+ sleep 1
/home/andypugh/linuxcnc-dev/scripts/linuxcnc: line 977: 3136835 Segmentation fault      $EMCDISPLAY $EMCDISPLAYARGS $EXTRA_ARGS -- -ini "$INIFILE"

I was planning to do a git bisect to see when it git fixed, and by what, but that's not so easy without a known-good tag and known-bad to work with.

@JTrantow
Copy link
Contributor Author

JTrantow commented Nov 26, 2024

I just went through building and running test/linuxcncrsh/test.sh for tag v2.9.3, branch 2.9, and master.

tag v2.9.3 and branch 2.9 will both seg fault if I add a "echo get brake" or "echo get spindle" line (missing spindle number) after the machine on. "echo get spindle 0" works fine with both.

master (2.10.0~pre) has MUCH better testing (disabled by the skip file) The test fails on different PROGRAM_CODES but does not seg fault even when I add the "testGet spindle" equivalent to "echo get spindle" which crashes 2.9 and 2.9.3.

# test spindle command
    testSet mode manual
    **testGet spindle                        # Missing spindle number.**
    testSet spindle forward                # turn on all w/o param
    testSet spindle off -1                 # turn off all w/param
    testGet spindle -1                     # check all w/param
    testSet spindle forward 99             # turn on illegal spindle
    **testGet spindle-override               # Missing spindle number.**
    testSet spindle-override 99            # Missing spindle number.

    # test brake command
    **testGet brake                          # Missing spindle number.**
    testSet brake on                       # turn on all w/o param
    testSet brake off -1                   # turn off all w/param
    testGet brake -1                       # check all w/param
    testSet brake forward 99               # turn on illegal spindle

I am still confused by the way LCNC is branched and merged. When I look at the log for emcrsh.cc, I don't see why master works and 2.9 doesn't. It looks to me like Daniel Hiepler tested and fixed a bunch of the spindle/brake issues. Not sure why these are in the master and not 2.9. I do understand that some of the tests will return different values and flunk the test. In my case I see different PROGRAM_CODES than expected. It would be much better to disable specific tests that don't always return the same value (or refine the test) than to disable all the available testing in master.

@rmu75
Copy link
Contributor

rmu75 commented Nov 26, 2024

situation in 2.9: if you invoke "get spindle" without a number this line

if (sscanf(s, "%d", &spindle) < 0) spindle = -1; // no spindle number given return all

will set n to -1 and then this expression emcStatus->motion.spindle[spindle] will go out of bounds of the array. which may segfault.

I suggest we at least fix the segfaults, substituting emcStatus->motion.spindle[n] in getBrake should to it, rest of the code needs to be checked ofc, and the result may be unexpected in case of an actual multi-spindle-machine, but I would refer those users to future 2.10.

@andypugh
Copy link
Collaborator

We could easily default to spindle 0 if no spindle is given.
Better would be to iterate through all spindles and OR, though that might not be especially useful info (in a multi-spindle system with spindle brakes, I would expect one spindle in use and the others to be braked, in general)
This might not be easy if the code in the applicable module does not know how many spindles there are.
Or would could flag an error if no spindle number is given.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants