Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NMEA0183 serial input problem: 20231105 #60

Closed
norbert-walter opened this issue Nov 11, 2023 · 13 comments
Closed

NMEA0183 serial input problem: 20231105 #60

norbert-walter opened this issue Nov 11, 2023 · 13 comments

Comments

@norbert-walter
Copy link

If I introduce a serial data stream via NMEA0183 into an M5Stack Atom via an RS485 unit, problems arise when the data stream is interrupted (cable unplugged, reset on the transmitting device). After the data transfer is interrupted and continued, the telegram counter for the serial port freezes. However, telegrams forwarded via USB are still displayed. The website also no longer builds properly when you refresh the page.

The same problem also occurs when incomplete, garbled or unknown telegrams are transmitted. Then the M5Stack Atom freezes completely. To do this, I intentionally short-circuited the transmission lines. The new GPS receiver sends, for example, the following telegrams immediately after it has been started:

$GPTXT,01,01,02,HW=ATGM336H,00030107486321C
$GPTXT,01,01,02,IC=AT6558-5N-31-0C510800,BMLLCKJ-D2-037416
5A
$GPTXT,01,01,02,SW=URANUS5,V5.3.0.01D
$GPTXT,01,01,02,TB=2020-04-28,13:43:10
40
$GPTXT,01,01,02,MO=GB77
$GPTXT,01,01,02,BS=SOC_BootLoader,V6.2.0.2
34
$GPTXT,01,01,02,FI=00856014*71

grafik

@wellenvogel
Copy link
Owner

A bit hard to understand what exactly you did and what happend.
Let's discuss this directly...

@tkoning
Copy link

tkoning commented Jan 28, 2024

hi
i have noticed something simular
the nmea2000 connection freezes after connecting a serial input (AIS signal)
did you find a reason for the problem described in the question

Dick Koning

@wellenvogel
Copy link
Owner

No, not really.
Maybe you can fetch the log at the USB port.
Optionally increasing the log level.
Did you test this with the newest version?

@tkoning
Copy link

tkoning commented Jan 29, 2024 via email

@wellenvogel
Copy link
Owner

Any chance for some logs?

@tkoning
Copy link

tkoning commented Mar 21, 2024 via email

@norbert-walter
Copy link
Author

Hi Andreas,

I had exactly the same problem when NMEA sentences were transmitted incorrectly. This means that the checksum did not match the content. In my case it was a faulty ground connection that garbled the telegrams. You can take a NMEA0183 log file and subsequently add errors to the recorded telegrams, such as inserting special characters, spaces or deleting parts of the telegram or the checksum. My examples in the first post show such telegrams. For example, they have no checksum or correspond to unknown telegrams. In my opinion, the checksum of the telegram is not checked to see whether the telegram was transmitted completely correctly. Furthermore, it is not checked whether these are known telegrams that the interpreter can translate correctly. This is exactly what causes the software to freeze. Error handling is not working properly. This is not a problem with occasional errors in telegrams. But with many consecutive telegrams.

Norbert

wellenvogel pushed a commit that referenced this issue Mar 22, 2024
…ounter ids), add an error log for serial errors
@wellenvogel
Copy link
Owner

With some additional analysis I found basically 2 problems:
(1) when the serial rx buffers overrun the nmea messages can be garbled. This can lead to serial counter names that are invalid (e.g. containing a " sign). If this occurs the Web UI looks broken (as the status update internally creates an JS error). The device itself still runs without issues.
So one correction will be to prevent this error in the Web UI.
(2) What causes the RX fifos/ buffers to overflow? This tyipcally will happen if you enable debug output. As every invalid line will at least produce 2 lines of output this easily can cause the system to really slow down. The main loop flushes the log to the USB device. If running with the default of 115200 baud a continuous input with 38400 baud with the example data from this issue will already cause the USB channel to be at it's limit. You can see this in the Main loop line.
If you count the log bytes between 2 Main loop lines and compare this to the time diff you can easily see that this fills up 115200 baud completely.
good (idle):
Main loop 2886.00/s334.82[1542us]#1:30.35[69],2:13.14[21],3:3.44[979],4:3.52[223],5:145.97[299],6:53.15[121],7:32.12[238],8:10.72[24],9:26.04[58],10:12.99[28],11:4.00[16],
bad (38400 baud input with invalid messages):
Main loop 60.76/s14706.50[38544us]#1:13634.41[37054],2:29.25[57],3:3.46[1243],4:3.26[11],5:373.14[717],6:154.02[381],7:118.10[446],8:52.56[100],9:1064.08[1 868],10:22.97[83],11:16.09[4783],
The first line (when empty) shows 2886 main loops/second. The second one only ~60! And you can see that the phase 1(flushing the logs) already takes ~13.6ms (average) of 14.6ms main loop run time.
If you switch the log level to "log" the problem should go away.
In the correction I will add an error log if the RX fifo or the RX buffer are overflowing.

@norbert-walter
Copy link
Author

Ahh... nice that you found a problem.

@tkoning
Copy link

tkoning commented Mar 23, 2024 via email

@wellenvogel
Copy link
Owner

Any log level lower then debug should solve the issue

@wellenvogel
Copy link
Owner

@norbert-walter
Copy link
Author

Thanks for you work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants