Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UART IRQ Priority + Error Handling for Overrun, Framing and Parity #19301

Conversation

rhapsodyv
Copy link
Member

@rhapsodyv rhapsodyv commented Sep 7, 2020

Description

There's this issue, with a patch to handle uart overrun: #18358 . With the Emergency Parser development for STM32F1, we have the code to better handle UART directly in Marlin. We don't edit the maple framework anymore.

So, I was taking a look to apply the patch in #18358. But, talking with @sjasonsmith , he told that we can try first a better approach: increase the UART IRQ priority.

This PR does it: increase UART IRQ priority. It's hardcoded now, but we can tune it.

I will ask for the author of #18358 test.

Benefits

Try fix UART overrun.

Related Issues

#18358

@rhapsodyv rhapsodyv marked this pull request as draft September 8, 2020 02:55
@rhapsodyv rhapsodyv changed the title Allow to configure the UART IRQ Priority, to avoid overrun UART IRQ Priority + Error Handling for Overrun, Framing and Parity Sep 8, 2020
@sjasonsmith
Copy link
Contributor

I think this looks promising.

The ISR changes should prevent the hang when an overruns is encountered without data (as described in the STM32 manual).

The priority change should hopefully prevent overruns. At 250kbaud we will have to service the interrupt every 40 microseconds to keep up with a constant stream of data. We're only competing with SysTick and the Servo ISR, so that is probably ok as long as interrupts are never disabled too long elsewhere.

@sjasonsmith
Copy link
Contributor

I did some testing with this on my SKR Mini E3 1.0.

Prior to this change, I can reproduce a hang in about 5 seconds using the following technique:

  1. Configure TFT and USB serial ports at 250000 baud.
  2. Open TeraTerm sessions to both ports (using a USB-TTL adapter)
  3. Send a large binary file to the TFT port. This is just garbage (actually an SKR Pro firmware dump!)
  4. Repeat the following through the USB port:
G0 X200
ok
G0 X0
ok
  1. Repeat until you stop receiving ok responses. You are hung. You can stop sending the large file, and you will never get control back. The LCD is also completely hung.

After this change, I can repeat the same process as long as I want, with no problems. While it would previously fail in a few seconds, I can now send the entire file (1MB each time) three times and never saw a failure. That is about 2 minutes with no issues.

@rhapsodyv rhapsodyv marked this pull request as ready for review September 10, 2020 02:50
@sjasonsmith
Copy link
Contributor

@rhapsodyv pointed out to me that a lot of STM32F1 boards are using external serial-USB chips, which makes this change even more important. I originally thought this mainly impacted users with serial-connected displays, but on those boards all serial communication was vulnerable to this hang.

@rhapsodyv
Copy link
Member Author

Do you think worth adding some error logging when one of those errors happens?

@thinkyhead
Copy link
Member

It will be good to have another interrupt-priority-related issue fixed. These are some of the most tricky we've seen. I can see why developers prefer an RTOS for complex projects instead of getting intimate with the hardware.

I'll do another point release soon so this patch gets out to the world quickly.

@thinkyhead thinkyhead merged commit b98946b into MarlinFirmware:bugfix-2.0.x Sep 10, 2020
@sjasonsmith
Copy link
Contributor

@thinkyhead an RTOS just replaces interrupt priority with task priority.

@rhapsodyv rhapsodyv deleted the uart-irq-priority-to-fix-overrun branch September 10, 2020 04:27
@rhapsodyv
Copy link
Member Author

I think it's more than confirmed: #19204 (comment)

This user never was able to do long prints using serial!! 🚀 🥳

The first test he did:

I merged the pull request that you recommended.
I did a test with the GCODE that I attached to you.
You have to take into account that I never managed to terminal that GCODE (with my firmware), it always stopped, either at 20%, or at 50%, sometimes at 80%.

This time for the first time, I managed to carry it to the end.
It seems miraculous.

The last test:

I did further tests, another 20 hours of printing.
There are no more problems, I saw that the changes have also been introduced in the BUGFIX, there are no more problems now with the Serial communication.
Octoprint printing works very well and the printer is no longer stuck.

It seems really strange to me that before Marlin did not take into account communication errors, but it is good that he does now, it is a relief that this problem has been solved, I had tried in every way to ask for help, even with POST on FB on the Marlin's official page, but the fact that no one else had this problem made me feel crazy.

@Swap-File
Copy link

I was having this exact problem on 2.0.6.1. I was printing from octoprint and my 36 hour print kept failing 1-3 hours in, serial communication would just die, the printer would stop moving, and the heaters would be left on.

I switched to bugfix (and lowered my baud rate from 250,000 to 115200), and the problem went away.

Now I am 25 hours in with zero problems. I'll try my next print at 250,000.

Attached is my config, just in case someone thinks it might help them.

Ender 5 Plus, Mini SKR E3 V2, TFT35, Smart Filament Sensor:
config.zip

vgadreau pushed a commit to vgadreau/Marlin that referenced this pull request Dec 9, 2020
kageurufu pushed a commit to CR30-Users/Marlin-CR30 that referenced this pull request Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants