Left here for archival purposes.
User avatar
By cal
#15721 Moin,

I am able to reproduce the problem with unmodified dev branch in a few minutes.

When I do a simple change I can't reproduce it yet.

1. add the following file:

nodemcu-firmware$ head app/platform/dummywrapper.c
Code: Select allextern void* __real__xtos_set_exception_handler(int exno, void (*exhandler)());

void* __wrap__xtos_set_exception_handler(int exno, void (*exhandler)()) {
    return __real__xtos_set_exception_handler(exno, exhandler);
}


2. add the following to the global Makefile

Code: Select alldiff --git a/Makefile b/Makefile
index 85bc27a..c43aa3e 100644
--- a/Makefile
+++ b/Makefile
@@ -119,6 +119,7 @@ CCFLAGS +=                  \
 CFLAGS = $(CCFLAGS) $(DEFINES) $(EXTRA_CCFLAGS) $(INCLUDES)
 DFLAGS = $(CCFLAGS) $(DDEFINES) $(EXTRA_CCFLAGS) $(INCLUDES)
 
+LDFLAGS += -Wl,--undefined=_xtos_set_exception_handler -Wl,--wrap=_xtos_set_exception_handler
 
 #############################################################


If the LDFLAGS line is commented out I see fast crashes. If LDFLAGS is active I don't see them.

The original code of SDK (0x40120454) and the wrapper code (0x4023fb78) are in different sections.

iram1_0_seg : org = 0x40100000, len = 0x8000
irom0_0_seg : org = 0x40210000, len = 0x60000

Do these sections have different configurations that influence setup timing or flushing of code/data?

Carsten
User avatar
By Eyal
#15757 Good call Carsten. Works for me too with master HEAD.

I also would like to know why this hack works, and the timing (if it really is the issue) should be made more robust anyway.

But the real question is how did you discover this? Can I guess serendipity - you wrapped the exception handler to debug this problem then it stopped happening. Like the common "my program has a problem so I added a print for debug, but then it does not do it".

cheers
User avatar
By cal
#15764
Eyal wrote:But the real question is how did you discover this? Can I guess serendipity - you wrapped the exception handler to debug this problem then it stopped happening. Like the common "my program has a problem so I added a print for debug, but then it does not do it".

cheers


Moin,

when I installed my cal_dex module (which I wrote about above and was ignored (Sigh ;-)) I noticed
that I wasn't able to reproduce the problem.
Then you wrote about timing issues and it make me think if that may be the same reason.
I fealed "no" but then I used the same technique you used: Find the minimal change that changes the behavior.

New info:
When I move the dummy wrapper code to section ".text" the bug is repeatable again.

Then I modified the cal_dex module to install itself in section ".text/.data".
I was able to get register and stack dumps for the problem.

There is a strange difference between the fatal exception 29 we see and provoking a fatal exception 29 by
writing willfully to some location of first memory block.

My provoked exception gives exact location of the error and reason (epc1, excvaddr) and register content.
The fatal exception 29 we get points to the wakexxx method but code and register content don't match.

As if code is corrupt sometimes which may be an indication for cache or flash problems or memory corruption.
I don't have a clear understanding yet how flash/rom/cache interact and what that means for visibility,
modifiyability, etc.

Cal
that