Left here for archival purposes.

User avatar
By Eyal
#14807 We should really stick to the original subject: dsleep wakeup is unreliable. As mentioned earlier, this does not happen with the 2015018 fw but does with the 20150126. I rechecked today.

With 20150426 I note two visible differences:

1) At boot time it often (every time?) says "MEM CHECK FAIL!!!".
- What is this?
- Is it related?
- some of the boards have slow flash, and I mean real slow.
- Some have 4MB (e.g. my nodeMCU) and others have 512KB (e.g. my esp-01).
I could not see a relationship between these attributes and the failure.

2) After a (short) while it fails with this constant message:
Code: Select allMEM CHECK FAIL!!!
Fatal exception (29):
epc1=0x40222788, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000044, depc=0x00000000

It was already said that this epc1 is inside pm_wait4wakeup. This is probably informative to someone who is familiar with this part of the code. Is it nodemcu code? SDK code?

BTW, the above messages are at 15200 baud.

To avoid distractions I am running a trivial init.lua that is just a dsleep loop
Code: Select allmagic_pin = 1    -- gpio5, LOW on this pin will stop the program

gpio.mode (magic_pin, gpio.INPUT, gpio.PULLUP);
if 0 == gpio.read (magic_pin) then
   print ("aborting by magic")
else
   print ("will wake up in 2s\n")
   node.dsleep(2*1000000)
end

Wireless was intentionally not set up on the chip so I assume it is not involved (but what do I know?).

So, is anyone looking into this? This issue makes it impossible to use this software for low power (is there any other kind?) IoT.

I should say though, that I have two chips (one an esp-01 and the other a nodeMCU board) that are mostly reliable. So I suspect the issue is related to some sensitivity in the later code, which was handled better before. A timing issue (wakeeup too slow)? OK, I am just dreaming up ideas now...

cheers
User avatar
By cal
#14842 Moin,

Eyal wrote:2) After a (short) while it fails with this constant message:
Code: Select allMEM CHECK FAIL!!!
Fatal exception (29):
epc1=0x40222788, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000044, depc=0x00000000

It was already said that this epc1 is inside pm_wait4wakeup. This is probably informative to someone who is familiar with this part of the code. Is it nodemcu code? SDK code?


This is SDK code from libphy.a, no source available to my knowledge.

Carsten
User avatar
By Eyal
#14876 Good Day,

I expected so much. Surely there are nodemcu developers with some knowledge of the SDK or otherwise with access to such information (espressif forums)? Did anyone disassemble that section to make sense of it?

Also, if we know that the problem arrived with the 20150126 fw then it should be possible to bisect, but anyone involved with the patches (and there were many around that time) can identify likely points to test.

I will build a few binaries to see if I can narrow the point in time when this siiue showed up.

BTW, I actually did the same dsleep loop test in the sdk directly and it works fine (I did not see a crash yet).
Code: Select allvoid ICACHE_FLASH_ATTR
user_init()
{
   os_delay_us (3*1000000);

   system_deep_sleep_set_option (4);   // Disable RF
   system_deep_sleep (3*1000000);
}


I did not expect to see a failure here, it probably comes about from a more complex interplay of other conditions.

cheers
User avatar
By Eyal
#14885 There is Good news and there is Bad news...

The Good New first.

I did a few builds from the period 20150118-24, going back in time.

I am running the dsleep loop I presented yesterday at a test. It proved to be a good stress test so far, crashing within a few minutes.

The image from the end of the 24rd crashes. We knew this.
The image from the end of the 23rd crashes.
The image from the end of the 22nd works. (a few hours later it was shows as dates the 23rd...)

I then tested the other patches from the 23rd. The last patch that works reliably is
VIP6(xingyuewang,http://517513.cn) commit si7021 module.
https://github.com/nodemcu/nodemcu-firm ... e81d8c0b8b


The first one that fails is
merge mqtt branch to master and build pre_build bin
https://github.com/nodemcu/nodemcu-firm ... dd54cadedd


The Bad News:

I cannot see how this patch causes the failure. However, maybe the addition of a large module was enough to breach some memory map rule?

I am now testing the failing patch but with mqtt deselected in user_config.h. So far so good... 15 minutes... 20m...30m...[later]...1h... [even later] stopped after 2 hours (about 3000 loops).
Recall: this test does 25 dsleep/wakeup cycles per minute.

Closing Remarks

My app does not use mqtt (and still crashes) so the issue is probably not related specifically to this module. Furthermore. we do not know if the failure in January is the same as today, we only know that the same exception occurs at the same location intermittently.

Next step: is there anyone reading this that has the skills to take this issue further?Now is the time to chime in...

Thanks