this will be a long problem description for a sporadic, not easily reproducible error, so: Thanks for reading it!
Hardware:
I have a custom designed PCB arround an ESP-12F (see picture attached). The schematics are a lot derived from the NodeMCU, but not 100%. There is an additional lipo/charging unit and a real time clock at the I²C. I attached a PDF with the schematics.
The device is equipped with a Lipo, but also connected via USB to a PC. So there is enough power present. The USB cable is also used for flashing and the serial output reading. But the problem sometimes appears with or without USB connected.
Software:
I run an Arduino based firmaware flashed with PlatformIO.
PLATFORM: Espressif 8266 (3.2.0) > Espressif ESP8266 ESP-12E
HARDWARE: ESP8266 80MHz, 80KB RAM, 4MB Flash
PACKAGES:
- framework-arduinoespressif8266 3.30002.0 (3.0.2)
- tool-esptool 1.413.0 (4.13)
- tool-esptoolpy 1.30000.201119 (3.0.0)
- toolchain-xtensa 2.100300.210717 (10.3.0)
Problem:
The firmware is running on 16 similar devices. Most of them do not show this error at all. So far I saw the issue in 4 devices. It can happen after some hours, but some of those 4 problematic devices have beein running for 2-3 weeks without the error.
Occasionally, a device appears to run into some hardware watchdog. From the serial I see
$ pio device monitor -b 115200 -p /dev/ttyUSB0
data {"unixtime":1643153310,"battery":996, …}
...
data {"unixtime":1643156454,"battery":996, ...}
data {"unixtime":1643156455,"battery":996, ...}
data {"unixtime":1643156456,"battery":996, ...}
data {"unixtime":1643156457,"battery":996, ...}
ets Jan 8 2013,rst cause:4, boot mode:(3,6)
wdt reset
I do not bother about the hardware watchdog right now. This may be something to debug another time. The problem is that, in those sporadic cases, the ESP does not start up um again. It seems to be blocked in this state.
If I press the reset button, I see this:
$ pio device monitor -b 115200 -p /dev/ttyUSB0
data {"unixtime":1643153310,"battery":996, …}
...
data {"unixtime":1643156454,"battery":996, ...}
data {"unixtime":1643156455,"battery":996, ...}
data {"unixtime":1643156456,"battery":996, ...}
data {"unixtime":1643156457,"battery":996, ...}
ets Jan 8 2013,rst cause:4, boot mode:(3,6)
wdt reset
;$␀$��<␀�d�<␃␄␄␄�␄d�␌c|��␃�␛�{�c�␌c��''�loo���␄#␜p��$;${d
So only some gibberish output here (with my application baud rate).
If I run the serial interface with a baud rate of 74880 and press the reset button four times, I get this:
$ pio device monitor -b 74880 -p /dev/ttyUSB0
--- Available filters and text transformations: colorize, debug, default, direct, hexlify, log2file, nocontrol, printable, send_on_enter, time
--- More details at https://bit.ly/pio-monitor-filters
--- Miniterm on /dev/ttyUSB0 74880,8,N,1 ---
--- Quit: Ctrl+C | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
ets Jan 8 2013,rst cause:2, boot mode:(3,7)
ets Jan 8 2013,rst cause:2, boot mode:(3,6)
ets Jan 8 2013,rst cause:2, boot mode:(3,6)
ets Jan 8 2013,rst cause:2, boot mode:(3,6)
But no application output can be seen. The only way to recover from this state is to remove the battery and USB connection to completely remove all power from the ESP8266.
Analysis (so far):
From my experience, I miss something like:
load 0x4010f000, len 3460, room 16
tail 4
chksum 0xcc
load 0x3fff20b8, len 40, room 4
tail 4
chksum 0xc9
csum 0xc9
v0004ecf0
~ld
after the boot mode information. So I am very confused. According to the boot mode 3, the device starts up into “normal” mode and should load the sketch, but it does not.
I measured some Pins:
GPIO 15 is low (by pulldown) → Good.
GPIO 02 is high (by pullup) → Good.
GPIO 00 is on ~1.8V → NOT GOOD.
The GPIO 00 is connected to 3.3V by a 12k (I also tested 2.2k) pullup. From the oscillator I see a switching from low to high with 26MHz (see attached image).
The GPIO 00 is also connected to a logic level MOSFET (like used in the NodeMCU) for flashing. I was not sure if this connection my cause trouble. So I measured all the logic for the flashing.
Here it the logic table for the MOSFET-flashing-resetting-logic:
|RTS (input) | 1 | 0 | 0 | 1 |
|DTR (input) | 1 | 0 | 1 | 0 |
|RST (on esp) | 1 | 1 | 0 | 1 |
|GPIO00 (on esp) | 1 | 1 | 1 | 0 |
The lines RTS and DTR are either 1 and or 0 and 0, depending if USB is plugged in or not. I measured this, to be sure. But either way, the GPIO 00 should be pulled up from the MOSFET.
In order to eliminate a possible source of error here, I even cut the line between the MOSFET and GPIO 00. This results in a board that now can not be flashed anymore automatically. This would be ok for me. Unfortunately, the GPIO 00 is still on 1.8 V / the oscillating behavior.
This leads my to the assumption, that the oscillating voltage on the GPIO 00 can only be caused by the ESP8266 itself pulling the pin down internally. But I do not understand way.
I am looking forward to suggestions and help. Thanks in advance.