Using the new Arduino IDE for ESP8266 and found bugs, report them here

Moderator: igrr

User avatar
By bastian.h.jaeger
#93482 Hello,

this will be a long problem description for a sporadic, not easily reproducible error, so: Thanks for reading it!

Hardware:
I have a custom designed PCB arround an ESP-12F (see picture attached). The schematics are a lot derived from the NodeMCU, but not 100%. There is an additional lipo/charging unit and a real time clock at the I²C. I attached a PDF with the schematics.
The device is equipped with a Lipo, but also connected via USB to a PC. So there is enough power present. The USB cable is also used for flashing and the serial output reading. But the problem sometimes appears with or without USB connected.

Software:
I run an Arduino based firmaware flashed with PlatformIO.

Code: Select allPLATFORM: Espressif 8266 (3.2.0) > Espressif ESP8266 ESP-12E
HARDWARE: ESP8266 80MHz, 80KB RAM, 4MB Flash
PACKAGES:
 - framework-arduinoespressif8266 3.30002.0 (3.0.2)
 - tool-esptool 1.413.0 (4.13)
 - tool-esptoolpy 1.30000.201119 (3.0.0)
 - toolchain-xtensa 2.100300.210717 (10.3.0)

Problem:
The firmware is running on 16 similar devices. Most of them do not show this error at all. So far I saw the issue in 4 devices. It can happen after some hours, but some of those 4 problematic devices have beein running for 2-3 weeks without the error.
Occasionally, a device appears to run into some hardware watchdog. From the serial I see

Code: Select all$ pio device monitor -b 115200 -p /dev/ttyUSB0
data {"unixtime":1643153310,"battery":996, …}
...
data {"unixtime":1643156454,"battery":996, ...}
data {"unixtime":1643156455,"battery":996, ...}
data {"unixtime":1643156456,"battery":996, ...}
data {"unixtime":1643156457,"battery":996, ...}
 ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset

I do not bother about the hardware watchdog right now. This may be something to debug another time. The problem is that, in those sporadic cases, the ESP does not start up um again. It seems to be blocked in this state.
If I press the reset button, I see this:

Code: Select all$ pio device monitor -b 115200 -p /dev/ttyUSB0
data {"unixtime":1643153310,"battery":996, …}
...
data {"unixtime":1643156454,"battery":996, ...}
data {"unixtime":1643156455,"battery":996, ...}
data {"unixtime":1643156456,"battery":996, ...}
data {"unixtime":1643156457,"battery":996, ...}

 ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset

;$␀$��<␀�d�<␃␄␄␄�␄d�␌c|��␃�␛�{�c�␌c��''�loo���␄#␜p��$;${d

So only some gibberish output here (with my application baud rate).

If I run the serial interface with a baud rate of 74880 and press the reset button four times, I get this:

Code: Select all$ pio device monitor -b 74880 -p /dev/ttyUSB0
--- Available filters and text transformations: colorize, debug, default, direct, hexlify, log2file, nocontrol, printable, send_on_enter, time
--- More details at https://bit.ly/pio-monitor-filters
--- Miniterm on /dev/ttyUSB0  74880,8,N,1 ---
--- Quit: Ctrl+C | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---

 ets Jan  8 2013,rst cause:2, boot mode:(3,7)


 ets Jan  8 2013,rst cause:2, boot mode:(3,6)


 ets Jan  8 2013,rst cause:2, boot mode:(3,6)


 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

But no application output can be seen. The only way to recover from this state is to remove the battery and USB connection to completely remove all power from the ESP8266.

Analysis (so far):
From my experience, I miss something like:

Code: Select allload 0x4010f000, len 3460, room 16
tail 4
chksum 0xcc
load 0x3fff20b8, len 40, room 4
tail 4
chksum 0xc9
csum 0xc9
v0004ecf0
~ld

after the boot mode information. So I am very confused. According to the boot mode 3, the device starts up into “normal” mode and should load the sketch, but it does not.

I measured some Pins:
GPIO 15 is low (by pulldown) → Good.
GPIO 02 is high (by pullup) → Good.
GPIO 00 is on ~1.8V → NOT GOOD.

The GPIO 00 is connected to 3.3V by a 12k (I also tested 2.2k) pullup. From the oscillator I see a switching from low to high with 26MHz (see attached image).

The GPIO 00 is also connected to a logic level MOSFET (like used in the NodeMCU) for flashing. I was not sure if this connection my cause trouble. So I measured all the logic for the flashing.

Here it the logic table for the MOSFET-flashing-resetting-logic:
Code: Select all|RTS (input)     | 1 | 0 | 0 | 1 |
|DTR (input)     | 1 | 0 | 1 | 0 |
|RST (on esp)    | 1 | 1 | 0 | 1 |
|GPIO00 (on esp) | 1 | 1 | 1 | 0 |

The lines RTS and DTR are either 1 and or 0 and 0, depending if USB is plugged in or not. I measured this, to be sure. But either way, the GPIO 00 should be pulled up from the MOSFET.
In order to eliminate a possible source of error here, I even cut the line between the MOSFET and GPIO 00. This results in a board that now can not be flashed anymore automatically. This would be ok for me. Unfortunately, the GPIO 00 is still on 1.8 V / the oscillating behavior.
This leads my to the assumption, that the oscillating voltage on the GPIO 00 can only be caused by the ESP8266 itself pulling the pin down internally. But I do not understand way.

I am looking forward to suggestions and help. Thanks in advance.
You do not have the required permissions to view the files attached to this post.
User avatar
By Inq720
#94233 I've been doing a lot of work with the ESP8266 flash memory. I see you are far more hardware savvy than I... so I wanted to ask if you have your own hardware EEPROM on the PCB or are you talking about the EEPROM library on the ESP that is using flash memory?

The reason I ask... I did some abusive, destructive tests https://www.esp8266.com/viewtopic.php?f=6&t=23141&hilit=EEPROM and found that the flash does not return errors... the erase step takes longer and longer and eventually the whole ESP become inoperative. No warning, no nothing... works one time, the next resets and won't restart.

EDIT - That is the same symptoms I had... finally fail with a WDT and then entire ESP is totally inoperative - won't boot and won't accept a new upload.

I seem to recall reading somewhere (I couldn't find the reference) that the EEPROM library does not do any flash wear leveling... meaning every time you you do a commit, it is doing an erase and overwriting the same region at the beginning of the sector. Don't know if that is still the case.

Do you know (or can estimate) how many cycles you're talking about? My tests went 400,000 cycles or better before finally going belly-up.