General area when it fits no where else

Moderator: Mmiscool

User avatar
By Electroguard
#45274 Some feedback to help others make sense of some strange symptoms...
In 4Mb Alpha 9 in AP mode I can cause an esp reset every time by using ip() or len(variable).
But I've noticed that some (many) syntax errors can also cause the esp to reset. I suspect this might then leave some of flash memory corrupted.

But when an esp reset happens - for whatever cause - it's quite easy to get sucked deeper into quicksand unless you realise the situation.

You may not be able to do a save when you try, due to the esp not yet rebooting back into (AP) mode again to allow reconnection to the device. So wait until it does, or reboot it yourself, then when able to reconnect, save your modified code without the problem statement still live (just comment it out).

Unless corrected code is save back to the esp, it reboots with the faulty code still 'live', which can make the esp reset again, and keep on rebooting endlessly about once every minute.

I suspect (but is only conjecture) that the corrupted esp code may be sending/expecting some sort of 'keep-alive' about every 50 seconds which perhaps causes an autorun of the faulty script, inducing resets ad-infinitum.

So to break the loop, resave a 'safe' script as soon as the esp briefly reconnects again, or else do a flash format to start from scratch (but you will lose your unsaved script).
User avatar
By cicciocb
#45282 Hi, I already was faced to this kind of problem and, as you say, the only way to "escape" is to save a new program before the autorun. I was already thinking about a solution to bypass this situation and I think we could use the GPIO0 for that.
I need to check if this works but this could be a good solution without losing any I/O pin.

CiccioCB
User avatar
By Electroguard
#45290 Hmm, not sure how it could be done.
Obviously can't be used at reset or will put esp into flash mode which doesn't help much (can do that manually ourselves for reformatting anyway).

If used after the esp boots but before it reboots again then it could perhaps somehow be used to load a 'safe' program from flash, but can also do that manually ourselves if we're aware, and the same problem still persists in that it would overwrite the users program which would be lost if they didn't yet have a chance to save it because of the disconnections and rebooting.

I don't know what lies under the hood in the flash, but it seems to me as if certain error situations are having no chance to tidy up any variables or program flow records because of the reset, and when the program is caused to autorun somehow (at 1 minute intervals) it keeps stepping into the same mess, but the mess can get cleaned up by running a safe working script which tidies everything up behind it when doing a clean run through without errors.

What causes the actual reset and/or autorun anyway - maybe the error somehow bleeds its untidied corruption into the OS program flow control memory area (I don't know all the proper terminology), which suggests the errors could either be screwing up memory addressing or perhaps splatting BIG garbage far and wide. But the area that gets splattered would seem to be consistent to be causing the same reboot symptoms, else you'd expect all sorts of other weird symptoms.
So is any memory being shared for multiple tasks? cos it only takes one task to mess it up for the others.

Personally speaking, I think reverting back to previous versions is a backward step which I'd prefer to avoid if at all possible - so I'll stick with the latest. I think the same problems were present in the previous Alpha 7 anyway, but not quite so pronounced.
Even when the esp keeps rebooting it's still possible to edit and save the script by copying and pasting to file, so it's not the end of the world - it just means ensuring a save to flash does actually save the edited safe script, and if not, being a bit patient for reconnection again until it can.

It certainly is a poser matey, but if it has a solution, I have confidence that you and the captain will teach it better manners.
User avatar
By Electroguard
#45303 The esp reset seems to cause some other less obvious problems, presumably from memory corruption of some sort.

Looking at the enclosed code, it basically just prints out a test IP address then waits for an incoming udp msg to handle.
A few errors later though, and the same code bombs out straight away to done... without getting far enough to print the ip address - it just says done... without showing any errors.

The esp can remain connected, and acknowledges saves ok, but either the saved script is not being saved correctly, or the 'system' itself is corrupted, because the newly saved script still bombs before getting to the print instruction.

If the script is copied and pasted to file, the esp reformatted and reflashed, then the script pasted back into the edit window and saved, it runs as it should and prints out the ip address.
So the problem is not in the script, it's in the bowels of the wee beastie.

The real 'sticky' problem is that once there's been any errors, any subsequent edits
may not even be recognised by the beastie, so there's no way of telling if any edits might fix things, cos the wee beastie may just keep following its own error-corrupted path from then on.

From a laymans viewpoint, it seems like the error-handling routines may be bleeding somewhere they shouldn't, perhaps somewhere 'systemy' rather than the user script or variables areas.

Another clue I've just noticed is that since erroring without resetting a while ago, when I now do a save, the count clocks up in the edit page lower window, and the saved ok window pops up after, but there is no esp blue led activity during any of this, suggesting that the actual 'save' code on the beastie that receives the saved script is not actually doing anything because maybe it too is corrupted. That could explain why saving and edited script after an error doesn't make any difference.


Code: Select allmemclear
'let localname = "default" ' Default node name must be changed to something unique
let localname = "MP3"      ' Unique node name (currently not case sensitive) eg: "MP3 blink"
let globalname = "ALL"     ' Allows addressing different system clusters of nodes separately if wished, eg: "All blink"
let groupname = "Group"      ' Allows addressing different groups of nodes if required. eg: "PIR blink"
' let localIP = ip()       ' Doesnt work in AP mode, so using the follwing manually assigned test alternative
let localIP = "192.168.4.123" 
print "Local IP=" & localIP
let ledpin = 1  ' Onboard blue GPIO01 led
po ledpin 1 ' Ensure led is off
udpbegin 5001
udpbranch [udpmain]
'print "Need to try to do a network time sync if possible"
wait


[udpmain] 
msg = udpread()
udpreply "MAIN UDP msg received: " & msg  ' For test purposes only
if left(upper(msg), len(localname)) == upper(localname) then goto [local_commands]    ' handle localname commands
if instr(upper(msg), upper(localname)) <> 0 then goto [partial_name_commands] ' handle partial localname commands
if (len(groupname) <> 0) and left(upper(msg), len(groupname)) == upper(groupname) then goto [group_commands]
if (len(globalname) <> 0) and left(upper(msg), len(globalname)) == upper(globalname) then goto [global_commands]
udpreply "Main dropped through Error. " & localname & " ERROR: unrecognised name in msg: " & msg
wait