Your new topic does not fit any of the above??? Check first. Then post here. Thanks.

Moderator: igrr

User avatar
By pstehlik
#92994 I've got an Arduino ESP8266 project that does a bunch of things (OneWire, MQTT) including running a ESP8266WebServer. Then I have about three or four other ESP8266 nodes that use ESP8266HTTPClient to fetch some data from that webserver above every five seconds. The HTTPClients have enforced http1.0 (useHTTP10) and shortened timeout to 1500 ms just in case something went wrong with the web server. This setup used to work OK for years when compiled on core v2.6.x or v2.7.x.

After compiling with core v3.0.x this network setup breaks eventually - the three or four HTTP clients start getting error -1 (CONNECTION FAILED) or -11 (TIMEOUT). Sometimes it works for few minutes or even hours but at the end it always falls into the error state. I have tried switching from lwip "low memory" to lwip "high bandwidth" but it didn't change anything. I have tried simplifying the clients' code as much as possible - got rid of JSON stream deserializing and used just http.getString() to fetch the body as soon as possible and block the web server for as short period of time as possible. Didn't help. Also tried downgrading the clients back to core v2.7.4 but it didn't help. What helps is downgrading the web server's Arduino core from 3.0.2 to 2.7.4.

I have tried profiling the webserver's handler - it takes about 2 ms to prepare the answer and another 2 ms to send it to the client. I doubt this would be a problem. When profiling the HTTPClient it usually takes somewhere between 70 and 400 ms to fetch and deserialize the data from the web server (sometimes it apparently needs to wait until the server does its OneWire scan or something alike). When the server goes hairy this HTTPClient of course takes about 1511 ms to timeout and return error -11 (as I have set the timeout to 1500 ms).

When the webserver became this unresponsive I tried running a curl against it from my desktop. It turned out that the curl got a reply in 7, 9 or even 40 seconds (instead of usual 0.011 second). Looks like if there was a massive queue of client requests that the webserver was trying to sort out? Most of the requests were of course timed out already by the clients but I don't know what the server did with them, if they were indeed queued. Perhaps the curl kept re-sending the request? Not sure, forgot to check it with Wireshark.

Please do you have any idea what has changed between Arduino core v2.7.x and v3.0.x that could have this adverse effect on the web server? Can it be fixed/worked-around in my own code somehow? Is there something I could do better to handle three-four or few more ESP8266 clients that fetch data from ESP8266? Is there somewhere a webserver timeout that needs to be set in order to let it quickly throw away "old requests" instead of building a long queue of them (if that is indeed the case - just guessing from the long time to reply to curl). Any other idea, please?

I'd like to stick with the core v3.0.x because I like the possibility of getting 16 kB of extra memory but if I couldn't fix this webserver's stuck issue I'd have to go back to v2.7.4 :-/

Thanks! Petr
User avatar
By dimecho
#93065 I experienced similar thing with v3.0.2. Randomly I found the fix. It was a mismatch inside an external library that I had (which was written in c++) and was included in Arduino. The functions in .cpp file and .h file had mismatched return. This bug worked fine with v2.7.4. I guess v3.0.2 uses more strict compiler.

For example .cpp file had (note "void")

Code: Select allvoid ARMDebug::theFunction(uint32_t addr, unsigned count)

But .h file had (note "bool")

Code: Select allbool theFunction(uint32_t addr, unsigned count);