Chat freely about anything...

User avatar
By Phoenard-Team
#31017 We are currently in the middle of developing continuous serial communication over WiFi using the ESP8266. We have succeeded in this so far, but we have run into a suspected problem with the module.

After several seconds of continuous operation, the communication slows down to a crawl. With response times in the beginning of 30-150ms, they quickly shoot up to 5000+ms, followed by, what looks like, a complete crash of the ESP8266 module. This is what we have logged after some time of communication over TCP: http://pastebin.com/CsBQpEgD

Using TCPView I could not find an extraneous list of connections, so why is it eventually crashing with hundreds of CONNECT FAIL messages? It very much looks like the module is running out of memory before crashing, so a memory leak is not excluded. Is there something we should do to prevent such problems? Flush something? Restart the module every now and then? Once a connection is established on the computer side, the connection is not closed. So there is a back-and-forth of TCP packets.

If we run this same software over UDP, it does not suffer from this problem. A few packet losses (rarely) aside, it does not slow down at all.

The module is connected to a WiFi router, then starting up a TCP server as well listening on UDP on the same port. This way a computer can connect to it both using UDP and TCP. The bug does not go away if I only open up a TCP server (same results, connects over ID 0 instead). I have completely ruled out the microcontroller it runs on from being the problem; it crashes during the time the microcontroller is waiting for data to come from the chip. This is when the CONNECT FAIL is eventually received.

In case this is important; this is our current routine on the microcontroller that sends back the response data to the computer using CIPSEND. Command echo is turned off. http://pastebin.com/rnUTA6wB

That said, in the beginning it works great and communication is established at ~9kbps at 115200 baud. There is no data loss. After a while, it just...dies. :|

Really would like to know how to resolve this, since so far I am very happy about the module's performance.
User avatar
By Phoenard-Team
#31913 Further testing concluded that this only happens with my home WiFi. The public WiFi elsewhere does not suffer from this problem. Either it is caused by having multiple network interfaces on the computer side, or the WiFi network is having problems.

I ran wireshark to see what the problem could be:
Image

A few very obvious things show up:
    Transmission operates fine for a short while. No retransmissions.
    After about 30 seconds, a whole lot of retransmissions happen
    I also notice a lot of DNS lookups (ARP) for a server in Shenzen (?) to resolve local IP
    This all piles up until the device crashes (presumably, too many connections)

Anyone knows what this all might be caused by? It seems weird to perform a DNS lookup to resolve an IP address that it already is connected to. It is also weird that it keeps re-attempting these DNS lookups so often. It is also weird that the retransmissions over TCP can crash the device...