easyMesh

85 posts

Reply

Re: easyMesh #57072

By rudy - Mon Oct 24, 2016 11:30 pm

User mini profile
View full profile

rudy

Posts: 1483
Joined: Fri Aug 21, 2015 10:49 pm

Status: Off-line

- Mon Oct 24, 2016 11:30 pm #57072 OK, my two cents. I have been watching the RSSI values tonight. I have had nodes connect to a weak signal AP because it was the first one up. Or the last one standing when I altered the firmware on the rest. So most of the communications went through it.

There are significant problems with the existing approach. Maybe it can be fixed. But I think it is worth discussing alternatives.

One thing that bothers me is that a message needs to follow a connection path when there is plenty of signal to have two nodes talk directly with each other. The mesh idea is great but maybe a hybrid would be better. Of course this will add some complexity.

If possible I think the scanning and rebuilding of the network should not be as disruptive to the network. Or just make it work and not need to go through the process so often.

The other thing that bothers me is the loss of messages without any error handling.

Re: easyMesh #57102

By sfranzyshen - Tue Oct 25, 2016 1:19 pm

User mini profile
View full profile

sfranzyshen

Posts: 188
Joined: Thu Jan 08, 2015 1:36 pm

Status: Off-line

- Tue Oct 25, 2016 1:19 pm #57102 There is a bug hidden in the manageConnections() function in the easyMeshConnection.cpp file ... after stripping all of the timesync stuff from the code and just running the connection and nodesync code I still get STA disconnects (dropping by the AP) for NO apparent reason ... even when the timeout is set as high as 10 sec ... here is one of the debug message from the AP log ...

Code: Select allmanageConnections(): dropping 10417291 NODE_TIMEOUT last=4289636273 node=4289641320

and here is the code snippit that generated this line ...

Code: Select all

void ICACHE_FLASH_ATTR easyMesh::manageConnections( void ) {
    debugMsg( GENERAL, "manageConnections():\n");
    SimpleList<meshConnectionType>::iterator connection = _connections.begin();
    while ( connection != _connections.end() ) {
        if ( connection->lastRecieved + NODE_TIMEOUT < getNodeTime() ) {
            debugMsg( CONNECTION, "manageConnections(): dropping %d NODE_TIMEOUT last=%u node=%u\n", 
                      connection->chipId, connection->lastRecieved, getNodeTime() );
 
            connection = closeConnection( connection );
            continue;
        }

as you can see ... the node timeout value isn't being applied to the equation as expected ...
lastrecieved (4289636273) + timeout (10000000) = 4299636273 ... but that doesn't match the results ...
because 4299636273 is NOT less than 4289641320 (but 4289636273 is) ... so the only way this would kick off is if NODE_TIMEOUT value was not being added to lastrecieved value ... If it is being evaluated
as 0 ... this would explain why it kicked off here ... BUT why isn't it always kicking off?? since
lastrecieved will always be less than now ... so I tried this ... to see what is going on ...

Code: Select all

void ICACHE_FLASH_ATTR easyMesh::manageConnections( void ) {
    debugMsg( GENERAL, "manageConnections():\n");
    uint32_t nowNodeTime;

    SimpleList<meshConnectionType>::iterator connection = _connections.begin();
    while ( connection != _connections.end() ) {
        nowNodeTime = getNodeTime();
        if ( (connection->lastRecieved + NODE_TIMEOUT ) < nowNodeTime ) {
            debugMsg( CONNECTION, "manageConnections(): dropping %d timeout=%u + last=%u (%u) < 
                      now=%u\n", connection->chipId, NODE_TIMEOUT, connection->lastRecieved, 
                      connection->lastRecieved+NODE_TIMEOUT, nowNodeTime );
 
            connection = closeConnection( connection );
            continue;
        }

as you can see ... this equation is not working as expected ...

Code: Select all

manageConnections(): dropping 10417291 timeout=10000000 + last=4287984446 (4297984446) < now=4287989636 
manageConnections(): dropping 0 timeout=10000000 + last=4293502101 (4303502101) < now=4293513943
manageConnections(): dropping 10417291 timeout=10000000 + last=4288418750 (4298418750) < now=4288424734
manageConnections(): dropping 10417291 timeout=10000000 + last=4288672786 (4298672786) < now=4288678269

so now i'm trying this ... i'll let you know how it goes ...

Code: Select all

void ICACHE_FLASH_ATTR easyMesh::manageConnections( void ) {
    debugMsg( GENERAL, "manageConnections():\n");

    uint32_t nowNodeTime;
    uint32_t nodeTimeOut = NODE_TIMEOUT;
    uint32_t connLastRecieved;
    uint32_t totalTimeOut;

    SimpleList<meshConnectionType>::iterator connection = _connections.begin();
    while ( connection != _connections.end() ) {

        nowNodeTime = getNodeTime();
        connLastRecieved = connection->lastRecieved;
        totalTimeOut = connLastRecieved + nodeTimeOut;

        if ( totalTimeOut < nowNodeTime ) {
            debugMsg( CONNECTION, "manageConnections(): dropping %d timeout=%u + last=%u (%u) < 
                      now=%u\n", connection->chipId, nodeTimeOut, connLastRecieved, 
                      totalTimeOut, nowNodeTime );
 
            connection = closeConnection( connection );
            continue;
        }

UPDATE: this code is doomed to crash at the clock rollover ... I am now running this to see if the drops continue ...

Code: Select all

void ICACHE_FLASH_ATTR easyMesh::manageConnections( void ) {
    debugMsg( GENERAL, "manageConnections():\n");

    uint32_t nowNodeTime;
    uint32_t nodeTimeOut = NODE_TIMEOUT;
    uint32_t connLastRecieved;

    SimpleList<meshConnectionType>::iterator connection = _connections.begin();
    while ( connection != _connections.end() ) {

        nowNodeTime = getNodeTime();
        connLastRecieved = connection->lastRecieved;
        // The trick is to always calculate the time difference, and not compare the two time values.
        if ( nowNodeTime - connLastRecieved > nodeTimeOut ) {
            debugMsg( CONNECTION, "manageConnections(): dropping %d now= %u - last= %u ( %u ) > timeout= %u \n", connection->chipId, nowNodeTime, connLastRecieved, nowNodeTime - connLastRecieved, nodeTimeOut );
            connection = closeConnection( connection ); 
            continue;
        }

If this makes a change tonight ... i'll push changes to github devel ...

UPDATE: If anyone wants to run this across multiple nodes ... I pushed the code here ... just a reminder ... it's ONLY the connection and nodesync stuff ... no timesync ... just for testing
https://github.com/sfranzyshen/easyMesh/tree/no-timing

Last edited by sfranzyshen on Tue Oct 25, 2016 7:30 pm, edited 3 times in total.

Re: easyMesh #57103

By sfranzyshen - Tue Oct 25, 2016 1:57 pm

User mini profile
View full profile

sfranzyshen

Posts: 188
Joined: Thu Jan 08, 2015 1:36 pm

Status: Off-line

- Tue Oct 25, 2016 1:57 pm #57103

picstart wrote:OK here goes.
A possible design for scanning
All devices are equal but one needs to be the AP and upfront no device can be considered as the designated AP.

1) a device scans to see if there is already an AP
if there is an AP it meshes with that AP
If there is no AP it backs off a random amount of time and re-scans; if at that time, it again finds no AP it establishes itself as the AP
2) the random amount of time allows the device that drew the shortest amount of random wait time to win the AP role;
but only in the situation where there is no existing AP and more than one device is competing for it.
3) if the existing AP drops out then the same method to establish a new AP is used.

I'm not sold that we need a synchronizing time base....TCPIP if I have it right will accommodate the resend of broken transmissions...so the issue is in getting a design that
establishes and if needed re-establishes the mesh AP.
There is the special case where a device finds itself alone..we need to consider what to do while it waits for the company of another device.

Randomness is established via the analog pin voltage being used as the seed value.
Since we are considering the bssid (MAC) as establishing a unique mesh ID its uniqueness is good to use for back off time ( frees up the analog pin)...the bssid value modulo some large number to create unique back off time for establishing the device that wins the AP role. Almost the same as a designated device for the AP but not really since the device with the lowest bssid will only win if it is simultaneously competing for the AP role.

I maintain that every node needs to be AP & STA at all times ... I also maintain that we need a synchronizing time base ... but I do agree we need to handle scanning a different way ... I am too leaning toward a major rewrite ... but right now I'm focused on making this code work ... as it is ... first ... than ... BOOM!

Last edited by sfranzyshen on Tue Oct 25, 2016 5:30 pm, edited 1 time in total.

Re: easyMesh #57104

By sfranzyshen - Tue Oct 25, 2016 1:59 pm

User mini profile
View full profile

sfranzyshen

Posts: 188
Joined: Thu Jan 08, 2015 1:36 pm

Status: Off-line

- Tue Oct 25, 2016 1:59 pm #57104

rudy wrote:OK, my two cents. I have been watching the RSSI values tonight. I have had nodes connect to a weak signal AP because it was the first one up. Or the last one standing when I altered the firmware on the rest. So most of the communications went through it.

There are significant problems with the existing approach. Maybe it can be fixed. But I think it is worth discussing alternatives.

One thing that bothers me is that a message needs to follow a connection path when there is plenty of signal to have two nodes talk directly with each other. The mesh idea is great but maybe a hybrid would be better. Of course this will add some complexity.

If possible I think the scanning and rebuilding of the network should not be as disruptive to the network. Or just make it work and not need to go through the process so often.

The other thing that bothers me is the loss of messages without any error handling.

nodes talk directly with each other can only be done in an ad-hoc mode ... but the idea here is to build a decentralized wifi mesh using standard wifi protocols/hardware in a infrastructure mode ... not ad-hoc ... I completely agreed ... we need some additional layers of checks and balance for the messaging system ...

Reply

ESP8266 Community Forum

Re: easyMesh #57072

Re: easyMesh #57102

Re: easyMesh #57103

Re: easyMesh #57104

THESE FORUMS ARE CLOSED

Need to improve battery life of my ESP8266 Temp Hum sensor.

Sonoff Basic R2 unresponsive after upgrade

Best 3g module for IoT

NodeMCU ESP 8266 - 340G Driver issue

Lolin D1 Mini V4 problems with analog input reading

AP: Limited or no Connectivity.

Communication ESP 8266

ESP8266 web view

Firware ESP 8266

Speed up connection to WiFi after power on reset

NodeMCU: Failed uploading: uploading error: exit status 2

How to Set an ESP8266 NodeMCU Access Point for a Web Server

How to Post on Twitter using an ESP8266

Build a Water Level Control System Using ESP8266 NodeMCU

ESP32 [LOLIN WEMOS D1 32 Weak WiFi

Running CPP code using ESP8266_RTOS_SDK

Is my Wemos D1 Mini busted? Why is it doing this?

NODEMCU (ESP-12E) SOFT AP FAILURE?

WEMOS D1 Mini / esptool / Failed to connect - Timed out

Follow on Twitter @ESP8266COM