Chat freely about anything...

User avatar
By rudy
#57072 OK, my two cents. I have been watching the RSSI values tonight. I have had nodes connect to a weak signal AP because it was the first one up. Or the last one standing when I altered the firmware on the rest. So most of the communications went through it.

There are significant problems with the existing approach. Maybe it can be fixed. But I think it is worth discussing alternatives.

One thing that bothers me is that a message needs to follow a connection path when there is plenty of signal to have two nodes talk directly with each other. The mesh idea is great but maybe a hybrid would be better. Of course this will add some complexity.

If possible I think the scanning and rebuilding of the network should not be as disruptive to the network. Or just make it work and not need to go through the process so often.

The other thing that bothers me is the loss of messages without any error handling.
User avatar
By sfranzyshen
#57102 There is a bug hidden in the manageConnections() function in the easyMeshConnection.cpp file ... after stripping all of the timesync stuff from the code and just running the connection and nodesync code I still get STA disconnects (dropping by the AP) for NO apparent reason ... even when the timeout is set as high as 10 sec ... here is one of the debug message from the AP log ...
Code: Select allmanageConnections(): dropping 10417291 NODE_TIMEOUT last=4289636273 node=4289641320

and here is the code snippit that generated this line ...
Code: Select allvoid ICACHE_FLASH_ATTR easyMesh::manageConnections( void ) {
    debugMsg( GENERAL, "manageConnections():\n");
    SimpleList<meshConnectionType>::iterator connection = _connections.begin();
    while ( connection != _connections.end() ) {
        if ( connection->lastRecieved + NODE_TIMEOUT < getNodeTime() ) {
            debugMsg( CONNECTION, "manageConnections(): dropping %d NODE_TIMEOUT last=%u node=%u\n",
                      connection->chipId, connection->lastRecieved, getNodeTime() );
 
            connection = closeConnection( connection );
            continue;
        }

as you can see ... the node timeout value isn't being applied to the equation as expected ...
lastrecieved (4289636273) + timeout (10000000) = 4299636273 ... but that doesn't match the results ...
because 4299636273 is NOT less than 4289641320 (but 4289636273 is) ... so the only way this would kick off is if NODE_TIMEOUT value was not being added to lastrecieved value ... If it is being evaluated
as 0 ... this would explain why it kicked off here ... BUT why isn't it always kicking off?? since
lastrecieved will always be less than now ... so I tried this ... to see what is going on ...
Code: Select allvoid ICACHE_FLASH_ATTR easyMesh::manageConnections( void ) {
    debugMsg( GENERAL, "manageConnections():\n");
    uint32_t nowNodeTime;

    SimpleList<meshConnectionType>::iterator connection = _connections.begin();
    while ( connection != _connections.end() ) {
        nowNodeTime = getNodeTime();
        if ( (connection->lastRecieved + NODE_TIMEOUT ) < nowNodeTime ) {
            debugMsg( CONNECTION, "manageConnections(): dropping %d timeout=%u + last=%u (%u) <
                      now=%u\n", connection->chipId, NODE_TIMEOUT, connection->lastRecieved,
                      connection->lastRecieved+NODE_TIMEOUT, nowNodeTime );
 
            connection = closeConnection( connection );
            continue;
        }

as you can see ... this equation is not working as expected ...
Code: Select allmanageConnections(): dropping 10417291 timeout=10000000 + last=4287984446 (4297984446) < now=4287989636
manageConnections(): dropping 0 timeout=10000000 + last=4293502101 (4303502101) < now=4293513943
manageConnections(): dropping 10417291 timeout=10000000 + last=4288418750 (4298418750) < now=4288424734
manageConnections(): dropping 10417291 timeout=10000000 + last=4288672786 (4298672786) < now=4288678269

so now i'm trying this ... i'll let you know how it goes ...
Code: Select allvoid ICACHE_FLASH_ATTR easyMesh::manageConnections( void ) {
    debugMsg( GENERAL, "manageConnections():\n");

    uint32_t nowNodeTime;
    uint32_t nodeTimeOut = NODE_TIMEOUT;
    uint32_t connLastRecieved;
    uint32_t totalTimeOut;

    SimpleList<meshConnectionType>::iterator connection = _connections.begin();
    while ( connection != _connections.end() ) {

        nowNodeTime = getNodeTime();
        connLastRecieved = connection->lastRecieved;
        totalTimeOut = connLastRecieved + nodeTimeOut;

        if ( totalTimeOut < nowNodeTime ) {
            debugMsg( CONNECTION, "manageConnections(): dropping %d timeout=%u + last=%u (%u) <
                      now=%u\n", connection->chipId, nodeTimeOut, connLastRecieved,
                      totalTimeOut, nowNodeTime );
 
            connection = closeConnection( connection );
            continue;
        }


UPDATE: this code is doomed to crash at the clock rollover ... I am now running this to see if the drops continue ...
Code: Select allvoid ICACHE_FLASH_ATTR easyMesh::manageConnections( void ) {
    debugMsg( GENERAL, "manageConnections():\n");

    uint32_t nowNodeTime;
    uint32_t nodeTimeOut = NODE_TIMEOUT;
    uint32_t connLastRecieved;

    SimpleList<meshConnectionType>::iterator connection = _connections.begin();
    while ( connection != _connections.end() ) {

        nowNodeTime = getNodeTime();
        connLastRecieved = connection->lastRecieved;
        // The trick is to always calculate the time difference, and not compare the two time values.
        if ( nowNodeTime - connLastRecieved > nodeTimeOut ) {
            debugMsg( CONNECTION, "manageConnections(): dropping %d now= %u - last= %u ( %u ) > timeout= %u \n", connection->chipId, nowNodeTime, connLastRecieved, nowNodeTime - connLastRecieved, nodeTimeOut );
            connection = closeConnection( connection );
            continue;
        }


If this makes a change tonight ... i'll push changes to github devel ...

UPDATE: If anyone wants to run this across multiple nodes ... I pushed the code here ... just a reminder ... it's ONLY the connection and nodesync stuff ... no timesync ... just for testing
https://github.com/sfranzyshen/easyMesh/tree/no-timing
Last edited by sfranzyshen on Tue Oct 25, 2016 7:30 pm, edited 3 times in total.
User avatar
By sfranzyshen
#57103
picstart wrote:OK here goes.
A possible design for scanning
All devices are equal but one needs to be the AP and upfront no device can be considered as the designated AP.

1) a device scans to see if there is already an AP
if there is an AP it meshes with that AP
If there is no AP it backs off a random amount of time and re-scans; if at that time, it again finds no AP it establishes itself as the AP
2) the random amount of time allows the device that drew the shortest amount of random wait time to win the AP role;
but only in the situation where there is no existing AP and more than one device is competing for it.
3) if the existing AP drops out then the same method to establish a new AP is used.

I'm not sold that we need a synchronizing time base....TCPIP if I have it right will accommodate the resend of broken transmissions...so the issue is in getting a design that
establishes and if needed re-establishes the mesh AP.
There is the special case where a device finds itself alone..we need to consider what to do while it waits for the company of another device.

Randomness is established via the analog pin voltage being used as the seed value.
Since we are considering the bssid (MAC) as establishing a unique mesh ID its uniqueness is good to use for back off time ( frees up the analog pin)...the bssid value modulo some large number to create unique back off time for establishing the device that wins the AP role. Almost the same as a designated device for the AP but not really since the device with the lowest bssid will only win if it is simultaneously competing for the AP role.


I maintain that every node needs to be AP & STA at all times ... I also maintain that we need a synchronizing time base ... but I do agree we need to handle scanning a different way ... I am too leaning toward a major rewrite ... but right now I'm focused on making this code work ... as it is ... first ... than ... BOOM! :D
Last edited by sfranzyshen on Tue Oct 25, 2016 5:30 pm, edited 1 time in total.
User avatar
By sfranzyshen
#57104
rudy wrote:OK, my two cents. I have been watching the RSSI values tonight. I have had nodes connect to a weak signal AP because it was the first one up. Or the last one standing when I altered the firmware on the rest. So most of the communications went through it.

There are significant problems with the existing approach. Maybe it can be fixed. But I think it is worth discussing alternatives.

One thing that bothers me is that a message needs to follow a connection path when there is plenty of signal to have two nodes talk directly with each other. The mesh idea is great but maybe a hybrid would be better. Of course this will add some complexity.

If possible I think the scanning and rebuilding of the network should not be as disruptive to the network. Or just make it work and not need to go through the process so often.

The other thing that bothers me is the loss of messages without any error handling.


nodes talk directly with each other can only be done in an ad-hoc mode ... but the idea here is to build a decentralized wifi mesh using standard wifi protocols/hardware in a infrastructure mode ... not ad-hoc ... I completely agreed ... we need some additional layers of checks and balance for the messaging system ... :D