So you're a Noob? Post your questions here until you graduate! Don't be shy.

User avatar
By lucasromeiro
#79975
McChubby007 wrote:OK.

Is the variable you are changing in the isr a simple data type (int etc) and is it marked as 'volatile' in the global declaration of it? It also has to be 'atomic' which means a read/write to a variable with a bit size smaller or same as the architecture which in this case is 32 bit. Also, is it aligned on the correct memory boundary, and if yu comment out the access to the variable in just the main code (or the idr) does the wdt reset go away?

Yes, a simple example replicating the problem is always a good start - it is something us 'professionals' will do when other methods of fault detection fail.


Hello McChubby007!
Thanks for the answers.
Something is happening with my account and I do not receive replies by email.
Just seen it now.
I thought about doing a brief program to explain the problem, but I will not be able to provoke the problem here, since I am traveling (away from the company).
But it's really really simple stuff that I do and I do not understand why the problem happens.

Within the interruption, a variable is incremented.
I tested it with 'volatile' but it did not change anything.
This part I did not understand, can you explain me better? "It also has to be 'atomic' which means to read / write to a variable with a bit size smaller or same as the architecture which in this case is 32 bit."

when you say: "Also, is it aligned on the correct boundary memory, and if it's commenting on the access to the variable in just the main code (or the idr) does the wdt reset go away?"
is it for me to leave the variable incrementing, but not to use it anywhere to make a test if the problem appears? I did not take this test! Good idea, I'll code and send it to my colleagues to test it.

Would you prefer me to open a new topic?
I put it here because I thought about solving it by reading the interrupt flag and incrementing myself the variable, without ISR.


I will paste here the passages involved in this error, where I use the variable, maybe it will facilitate the understanding.

pinMode(pin, INPUT_PULLUP);
attachInterrupt(digitalPinToInterrupt(pin), interrupcaoED1, FALLING);

inside h file
{
extern volatile uint32_t pulsosEDs[8];
extern volatile bool entradasDigitais[8];
}
cpp file
{
volatile uint32_t pulsosEDs[8];
volatile bool entradasDigitais[8];
}

void interrupcaoED1()
{
pulsosEDs[0]++;
entradasDigitais[0] = 1;
}

recording the variable in the file:
uint32_t *uint32_tBuffer;
uint32_tBuffer = (uint32_t *)malloc(((readings[readingIndex].numRegs * sizeof(uint32_t)) + 1));
uint32_tBuffer[0] = pulsosEDs[readings[readingIndex].reg];
pulsosEDs[readings[readingIndex].reg] = 0;
auxPtr = (char *)&uint32_tBuffer[0];
auxPtr[sizeof(uint32_t) * readings[readingIndex].numRegs] = 0;
carregaArquivos.write((const uint8_t *)auxPtr, ((sizeof(uint32_t) * readings[readingIndex].numRegs) + 1));
free(uint32_tBuffer);
break;
User avatar
By McChubby007
#79979 It's possible that the problem is due to the behaviour reported in this topic but it does not feel like it to me; I could be wrong though. I will leave it to you to decide to open a new topic or not.

I will try to explain each thing in turn, and not make it too complicated, but the subject can become complicated I'm afraid.

Volatile : tells the compiler not to assume anything about optimizing the generated instructions for the reads/writes of this variable. Without this the compiler will assume that the variable cannot be modified by code 'simultaneously' by other code and thus not reload it when accessing it multiple times in a series of instructions. This optimization is bad when interrupt routines interrupt other code currently accessing the variable and then modify this variable because when the interrupted code is allowed to execute (after isr finished) it will have the wrong value still loaded in registers. So, in summary, always use volatile when an isr accesses a global variable, otherwise the main code may see the wrong value from time to time.

Atomic : This is about ensuring consistency of the value stored in the variable when the variable is being changed, such as
C++:
var++
which in machine code is:
reg=var,
reg++,
var=reg
If the non-isr code is interrupted part way through this series of instructions then the isr will modify the value, and when the non isr code runs again it will overwrite the variable already set by the isr. In your example code you have this problem but slightly different in that the main code writes a 0 : if the isr runs during this time it will ++ the variable which the main code will reset to 0 meaning you get wrong values in the array. Some architectures (or languages or functions etc) will provide a different command/instruction that you use to ensure it cannot be interrupted in the middle of such an operation, and is commonly known as read-modify-write see https://en.wikipedia.org/wiki/Read-modify-write
There are various ways to prevent this, such as turning interrupts off, using atomic instructions or designing it so only the main OR the isr writes and the other code just reads.

'natural machine word size' (32 bit or 8 bit etc etc) : This is about ensuring consistency of the value stored in the variable but in a different way to atomic above. If it takes more than one memory access to read or write then parts of the variable will have old values and other new values, I.e. Atomic across multi-memory variables. So for example:

C++ main code & globals:
volatile uint64_t var1;
var1++;

isr:
var1=0;

Because this is a 64 bit variable, and not a natural machine size for esp8266 the high 32bits is loaded into a register and modofied and written back to memory, and then this is repeated for the low 32bits. Because this can be interrupted between the 1st 32bit write and the 2nd it is possible for the contents to be very very wrong indeed. You don't have this problem in your example code.

Each of the above if not considered in your design can lead to various class of error, ranging from incorrect behaviour to system crash, it just depends on the particular code, the timing and how the variable is used for the intended function.

Unaligned memory : I don't think you have this problem, but it is where you try to read/write across an illegal memory boundary, such as writing a 32bit to a memory address not divisible by 4 (for the esp8266). I won't bother to explain further as I don't want to make this even more complicated. I can explain if you ask again.

So having said all that, there is an issue in your code as I mentioned above, but my gut feeling is that that is not the cause of the crash. The problem in your code is more likely to cause some missed interrupts or poor calculations, not any crashes. Having said that I do note further down in my reply that I think your example is incomplete.

More likely is that the indexing into one of the arrays (either pulsosEDs or entradasDigitais) is out of bounds. Which would mean is -ve or >= 8.

Because you have only provided me with an example it may well be other problems but I would concentrate on out of range array indexes and put some checks to protect against this happening, which may just be temporary to find the problem or keep it in for future assurance.

I would also add that your example is, I think incomplete, as you do not check that entradasDigitais[0] is 1 in your main code before doing the new work, nor do you reset it there. Also the isr only ever increments pulsosEDs[0] and perhaps you mean pulsosEDs[some_index_or_other]++ because the main code does pulsosEDs[readings[readingIndex].reg] = 0.

You could google these topics for more accurate or definitive explanations but I think I have covered the main points of comuter theory here, although it's late and I'm tired :-(
User avatar
By lucasromeiro
#79992
McChubby007 wrote:It's possible that the problem is due to the behaviour reported in this topic but it does not feel like it to me; I could be wrong though. I will leave it to you to decide to open a new topic or not.

I will try to explain each thing in turn, and not make it too complicated, but the subject can become complicated I'm afraid.

Volatile : tells the compiler not to assume anything about optimizing the generated instructions for the reads/writes of this variable. Without this the compiler will assume that the variable cannot be modified by code 'simultaneously' by other code and thus not reload it when accessing it multiple times in a series of instructions. This optimization is bad when interrupt routines interrupt other code currently accessing the variable and then modify this variable because when the interrupted code is allowed to execute (after isr finished) it will have the wrong value still loaded in registers. So, in summary, always use volatile when an isr accesses a global variable, otherwise the main code may see the wrong value from time to time.

Atomic : This is about ensuring consistency of the value stored in the variable when the variable is being changed, such as
C++:
var++
which in machine code is:
reg=var,
reg++,
var=reg
If the non-isr code is interrupted part way through this series of instructions then the isr will modify the value, and when the non isr code runs again it will overwrite the variable already set by the isr. In your example code you have this problem but slightly different in that the main code writes a 0 : if the isr runs during this time it will ++ the variable which the main code will reset to 0 meaning you get wrong values in the array. Some architectures (or languages or functions etc) will provide a different command/instruction that you use to ensure it cannot be interrupted in the middle of such an operation, and is commonly known as read-modify-write see https://en.wikipedia.org/wiki/Read-modify-write
There are various ways to prevent this, such as turning interrupts off, using atomic instructions or designing it so only the main OR the isr writes and the other code just reads.

'natural machine word size' (32 bit or 8 bit etc etc) : This is about ensuring consistency of the value stored in the variable but in a different way to atomic above. If it takes more than one memory access to read or write then parts of the variable will have old values and other new values, I.e. Atomic across multi-memory variables. So for example:

C++ main code & globals:
volatile uint64_t var1;
var1++;

isr:
var1=0;

Because this is a 64 bit variable, and not a natural machine size for esp8266 the high 32bits is loaded into a register and modofied and written back to memory, and then this is repeated for the low 32bits. Because this can be interrupted between the 1st 32bit write and the 2nd it is possible for the contents to be very very wrong indeed. You don't have this problem in your example code.

Each of the above if not considered in your design can lead to various class of error, ranging from incorrect behaviour to system crash, it just depends on the particular code, the timing and how the variable is used for the intended function.

Unaligned memory : I don't think you have this problem, but it is where you try to read/write across an illegal memory boundary, such as writing a 32bit to a memory address not divisible by 4 (for the esp8266). I won't bother to explain further as I don't want to make this even more complicated. I can explain if you ask again.

So having said all that, there is an issue in your code as I mentioned above, but my gut feeling is that that is not the cause of the crash. The problem in your code is more likely to cause some missed interrupts or poor calculations, not any crashes. Having said that I do note further down in my reply that I think your example is incomplete.

More likely is that the indexing into one of the arrays (either pulsosEDs or entradasDigitais) is out of bounds. Which would mean is -ve or >= 8.

Because you have only provided me with an example it may well be other problems but I would concentrate on out of range array indexes and put some checks to protect against this happening, which may just be temporary to find the problem or keep it in for future assurance.

I would also add that your example is, I think incomplete, as you do not check that entradasDigitais[0] is 1 in your main code before doing the new work, nor do you reset it there. Also the isr only ever increments pulsosEDs[0] and perhaps you mean pulsosEDs[some_index_or_other]++ because the main code does pulsosEDs[readings[readingIndex].reg] = 0.

You could google these topics for more accurate or definitive explanations but I think I have covered the main points of comuter theory here, although it's late and I'm tired :-(


Wow!!
A lot of information!
I had never seen guidance on these issues.
I only knew about Volatile, because the Arduino documentation guides you to use it. I knew I was avoiding optimizations, but I did not know the other details.
Reading a little more about Atomic operations I understood what you meant!
I think this may be the root of the problem!
Because I do not know if I spoke earlier, the more often I read, the error appears more easily.
I also noticed that when starting esp (connecting the interrupt, still without manipulation of the variable) it resets infinitely as long as the frequency is greater than 10hz. (Strange Behavior) Is it because I turn on interrupts before connecting to wifi? As soon as I call and esp, the first thing I do is turn on the interrupts.
I'm thoughtful ...

Going back to the atomic instructions ...
What strategy do you recommend?
I'm afraid to disable interrupts to manipulate the variable and lose pulses at this point ...
Or some other unwanted effect.
maybe this will solve:
"Https://forum.arduino.cc/index.php?topic=73838.0"

I'm going to do the test I told you about now.
I will enable the external interrupt and put inside it only the increment of a Volatile variable and not use that variable anywhere.
Then I will know if the problem is in the manipulation of the variable.

You're right when you say the code is incomplete.
But what is missing is the handling of the variable entradasDigitais[0], I did not put it, because it is similar to what I sent to the variable pulsosEDs [0].
As far as indexing the variable pulsosEDs [index] is correct, because I have an interrupt for each pulsosEDs. I use the index only for the time of recording.

Thanks a lot for the help!
I appreciate your patience and knowledge!
User avatar
By McChubby007
#79993 There are three strategies you can use :
1. Use a read/write operation which cannot be interrupted. This is the best option but in the case of esp8266 is not available.
2. Turn off interrupts in main code when modifying shared variable. This is fine because interrupts are not missed they are just not 'serviced' by the system until re-enabled. It is not my preferred option but I think you could do this and many many other code does this. It would be a bad option if interrupts were off for long periods but this is not the case for your example code. EDIT 1 : However, if interrupts are disabled for longer than the interrupt frequency then interrupts will be lost as you would get one isr call even though multiple interrupts have occurred. CPUs such as ARM have an NVIC (nested interrupt controller) which handles this for the user and is a fantastic piece of hardware!
3. Only write in one place, read from the other place into a temporary/other variable. This is what I tend to use when it is a simple data type like an int, which is what you use. The ISR will do the 'var++'. The main code reads this value and compares to a local copy to determine the interrupt count, eg
Code: Select all===
ISR:
===
isrCount++;

=======
non-ISR:
=======
//--------------------------------------
// globals:
uint32_t isrCount = 0;

//--------------------------------------
// code body in regular periodic loop etc...
static uint32_t copyOfCount = 0;  // local copy of isr counter

uint32_t newCount = isrCount;  // atomic read of isr counter

// note using unsigned ints means count still works after isrCount rolls back to 0, it would not work for signed values
uint32_t numInterrupts = newCount - copyOfCount;  // get number of interrupts since last checked

copyOfCount = newCount;  // update local copy of isr counter

if (numInterrupts > 0) {
   // process ...
}
Last edited by McChubby007 on Thu Jan 10, 2019 3:15 pm, edited 1 time in total.