Your new topic does not fit any of the above??? Check first. Then post here. Thanks.

Moderator: igrr

User avatar
By Me-no-dev
#51225 there is no code to handle SPI0 and I2S interrupts. Maybe SPI0 needs clearing because of some strange behind-the-scenes thing going on in the xtensa core. I know that the registers we write to are just interfaces to some hardwired logic. I guess reading SPIFFS at the same time as getting SPI1Slave stuff breaks something. Question is now wether the very basic ReadFlash routines get interrupted and break, or is it the spiffs logic itself. Maybe disabling interrupt at some point in SPIFFS is not a bad idea.
User avatar
By igrr
#51235 I have commented about this on Gitter already, but will post my thoughts here as well, for the folks who are not on Gitter.

As a general rule, when you see an issue, try to diagnose it from "first principles".
Exception output says that CPU raised exception 0 at PC 0x402095f8. You have correctly checked the exception list for cause of exception 0 — it means that the instruction was invalid. But don't stop there :)
First do xtensa-lx106-elf-objdump -d file.elf, and verify that the instruction at this address is in fact valid and objdump decodes it (i'm pretty sure it is valid).

So, if instruction is valid, why does CPU complain about invalid instruction? Well, this instruction is in flash, and CPU accesses instructions in flash memory through a cache. This cache is a piece of hardware which knows how to read data from flash memory, and return it to the CPU as if it was in some RAM. Because it is a cache, it also uses a piece of RAM (32k of it) to store instructions which get requested by the CPU frequently.
Because this piece of hardware talks to the SPI flash chip on its own, every time we want to read or write flash from our program we need to disable cache. Otherwise flash commands issued by code can get intermingled with commands issued by cache hardware, and this will cause read and write errors. So, each time SPIFFS needs to read or write something, flash cache is disabled. When it is disabled, it will always return 0x000000 as an instruction word, which is an illegal instruction for the CPU. All this means that if you try to run code located in Flash while flash read or write is in progress, this will cause exception 0.

Now, back to the original issue. When reading or writing flash from SPIFFS, i mask all interrupts except for SPI interrupt. See spiffs_hal.cpp file. I have tried masking SPI interrupt as well, but this seems to break the actual function which reads or writes flash. So extra care is needed in SPISlave to check if the ISR arrived when flash cache is disabled. If it is, SPISlave may choose to bail out of the ISR handler.

It would be also nice to dig why masking SPI interrupt breaks flash read/write routines. I suppose there may be a bug in SPIRead and/or SPIWrite ROM functions.
User avatar
By forlotto
#51305 hrmmm... Is there a dism of the rom that is public or possibly an emulator that shows the execution of everything as cmd/response is being made this would be pretty cool... Guess I never really looked often times if I recall doesn't a rom typically allow for patches to be made by hooking to one of the interrupts which I am sure you have already done if you are turning off these interrupts basically by subverting them somehow I guess with some code of your own in flash as there is likely a table of patches that is standard that tells the address where the patch is located etc...

Hrmmm this gets me thinking I almost wonder if the SPI routine is not required to look at this patch table so any time you subvert it the rom cannot look for these patches by default which likely causes the issue.

Sometimes the Rom can be reprogrammed with special tools, commands, etc on certain devices if I recall from my reading normally the factory is the only one with access to this type of internal information and it never reaches the public for obvious reasons. There are also often protections like fuses that get blown after the rom is written to keep you from doing this which would take some high dollar equipment to solve such a thing at that point.

There is likely a way to hook something in the boot routine or something knowing what gets invoked first is important to stop the exception I would believe.

These are all just guesses from a guy that knows very little. So take this all with a grain of sand.
User avatar
By Vicne
#51336 Hi,

First, let me thank everyone involved. My code is now working as expected and the ESP is really stable :-).
To summarize, igrr's diagnostics was correct (who doubted it ;-)), and as discussed on gitter, forcing the callback to be stored in RAM instead of flash using the ICACHE_RAM_ATTR directive in the callback declaration solved the issue, as there's no need to fetch cache from flash anymore when the interrupt occurs.
As the code in the handler is very limited in terms of processing, it doesn't take too much RAM.
I can now spy on that SPI protocol and handle it fine :-)

So thanks again very much ;-)

Vicne