ESPUSB - Chat about the software components

User avatar
By cnlohr
#52709 So, this made a TON of sense. Thank you SO MUCH! I don't know how people get as good at explaining things as you did.

I probably won't have time till the weekend to rifle around, but I think I have an idea how to do it, but it doesn't sound like it's the way you would recommend...

... this seems kind of hacky to me. I would recommend using the --wrap option of the linker to wrap _UserExceptionVector or _UserExceptionVector_1 at link time.


I was thinking writing my own ASM function to replace _UserExceptionVector. It would preserve any registers it used (as well as a0). It would first check to see if bit 4 in INUM_GPIO is set. If so, jump directly into my USB handler. If not, call0 to _UserExceptionVector_1 I would then, whenever I want to do USB install my own _UserExceptionVector into 40000050 by memcpy.

This brings up a few questions, though.

(1) Why is _UserExceptionVector using call0 rather than j? Because of the added jump range? I don't know what would be +/- 512kB out of range rather than +/- 128kB. ***EDIT: Looks like your code does a j***

(2) What mechanism should I use to preserve all non-register processor state? I'm manually backing up SAR and PS using the following code... Is this right?
Code: Select all#define ENTER_INTERRUPT \
   rsil a0, 15; \
   s32i a0, a1, 60; \
   rsr a0, SAR; \
   s32i a0, a1, 64;

#define EXIT_INTERRUPT \
   l32i a0, a1, 64; \
   wsr a0, SAR; \
   isync; \
   l32i a0, a1, 60; \
   wsr a0, ps; \
   isync;


(3) How is a0 restored, from "wsr.excsave1 a0" **EDIT** Even in your code, it is a1... I don't really get it... Still not sure how its restored.

(4) It looks like the existing system uses "rsr.exccause a2" instead of INTENABLE & INTERRUPT. What is the difference there? The Xtensa manual says the "main cause" is EXCCAUSE, and "other information" is INTERRUPT, EXCVADDR. Looks like your code is using execcause?
User avatar
By cnlohr
#52740 Ok, I understand why you are suggesting to --wrap. The call0 address used for the command is just wrong because it's been relocated to the wrong place in code. Apparently relative location code only works if it's executing from the right location d'uh.

I spent about an hour trying to get --wrap to work, calling it with every convention I could think of in multiple combinations, but nothing seems to work. GCC just won't wrap it. No warnings. No errors. Just not wrapping.

Now, the best I can come up with is trying to figure out how to make gcc produce the right code, if I can trick it into believing the code will be located at the right place. I can't seem to figure that out so I opened a stack overflow ticket... http://stackoverflow.com/questions/3890 ... r-gcc-code
User avatar
By projectgus
#52744
cnlohr wrote:So, this made a TON of sense. Thank you SO MUCH! I don't know how people get as good at explaining things as you did.


You're welcome. I'm curious to see if you can pull this off. :)

cnlohr wrote:I was thinking writing my own ASM function to replace _UserExceptionVector. It would preserve any registers it used (as well as a0). It would first check to see if bit 4 in INUM_GPIO is set. If so, jump directly into my USB handler. If not, call0 to _UserExceptionVector_1


This sounds fine. The only restriction is you only have 0x20 bytes of _UserExceptionVector to work in. If you can do it in 0x20 bytes, it's a great approach.

(The reason I recommended wrapping _UserExceptionVector_1 instead is that then you don't have that limit - and your "fast path" for USB would still have the same number of branches, provided you put the USB stuff inline in _UserExceptionVector_1 instead of branching a second time. If that makes sense.)

(1) Why is _UserExceptionVector using call0 rather than j? Because of the added jump range? I don't know what would be +/- 512kB out of range rather than +/- 128kB. ***EDIT: Looks like your code does a j***


_UserExceptionVector and _UserExceptionVector_1 are actually compiled to different sections, so the compiler doesn't know the address of _UserExceptionVector_1 at compile time..

This way that the non-RTOS SDK does vectors means that the linker script is responsible for assembling the actual exception vectors (ie _UserExceptionVector and friends) at the right offsets. You can see it in the linker script, here:

Code: Select all  .text : ALIGN(4)
  {
    _stext = .;
    _text_start = ABSOLUTE(.);
    *(.UserEnter.text)
    . = ALIGN(16);
    *(.DebugExceptionVector.text)
    . = ALIGN(16);
    *(.NMIExceptionVector.text)
    . = ALIGN(16);
    *(.KernelExceptionVector.text)
    LONG(0)
    LONG(0)
    LONG(0)
    LONG(0)
    . = ALIGN(16);
    *(.UserExceptionVector.text)
    LONG(0)
    LONG(0)
    LONG(0)
    LONG(0)
    . = ALIGN(16);
    *(.DoubleExceptionVector.text)
    LONG(0)
    LONG(0)
    LONG(0)
    LONG(0)
    . = ALIGN (16);


The way exception vectors on Xtensa work is you have one special register VecBase, that holds a 16 byte aligned address in memory. Then all of the vectors are at known offsets relative to VecBase.

For esp-open-rtos, we're assembling the vectors at compile-time instead of link-time which is why they're all defined in one file and go into one section. We can put other code in the same section after the vectors, so the relative offset of that code is known at compile time (because it's all going in the same section, ie the same contiguous chunk of memory). This is why we get to have PC-relative jump & branch instructions there.

Actually, writing this out makes me think that you might be better off creating your own set of vectors as well - just like the esp-open-rtos approach. The vector implementations in the SDK all follow the same pattern as _UserExceptionVector I think (save basic state, call a different function), so re-writing the same few lines of assembler from each vector into a duplicate set of vectors is not too complex. _but_ once you do that, you'll have your own set of vectors and you can implement your USB code in the same section - and make use of relative jump instructions & conditional branch instructions that you wouldn't otherwise get to use without a call0.

Also, this lets you enable/disable USB interrupts by just changing VecBase. Pointed to SDK vecbase = no USB interrupts, pointed to your vecbase = USB interrupts.

This also means you don't have to --wrap anything at all, or memcpy random bits of memory.

Does that make sense as an idea?

Going a bit all-out here, but if you really needed to you could "save" some state by having multiple vecbases and switching them around depending on what part of the USB transfer state you were in (to save on having to read that state from RAM). That might be a bit crazy though, idk.

(2) What mechanism should I use to preserve all non-register processor state? I'm manually backing up SAR and PS using the following code... Is this right?
Code: Select all#define ENTER_INTERRUPT \
   rsil a0, 15; \
   s32i a0, a1, 60; \
   rsr a0, SAR; \
   s32i a0, a1, 64;

#define EXIT_INTERRUPT \
   l32i a0, a1, 64; \
   wsr a0, SAR; \
   isync; \
   l32i a0, a1, 60; \
   wsr a0, ps; \
   isync;



Looks right. Remember you only have to save registers to the stack if you know your code will be touching them.

(3) How is a0 restored, from "wsr.excsave1 a0" **EDIT** Even in your code, it is a1... I don't really get it... Still not sure how its restored.


In esp-open-rtos' UserExceptionHandler, it reads it back from excsave1 and pushes it onto the stack along with the address of a function called _xt_user_exit:

Code: Select all        rsr     a0, excsave1
        s32i    a0, sp, 0x0c
        movi    a0, _xt_user_exit
        s32i    a0, sp, 0x0


And then when the ISR handler is done it pops the _xt_user_exit address of the stack and "returns" to it (even though it was never there to begin with). The body of _xt_user_exit completes the restoration by popping the remaining values off the stack:

Code: Select all/* _xt_user_exit is pushed onto the stack as part of the user exception handler,
   restores same set registers which were saved there and returns from exception */
_xt_user_exit:
        .global _xt_user_exit
        .type _xt_user_exit, @function
        l32i    a0, sp, 0x8
        wsr     a0, ps
        l32i    a0, sp, 0x4
        wsr     a0, epc1
        l32i    a0, sp, 0xc
        l32i    sp, sp, 0x10


So a0's exciting journey goes something like a0 -> excsave1 -> a0 -> stack -> a0.

(How I think of it is this: the EXCSAVEx registers aren't so much about long term saving of a register value. But they're about giving the exception vectors access to a working register early on so they can load al iteral and/or make a call0. If that makes sense...)

I don't actually know how the non-RTOS SDK does this part. I think it's in ROM, you might have to do some digging. It'll be similar.

(4) It looks like the existing system uses "rsr.exccause a2" instead of INTENABLE & INTERRUPT.

What is the difference there? The Xtensa manual says the "main cause" is EXCCAUSE, and "other information" is INTERRUPT, EXCVADDR. Looks like your code is using execcause?


There's a real gotcha with understanding Xtensa: exceptions and interrupts are two different things.

If your Xtensa CPU has the "interrupt option" (like lx106/esp8266 does) then interrupts are implemented using the exception mechanism. Meaning that when an interrupt occurs, it causes exception where EXCCAUSE = Level1InterruptCause (which happens to be 4) and INTERRUPT holds the actual interrupt line(s) that asserted to trigger that exception.

The Xtensa ISA RM table 4-64 has a full list of exception causes & Section 4.4.4 "Interrupt Option" has a description of the interrupt functionality.

I used inaccurate language the first ime I described this, and I think the confusion is compounded because an interrupt means EXCCAUSE has value 4 (for Level1InterruptCause) and a GPIO interrupt sets the INTERRUPT register bit 4 (INUM_GPIO). This is a coincidence! If a different peripheral causes the interrupt, EXCCAUSE is still 4 (Level1InterruptCause) but the INTERRUPT register will have different bit(s) set.

I spent about an hour trying to get --wrap to work, calling it with every convention I could think of in multiple combinations, but nothing seems to work. GCC just won't wrap it. No warnings. No errors. Just not wrapping.


I think the confusion here is between which things happen at compile time, and which happen at link time. Maybe the explanation above about _UserExceptionVector being in a different section will be illuminating.

If you push the non-working wrap code & makefile you have to a branch somewhere, I can probably take a look. But like I said above, maybe you're better off just writing your own set of vectors and leaving the SDK ones as-is.
Last edited by projectgus on Thu Aug 11, 2016 9:34 pm, edited 3 times in total.
User avatar
By RichardS
#52745 Wish I could help, not a big GCC person here...

projectgus is going an awesome job!

RichardS