ESPUSB - Chat about the software components

User avatar
By cnlohr
#52823 I know it's a bummer, but, there is one operation I neglected to test, and I should have.

Code: Select all   _l8ui a0, a11, GPIO_OFFSET_INPUT


Loads from 0x60000318. I played with reading from RAM, flash, etc... And writing to RAM, peripheral I/O... But, not reading peripheral I/O. I've tried pipelining it, and hoping that I can keep issuing instructions after it, but no dice. This command seems to take a whopping 17 cycles!!! What in the world?

So, unless anyone's found another way of polling I/O that might be cached?

Though it is possible to use the I2S to clock in data at 32 MHz, I could probably pull out the USB signal... But, I don't think I could respond fast enough (it's 6.5 bit times, right?) not to mention no mechanism to finding a SE0 signal... I guess I could leave that brainstorming to another thread...

For now... Any ideas? any trick I might be able to use to get data from the peripherals faster? Or at least in a way that wouldn't hang up the CPU?

P.S. I found one cool thing!!! I could respond to interrupt requests in .28 us! (And actually faster, but I was doing extra setup)

P.P.S. This whole thing actually works a LOT better than I expected, as there is no need to resynchronize to the USB signal. When the ESP receives a pin-change interrupt, it is cycle-precise all the way down to 160MHz! So, that would be a HUGE BOON if I could get this problem figured out.
User avatar
By rudi
#52980
cnlohr wrote:I know it's a bummer, but, there is one operation I neglected to test, and I should have.

Code: Select all   _l8ui a0, a11, GPIO_OFFSET_INPUT


Loads from 0x60000318. I played with reading from RAM, flash, etc... And writing to RAM, peripheral I/O...


hi charles,
its me rudi :)

do you make a circular buffer on it`?

cnlohr wrote:

But, not reading peripheral I/O. I've tried pipelining it, and hoping that I can keep issuing instructions after it, but no dice. This command seems to take a whopping 17 cycles!!! What in the world?



you right, i am at same scene -
not sure at time - but the reason can be "optimaze" option in gcc

i had difference ticks in reply scene..

i am sure you know about this:
http://bbs.espressif.com/viewtopic.php?t=200


cnlohr wrote:

So, unless anyone's found another way of polling I/O that might be cached?



just in time a maked a pause on this, in the past last year in spring, i wrote things here:
http://www.mikrocontroller.net/topic/358654

with 80MHz it is 12.5 ns
with 160 MHz it is 6.5 ns ( had measuring on time this by 6.2 ns - see picture )

Code: Select all
static inline unsigned get_ccount(void)
{
        unsigned r;
        asm volatile ("rsr %0, ccount" : "=r"(r));
        return r;
}





cnlohr wrote:

Though it is possible to use the I2S to clock in data at 32 MHz, I could probably pull out the USB signal... But, I don't think I could respond fast enough (it's 6.5 bit times, right?) not to mention no mechanism to finding a SE0 signal... I guess I could leave that brainstorming to another thread...



80 MHz 12.5
160 MHz 6.5 ( 6.2<5>)

cnlohr wrote:
For now... Any ideas? any trick I might be able to use to get data from the peripherals faster? Or at least in a way that wouldn't hang up the CPU?



in the past worked with isr on 160 MHz gives fast timing -

cnlohr wrote:

P.S. I found one cool thing!!! I could respond to interrupt requests in .28 us! (And actually faster, but I was doing extra setup)




cnlohr wrote:

P.P.S. This whole thing actually works a LOT better than I expected, as there is no need to resynchronize to the USB signal. When the ESP receives a pin-change interrupt, it is cycle-precise all the way down to 160MHz! So, that would be a HUGE BOON if I could get this problem figured out.


[/quote]

pin change on any edge gives you on 160MHz nice timing,
but there must be a better timing on it, because all steps to the isr fire need ticks too..
so the way would be, espressif post the DMA to the isr, and then we can overlap ( snap on ) the time square before the work on isr is start. that can be theoretically

fastest:
instructions: 4

80 MHz : 12.5ns * 4 = 50 ns
160 MHz : 6.5 ns * 4 = 26 ns ( 6.25 * 4 = 25 ns )

optimated
instructions: 8

80 MHz : 12.5ns * 8 = 100 ns
160 MHz : 6.5 ns * 8 = 52 ns ( 6.25 * 8 = 50 ns )

and does a little be more better as your cool thing with .28 us ( 280 ns )

if you want, i can go on again on this after holiday, there are many question open to this theme,
i had paused because there was no community interrest on this last year -

this will be 1 tick
Code: Select all
..
t1 = system_get_time();     // API get time
// start
tick1 = get_ccount();       // Spezial Register CCOUNT Abruf und Übertrag / call and get
while (mtick < 100 )
{
// empty
mtick++;
}

t2 = system_get_time();     // API get time
tickdiff = tick2 - tick1;   // Wieviel Ticks verbraucht / how many ticks used..
..
..










that will be 3 ticks

Code: Select all
..
t1 = system_get_time();     // API get time
// start
tick1 = get_ccount();       // Spezial Register CCOUNT Abruf und Übertrag
while (mtick < 100 )
{
// empty
mtick++;
}

asm ("nop");                // "Ersteinrichtung = 1 Tick, Ausführung = 1 Tick"

t2 = system_get_time();     // API get time
tickdiff = tick2 - tick1;   // Wieviel Ticks verbraucht
..





that will be 4 ticks

Code: Select all
..
t1 = system_get_time();     // API get time
// start
tick1 = get_ccount();       // Spezial Register CCOUNT Abruf und Übertrag
while (mtick < 100 )
{
// empty
mtick++;
}

asm ("nop");
asm ("nop");

t2 = system_get_time();     // API get time
tickdiff = tick2 - tick1;   // Wieviel Ticks verbraucht
..
..


thats 3 ticks too

Code: Select all
..
t1 = system_get_time();     // API get time
// start
tick1 = get_ccount();       // Spezial Register CCOUNT Abruf und Übertrag
while (mtick < 1 )
{
asm ("nop");
mtick++;
}


t2 = system_get_time();     // API get time
tickdiff = tick2 - tick1;   // Wieviel Ticks verbraucht
..
..




thats will be only 5 ticks

Code: Select all
..
t1 = system_get_time();     // API get time
// start
tick1 = get_ccount();       // Spezial Register CCOUNT Abruf und Übertrag
while (mtick < 100 )
{
// empty
mtick++;
}
                            // bisher 1 tick
asm ("nop");                // 2 ticks
asm ("nop");                // + 1 tick
asm ("nop");                // + 1 tick
                            // werden 5 ticks!

t2 = system_get_time();     // API get time
tickdiff = tick2 - tick1;   // Wieviel Ticks verbraucht
..
..



that will be 14 ticks

Code: Select all
..
t1 = system_get_time();     // API get time
// start
tick1 = get_ccount();       // Spezial Register CCOUNT Abruf und Übertrag
while (mtick < 3 )
{
asm ("nop");
mtick++;
}

t2 = system_get_time();     // API get time
tickdiff = tick2 - tick1;   // Wieviel Ticks verbraucht
..
..


3 x 2 x 2 = 12 + while 3 - 1 = 14

ok?!...


but now see itself:

1 round
1 x 2 x 2 = 4 + while 1 - 1 = 4

Code: Select all
..
t1 = system_get_time();     // API get time
// start
tick1 = get_ccount();       // Spezial Register CCOUNT Abruf und Übertrag
while (mtick < 1 )
{
asm ("nop");
mtick++;
}


t2 = system_get_time();     // API get time
tickdiff = tick2 - tick1;   // Wieviel Ticks verbraucht
..
..


only 3 ticks




2 rounds
2 x 2 x 2 = 8 + while 2 - 1 = 9

Code: Select all..
t1 = system_get_time();     // API get time
// start
tick1 = get_ccount();       // Spezial Register CCOUNT Abruf und Übertrag
while (mtick < 2 )
{
asm ("nop");
mtick++;
}


t2 = system_get_time();     // API get time
tickdiff = tick2 - tick1;   // Wieviel Ticks verbraucht
..
..


yes - 9 ticks - we have the formular now.



..


5 rounds
5 x 2 x 2 = 20 + while 5 - 1 = 24

Code: Select all..
t1 = system_get_time();     // API get time
// start
tick1 = get_ccount();       // Spezial Register CCOUNT Abruf und Übertrag
while (mtick < 5 )
{
asm ("nop");
mtick++;
}


t2 = system_get_time();     // API get time
tickdiff = tick2 - tick1;   // Wieviel Ticks verbraucht
..
..


are 24 ticks, ok



now here comes the reason why i paused:

100 rounds .. how many ticks?
100 x 2 x 2 + while 100 - 1 = 499 ??

Code: Select all..
t1 = system_get_time();     // API get time
// start
tick1 = get_ccount();       // Spezial Register CCOUNT Abruf und Übertrag
while (mtick < 100 )
{
asm ("nop");
mtick++;
}

t2 = system_get_time();     // API get time
tickdiff = tick2 - tick1;   // Wieviel Ticks verbraucht
..
..



_1_praezisestimingccount_while_mtick_lees_100.JPG


ups
we have over 598 ticks

where arfe the rest of 99 ticks gone?

..bug? ..
this can only answere espressif
because there are no interest in the community on this theme, i paused the things,
because i cant measuring the things so fast.

translating:

10 x 2 x 2 = 40 + while 10 - 1 = 49

gegenprobe ( to probe )

30 durchläufe ( rounds )
30 x 2 x 2 = 120 + while 30 - 1 = 149
und? ( and?)

Bingo!

40 durchläufe ( rounds )
40 x 2 x 2 = 160 + while 40 - 1 = 199
und? (and?)

Bingo!

50 durchläufe ( rounds )
50 x 2 x 2 = 200 + while 50 - 1 = 249
und? ( and ?)

Bingo!


60 durchläufe ( rounds )
60 x 2 x 2 = 240 + while 60 - 1 = 299
und? ( and ? )

Bingo!

70 durchläufe ( rounds )
70 x 2 x 2 = 280 + while 70 - 1 = 349
und? ( and ? )

Bingo!

80 durchläufe ( rounds )
80 x 2 x 2 = 320 + while 80 - 1 = 399
und? ( and ? )

Bingo!

90 durchläufe ( rounds )
90 x 2 x 2 = 360 + while 90 - 1 = 449
und? ( and ? )

Bingo!

100 durchläufe ( rounds )
100 x 2 x 2 = 400 + while 100 - 1 = 499
und? ( and ? )

nö - 598 ( no - 598 )

was ist mit den 100? und 598?
what is with 100 and 598 ?

100 x 2 x 2 = 400 + while 100 - 1 = 499
wir haben aber 598?
but we have 598 ?

also step by step
klammer wert ist rechnerischer soll
theoretically calculate is in the right clip ( theoretically calc )
with "durchläufe" is mean: rounds

91 durchläufe = 454 ticks (91*2*2+while 91-1=454)
92 durchläufe = 459 ticks (92*2*2+while 92-1=459)
93 durchläufe = 464 ticks (93*2*2+while 93-1=464)
94 durchläufe = 469 ticks (94*2*2+while 94-1=469)
95 durchläufe = 474 ticks (95*2*2+while 95-1=474)

~~~~~~~~~~~~~~~bis hier her alles ok ~~~~~~~~~~~~~
all ok just in time..

but now.. mhm?

96 durchläufe = 574 ticks (96*2*2+while 96-1=479)
97 durchläufe = 580 ticks (97*2*2+while 97-1=484)
98 durchläufe = 586 ticks (98*2*2+while 98-1=489)
99 durchläufe = 592 ticks (99*2*2+while 99-1=494)
100 durchläufe = 598 ticks (100*2*2+while 100-1=499)

es sind in diesem 'abschnitt' immer 6 ticks mehr als vorherige wert
vorher waren es 5 ticks mehr.

in this section allways was 6 ticks more as before -
before was 5 ticks more.

zwischen 95 und 96 durchläufen 'kippt' es
auf

between 95 and 96 overflow it..


96*2*2+while 96-1 + 100-1

Ein Überlauf um 100 ticks und dann eine differenz weiterer werte um 6.
a overflow with 100 ticks and then a differenze with 6 ticks more


kippt es bei 2*95 also bei 96 + 95 = 191 durchäufen?
does it overflow with 2*95 = 96 + 96 = 191 rounds?

Test

190 durchläufe = 1138 ticks (190*2*2+while 190-1 + 190 - 1 = 1038 )
195 durchläufe = 1168 ticks (195*2*2+while 195-1 + 195 - 1 = 1168 )
200 durchläufe = 1198 ticks (200*2*2+while 200-1 + 200 - 1 = 1198 )
..
1200 ....= 7198 ( = 7198 )

nein, es kippt nicht mehr
no it does not overflow again..

warum kippt es bei 95-96?
why is it overflow at 95-96?

warum ist es zwischen 1 und ab 2 Durchläufen anders?
why is it difference between 1 round and from begin 2 round a change?


Changing GPIO registers directly.
Without -mno-serialize-volatile gcc emits memw instruction before each volatile memory write.
You can try the following snippet that makes two GPIO12 pulses:
Overall gcc may not be very good at optimizing register access, so you should look at the code to see what it does.


Code: Select allasm volatile (
"s32i %0, %1, 4\n\t"
"s32i %0, %1, 8\n\t"
"s32i %0, %1, 4\n\t"
"s32i %0, %1, 8"
: : "r"(0x00001000), "r"(0x60000300) : "memory");


hope this helps you charles.
i think the last you know too, or ?
;-)


best wishes
rudi ;-)
You do not have the required permissions to view the files attached to this post.
User avatar
By rudi
#53002 perhabs this help better to translate my bad english:

you said: .28 us
did you mean with this 280 ns ?

here you go faster without any tricks and configs and so on:


0.047 us
think this are 41.7 ns

stable with 83.3 ns ( . 083 us )

asm_toggle.png



with small tricks ( 80 MHz ..12.5 ns , 160 MHz ..6.25 ns ) we can go stable 25 ns and 12.5 ns toggle
gpio and instruction too, but then we must code in asm the usb polling - timing in one block.

you see the small difference with single asm line and block asm lines in the picture.

the asm code is right on the picture for each.

have a try with this.

do not forget to set, init your right gpio


best wishes
rudi ;-)
You do not have the required permissions to view the files attached to this post.
User avatar
By rudi
#53019 hi charles,

this is what i get / meassuring on fast way on a experiment on GPIO ASM Blocks:
stable 6 MBit/S

6_MBit_stable.png


this i get for possible realtime calculation if the polling is all in asm blocks
and we work on normal 6.25er "DMA"

Code: Select allwithout DMA and Core 160MHz
without drops - tricks
and with asm blocks

custom Instructions: 6.25ns
bit flow for 1 bit data read:
---------------------------

ISR Fired:
   fired   6.25
   nop   6.25
   clear   6.25
   nop   6.25
   attach    6.25
   nop   6.25


Clock:
    low      6.25
    nop      6.25
    high      6.25
    nop      6.25



Data read:
    get pin   6.25
    nop      6.25
    shift pin   6.25
    nop      6.25

-----------------------------

   14 * 6.5     87.5 ns ( full clock )

87.5 ns * 8 = 700 ns

8 Bit = 0.700 us

1 ms  = 1000 * 8 / 0.700 = 11428 Bit / ms

1 s     11428 * 1000 = 11 428 000 Bit/s

11 428 000 Bit/s = 11.428 MBit/s
      



after vacation, will investigate perhabs a little.
not sure - because we have many orders.
i am work on "usb otg / CDC" :mrgreen:

best wishes
rudi ;-)
You do not have the required permissions to view the files attached to this post.