Home >

DSP/ML computing libraries for IoT

Laurent Le Faucheur- Watch Now - Duration: 28:01

CMSIS-NN and CMSIS-DSP provide developers with a collection of efficient neural network kernels aimed at maximizing performance and minimizing the memory footprint on Cortex-M processors for applications that require machine learning and DSP capabilities. Join Laurent Le Faucheur, Principal IoT Software Engineer at Arm as he shares the latest developments for these computing libraries and how they can be used efficiently with future processing technology, including the Arm Cortex-M55 processor.

M↓ MARKDOWN HELP
italicssurround text with
*asterisks*
boldsurround text with
**two asterisks**
hyperlink
[hyperlink](https://example.com)
or just a bare URL
code
surround text with
`backticks`
strikethroughsurround text with
~~two tilde characters~~
quote
prefix with
>

laurentlefaucheurSpeaker
Score: 0 | 2 months ago | no reply

Yes, the renormalization ends with an arithmetic right shift.
Cortex-M have a short pipeline with internal bypass, the compiler takes care of instruction reordering when needed.

Darkphibre
Score: 0 | 2 months ago | no reply

I also have a question about many sequential shifts and MAC. The slide on the processors calculates the most optimal times, yes? Back when I worked with processors that could reorder instructions, it seemed that we had to be careful using the raw instruction times, because the processor could skip a load (if I remember correctly) by chaining operations, and in the other direction cache misses were critical to minimize.
A potential reference, that I've only just skimmed through:
https://blog.stuffedcow.net/2013/05/measuring-rob-capacity/
(Aside: It'd be nice to have slide numbers to reference questions to :)

Darkphibre
Score: 0 | 2 months ago | no reply

On the CSD slide, the final line would read A x 76 = -A<<2 + A<<4 + A<<6, correct? I assume the final line would be A x 0.594 = (-A<<2 + A<<4 + A<<6) >> 7?
CSD looks fascinating.