1) Программирование многоядерных DSP-процессоров TMS320C66x с использованием OpenMP https://habr.com/ru/articles/318762/
pdf:
1) Sparse Matrix-Vector Multiply on the Texas Instruments C6678 Digital Signal Processor https://pdfs.semanticscholar.org/6617/964cd7ead75d18a7b25dcc04c222abdce1f9.pdf
2) Optimising loops in c66 https://www.ti.com/lit/pdf/sprabg7
3) c66x instruction set https://www.ti.com/lit/ug/sprugh7/sprugh7.pdf
текстовое описание функций c66x:
1) описание (абзац TMS320C6600 C/C++ Compiler Intrinsics в TMS320C6000 Optimizing Compiler
v8.2.x. User's Guide) https://www.ti.com/lit/pdf/spru187
2) dotpsu4h
long long _ddotpsu4h (__x128_t src1, __x128_t src2 ); - Performs two dot-products between four sets of packed 16-bit values. (Two-way _dotpsu4h)
int _dotpsu4h (long long src1, long long src2); - Multiply four signed 16-bit values by four unsigned 16-bit values and return the 32-bit sum.
long long _dotpsu4hll (long long src1, long long src2); - Multiply four signed 16-bit values by four unsigned 16-bit values and return the 64-bit sum.
3) dmpy2
__x128_t _dmpy2 (long long src1, long long src2); - Four-way SIMD multiply of signed 16-bit values producing four signed 32-bit results. (Two-way _mpy2)
4) dadd2
long long _dadd2 (long long src1, long long src2); - Four-way SIMD addition of signed 16-bit values producing four signed 32-bit results. (Two-way _add2)
векторные функции c66x:
Комментариев нет:
Отправить комментарий