socel.net is one of the many independent Mastodon servers you can use to participate in the fediverse.
Socel is a place for animation professionals, freelancers, independents, students, and fans to connect and grow together. Everyone in related fields are also welcome.

Server stats:

333
active users

#blas

1 post1 participant0 posts today
Replied in thread

Even now, Thrust as a dependency is one of the main reason why we have a #CUDA backend, a #HIP / #ROCm backend and a pure #CPU backend in #GPUSPH, but not a #SYCL or #OneAPI backend (which would allow us to extend hardware support to #Intel GPUs). <doi.org/10.1002/cpe.8313>

This is also one of the reason why we implemented our own #BLAS routines when we introduced the semi-implicit integrator. A side-effect of this choice is that it allowed us to develop the improved #BiCGSTAB that I've had the opportunity to mention before <doi.org/10.1016/j.jcp.2022.111>. Sometimes I do wonder if it would be appropriate to “excorporate” it into its own library for general use, since it's something that would benefit others. OTOH, this one was developed specifically for GPUSPH and it's tightly integrated with the rest of it (including its support for multi-GPU), and refactoring to turn it into a library like cuBLAS is

a. too much effort
b. probably not worth it.

Again, following @eniko's original thread, it's really not that hard to roll your own, and probably less time consuming than trying to wrangle your way through an API that may or may not fit your needs.

6/

Hi Friends! Little Life update!

I’m really, really excited to share I’m joining #Tenstorrent in September as a Field Application Engineer on the Customer Engineering Team!
Will be working on a few things, amongst them building a wicked fast #BLAS package for HPC & AI users!
#HPC #AI

Continued thread

What does this mean? It means that we now have a dedicated Matrix ASIC that can be used via standard opcodes/compilers, available to anyone with a relevant toolchain and compiler.

For the most part, expect all of your #BLAS kernels to gain support over time!

For #HPC in contrast with most matrix tile implementations, we have spec mandated single and double precision support.

That's in contrast with the x86 AMX extensions, most consumer dGPU implementations etc. which are 19 bits and below.