> In a previous post, I’ve shown how to use the rayon framework in Rust to automatically parallelize a loop computation across multiple CPU cores. > In this post, I’ll first explain which profiling tools I used to chase optimizations, before diving into how I built a faster replacement of Rayon for my use case. In the next post, I’ll describe the other optimizations that made my code much faster. Spoiler alert: copying some data sped up my code!