C | Python | Program Optimization | Unrolling, SIMD Instructions, OpenMP
In this project, I first implemented a subset of numpy called numc using a naive solution for matrix operations. Then, I experimented with a setup file in Python to install the numc module. After that, I constructed the Python-C interface by overloading some operators and defining some instance methods for numc.Matrix objects. Finally, I sped up my naive solution, thus making numc.Matrix operations faster.
To speed up the naive solution, I:
- used conventional code optimizations to speed up the matrix operation algorithms themselves
- SIMD instructions, which perform operations on 256 bit values (4 64 bit operations can be performed in parallel)
- parallel computation using OpenMP, being careful to keep the threads from overwriting one another