Added API to run a function in multiple CPUs, and wait for completion