Based on the title I would have assumed that the programming model would be inverted, but it wasn't. What is needed is something akin to the Haskell model where the evaluation order is unspecified while simultaneously allowing mutation. The way to do this would be a Rust style linear type system where you are allowed to acquire exclusive write access to a region in memory, but not be allowed to perform any side effect and all modifications must be returned as if the function was referentially transparent. This is parallel by default, because you actively have to opt into a sequential execution order if you want to perform side effects.
The barriers to this approach are the same old problems with automatic parallelization.
Current hardware assumes a sequential instruction stream with hardware threads and cores and no hardware primitive in the microsecond range to rapidly schedule code to be executed on another core. This means you must split your program into two identical programs that then are managed by the operating system. This kills performance due to excessive amount of synchronization overhead.
The other problem is that even if you have low latency scheduling, you still need to gather a sufficient amount of work for each thread. Too fine grained and you run into synchronization overhead (no matter how good your hardware is), too coarse grained and you won't be able to spread the load onto all the processors.
There is also a third problem that is lurking in the dark and many developers with the exception of the Haskell community are underestimating: Running programs in a suboptimal order can lead to a massive increase in the instantaneous memory usage to the point where the program can no longer run. Think of a program allocating memory for each request, processing it and then deallocating, then allocating again. What if it accepts all requests in parallel? It will first allocate everything, then process everything and then deallocate everything.