First, it should be noted that it's not a given that entanglement really does produce non-local causal action at a distance. The copenhagen interpretation of QM can be argued to affirm that, depending on who's talking about that interpretation at the time, but other interpretations don't necessarily have the same view of entanglement.
But, let's assume for a minute that it really is non-locally causal - why can no information be communicated instantaneously? Why do we have the no communication theorem
and how does it make sense? After all, if there's non-local causality, we should be able to make use of it, right?
Well, no. The problem is, the sort of non-local causality in QM is only in a statistical sense, and the problem is, there would be no way of seeing the statistically causal change in distant regions of space, without comparing notes.
The normal variable people tend to talk about is Spin. We send an entangled pair, one east and one west. Not just one entangled pair, imagine a stream of entangled pairs, so that maybe later we may use this stream of particles for communication. At the east station, we're measuring spin up/down at 0deg, and the particle is up. At the west station, we're measuring spin up/down at 0deg, and the particle is down. When we measure at each station in the same orientation, each station gets the opposite result of the other station every time, for each particle pair. So, how could we manipulate this situation to communicate?
The only option, I think, is to change how we measure. If we change the relative degree of measurement, then we can change the correlation rates at the other end of the wire - when it's at 0deg, the results are perfectly negatively correlated, and when it's at 20deg, for example, that correlation changes to a different number. So if the West station wants to send FTL information to the East station, then maybe they rotate their spin-measuring-device from 0deg to 20deg, or something, and then that causes something different to happen at the East station, right?
Wrong. At the east station, when the West station changes from 0deg to 20deg, to the east station, everything looks exactly the same. They're still getting a stream of particles and still measuring it at 0deg up/down, and 50% of them are down and 50% of them are up - there's not actually a way for the East station to know that the West station changed their measuring orientation, until after they compare notes and find out that the correlations changed.
It's these sort of thought experiments that underly the no-communication theorem. Even if quantum causality is non-local and immediate, there's no way for the East station to know anything about what's happening at the West station in a faster-than-light way.