There's No Such Thing As "Implicitly Atomic"
If I have an aligned machine-word-sized variable (Int) and I store to it from Thread A, then I know Thread B might see the old value instead of the new value (because of per-processor caching, or the compiler “hoisting” a load to earlier in the function). But there’s no way, on a modern processor, that Thread B sees a mix of the old and new value, right? That can only happen with wider values, or unaligned values, that the code may update non-atomically, right?