Isn’t this just a function of the parallax when rendering both screens?
Usually your brain learns a strong correspondence between focus and convergence, but this can be unlearned quite easily, and indeed must be in order to view e.g. VR, 3D films, Magic Eye pictures, etc... - all of which encode 3D information through convergence, while requiring your eyes to focus on a fixed plane.
There are solutions being developed for this, but they have not been successfully miniaturised and/or cost-reduced for productisation. It's unclear how far away it is at this time, but Reality Labs has several generations of solutions that physically change the distance between the lenses and the displays, and alternate solutions like lightfields capable of simultaneously displaying content at different focal planes are being investigated.