We want to prove:

∫∫pxs(x,s)logpxs(x,s)ps(s)px(x)dxds+∫∫pys(y,s)logpys(y,s)ps(s)py(y)dyds≤∫∫∫pxys(x,y,s)logpxys(x,y,s)ps(s)pxy(x,y)dxdyds

This can be rewritten as:

∫∫∫pxys(x,y,s)logpsx(s,x)ps(s)px(x)dxdyds+∫∫∫pxys(x,y,s)logpsy(s,y)ps(s)py(y)dxdyds≤∫∫∫pxys(x,y,s)logpxys(x,y,s)ps(s)px(x)py(y)dxdyds

After moving everything to the right hand side and simplifying, we get:

∫∫∫pxys(x,y,s)logpxys(x,y,s)ps(s)pxs(x,s)pys(y,s)dxdyds≥0

Now if we just prove that q(x,y,s)=pxs(x,s)pys(y,s)ps(s) is a probability distribution, then the left hand side is KL(pxys(x,y,s),q) , and Kullback-Leibler divergence is always nonnegative.

Ok, q is obviously nonnegative, and its integral equals 1:

∫∫∫pxs(x,s)pys(y,s)ps(s)dxdyds=∫ps(s)ps(s)ps(s)ds=∫ps(s)ds=1

Q.e.d.

Just for amusement, I think this theorem can fail when s, x, y represent subsystems of an entangled quantum state. (The most natural generalization of mutual information to this domain is sometimes negative.)