If you ever have looked at any article regarding graphical models, you probably have seen these terms: conditional independence, or D-separation. but what are those?
Put simply, two variables X and Y are independent conditional on another variable Z if knowledge about X gives you no extra information about Y once you have knowledge of Z.
Let's use a cooking metaphor here. I love cooking and occasionally I use the old good method of pan frying which also produces a nice sizzling sound (hmm.. not a healthy cooking method though!). The problem is that sometimes if the oil is too hot, it will cause smoke, and this will in turn trigger the fire alarm to go off. Let's see how we can use our common sense reasoning in a probabilistic manner using this metaphor.
I will show conditional independence in three special cases, and I will use three variables X, Y, and Z. Note that the same letter might refer to different things in different cases (i.e. X in the first case might refer to the sizzling sound, while it might refer to pan frying in the second case). In the figures, the orange ellipse always corresponds to Z though.
In short, two variables X and Y are independent conditional on another variable Z if knowledge about X gives you no extra information about Y once you have knowledge of Z.
Case 1: Let's assume that you already know that I am pan frying (Z) in the kitchen. Now if you also hear the sizzling sound (X), it does not give you any additional information regarding the alarm going off (Y), because you can infer alarm (Y) from pan frying (Z). The same is true in case of the sizzling sound: if you already have seen me pan frying in the kitchen (Z), and you also have heard the alarm going off (Y), then the alarm (Z) does not give you any additional information regarding the sizzling sound (X). We call this setting a tail-to-tail setting, as you have two arrow tails pointing out from Z.
Case 2: Let's see another example. Here, we will condition two variables X (Pan frying) and Y (Alarm) on Z (smoke). In this example, if you already know about the smoke (Z), then the knowledge about somebody pan frying (X) does not give you extra information regarding the alarm (Y). Likewise, if you already have seen lots of smoke (Z) in the kitchen, then hearing the alarm (Y) does give any extra information regarding the fact that I am pan frying in the kitchen (X). We call this setting as head-to-tail, where the arrows form a chain.
Case 3: This case is a bit different. Consider the smoke (X) which triggers the fire alarm (Z). However, sometimes, the fire alarm might go off by an earthquake (Y, you know what I am talking about if you live in California). In this case, we can clearly say that X and Y are independent of each other (note I did not mention Z at all). Well, it kind of makes sense intuitively too, there is no reason that earthquake might be related to the smoke. However, if we take into account the alarm (Z), then X and Y are not conditionally independent anymore, given Z! This again makes sense. Assume that you fell off your chair all of a sudden, the earth is shaking, and you know an earthquake is going on. At the same time, you hear the alarm. Now, you most probably will assume that the alarm has gone off due to the earthquake, not the smoke. In other words, your knowledge of earthquake given the alarm, has effected the probability of the smoke, has "explained it away". This shows how observations can render otherwise independent causes to become statistically dependent!
Let's reiterate what we have seen so far. Two variables X and Y are independent conditional on another variable Z if knowledge about X gives you no extra information about Y once you have knowledge of Z. In other words, once you know Z, X adds nothing to what you know about Y.
I will talk about D-separation in another post.