## Probability clustering- fuzzy clustering

In the traditional clustering, any data either belongs to cluster X or Y (assuming that you have two clusters). In a way, you have a binary result for each data point (1: if it belongs to the cluster x, 0: if it does not belong to the cluster x). However, in many research scenarios, having a probability clustering can be more informative than the traditional clustering. To do that, instead of k-means, fuzzy clustering is used. In this blog, I am going to explain the mathematics behind it (rather than doing this on python). I think that we should understand what we do, and not do the blind data analysis. I used the example from this website, just I hope that I explained it in more detail (so that beginners like us can comprehend better- but they did a great job for sure!).

**Randomly initialize the data points into the planned number of the clusters**: Each data point has a membership value in both clusters (let us name our clusters as the cluster X and the cluster Y). For instance, if one of the data point is : (1,3) , then it has 80% membership value for the cluster X and 20% membership value for the cluster Y. In the traditional methods, we would say that this data point belongs to the cluster X. Instead, we consider the probabilities. In the beginning of the process, the membership values can be anything. I will use the same data points (which are (1,3) (2,5) (4,8) (7,9) ) as the example website I used. The membership value of the first data (1.3) point for the cluster X is 0.8, and for the cluster Y, it is 0.2. See the screenshot below for all the data points and their membership values:

**The second step is to find the centroid (V).** The formula for this is provided below:

m stands for the fuzziness parameter (taken as 2 here and taking fuzziness parameter as 2 is a general tendency, if you want to learn more about it, here is one example paper) and xk symbolizes the data point. We need to multiply the membership value (0.82) with the x of the data point (1). We do the same for all the data points (e.g., 0.72*2) and add to each value (e.g., (0.82*1 + 0.72*2+…). Then, this sum is divided by the sum of the gamma values (membership values) (e.g., 0.82+ 0.72+…) for the cluster x. The resulting number will be the x value of the centroid. The same calculation will be done with the y values of the data point (e.g., (0.82*3) and the resulting number will become the y value of the centroid for the cluster X. See for demonstration:

V11 = (0.82 * 1 + 0.72 * 2 + 0.22 * 4 + 0.12 * 7)/((0.82 + 0.72 + 0.22 + 0.12) = 1.56

V12 = (0.82 * 3 + 0.72 * 5 + 0.22 * 8 + 0.12 * 9) / ( (0.82 + 0.72 + 0.22 + 0.12 ) = 4.051

So, the first centroid (the centroid for cluster X) is (1.56, 4.05)

V11 = (0.22 * 1 + 0.32 * 2 + 0.82 * 4 + 0.92 * 7) / ( (0.22 + 0.32 + 0.82 + 0.92 ) = 5.35

V11 = (0.22 * 3 + 0.32 * 5 + 0.82 * 8 + 0.92 * 9) / ( (0.22 + 0.32 + 0.82 + 0.92 ) = 8.21

The second centroid (the centroid for the cluster Y) is (5.35, 8.21).

**The third step is finding out the distance of each data point from each centroid.** The first data point was (1,3) and the first centroid is (1.56, 4.05). The distance between these points are calculated. Then, the distance between (1.3) and the second centroid (5.35, 8.21) is calculated. In the same fashion, the distance of each data point form each centroid will be calculated. See below:

D11 = ((1 – 1.568)^{2} + (3 – 4.051)^{2})^{0.5} = 1.2

D12 = ((1 – 5.35)^{2} + (3 – 8.215)^{2})^{0.5} = 6.79

**The fourth step is updating the membership values based on these distances from the centroids.** The formula is below (taken from the website stated in this blog).

For the first data point (1,3) the calculated distances are 1.2 and 6.79. The fuzziness parameter was taken as 2.

= [{ [(1.2)^{2} / (1.2)^{2}] + [(1.2)^{2} / (6.79)^{2}]} ^ {(1 / (2 – 1))} ] ^{-1} = 0.96

= [{ [(6.79)^{2} / (6.79)^{2}] + [(6.79)^{2} / (1.2)^{2}]} ^ {(1 / (2 – 1))} ] ^{-1} = 0.04

**The fifth step is to repeat the step 2 (find out the centroid) and the step 4 (update the membership values based on the distances of the data points from each centroid) until the membership values are stabilized or the difference between each trial is lower than the tolerance value. **

**The sixth step is to defuzzify the obtained membership values.**