Gibbs Sampling

The corpus is now randomly initialized.

When you are under "corpus mode", each time you click the iterate button, the model is going to iterate through each word from each document once. This mode is designed for users to notice the convergence faster.

The corpus is now randomly initialized.

When you are under "word mode", each time you click the iterate button, the model is going to be trained for the current word. This mode is designed for user to learn the underlying details of model training.

Step 1: Unlabel the word with topic and continue;

Step 2: Iterate through topic, take topic 1 as an example;

Step 3: For current document, calculate the ratio of the occrance of topic 1 compared to the corpus with alpha added for smoothing, by formula:

(occrance of current topic in current document + α) / (length of current topic + α * number of topics)

Step 4: For the current word, calculate the ratio of the occrance of topic 1 compared to the corpus with beta added for smoothing, by formula:

(occrance of current word of current topic + β) / (total occrance of current word + β * size of vocabulary)

Step 5: Multiply the above two probabilities and append them to the distribution list:

Step 6: Iterate Step 2 - 5 for each topic, resulting with a raw distribution:

Step 7: Normalize the distribution:

Step 8: Randomly draw from the distribution: