This script attempts to compare two Pali texts for differences. The steps it takes are roughly as follows:
- remove non-pali characters from each input
- break each input into words
- compare each word until a difference is found
- attempt three different resolutions concurrently:
- assume the two words are the same, spelled differently
- assume the word from the first input is extraneous, skipping it
- assume the word from the second input is extraneous, skipping it
- after applying each of the three resolution methods, the same three methods are then applied to the next parallel words until both inputs are exhausted
- a comparison score (0-100) is assigned to each word in each input for each method chosen, and then averaged over number of words in total
- best score above is used to color each word in each input, outputted to the output divs
- Note: the strict checkbox refers to methods two and three above; checking it means the word after the skipped word will be used only if it matches the word in the other input exactly (otherwise it is skipped). If unchecked, the next word must merely score higher than the first method. If a word is not used, it too will be skipped until a word that meets the criterion is found.
- Note: this script is resource intensive. It attempts to make use of multiple threads, but seems to focus on a single thread for some reason. Try shorter strings first.