String Tools offers to apply various algorithms on strings. In particular, it supports to evaluate the similarity of two strings, to find a substring from a string or a resource identified with a URL using POSIX regular expressions, and to encode strings.
* The String Similarity screen supports the Levenshtein String Edit Distance, the Bigram Distance, Jaro-Winkler, Smith-Waterman and Dice Coefficient, as well as returning a normalized similarity in percent for all of these. Such distance metrics are often applied in record linkage and protein sequence similarity computation. The String Edit Distance is the minimal number of manipulations required to transform the first string into the second string. The Bigram Distance computes the number of 2-grams that both input strings share (including word start and word end character as own symbol). The Jaro Score evaluates if strings have common characters in close vicinity. Jaro-Winkler assigns a higher value to strings with common prefixes based on the Jaro Score. Smith-Waterman assigns different weights than Levenshtein for edit operations and returns the highest value of the generated result matrix as unnormalized similarity. The Dice Coefficent compares the number of common characters.
* The Regular Expression screen supports to evaluate extended POSIX regular expressions. As input, you can either give a string or reference to a source by providing a URL. The application returns the first match of the string. It is useful for testing whether a particular regular expression works on a given string as intended, as well as for finding particular words in a web source.
* The String Encoding screen supports to encode strings using URL Encoding (with and without reserved characters) and BASE64 encoding.