FAQs
What are the differences between the Tanimoto and Dice similarity coefficients?
Solution
Note: The following information applies only to WebCSD version 1 (legacy WebCSD).
The Tanimoto coefficent is determined by looking at the number of chemical features that are common to both molecules (the intersection of the data strings) compared to the number of chemical features that are in either (the union of the data strings). The Dice coefficient also compares these values but using a slightly different weighting.
The Tanimoto coefficient is the ratio of the number of features common to both molecules to the total number of features, i.e.
( A intersect B ) / ( A + B - ( A intersect B ) )
The range is 0 to 1 inclusive.
The Dice coefficient is the number of features in common to both molecules relative to the average size of the total number of features present, i.e.
( A intersect B ) / 0.5 ( A + B )
The weighting factor comes from the 0.5 in the denominator. The range is 0 to 1.