​​The Cambridge Crystallographic Data Centre (CCDC).
The CCDC websites use cookies. By continuing to browse the site you are agreeing to our use of cookies. For more details about cookies and how to manage them, see our  cookie policy.

What are the differences between the Tanimoto and Dice similarity coefficients?

Solution

Note: The following information applies only to WebCSD version 1 (legacy WebCSD).

The Tanimoto coefficent is determined by looking at the number of chemical features that are common to both molecules (the intersection of the data strings) compared to the number of chemical features that are in either (the union of the data strings). The Dice coefficient also compares these values but using a slightly different weighting.

The Tanimoto coefficient is the ratio of the number of features common to both molecules to the total number of features, i.e.

( A intersect B ) / ( A + B - ( A intersect B ) )

The range is 0 to 1 inclusive.

The Dice coefficient is the number of features in common to both molecules relative to the average size of the total number of features present, i.e.

( A intersect B ) / 0.5 ( A + B )

The weighting factor comes from the 0.5 in the denominator. The range is 0 to 1.​