These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.
The Hellinger distance is a metric used to measure the similarity between two probability distributions. It is related to the Euclidean distance but applied in the space of probability distributions. The Hellinger distance ranges between 0 and 1, where 0 indicates that the two distributions are identical, and 1 indicates that they are completely dissimilar.
Let P and Q denote two probability measures on a measure space X with corresponding probability density functions p(x) and q(x). The Hellinger distance H(P, Q) between these two distributions is defined as follows:
For continuous probability distributions:
H(P, Q) = (1 / sqrt(2)) * [∫ (sqrt(p(x)) - sqrt(q(x)))^2 dx]^(1/2)
For discrete probability distributions:
H(P, Q) = (1 / sqrt(2)) * sqrt(∑ (sqrt(p_i) - sqrt(q_i))^2)
Alternatively, the discrete version can be written more compactly as:
H(P, Q) = (1 / sqrt(2)) * ||sqrt(P) - sqrt(Q)||_2
where || · ||_2 denotes the Euclidean norm.
Properties of the Hellinger Distance
• Bounded Metric: The Hellinger distance satisfies the property 0 ≤ H(P, Q) ≤ 1.
• Symmetry: It is symmetric, meaning H(P, Q) = H(Q, P).
• Metric Properties: It satisfies the triangle inequality, making it a true metric.
• Maximum Distance: The maximum value of 1 occurs when the two distributions have no overlap.
Relationship with Other Measures
The Hellinger distance is closely related to the Bhattacharyya Coefficient but differs in that the Hellinger distance is a proper metric (it satisfies the triangle inequality). This makes the Hellinger distance easier to work with in certain mathematical contexts.
Application
The Hellinger distance is used in various fields, including statistical inference, machine learning, and information theory, to quantify the difference between probability distributions. It is especially useful in scenarios where one needs to measure the divergence between two distributions that might represent different scenarios, populations, or experimental conditions.
References
About the metric
You can click on the links to see the associated metrics
Objective(s):
Purpose(s):
Lifecycle stage(s):
Target users: