package tdigest
Install
Dune Dependency
Authors
Maintainers
Sources
md5=d17532085ce53b390d1b9446a176337c
sha512=ec193b47c9167aceaec593de9fd4c98baf2f5831acec4ba851e59c05738a6f24d921aa79488a08f68a714db6e1e03f590f9f9f1be56970ba84336e3cada06ff6
Description
The T-Digest is a data structure and algorithm for constructing an approximate distribution for a collection of real numbers presented as a stream.
The T-Digest can estimate percentiles or quantiles extremely accurately even at the tails, while using a fraction of the space.
Additionally, the T-Digest is concatenable, making it a good fit for distributed systems. The internal state of a T-Digest can be exported as a binary string, and the concatenation of any number of those strings can then be imported to form a new T-Digest.
Published: 28 Aug 2022
README
Tdigest
OCaml implementation of the T-Digest algorithm.
The T-Digest is a data structure and algorithm for constructing an approximate distribution for a collection of real numbers presented as a stream.
The median of a list of medians is not necessarily equal to the median of the whole dataset. The median (p50), p95, and p99 are critical measures that are expensive to compute due to their requirement of having the entire sorted dataset present in one place.
The T-Digest can estimate percentiles or quantiles extremely accurately even at the tails, while using a fraction of the space.
Additionally, the T-Digest is concatenable, making it a good fit for distributed systems. The internal state of a T-Digest can be exported as a binary string, and the concatenation of any number of those strings can then be imported to form a new T-Digest.
Links:
This library started off as a port of Will Welch's JavaScript implementation, down to the unit tests. However some modifications have been made to adapt it to OCaml, the most important one being immutability. As such, almost every function in the Tdigest
module return a new Tdigest.t
, including "reading" ones since they may trigger expensive intermediate computations worth caching.
Usage
The API is well documented here.
opam install tdigest
Performance
On a 2018 MacBook Pro, it can incorporate 1,000,000 random floating points in just 770ms.
Exporting and importing state (to_string
/of_string
) is essentially free.
Dependencies (3)
-
core_kernel
>= "v0.14.0" & < "v0.15.0"
-
dune
>= "1.9.0"
-
ocaml
>= "4.10.0"
Used by
None
Conflicts
None