package duff
Install
Dune Dependency
Authors
Maintainers
Sources
sha256=4795e8344a2c2562e0ef6c44ab742334b5cd807637354715889741b20a461da4
sha512=a2771c2d045059eef991afcad5a6c8513c317bb928caf8c9c33d18c0df9a4b284da8f2b278d7e952605e74c3211a54ea683f107b0205719f5d19b0120b32b7e9
Description
This library provides a pure implementation of Rabin's fingerprint and diff algorithm in OCaml.
It follows libXdiff C library. It is used by ocaml-git project.
Published: 15 Mar 2021
README
Duff – libXdiff implementation in OCaml
Duff is a little library to implement libXdiff in OCaml. This library is a part of the ocaml-git project. This code is a translation of diff-delta.c
available on the git project in OCaml. So, it respects some git's constraints unlike libXdiff.
Examples
This library let the user to calculate an index
from a source (a hash-table) which can be computed with a blob. Then, from index
(which represents your source) and a blob, we generate a list of Copy
and Insert
elements.
Copy (off, len)
means to take a slice oflen
bytes from your source atoff
(absolute offset) and copy it.Insert (off, len)
means to store a slice oflen
bytes from your blob atoff
(absolute offset) and copy it.
From this information, we can have a tiny representation of your blob which can be reconstruct with your source. The goal is to store Copy
opcode with off
and len
, and Insert
opcode which contains a slice of your blob.
Finally, to produce a PACK file in git or ocaml-git, we use this algorithm and this representation to optimize storage of your blobs (cf. git gc
).
Binary
You can see an example of duff
in bin
directory. It's an executable to represent a thin representation of your file. Then, you can reconstruct it with patch
sub-command.
This is an example to use duff
:
$ ./duff.exe diff source target > target.xduff
$ ./duff.exe patch source < target.xduff > target.new
$ diff target target.new
$ echo $?
0
The internal format used is close to what git
does internally (without zlib
layer). However, it does not correspond to an official format. The binary is not optimized to be used in a production environment but feedback and improvement on it are welcome.
Limitations
Because this project is used by ocaml-git, we have some limitations:
We compute at most
0xFFFFFFFE
bytes from sourceAn
insert
block can not be bigger than0x10000
bytes
For example, libXdiff computes a bigger source than this implementation. Then, limitation about insert
block depends on the PACK (git) file format. So, don't ask me to compute bigger source or merge and produce bigger insert
block - these constraints is outside the scope of this library.
From this limitation, Copy
opcode have an offset between 0x0 and 0xFFFFFFE and off + len
is lower than 0xFFFFFFFE.
Fuzzer
We provide a fuzzer to randomly test this library. Currently (4/9/2018), afl-fuzz
did not find any bugs and it computed 67.7k cycles (117 paths).
Dependencies (5)
- stdlib-shims
- bigarray-compat
- fmt
-
dune
>= "2.0.0"
-
ocaml
>= "4.03.0" & < "5.1"
Dev Dependencies (4)
-
crowbar
with-test
-
hxd
with-test & >= "0.3.1"
-
bigstringaf
with-test
-
alcotest
with-test
Used by (1)
-
carton
< "0.4.4"
Conflicts
None