package asak
Install
Dune Dependency
Authors
Maintainers
Sources
md5=542e0d58dca1edefbc75e58c141b9cd2
sha512=aba0cd40520c69ebfc9004af4462ceb97db9aa0f403c87137df688e695b537a76e9a34926172db2d409e934716d58d58b2e77dc7b02b91670f49caf8d619d666
Description
Asak provides functions to parse, type-check and identify similar OCaml codes. These functions are then used to partition codes implementing the same function and help to detect code duplication.
Published: 31 Jul 2024
README
asak
Asak is an OCaml library that allows to identify similar OCaml codes.
Why ?
For teaching
The module Asak.Partition
offers a function create
that produces a partition of codes implementing the same function, where two codes are in the same class if they are syntactically "close".
For redundancy detection
The binary anzad
can detect redundant definitions of an OCaml project built with dune
and compare it with a database of previously analyzed projects.
To use it on a project with sources in src/
, run:
dune build @check
anzad src/
Documentation
The documentation of the API is available here: https://nobrakal.github.io/asak/asak/.
A man page is available for the binary anzad
.
How ?
The idea is to compare AST (Abstract Syntax Tree) of codes. However, the OCaml AST is too rich for our purpose (since, for example, match x with ...
and function ...
generate two different AST). We decided instead to use the Lambda language, an intermediate language in the OCaml compilation pipeline, where such syntactic sugar is optimized away.
To efficiently compare Lambda trees, we use the methodology of Chilowicz et al. which consist in hashing recursively trees.
We then compare hashes and provide a clustering of the closest functions.
But really, how ?
There is two cores:
Asak.Lambda_hash
, that defines a functionhash_lambda : config -> threshold -> Lambda.lambda -> hash
which is hashing a Lambda expression capturing the shape of the AST, with respect to the given configuration. We recommend to first apply the functions fromAsak.Lambda_normalization
to normalize the Lambda expression.Asak.Clustering
, that defines a functioncluster : ('a * Lambda_hash.hash) list -> ('a list) wtree list
which is making a kind of complete-linkage clustering of a list of hashes. The output is a dendrogram where leaves are close in a tree if they are similar. It is guaranteed that two codes in the same tree share at least a sub-AST.
More details
A paper (in french) about asak was published in the proceedings of the JFLA (Journées Francophones des Langages Applicatifs) 2020. It can be found here: https://github.com/nobrakal/asak-paper/
The name
This tool is about making partitions. "Partition" is the word in french for "sheet music". Consequently, its name is about music: asak is the name of Tuareg's traditional songs accompanied by a monochord violin.
This monochord violin is called an anzad, which is the name of the binary client of asak.
Resources
Inzad is an experiment to detect OCaml code similarities, based on Asak's ideas.
License and copyright
Asak is released under the MIT license. The copyright is held by IRIF / OCaml Software Foundation.
Authors
Asak is developed by Alexandre Moine.
Used by (2)
-
learn-ocaml
>= "0.12" & != "0.16.0"
-
learn-ocaml-client
!= "0.16.0"
Conflicts
None