Library
Module
Module type
Parameter
Class
Class type
Value converters.
A value converter describes how to encode and decode OCaml values to a binary presentation and a textual, human specifiable, s-expression based, representation.
Notation. Given a value v
and a converter c
we write [v
]c the textual encoding of v
according to c
.
The exception for conversion errors. This exception is raised both by encoders and decoders with raise_notrace
. The integers indicates a byte index range in the input on decoding errors, it is meaningless on encoding ones.
Note. This exception is used for defining converters. High-level converting functions do not raise but use result values to report errors.
module Bin : sig ... end
Binary codecs.
module Txt : sig ... end
Textual codecs
val v :
kind:string ->
docvar:string ->
'a Bin.enc ->
'a Bin.dec ->
'a Txt.enc ->
'a Txt.dec ->
'a t
v ~kind ~docvar bin_enc bin_dec txt_enc txt_dec
is a value converter using bin_enc
, bin_dec
, txt_enc
, txt_dec
for binary and textual conversions. kind
documents the kind of converted value and docvar
a meta-variable used in documentation to stand for these values (use uppercase e.g. INT
for integers).
val kind : 'a t -> string
kind c
is the documented kind of value converted by c
.
val docvar : 'a t -> string
docvar c
is the documentation meta-variable for values converted by c
.
with_kind ~docvar k c
is c
with kind k
and documentation meta-variable docvar
(defaults to docvar c
).
with_docvar docvar c
is c
with documentation meta-variable docvar
.
with_conv ~kind ~docvar to_t of_t t_conv
is a converter for type 'b
given a converter t_conv
for type 'a
and conversion functions from and to type 'b
. The conversion functions should raise Error
if they are not total.
to_bin c v
binary encodes v
using c
. buf
is used as the internal buffer if specified (it is Buffer.clear
ed before usage).
to_txt c v
textually encodes v
using c
. buf
is used as the internal buffer if specified (it is Buffer.clear
ed before usage).
of_txt c s
textually decodes a value from s
using c
.
to_pp c
is a formatter using to_txt
to format values. Any error that might occur is printed in the output using the s-expression (conv-error [c
]kind [e
]) with [c
]kind the atom for the value kind c
and [e
] the atom for the error message.
val bool : bool t
bool
converts booleans. Textual conversions represent booleans with the atoms true and false.
val byte : int t
byte
converts a byte. Textual decoding parses an atom according to the syntax of int_of_string
. Conversions fail if the integer is not in the range [0;255].
val int : int t
int
converts signed OCaml integers. Textual decoding parses an atom according to the syntax of int_of_string
. Conversions fail if the integer is not in the range [-2Sys.int_size
-1;2Sys.int_size
-1-1].
Warning. A large integer encoded on a 64-bit platform may fail to decode on a 32-bit platform, use int31
or int64
if this is a problem.
val int31 : int t
int31
converts signed 31-bit integers. Textual decoding parses an atom according to the syntax of int_of_string
. Conversions fail if the integer is not in the range [-230;230-1].
val int32 : int32 t
int32
converts signed 32-bit integers. Textual decoding parses an atom according to the syntax of Int32.of_string
. Conversions fail if the integer is not in the range [-231;231-1].
val int64 : int64 t
int64
converts signed 64-bit integers. Textual decoding parses an atom according to the syntax of Int64.of_string
. Conversions fail if the integer is not in the range [-263;263-1].
val float : float t
float
converts floating point numbers. Textual decoding parses an atom using float_of_string
.
val string_bytes : string t
string_bytes
converts OCaml strings as byte sequences. Textual conversion represents the bytes of s
with the s-expression (hex [s
]hex) with [s
]hex the atom resulting from String.Ascii.to_hex
s
. See also atom
and only_string
.
Warning. A large string encoded on a 64-bit platform may fail to decode on a 32-bit platform.
val atom : string t
atom
converts strings assumed to represent UTF-8 encoded Unicode text; but the encoding is not checked. Textual conversions represent strings as atoms. See also string_bytes
and only_string
.
Warning. A large atom encoded on a 64-bit platform may fail to decode on a 32-bit platform.
option c
converts optional values converted with c
. Textual conversions represent None
with the atom none and Some v
with the s-expression (some [v
]c).
some c
wraps decodes of c
with Option.some
. Warning. None
can't be converted in either direction, use option
for this.
result ok error
converts result values with ok
and error
. Textual conversions represent Ok v
with the s-expression (ok [v
]ok) and Error e
with (error [e
]error).
array c
converts a list of values converted with c
. Textual conversions represent a list [v0; ... vn]
by the s-expression ([v0
]c ... [vn
]c).
Warning. A large list encoded on a 64-bit platform may fail to decode on a 32-bit platform.
array c
is like list
but converts arrays.
Warning. A large array encoded on a 64-bit platform may fail to decode on a 32-bit platform.
pair c0 c1
converts pairs of values converted with c0
and c1
. Textual conversion represent a pair (v0, v1)
by the s-expression ([v0
]c0 [v1
]c1).
val enum :
kind:string ->
docvar:string ->
?eq:('a -> 'a -> bool) ->
(string * 'a) list ->
'a t
enum ~kind ~docvar ~eq vs
converts values present in vs
. eq
is used to test equality among values (defaults to (=
)
). The list length should not exceed 256. Textual conversions use the strings of the pairs in vs
as atoms to encode the corresponding value.
Textual conversions performed by the following converters cannot be composed; they do not respect the syntax of s-expression atoms. They can be used for direct conversions when one does not want to be subject to the syntactic constraints of s-expressions. For example when parsing command line interface arguments or environment variables.
val string_only : string t
string_only
converts OCaml strings. Textual conversion is not composable, use string_bytes
or atom
instead. Textual encoding passes the string as is and decoding ignores the initial starting point and returns the whole input string.
Warning. A large string encoded on a 64-bit platform may fail to decode on a 32-bit platform.
S-expressions are a general way of describing data via atoms (sequences of characters) and lists delimited by parentheses. Here are a few examples of s-expressions and their syntax:
this-is-an-atom (this is a list of seven atoms) (this list contains (a nested) list) ; This is a comment ; Anything that follows a semi-colon is ignored until the next line (this list ; has three atoms and an embededded () comment) "this is a quoted atom, it can contain spaces ; and ()" "quoted atoms can be split ^ across lines or contain Unicode esc^u\{0061\}pes"
We define the syntax of s-expressions over a sequence of Unicode characters in which all US-ASCII control characters except whitespace are forbidden in unescaped form.
Note. This module assumes the sequence of Unicode characters is encoded as UTF-8 although it doesn't check this for now.
An s-expression is either an atom or a list of s-expressions interspaced with whitespace and comments. A sequence of s-expressions is a succession of s-expressions interspaced with whitespace and comments.
These elements are informally described below and finally made precise via an ABNF grammar.
Whitespace is a sequence of whitespace characters, namely, space ' '
(U+0020), tab '\t'
(U+0009), line feed '\n'
(U+000A), vertical tab '\t'
(U+000B), form feed (U+000C) and carriage return '\r'
(U+000D).
Unless it occurs inside an atom in quoted form (see below) anything that follows a semicolon ';'
(U+003B) is ignored until the next end of line, that is either a line feed '\n'
(U+000A), a carriage return '\r'
(U+000D) or a carriage return and a line feed "\r\n"
(<U+000D,U+000A>).
(this is not a comment) ; This is a comment (this is not a comment)
An atom represents ground data as a string of Unicode characters. It can, via escapes, represent any sequence of Unicode characters, including control characters and U+0000. It cannot represent an arbitrary byte sequence except via a client-defined encoding convention (e.g. Base64 or hex encoding).
Atoms can be specified either via an unquoted or a quoted form. In unquoted form the atom is written without delimiters. In quoted form the atom is delimited by double quote '\"'
(U+0022) characters, it is mandatory for atoms that contain whitespace, parentheses '('
')'
, semicolons ';'
, quotes '\"'
, carets '^'
or characters that need to be escaped.
abc ; a token for the atom "abc" "abc" ; a quoted token for the atom "abc" "abc; (d" ; a quoted token for the atom "abc; (d" "" ; the quoted token for the atom ""
For atoms that do not need to be quoted, both their unquoted and quoted form represent the same string; e.g. the string "true"
can be represented both by the atoms true and "true". The empty string can only be represented in quoted form by "".
In quoted form escapes are introduced by a caret '^'
. Double quotes '\"'
and carets '^'
must always be escaped.
"^^" ; atom for ^ "^n" ; atom for line feed U+000A "^u\{0000\}" ; atom for U+0000 "^"^u\{1F42B\}^"" ; atom with a quote, U+1F42B and a quote
The following escape sequences are recognized:
"^ "
(<U+005E,U+0020>) for space ' '
(U+0020)"^\""
(<U+005E,U+0022>) for double quote '\"'
(U+0022) mandatory"^^"
(<U+005E,U+005E>) for caret '^'
(U+005E) mandatory"^n"
(<U+005E,U+006E>) for line feed '\n'
(U+000A)"^r"
(<U+005E,U+0072>) for carriage return '\r'
(U+000D)"^u{X}"
with X
is from 1 to at most 6 upper or lower case hexadecimal digits standing for the corresponding Unicode character U+X.'\n'
(U+000A) or carriage return '\r'
(U+000D), following a caret is an illegal sequence of characters. In the two former cases the atom continues on the next line and white space is ignored.An atom in quoted form can be split across lines by using a caret '^'
(U+005E) followed by a line feed '\n'
(U+000A) or a carriage return '\r'
(U+000D); any subsequent whitespace is ignored.
"^ a^ ^ " ; the atom "a "
The character '^'
(U+005E) is used as an escape character rather than the usual '\\'
(U+005C) in order to make quoted Windows® file paths decently readable and, not the least, utterly please DKM.
Lists are delimited by left '('
(U+0028) and right ')'
(U+0029) parentheses. Their elements are s-expressions separated by optional whitespace and comments. For example:
(a list (of four) expressions) (a list(of four)expressions) ("a"list("of"four)expressions) (a list (of ; This is a comment four) expressions) () ; the empty list
The following RFC 5234 ABNF grammar is defined on a sequence of Unicode characters.
sexp-seq = *(ws / comment / sexp) sexp = atom / list list = %x0028 sexp-seq %x0029 atom = token / qtoken token = t-char *(t-char) qtoken = %x0022 *(q-char / escape / cont) %x0022 escape = %x005E (%x0020 / %x0022 / %x005E / %x006E / %x0072 / %x0075 %x007B unum %x007D) unum = 1*6(HEXDIG) cont = %x005E nl ws ws = *(ws-char) comment = %x003B *(c-char) nl nl = %x000A / %x000D / %x000D %x000A t-char = %x0021 / %x0023-0027 / %x002A-%x003A / %x003C-%x005D / %x005F-%x007E / %x0080-D7FF / %xE000-10FFFF q-char = t-char / ws-char / %x0028 / %x0029 / %x003B ws-char = %x0020 / %x0009 / %x000A / %x000B / %x000C / %x000D c-char = %x0009 / %x000B / %x000C / %x0020-D7FF / %xE000-10FFFF
A few additional constraints not expressed by the grammar:
unum
once interpreted as an hexadecimal number must be a Unicode scalar value.nl
.