Library
Module
Module type
Parameter
Class
Class type
This is an aggregation of rules used to parse an e-mail address. The goal of this documentation is to show relations between RFCs, updates, and final description of parts needed to parse an e-mail address.
Obviously, this part is most a copy-paste from RFCs to explain what we implement. And for a client, it's a boring and indigestible (but needed) work. We provide implementations only for people know what they really need — and avoid duplicate code in some ways.
But the biggest advise about this module is just to ignore it and move on — like what I really want when I wrote this documentation.
From RFC5322.
obs-NO-WS-CTL = %d1-8 / ; US-ASCII control
%d11 / ; characters that do not
%d12 / ; include the carriage
%d14-31 / ; return, line feed, and
%d127 ; white space characters
From RFC822.
ctext = <any CHAR excluding "(", ; => may be folded
")", BACKSLASH & CR, & including
linear-white-space>
From RFC1522 (occurrences).
From RFC2047 § Appendix.
From RFC2822.
ctext = NO-WS-CTL / ; Non white space controls
%d33-39 / ; The rest of the US-ASCII
%d42-91 / ; characters not including "(",
%d93-126 ; ")", or BACKSLASH
From RFC5322.
ctext = %d33-39 / ; Printable US-ASCII
%d42-91 / ; characters not including
%d93-126 / ; "(", ")", or BACKSLASH
obs-ctext
obs-ctext = obs-NO-WS-CTL
Update from RFC 2822
+ Removed NO-WS-CTL from ctext
From RFC5335.
ctext =/ UTF8-xtra-char
UTF8-xtra-char = UTF8-2 / UTF8-3 / UTF8-4
UTF8-2 = %xC2-DF UTF8-tail
UTF8-3 = %xE0 %xA0-BF UTF8-tail /
%xE1-EC 2(UTF8-tail) /
%xED %x80-9F UTF8-tail /
%xEE-EF 2(UTF8-tail)
UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) /
%xF1-F3 3( UTF8-tail ) /
%xF4 %x80-8F 2( UTF8-tail )
UTF8-tail = %x80-BF
From RFC6532.
ctext =/ UTF8-non-ascii
@note about UTF-8, the process is out of this scope where we check only one byte here. @note about compliance with RFC1522, it's out of scope where we check only one byte here.
From RFC822.
qtext = <any CHAR excepting DQUOTE, ; => may be folded
BACKSLASH & CR, and including
linear-white-space>
From RFC2822.
qtext = NO-WS-CTL / ; Non white space controls
%d33 / ; The rest of the US-ASCII
%d35-91 / ; characters not including BACKSLASH
%d93-126 ; or the quote character
From RFC5322.
qtext = %d33 / ; Printable US-ASCII
%d35-91 / ; characters not including
%d93-126 / ; BACKSLASH or the quote character
obs-qtext
obs-qtext = obs-NO-WS-CTL
From RFC5335 (see is_ctext
about UTF-xtra-char
).
utf8-qtext = qtext / UTF8-xtra-char
From RFC6532.
qtext =/ UTF8-non-ascii
@note about UTF-8, the process is out of this scope where we check only one byte here.
The ABNF of atext
is not explicit from RFC822 but the relic could be find here.
atom = 1*<any CHAR except specials, SPACE and CTLs>
From RFC2822.
atext = ALPHA / DIGIT / ; Any character except controls,
"!" / "#" / ; SP, and specials.
"$" / "%" / ; Used for atoms
"&" / "'" /
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~"
From RFC5322.
atext = ALPHA / DIGIT / ; Printable US-ASCII
"!" / "#" / ; characters not including
"$" / "%" / ; specials. Used for atoms.
"&" / "'" /
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~"
From RFC535 (see is_ctext
about UTF-xtra-char
).
utf8-atext = ALPHA / DIGIT /
"!" / "#" / ; Any character except
"$" / "%" / ; controls, SP, and specials.
"&" / "'" / ; Used for atoms.
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~" /
UTF8-xtra-char
From RFC6532.
atext =/ UTF8-non-ascii
@note about, UTF-8, the process is out of this scope where we check only byte here.
From RFC822.
quoted-pair = BACKSLASH CHAR ; may quote any char
CHAR is case-sensitive
From RFC2822.
quoted-pair = (BACKSLASH text) / obs-qp
text = %d1-9 / ; Characters excluding CR and LF
%d11 /
%d12 /
%d14-127 /
obs-text
obs-text = *LF *CR *(obs-char *LF *CR)
obs-char = %d0-9 / %d11 / ; %d0-127 except CR and
%d12 / %d14-127 ; LF
obs-qp = BACKSLASH (%d0-127)
From RFC5322.
quoted-pair = (BACKSLASH (VCHAR / WSP)) / obs-qp
obs-qp = BACKSLASH (%d0 / obs-NO-WS-CTL / LF / CR)
From RFC5335 (see is_ctext
about UTF-xtra-char
).
utf8-text = %d1-9 / ; all UTF-8 characters except
%d11-12 / ; US-ASCII NUL, CR, and LF
%d14-127 /
UTF8-xtra-char
utf8-quoted-pair = (BACKSLASH utf8-text) / obs-qp
@note this function is fun _chr -> true
. @note RFC5322 (last version of e-mail) does not mention an update from RFC2822. RFC6532 does not mention an update of quoted-pair
. This implemention follow RFC5322 without unicode support.
From RFC822.
dtext = <any CHAR excluding "[", ; => may be folded
"]", BACKSLASH & CR, & including
linear-white-space>
From RFC2822.
dtext = NO-WS-CTL / ; Non white space controls
%d33-90 / ; The rest of the US-ASCII
%d94-126 ; characters not including "[",
; "]", or BACKSLASH
From RFC5322.
+ Removed NO-WS-CTL from dtext
dtext = %d33-90 / ; Printable US-ASCII
%d94-126 / ; characters not including
obs-dtext ; "[", "]", or BACKSLASH
obs-dtext = obs-NO-WS-CTL / quoted-pair
@note quoted-pair
can not be processed here where we handle only one byte.
val quoted_pair : char Angstrom.t
See is_quoted_pair
.
val fws : (bool * bool * bool) Angstrom.t
From RFC822.
From RFC2822 § 3.2.3 & RFC2822 § 4.2.
White space characters, including white space used in folding
(described in section 2.2.3), may appear between many elements in
header field bodies. Also, strings of characters that are treated as
comments may be included in structured field bodies as characters
enclosed in parentheses. The following defines the folding white
space (FWS) and comment constructs.
Strings of characters enclosed in parentheses are considered comments
so long as they do not appear within a "quoted-string", as defined in
section 3.2.5. Comments may nest.
There are several places in this standard where comments and FWS may
be freely inserted. To accommodate that syntax, an additional token
for "CFWS" is defined for places where comments and/or FWS can occur.
However, where CFWS occurs in this standard, it MUST NOT be inserted
in such a way that any line of a folded header field is made up
entirely of WSP characters and nothing else.
FWS = ([*WSP CRLF] 1*WSP) / ; Folding white space
obs-FWS
In the obsolete syntax, any amount of folding white space MAY be
inserted where the obs-FWS rule is allowed. This creates the
possibility of having two consecutive "folds" in a line, and
therefore the possibility that a line which makes up a folded header
field could be composed entirely of white space.
obs-FWS = 1*WSP *(CRLF 1*WSP)
From RFC5322 § 3.2.2 & RFC322 § 4.2.
White space characters, including white space used in folding
(described in section 2.2.3), may appear between many elements in
header field bodies. Also, strings of characters that are treated as
comments may be included in structured field bodies as characters
enclosed in parentheses. The following defines the folding white
space (FWS) and comment constructs.
Strings of characters enclosed in parentheses are considered comments
so long as they do not appear within a "quoted-string", as defined in
section 3.2.4. Comments may nest.
There are several places in this specification where comments and FWS
may be freely inserted. To accommodate that syntax, an additional
token for "CFWS" is defined for places where comments and/or FWS can
occur. However, where CFWS occurs in this specification, it MUST NOT
be inserted in such a way that any line of a folded header field is
made up entirely of WSP characters and nothing else.
FWS = ([*WSP CRLF] 1*WSP) / obs-FWS ; Folding white space
In the obsolete syntax, any amount of folding white space MAY be
inserted where the obs-FWS rule is allowed. This creates the
possibility of having two consecutive "folds" in a line, and
therefore the possibility that a line which makes up a folded header
field could be composed entirely of white space.
obs-FWS = 1*WSP *(CRLF 1*WSP)
val obs_fws : (bool * bool * bool) Angstrom.t
See fws
.
val comment : unit Angstrom.t
val cfws : unit Angstrom.t
val qcontent : string Angstrom.t
val quoted_string : string Angstrom.t
From RFC822.
quoted-string = DQUOTE *(qtext/quoted-pair) DQUOTE ; Regular qtext or
; quoted chars.
From RFC2047.
+ An 'encoded-word' MUST NOT appear within a 'quoted-string'
From RFC2822.
quoted-string = [CFWS]
DQUOTE *([FWS] qcontent) [FWS] DQUOTE
[CFWS]
A quoted-string is treated as a unit. That is, quoted-string is
identical to atom, semantically. Since a quoted-string is allowed to
contain FWS, folding is permitted. Also note that since quoted-pair
is allowed in a quoted-string, the quote and backslash characters may
appear in a quoted-string so long as they appear as a quoted-pair.
Semantically, neither the optional CFWS outside of the quote
characters nor the quote characters themselves are part of the
quoted-string; the quoted-string is what is contained between the two
quote characters. As stated earlier, the BACKSLASH in any quoted-pair
and the CRLF in any FWS/CFWS that appears within the quoted-string are
semantically "invisible" and therefore not part of the quoted-string
either.
@note in other words, space(s) in FWS
are "visible" between DQUOTE.
From RFC5322.
quoted-string = [CFWS]
DQUOTE *([FWS] qcontent) [FWS] DQUOTE
[CFWS]
@note currenlty, this implementation has a bug about multiple spaces in quoted-string
. We need to update fws
to count how many space(s) we skip.
val atom : string Angstrom.t
val word : word Angstrom.t
val dot_atom_text : string list Angstrom.t
val dot_atom : string list Angstrom.t
val local_part : local Angstrom.t
From RFC822.
From RFC2822 § 3.4.1 & RFC2822 § 4.4.
From RFC5322 § 3.4.1 & RFC5322 § 4.4.
local-part = dot-atom / quoted-string / obs-local-part
obs-local-part = word *("." word)
val obs_local_part : local Angstrom.t
See local_part
.
val domain_literal : string Angstrom.t
val obs_domain : string list Angstrom.t
val domain : domain Angstrom.t
From RFC822 § 6.1, RFC822 § 6.2.1, RFC822 § 6.2.2 & RFC822 § 6.2.3.
From RFC2822 § 3.4.1 & RFC2822 § 4.4.
domain = dot-atom / domain-literal / obs-domain
obs-domain = atom *("." atom)
From RFC5322 § 3.4.1 & RFC5322 § 4.4.
domain = dot-atom / domain-literal / obs-domain
obs-domain = atom *("." atom)
@note from RFC5322, we should accept any domain as `Literal
and let the user to resolve it. Currently, we fail when we catch a `Literal
and do the best effort where we follow RFC5321. But may be it's inconvenient (or not?) to fail.
val id_left : local Angstrom.t
From RFC2822 § 3.6.4 & RFC2822 § 4.5.4.
obs-id-left = local-part
no-fold-quote = DQUOTE *(qtext / quoted-pair) DQUOTE
id-left = dot-atom-text / no-fold-quote / obs-id-left
From RFC5322 § 3.6.4 & RFC5322 § 4.5.4.
id-left = dot-atom-text / obs-id-left
obs-id-left = local-part
val no_fold_literal : string Angstrom.t
val id_right : domain Angstrom.t
From RFC2822 § 3.6.4 & RFC2822 § 4.5.4.
id-right = dot-atom-text / no-fold-literal / obs-id-right
obs-id-right = domain
From RFC5322 § 3.6.4 & RFC5322 § 4.5.4.
id-right = dot-atom-text / no-fold-literal / obs-id-right
obs-id-right = domain
val msg_id : (local * domain) Angstrom.t
From RFC822 § 4.1 & RFC822 § 6.1.
addr-spec = local-part "@" domain ; global address
msg-id = "<" addr-spec ">" ; Unique message id
From RFC2822.
From RFC5322.
val addr_spec : mailbox Angstrom.t
From RFC822.
addr-spec = local-part "@" domain ; global address
From RFC2822.
An addr-spec is a specific Internet identifier that contains a
locally interpreted string followed by the at-sign character ("@",
ASCII value 64) followed by an Internet domain. The locally
interpreted string is either a quoted-string or a dot-atom. If the
string can be represented as a dot-atom (that is, it contains no
characters other than atext characters or "." surrounded by atext
characters), then the dot-atom form SHOULD be used and the
quoted-string form SHOULD NOT be used. Comments and folding white
space SHOULD NOT be used around the "@" in the addr-spec.
addr-spec = local-part "@" domain
From RFC5322.
val angle_addr : mailbox Angstrom.t
From RFC822.
The ABNF of angle-addr
is not explicit from RFC 822 but the relic could be find here, as a part of mailbox:
mailbox = addr-spec ; simple address
/ phrase route-addr ; name & addr-spec
From RFC2822 § 3.4 & RFC2822 § 4.4.
From RFC5322 § 3.4 & RFC5322 § 4.4.
val obs_domain_list : domain list Angstrom.t
See angle_addr
.
val obs_route : domain list Angstrom.t
See angle_addr
.
val obs_angle_addr : mailbox Angstrom.t
See angle_addr
.
val phrase : phrase Angstrom.t
From RFC822.
phrase = 1*word ; Sequence of words
From RFC2047 § 2 & RFC2047 § 5.
From RFC2822 § 3.2.6 & RFC2822 § 4.1.
From RFC5322 § 3.2.5 & RFC5322 § 4.1.
val obs_phrase : phrase Angstrom.t
See phrase
.
val display_name : phrase Angstrom.t
val mailbox : mailbox Angstrom.t