CCUtf8_string
Unicode String, in UTF8
A unicode string represented by a utf8 bytestring. This representation is convenient for manipulating normal OCaml strings that are encoded in UTF8.
We perform only basic decoding and encoding between codepoints and bytestrings. For more elaborate operations, please use the excellent Uutf.
status: experimental
val hash : t -> int
val pp : Stdlib.Format.formatter -> t -> unit
val to_string : t -> string
Identity.
Iter of unicode codepoints. Renamed from to_std_seq
since 3.0.
val n_chars : t -> int
Number of characters.
val n_bytes : t -> int
Number of bytes.
val empty : t
Empty string.
concat sep l
concatenates each string in l
, inserting sep
in between each string. Similar to String.concat
.
Build a string from unicode codepoints Renamed from of_std_seq
since 3.0.
Translate the unicode codepoint to a list of utf-8 bytes. This can be used, for example, in combination with Buffer.add_char
on a pre-allocated buffer to add the bytes one by one (despite its name, Buffer.add_char
takes individual bytes, not unicode codepoints).
val of_string_exn : string -> t
Validate string by checking it is valid UTF8.
val of_string : string -> t option
Safe version of of_string_exn
.
val unsafe_of_string : string -> t
Conversion from a string without validating. CAUTION this is unsafe and can break all the other functions in this module. Use only if you're sure the string is valid UTF8. Upon iteration, if an invalid substring is met, Malformed will be raised.