<cahute/text.h> – Text encoding related utilities for Cahute¶
Macro definitions¶
CAHUTE_TEXT_ENCODING_* are constants representing how a given
picture’s data is encoded.
-
CAHUTE_TEXT_ENCODING_LEGACY_8¶
Constant representing the Variable width encoding with the legacy character table.
-
CAHUTE_TEXT_ENCODING_LEGACY_16_HOST¶
Constant representing the Fixed-width encoding with the legacy character table, and host endianness.
-
CAHUTE_TEXT_ENCODING_LEGACY_16_BE¶
Constant representing the Fixed-width encoding with the legacy character table, and big endian.
-
CAHUTE_TEXT_ENCODING_LEGACY_16_LE¶
Constant representing the Fixed-width encoding with the legacy character table, and little endian.
-
CAHUTE_TEXT_ENCODING_9860_8¶
Constant representing the Variable width encoding with the fx-9860G character table.
-
CAHUTE_TEXT_ENCODING_9860_16_HOST¶
Constant representing the Fixed-width encoding with the fx-9860G character table, and host endianness.
-
CAHUTE_TEXT_ENCODING_9860_16_BE¶
Constant representing the Fixed-width encoding with the fx-9860G character table, and big endian.
-
CAHUTE_TEXT_ENCODING_9860_16_LE¶
Constant representing the Fixed-width encoding with the fx-9860G character table, and little endian.
-
CAHUTE_TEXT_ENCODING_CAT¶
Constant representing the CAT data encoding.
-
CAHUTE_TEXT_ENCODING_CTF¶
Constant representing the CTF data encoding.
-
CAHUTE_TEXT_ENCODING_UTF32_HOST¶
Constant representing the UTF-32 encoding, with host endianness.
-
CAHUTE_TEXT_ENCODING_UTF32_BE¶
Constant representing the UTF-32 encoding, with big endian.
-
CAHUTE_TEXT_ENCODING_UTF32_LE¶
Constant representing the UTF-32 encoding, with little endian.
-
CAHUTE_TEXT_ENCODING_UTF8¶
Constant representing the UTF-8 encoding.
Function declarations¶
-
int cahute_convert_text(cahute_context *context, void **bufp, size_t *buf_sizep, void const **datap, size_t *data_sizep, int dest_encoding, int source_encoding)¶
Convert text from one encoding to another.
Note
When
CAHUTE_TEXT_ENCODING_UTF32_HOST,CAHUTE_TEXT_ENCODING_UTF32_BE,CAHUTE_TEXT_ENCODING_UTF32_LEorCAHUTE_TEXT_ENCODING_UTF8is used as the destination encoding, Normalization Form C (NFC) is employed; see Unicode Normalization Forms for more information.Errors you can expect from this function are the following:
CAHUTE_OKThe conversion has finished successfully, and there is no more bytes in the input buffer to read.
CAHUTE_ERROR_TERMINATEDA sentinel has been found, and the conversion has been interrupted.
Note
If this error is raised,
*datapis set to after the sentinel, and*data_sizepis set accordingly.This is useful in case you have multiple text blobs placed back-to-back.
CAHUTE_ERROR_SIZEThe destination buffer had insufficient space, and the procedure was interrupted.
CAHUTE_ERROR_TRUNCThe source data had an incomplete sequence, and the procedure was interrupted.
CAHUTE_ERROR_INVALIDThe source data contained an unknown or invalid sequence, and the procedure was interrupted.
CAHUTE_ERROR_INCOMPATThe source data contained a sequence that could not be translated to the destination encoding.
At the end of its process, this function updates
*bufp,*buf_sizep,*datapand*data_sizepto the final state of the function, even in case of error, so that:You can determine how much of the destination buffer was filled, by substracting the final buffer size to the original buffer size.
In case of
CAHUTE_ERROR_SIZE, you can get the place at which to get the leftover bytes in the source data.In case of
CAHUTE_ERROR_TRUNC, you can get the place at which to get the leftover bytes in the source data to complete with additional data for the next conversion.In case of
CAHUTE_ERROR_INVALIDorCAHUTE_ERROR_INCOMPAT, you can get the place of the problematic input sequence.
Currently supported conversions are the following:
Src. ⯈▼ Dst.LEGACY_*9860_*CATCTFUTF*LEGACY_*x
x
9860_*x
x
CATCTFUTF*x
x
x
For specific guides on how to use this function, see Converting text from an encoding to another.
- Parameters:
context – Context in which to run the function.
bufp – Pointer to the destination buffer pointer.
buf_sizep – Pointer to the destination buffer size.
datap – Pointer to the source data pointer.
data_sizep – Pointer to the source data size.
dest_encoding – Destination encoding.
source_encoding – Source encoding.
- Returns:
Error, or 0 if the operation was successful.
-
int cahute_convert_to_utf8(cahute_context *context, char *buf, size_t buf_size, void const *data, size_t data_size, int encoding)¶
Convert the provided data to UTF-8, and place a terminating NUL character.
This is a utility that calls
cahute_convert_text(), for simple scripts using the Cahute library.- Parameters:
context – Context in which to run the function.
buf – Destination buffer.
buf_size – Destination buffer size.
data – Source data.
data_size – Size of the source data.
encoding – Encoding of the source data.
- Returns:
Error, or 0 if the operation was successful.