File internals¶

This document describes the internals behind files; see Files for more information about the user-facing abstractions.

A file only requires one memory allocation (except for system resources that are allocated / opened using different functions), and the medium is initialized with the file opening functions.

Mediums¶

Mediums define a common set of interfaces that can be used by the rest of the library to read or write on the opened file, thereby exploiting the system interfaces.

A medium is represented by the following type:

struct cahute_file_medium¶

File medium representation.

This structure is usually directly allocated with the file, i.e. cahute_file instance, and is accessed through file->medium.

int type¶: Type of medium, among the CAHUTE_FILE_MEDIUM_* constants documented in Available medium types.

unsigned long flags¶

Flags, which represent the kind of operations the underlying medium can do, influencing how the generic parts of the medium interactions will behave, among:

CAHUTE_FILE_MEDIUM_FLAG_WRITE¶: Whether the medium is writable, i.e. writing to the medium type is implemented and the underlying resources have been opened in a way that allows writing.

CAHUTE_FILE_MEDIUM_FLAG_READ¶: Whether the medium is readable, i.e. reading to the medium type is implemented and the underlying resources have been opened in a way that allows reading.

CAHUTE_FILE_MEDIUM_FLAG_SEEK¶: Whether the medium is seekable, i.e. seeking on the medium type is implemented and the underlying resources have been opened in a way that allows seeking.

CAHUTE_FILE_MEDIUM_FLAG_SIZE¶: Whether the size has been computed on opening the medium, i.e. cahute_file_medium.file_size is exploitable.

unsigned long offset¶: Current offset of the underlying resource, if relevant. This is used on writing to a stream, whether seekable or not, and on refreshing the read buffer if need be.

unsigned long file_size¶: File size, computed when the file was being opened.

unsigned long read_offset¶: Offset of the current read buffer.

size_t read_size¶: Size of the data contained within the read buffer.

Warning

Note that this only represents the number of bytes that are actually set to something exploitable in read_buffer, not the read buffer capacity.

cahute_u8 *read_buffer¶

Read buffer, for which the main purpose is to serve as a cache.

In normal circumstances, this buffer is CAHUTE_FILE_MEDIUM_READ_BUFFER_SIZE bytes long.

union cahute_file_medium_state state¶: State of the file medium, which contains the underlying resources depending on the file medium type.

Medium interface¶

Mediums support a generic memory read/write interface with the following functions:

int cahute_read_from_file_medium(cahute_file_medium *medium, unsigned long off, cahute_u8 *buf, size_t size)¶

Read from the file medium, starting at a provided offset.

Errors to be expected from this function are the following:

CAHUTE_ERROR_TRUNC: The parameters would lead to moving out-of-bounds, or reading at least one byte out-of-bounds.

Parameters:

medium – Medium from which to read.
off – Offset at which to start reading.
buf – Buffer in which to store the read data.
size – Size of the data to read.

Returns:

Error, or CAHUTE_OK.

int cahute_write_to_file_medium(cahute_file_medium *medium, unsigned long off, void const *data, size_t size)¶

Write to the file medium, starting at a provided offset.

Errors to be expected from this function are the following:

CAHUTE_ERROR_SIZE: The parameters would lead to moving out-of-bounds, or writing at least one byte out-of-bounds.

Parameters:

medium – Medium into which to write.
off – Offset at which to start writing.
data – Data to write.
size – Size of the data to write.

Returns:

Error, or CAHUTE_OK.

Internal medium logic¶

The internal logic for file mediums is implemented in lib/filemedium.c. While the medium interface presents a memory-like interface, most internal mediums actually work using streams with a current offset that is updated when making a read, write or seek operation.

Note

This implementation is optimized for reading with increasing file offsets, since the rationale behind most file formats allows us to do this.

This section documents the internal logics behind the interface functions.

cahute_read_from_file_medium()

First, we check if there is an intersection between our current read buffer and the requested data on the left boundary. If there is, we copy the intersection into the user-provided buffer.

If there is still some data to read, this means we need to refresh the read buffer at least once, i.e. we need to move the underlying cursor to match the first byte we want to read. If the cursor is not already at the correct position, this is done by one of these methods:

If seeking is supported, we seek to that offset.
Otherwise, if the targeted offset is after the current offset, we read and ignore bytes from the underlying stream.
Otherwise, we fail, since we can’t seek backwards in the stream.

Once this is done, we do CAHUTE_FILE_MEDIUM_READ_BUFFER_SIZE bytes long reads until the user-provided buffer has been completely filled.

cahute_write_to_file_medium()

There is no write buffering, so we directly want to check that we’re at the right offset on the underlying cursor. If the cursor is not already at the correct position, this is done by one of these methods:

If seeking is supported, we seek to that offset.
Otherwise, if the targeted offset is after the current offset, we write zeroes and ignore bytes from the underlying stream.
Otherwise, we fail, since we can’t seek backwards in the stream.

Once this is done, we do CAHUTE_FILE_MEDIUM_WRITE_CHUNK_SIZE bytes long writes until the user-provided buffer has been completely written.

We also check if there is an intersection between the user-provided boundaries and the read buffer boundaries, and if it’s the case, write the user-provided data to the correct offset in the read buffer to ensure reads from the same offsets will return the updated data, and not the data before the write.

Available medium types¶

File medium types are represented as CAHUTE_FILE_MEDIUM_* constants internally.

Warning

The file medium constants are only represented if they are available in the current configuration. This is a simple way for medium-specific implementations to be defined or not, with #ifdef.

Available mediums are the following:

CAHUTE_FILE_MEDIUM_NONE¶: Internal in-memory file medium; see Internal in-memory file medium for more information.

CAHUTE_FILE_MEDIUM_POSIX¶

POSIX file API medium, with a file descriptor (fd):

Closing using close(2);
Reading uses read(2);
Writing uses write(2);
Seeking uses lseek(2).

Only available on platforms considered POSIX, including Apple’s OS X explicitely (since they do not define the __unix__ constant like Linux does).

CAHUTE_FILE_MEDIUM_WIN32¶

Serial medium using the Windows API, with a file HANDLE:

Closing uses CloseHandle;
Reading uses ReadFile;
Writing uses WriteFile;
Seeking uses WriteFile.

Only available with Windows.

Internal in-memory file medium¶

In order to work for both files and protocols, some functions such as cahute_casiolink_decode_data() or cahute_mcs_decode_data() take a cahute_file instance. For abstracting memory buffers coming from protocols as in-memory files, the following internal function is available within the library:

void cahute_populate_file_from_memory(cahute_file *file, cahute_u8 *buf, size_t size)¶

Populate a file handle from a buffer and a size.

Parameters:

file – File object to populate.
buf – Buffer to abstract as a file.
size – Size of the buffer to abstract.

In order to avoid having too many memory allocations and since cahute_file is not opaque within the library, this utility can be used to populate a statically defined file object which can then be transmitted to other functions using files to decode data. It is also not necessary to call cahute_close_file() in such a case.

For example, with cahute_casiolink_decode_data():

cahute_file file;
unsigned long offset = 0;

cahute_populate_file_from_memory(&file, my_buf, my_buf_size);
err = cahute_casiolink_decode_data(datap, &file, &offset, my_variant, 1);
...

In order to work, this abuses the read buffer to not be CAHUTE_FILE_MEDIUM_READ_BUFFER_SIZE bytes long, but sized to the whole “file”, representing the buffer, with the read buffer actually being the provided buffer directly with the CAHUTE_FILE_MEDIUM_NONE.

It abuses existing manipulations of the read buffer to read directly from the read buffer, mirror written data in the read buffer to just write into the buffer at the provided offset, and not have any side effects, i.e. the operations become the following:

cahute_read_from_file_medium()

There is always an intersection between our current read buffer and the requested data on the left boundary, so we read from the “read” buffer to copy data to the user-provided buffer.

cahute_write_to_file_medium()

We do not move any underlying cursor or have any side-effect.

There is always an intersection between the user-provided boundaries and the read buffer boundaries, so we write the user-provided data to the correct offset in the read buffer to ensure reads from the same offsets will return the updated data, and not the data before the write.

File opening behaviours¶

In this section, we will describe the behaviour of file opening functions.

cahute_open_file()

Depending on the platform:

On POSIX and compatible, it attempts at opening the file using open(2). If this succeeds, it calls lseek(2) to seek 0 bytes from SEEK_END, which returns the current file size, then uses the same function to seek 0 bytes from SEEK_SET.

The created file handle will have the CAHUTE_FILE_MEDIUM_POSIX medium type.
On Win32, it attempts at opening the file using CreateFile. If this succeeds, it calls WriteFile to seek 0 bytes from FILE_END, which returns the current file size, then uses the same function to seek 0 bytes from FILE_BEGIN.

The created file handle will have the CAHUTE_FILE_MEDIUM_WIN32 medium type.
Otherwise, it will return CAHUTE_ERROR_IMPL.

If the obtained file size is too big, i.e. more than CAHUTE_MAX_FILE_OFFSET, the function will fail with error CAHUTE_ERROR_SIZE.

cahute_create_file()

Depending on the platform:

On POSIX and compatible, it attempts at creating and opening the file using open(2). If this suceeds, it calls ftruncate(2) to set the file size explicitely to the provided size.

The created file handle will have the CAHUTE_FILE_MEDIUM_POSIX medium type.
On Win32, it attempts at creating and opening the file using CreateFile. If this succeeds, it calls WriteFile to seek the provided file size from FILE_BEGIN, calls SetEndOfFile to set the file size explicitely, then uses WriteFile again to seek to FILE_BEGIN again.

The created file handle will have the CAHUTE_FILE_MEDIUM_WIN32 medium type.
Otherwise, it will return CAHUTE_ERROR_IMPL.

cahute_open_stdout()

Depending on the platform:

On POSIX and compatible, it creates a file handle with medium type CAHUTE_FILE_MEDIUM_POSIX and fd set to 1.
On Win32, it calls GetStdHandle with STD_OUTPUT_HANDLE.

The created file handle will have the CAHUTE_FILE_MEDIUM_WIN32 medium type.
Otherwise, it will return CAHUTE_ERROR_IMPL.

File metadata retrieval¶

The cahute_file contains caching for the file metadata retrieval, namely:

The retrieved type, using one of the CAHUTE_FILE_TYPE_* macros defined in <cahute/file.h> – File related utilities for Cahute;
The extension extracted from the provided path.

For any of the file reading functions that requires file type and metadata, if the CAHUTE_FILE_FLAG_EXAMINED flag is not present in the file flags yet, the cahute_examine_file() function is called to determine it and set the flag.