Direct-BT C++ Implementation Details (Part 1)

This is the first article covering Direct-BT‘s implementation details including jaulib.

See Direct-BT, Bluetooth Server and Client Programming in C++ and Java (Part 1) for a little introduction to Direct-BT.

Standard and Proprietary Communication Channels

As described, we were required to utilize the host Bluetooth implementation in the GNU/Linux kernel, i.e. BlueZ/Kernel directly without D-Bus to achieve best performance and access the native HCI, L2CAP/GATT and SMP communication channels directly.

By mostly using the standard protocols, the implementation is almost platform agnostic. However, the Bluetooth adapter management as well as the SMP security level setup is done via the proprietary BlueZ/Kernel Manager control channel.
Porting Direct-BT to different platform, we would need create alternatives for these relatively small portions

In detail, we have the following 3 channels connected via sockets:

Each connection utilizes a lock-free ringbuffer<T> for asynchronous access, which gets filled by their own packet-reader thread. The socket connection also allows sending commands to the underlying Bluetooth host implementation, which shall forward them to the Bluetooth adapter – except for the proprietary Manager control channel.

One HCI socket connection per BTAdapter is used and one L2CAP/GATT socket per BTDevice connection.

Since the BlueZ/Kernel is sadly not allowing a direct SMP socket connection, we filter the SMP packets from the HCI channel and direct them to the associated BTAdapter.

Dirty down to the metal details…

Endian Order and Byte Alignment

Direct-BT is endian order agnostic, performing all required transformations from Bluetooth little-endian to host-endian order and back. To avoid memcpy() to compensate for address alignment, we utilize the free of costs high performance compiler magic ‘struct attribute((packed))’ as provided via jau::packed_t< T >’s alignment cast.

Concurrency, or how to be not just multi-threaded

Mutli-threading is often sabotaged when CPU cores block each other and squash parallel processing and hence concurrency.

To achieve efficient parallelism while avoiding any deadlock situation, we use lock-free data structures.

The ringbuffer<T> stack-alike storage essentially decouples the producer from the consumer thread without locking while avoiding moving its content around, since storage is handled as a ring without a start or end unlike a stack has.
Locking is only performed when the consumer requests to block until data is available or the producer awaits free space. Otherwise only atomic cash reloading iterator are utilized, allowing non-overlapping parallel read and write access.
The ringbuffer<T> is used in Direct-BT for all socket connection receiver-threads to store asynchronous replies.

The copy-on-write (CoW) cow_darray<T> container also decouples the reader from the writer thread, while enjoying random-access properties of a container. Here we simply replace the underlying shared atomic darray<T> reference when the write operation is completed. This allows lock-free read access while another thread is mutating the content in parallel using the compromise that the reader thread may deal with an slightly older dataset.
To implement cow_darray<T> we also had to implement and use darray<T> to grant us fine control of the underlying data inclusive their iterator types – instead of simply using std::vector<T>.
The CoW container is used in Direct-BT for lock-free listener and callback maintenance.

ordered_atomic<T,..> is used for consistent atomic type std::memory_order usage, i.e. not changing the memory model after selecting the type and apply the set order to all operations.

sc_atomic_critical provides a resource acquisition is initialization (RAII) style Sequentially Consistent (SC) data race free (DRF) critical block and it is used e.g. in cow_darray<T> and its mechanism in ringbuffer<T> to ensure proper cash-reloading when reading and mutating cached memory.

The service_runner provides means to hard stop its service thread via signal SIGALRM as required for responsive behavior when blocked in a I/O task like waiting for data. Such threads must be stopped before their resources holder reach their end of life, otherwise we would see a free-after-use and potential SIGSEGV crash.
service_runner is used for Direct-BT’s socket connection receiver-threads as well as for certain GATT Server test applications.

Last but not least, a simple latch has been implemented with its extension to also allow to count_up(). Its mostly used in our (trial) unit tests to wait for dynamic threads to complete.

Next …

More details to follow-up ..

2 thoughts on “Direct-BT C++ Implementation Details (Part 1)”

Comments are closed.