Improvement of the Bfloat16 Floating-point for the Laplacian Source

The paper analyzes performance of the Bfloat16 floating-point format for the Laplacian source, using the analogy between the floating-point format and a piecewise uniform quantizer. Furthermore, suggesting slightly different allocation of bits, the paper proposes a new 16-bit floating-point format that can achieve for 6 dB higher SQNR (signal-to-quantization noise ratio) than the Bfloat16 format, having less complexity in the same time. The proposed 16-bit format is suitable for implementation on devices with limited hardware resources, such as smart sensor nodes and edge devices.

Zoran Perić
University of Niš, Faculty of Electronic Engineering
Serbia

Bojan Denić
University of Niš, Faculty of Electronic Engineering
Serbia

Milan Dinčić
University of Niš, Faculty of Electronic Engineering
Serbia

Nikola Vučić
University of Niš, Faculty of Electronic Engineering
Serbia