Not questioning the veracity of this - but is this a thing? Not gonna lie, some of that is French to me - but it looks believable.
Yes, I think it's reasonable though I'm not suggesting it's happening, I'm just mulling through possible ways of doing it.
Co-ordination between populating the circular buffer and then processing the contents via a parallel stream/filter idiom to do the lossless compression should be easy'ish to code and not too intrusive or demanding on resources for just spoken speech.
Except for processing the buffer with Flac which I understand is quite lightweight, the principle is little different from multi-track software recording techniques that have been around for 20 or so years, and since it's only using one 'track' it should be simple (It's only when you introduce heavyweight plugins to process multiple audio tracks like reverb and echo that things slow down and eat up cpu cycles and that's why for music production there are several plugin cards with SHARC DSP's available from the likes of UA to offload the plugin generated load from the CPU cores, ie pretty much along the lines of what I was suggesting with using the consumer device GPS cores earlier).
Incidentally, the 64kbps requirement for voice originated way back, around 40 years ago when the proposals to replace the old analog telephone network started to take shape. Digitising an analog signal is dependent on what audio bandwidth you wish to encode. Nyquist and Shannon & Hartley pretty much laid down the rules even earlier for it all in that you typically sample at the minimum twice the analog bandwidth, so for a 4khz bandwidth (fine for the spoken voice intelligability-the old analogue phone system was actually only 3kz bandwidth which was why the old dial a disc service was grim sounding) then that would be 4000 x 2 = 8000 (that's the Baud rate, ie change of signal state and not to be confused with the bit rate) and with an 8 bit sample depth that then gives you 64000 bps coming out. Historically, most of the telecoms backbones were based on multiples of this number and various muxing/demuxing/bi-plexing techniques are employed behind the scenes to on-the-fly allocate this according to current demand, ie allocate the silences to other consumers - rob Peter to pay Paul. You in effect have a virtual circuit rather than a physical one. It generally works so fast, we humans don't know it's going on unless the provider decides to re-allocate your bandwidth to a higher priority customer (Virgin Business used to be notorious for doing that) and everything slows down for you.
(Sorry, it's difficult to explain or discuss in detail without descending into levels of possibly uncomfortable spoddery.)
Just a quick edit...the more I think about this, the more
I believe it's certainly very feasible at a technology level. , there's Cuda libraries for
Kaldi which looks
very interesting.