# Float/String Conversion in Picolibc: Enter “Ryū”

I recently wrote about this topic having concluded that the best route for now was to use the malloc-free, but imprecise, conversion routines in the tinystdio alternative.

A few days later, Sreepathi Pai pointed me at some very recent work in this area:

This is amazing! Thirty years after the papers referenced in the
previous post, Ulf Adams came up with some really cool ideas and
managed to reduce the math required for 64-bit conversion to 128 bit
integers. This is a huge leap forward; we were doing long
multi-precision computations before, and now it's all short enough to
fit in registers (ok, a *lot* of registers, but still).

## Getting the Ryū Code

The code is available on github: https://github.com/ulfjack/ryu. Reading through it, it's very clear that the author focuses on performance with lots of tuning for common cases. Still, it's quite readable, especially compared with the newlib multi-precision based code.

## Picolibc String/Float conversion interface

Picolibc has some pretty basic needs for the float/string conversion code, it wants four functions:

__dtoa_engine

`int __dtoa_engine(double x, struct dtoa *dtoa, uint8_t max_digits, uint8_t max_decimals);`

This converts the double

`x`

to a string of decimal digits and a decimal exponent stored inside the 'dtoa' struct. It limits the total number of digits to`max_digits`

and, optionally (when max_decimals is non-zero), limits the number of fractional digits to`max_decimals - 1`

. This latter supports 'f' formats. Returns the number of digits stored, which is <= max_digits. Less if the number can be accurately represented in fewer digits.__ftoa_engine

`int __ftoa_engine(float x, struct ftoa *ftoa, uint8_t max_digits, uint8_t max_decimals);`

The same as

`__dtoa_engine`

, except for floats.__atod_engine

`double __atod_engine(uint64_t m10, int e10);`

To avoid needing to handle stdio inside the conversion function,

`__atod_engine`

receives fully parsed values, the base-10 significand (`m10`

) and exponent (`e10`

). The value to convert is`m10 * pow(10, e10)`

.__atof_engine

`float __atof_engine(uint32_t m10, int e10);`

The same as

`__atod_engine`

, except for floats.

With these, it can do printf, scanf, ecvt, fcvt, gcvt, strtod, strtof and atof.

## Porting Ryū to Picolibc

The existing Ryū float-to-string code always generates the number of
digits necessary for accurate output. I had to hack it up to generate
correctly rounded shorter output when `max_digits`

or `max_decimals`

were smaller. I'm not sure I managed to do that correctly, but at
least it appears to be passing all of the test cases I have. In normal
operation, Ryū iteratively removes digits from the answer that aren't
necessary to disambiguate with neighboring values.

What I changed was to keep removing digits using that method until the answer had few enough digits to fit in the desired length. There's some tricky rounding code that adjusts the final result and I had to bypass that if I'd removed extra digits.

That was about the only change necessary to the core algorithm. I also trimmed the code to only include the general case and not the performance improvements, then wrapped it with code to provide the _engine interface.

On the string-to-float side, most of what I needed to do was remove the string parsing bits at the start of the function and switch from performance-optimized to space-optimized versions of a couple of internal routines.

## Correctness Results

Because these new functions are now 'exact', I was able to adjust the picolibc tests to compare all of the bits for string/float conversion instead of having to permit a bit of slop in the answers. With those changes, the picolibc test suite passes, which offers some assurance that things aren't completely broken.

## Size Results

Snek uses the 32-bit float versions of the conversion routines, and for that, the size difference is:

```
text data bss dec hex filename
59068 44 37968 97080 17b38 snek-qemu-riscv-orig.elf
59430 44 37968 97442 17ca2 snek-qemu-riscv-ryu.elf
362
```

362 bytes added to gain accurate printf/strtof results seems like a good trade-off in this case.

## Performance

I haven't measured performance at all, but I suspect that it won't be nearly as problematic on most platforms as the source code makes it appear. And that's because Ryū is entirely integer arithmetic with no floating point at all. This avoids using the soft fp code for platforms without hardware float support.

## Pointers to the Code

I haven't merged this to picolibc master yet, it's on the ryu branch:

Review, especially of the hack above to return short results, would be greatly appreciated!

Thanks again to Ulf Adams for creating this code and to Sreepathi Pai for sending me a note about it!