Subject:

printf format validation in rust


Date: Message-Id: https://www.5snb.club/posts/2023/printf-format-validation-in-rust/
Tags: #hack(5)

I was watching a talk about Idris 2 and it was mentioned that you can implement a type safe printf using dependent types (around 10 minutes in).

And I was wondering if you could do something like that in rust. And you can, ish!

error[E0308]: mismatched types
   --> src/main.rs:145:13
    |
145 |     let x = printf::<"that's a %s %s, aged %u!">("cute", "dog");
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `"%s%s"`, found `"%s%s%u"`
    |
    = note: expected constant `"%s%s"`
               found constant `"%s%s%u"`

That’s done with no macros, just a lot of const code of dubious quality.

The core technique I use here that you can assert equality of 2 constant values as a where bound.

For example, let’s write a function that asserts that you pass in a string whose length is the same size as the object you pass in.

#![allow(incomplete_features)]
#![feature(generic_const_exprs)]
#![feature(associated_const_equality)]
#![feature(adt_const_params)]

trait Size {
    const SIZE: usize;
}

impl<T> Size for T {
    const SIZE: usize = std::mem::size_of::<T>();
}

pub const fn length(s: &str) -> usize {
    s.len()
}

fn correct_size<T, const S: &'static str>(item: T)
    where T: Size<SIZE = { length(S) }> {
}

fn main() {
    correct_size::<_, "hewwo">(42_u32);
}
error[E0308]: mismatched types
  --> src/main.rs:23:32
   |
23 |     correct_size::<_, "hewwo">(42_u32);
   |                                ^^^^^^ expected `4`, found `5`
   |
   = note: expected constant `4`
              found constant `5`
note: required by a bound in `correct_size`
  --> src/main.rs:19:19
   |
18 | fn correct_size<T, const S: &'static str>(item: T)
   |    ------------ required by a bound in this function
19 |     where T: Size<SIZE = { length(S) }> {
   |                   ^^^^^^^^^^^^^^^^^^^^ required by this bound in `correct_size`

Worried about the 3 features and incomplete_features in this small example? Don’t worry, it gets worse.

Anyways, the core premise here is that you have two sides, one being the string as the const parameter, and one being the value entered. And you apply a “skeleton” function to each side to map it to some shared expected value, and if both map to the same value, then it’s allowed to compile.

The skeleton function should end up in a reasonably human understandable value as the key, since it is what will be printed when there’s a difference.

That’s all we really need to know in order to start with the real printf code. First, let’s look at the skeleton function for the format string. The key will be the format string specifiers, so Hello %d world %s! ends up with a key of %d%s, which is reasonably human readable.

I’m using konst to make parsing the string a bit easier. You absolutely can do this without that crate, it’s just a bit more painful.

const fn parse_skeleton<const F: &'static str>() -> &'static str {
    let mut s = "";

    let mut chars = konst::string::chars(F);
    let mut saw_percent = false;

    while let Some((ch, chars_)) = chars.next() {
        chars = chars_;

        if saw_percent {
            if ch != '%' {
                let encoded = konst::chr::encode_utf8(ch);
                s = append_strs(append_strs(s, "%"), encoded.as_str());
            }
            saw_percent = false;
        } else {
            if ch == '%' {
                saw_percent = true;
            }
        }
    }

    s
}

Fairly simple code, if a bit Weird because it needs to be const fn, so no for loops for you :)

Except for append_strs. What the fuck is that? Well, I need some way to dynamically build a &'static str. So I wrote a function with a rather funny signature, fn(&str, &str) -> &'static str, which does exactly what you think it does.

const fn append_strs(a: &str, b: &str) -> &'static str {
    unsafe {
        let buf = core::intrinsics::const_allocate(a.len() + b.len(), 1);
        assert!(!buf.is_null(), "append_strs can only be called at comptime");
        std::ptr::copy(a.as_ptr(), buf, a.len());
        std::ptr::copy(b.as_ptr(), buf.add(a.len()), b.len());
        std::str::from_utf8_unchecked(std::slice::from_raw_parts(buf, a.len() + b.len()))
    }
}

Turns out const does have allocation. It’s just very very magic. It returns a null pointer if you try to call it at runtime, so this is actually safe, I think. Cursed shit like this at compile time is Fine because the unused values will just get removed since they’re not referenced. Probably.

Oh, I almost forgot, you also earn

#![feature(adt_const_params)]
#![feature(core_intrinsics)]
#![feature(const_ptr_is_null)]
#![feature(const_heap)]

Anyways, now let’s do the skeleton for the value. The code for that is downright normal, and doesn’t need any more unstable features.

First, we need to define the specifier for each type you want to use in the formatting.

trait InnerFormatString: Display {
    const KIND: &'static str;
}

And then the implementations

impl InnerFormatString for u32 {
    const KIND: &'static str = "%u";
}

impl InnerFormatString for i32 {
    const KIND: &'static str = "%d";
}

impl<'a> InnerFormatString for &'a str {
    const KIND: &'static str = "%s";
}

Fairly standard stuff.

I used tuples to pass the arguments, so I defined a trait for the tuples themselves

trait FormatString {
    const KIND: &'static str;
    fn display(&self, x: usize) -> &dyn Display;
}

And then the implementation code. I’ll only show the one for a 3-tuple, but you get the gist, it works for any tuple size you want.

impl<A: InnerFormatString, B: InnerFormatString, C: InnerFormatString> FormatString for (A, B, C) {
    const KIND: &'static str = append_strs(A::KIND, append_strs(B::KIND, C::KIND));
    fn display(&self, x: usize) -> &dyn Display {
        match x {
            0 => &self.0,
            1 => &self.1,
            2 => &self.2,
            _ => panic!(),
        }
    }
}

The thing computing the key is KIND, the display is just there to make the printf code actually work.

Finally, let’s write the printf code itself.

#![feature(generic_const_exprs)]
#![feature(associated_const_equality)]

fn printf<A, const F: &'static str>(arg: A) -> String
where
    A: FormatString<KIND = { parse_skeleton::<F>() }>,
{
    let mut saw_percent = false;
    let mut idx = 0;
    let mut ret = String::new();

    for ch in F.chars() {
        if saw_percent {
            if ch == '%' {
                ret.push('%');
            } else {
                // We know that the ch *will* correspond to the appropriate arg here
                // (So we could unsafely assume that.)
                // But for now, we can just make use of Display and not actually use it.
                write!(ret, "{}", arg.display(idx)).unwrap();
                idx += 1;
            }
            saw_percent = false;
        } else {
            if ch == '%' {
                saw_percent = true;
            } else {
                ret.push(ch);
            }
        }
    }

    ret
}
error[E0308]: mismatched types
   --> src/main.rs:157:53
    |
157 |     let x = printf::<_, "that's a %s %s, aged %u!">(("cute", "dog"));
    |                                                     ^^^^^^^^^^^^^^^ expected `"%s%s"`, found `"%s%s%u"`
    |
    = note: expected constant `"%s%s"`
               found constant `"%s%s%u"`
note: required by a bound in `printf`
   --> src/main.rs:163:21
    |
161 | fn printf<A, const F: &'static str>(arg: A) -> String
    |    ------ required by a bound in this function
162 | where
163 |     A: FormatString<KIND = { parse_skeleton::<F>() }>,
    |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `printf`

And we’re done!

Right?

Okay, fine. Time for one more hack. It would be nice if we didn’t need the double parenthesis, and it just acted like a normal function. Thankfully, we have a tool for that! FnOnce (and friends) take their arguments as a tuple! It’s quite unstable, but we can do it.

#![feature(generic_const_items)]
#![feature(unboxed_closures)]
#![feature(tuple_trait)]
#![feature(fn_traits)]

struct printf<const F: &'static str>;

impl<A: std::marker::Tuple, const F: &'static str> std::ops::FnOnce<A> for printf<F>
where
    A: FormatString<KIND = { parse_skeleton::<F>() }>,
{
    type Output = String;

    extern "rust-call" fn call_once(self, arg: A) -> String {
        // you've already seen this.
    }
}

Finally, we’ve reached the API shown at the top. Download the full .rs (be sure to add konst 0.3.6 as a dependency)

Is printf specifically a useful API to do this for? No, not really, we have format_args. But it sure was funny. Finding an actual productive use for this is an exercise for the reader.