Cross-platform digital filters in Rust¶

2025-11-30

I recently contributed a second-order sections (SOS) and Butterworth filter implementation to flaw, a digital filtering library for Rust, targeting both embedded and general-purpose platforms. SOS Butterworth filters are a standard good practice for cleaning up noisy sensor readings; this makes them easier to use on embedded devices running Rust. Making complicated science and engineering techniques "accessible as an import statement" is one of my favorite parts of working on scientific software.

Example usage:

// Create a low-pass filter with (cutoff frequency) / (sample frequency) = 0.01.
// Butterworth order 4 filter implemented with second-order sections.
// Filter creation is suitable for embedded platforms:
// it does not require dynamic memory allocation and will not panic.
let mut filter = flaw::sos::butter4::<f32>(0.01)?;

loop {
  let u = 1.0; // get the next raw measurement
  // Update the filter and get its next output.
  // Filter update is efficient, only 18x float ops (for an order 4 filter).
  let y = filter.update(u);
}

Here's what I learned from this project about digital filtering, Rust, and using AI as a learning coach. Thanks to flaw's maintainer, James Logan, for helping me with this!

Motivation¶

Summary

Sensor readings are noisy
Low-pass filters clean them up so controllers can make stable decisions
Filters on embedded devices need special consideration for numerical resolution and memory allocation
Rust lets us ship a reusable, tested filter implementation across embedded and general-purpose platforms

Embedded devices run software to measure and control physical systems, like a heat pump, a car's traction control, or a drone. Often, sensor noise is higher-frequency than the signal of interest, so a low-pass filter can remove the noise ¹. A rudimentary approach would be to average the last few measurements, but a properly designed filter gives better noise rejection (here, a Butterworth filter realized as second-order sections).

Software for embedded devices needs special considerations. Embedded targets often lack an operating system, dynamic memory allocation, or fast 64-bit float operations. In Rust, this implies no-std, supporting f32 (in addition to f64), and no panic branches.

Typically, embedded software has been developed in C. With poor tools for dependency management and poor cross-platform compatibility, it's common for filtering to be re-implemented on each project. Getting this right takes a few days of expert-level work.

In contrast, with Rust and Cargo we can ship a reusable, tested filter implementation across microcontrollers and general-purpose CPUs. Rust can build for a growing number of embedded platforms via LLVM, and is qualified for critical systems. Rust combines productive developer tooling with the low-level control needed for embedded or performance-critical code².

Results¶

The new flaw::sos module makes good filtering "accessible as an import statement" on embedded platforms too. It provides:

Implementation of cascaded second-order sections: sos::SisoSosFilter
- Generic over float type (f32, f64) and number of sections (filter order is up to 2x sections)
- Designed for efficiency and low numerical error:
  - Update uses 9 float operations per section
  - Storage is 7 floats per section (2 delays + 5 coefficients)
  - Each stage uses Transposed Direct Form II
- Can use hardware-accelerated fused-multiply-add (FMA) if available
Functions to generate SOS Butterworth low-pass filters for a given cutoff frequency: sos::{ butter2, butter4, butter6 }
- Provides orders 2, 4, and 6 for f32 and f64
- Fast filter creation: SOS coefficients are interpolated vs cutoff frequency from a lookup table
- Cutoff frequency region of validity is enforced and tested for each combination of filter order and float type
Python script to auto-generate the SOS Butterworth lookup tables from SciPy: scripts/generate_butter_sos_tables.py

Digital filters with `f32` are numerically challenging¶

One target of flaw is embedded platforms where 64-bit float operations are slow or unavailable. Thus, flaw::sos needs to support filters using f32.

The resolution of f32 is only ~7 decimal digits. A f32-based filter will have rounding errors if it adds together numbers more than a few orders of magnitude apart. Unfortunately, infinite impulse response (IIR) digital filters can have a large range of coefficient magnitudes, particularly at low cutoff ratio (cutoff frequency / sample frequency) for low-pass filters.

The large range of filter coefficient magnitudes presents a challenge for 32-bit floats.

An SOS form was chosen to improve numerical issues. The same digital filter can be realized in different forms, and these have different robustness to numerical issues. Common forms include:

The four direct forms: Direct I, Direct II, Transposed Direct I, and Transposed Direct II (math reference)
- Transposed Direct II is most numerically robust
- All are susceptible to numerical instabilities, particularly for higher-order filters
State space (math reference)
- A modified state space form was used in the existing flaw::SisoIirFilter
Cascaded second-order sections (SOS) (math reference)

The SOS form is more numerically robust for filters of order > 2. SOS divides a higher-order filter into a cascade of second-order sections. Within each section, it tends to add values of similar magnitude, reducing rounding errors. In flaw::sos::SisoSosFilter, each section is realized in the numerically-robust Transposed Direct II form.

Even with SOS, low-pass Butterworth filters have numerical problems for cutoff ratios that are too low. flaw::sos's butter methods protect users from creating bad filters by returning Err if cutoff_ratio is below a minimum value. The minimum values are tabulated for each combination of filter order and float precision. Tests ensure that each filter has a DC gain of 1.0 ± 1e-4 at the minimum cutoff ratio (numerical issues manifest as the DC gain departs from 1).

Interpolating SOS coefficients for fast filter creation¶

In one case, users need to quickly create a new filter with a different cutoff frequency on an embedded platform. The filter design algorithms are complicated and would not run in the allotted time. Instead, flaw stores lookup tables of the filter coefficients, and interpolates them versus cutoff frequency when creating a new filter. The flaw maintainer (James) developed this approach, and I extended it to SOS filters.

The coefficients of a Butterworth low-pass filter vary smoothly with cutoff ratio, and thus are suitable for interpolation.

To generate the lookup tables, a Python script calls scipy.signal.butter to calculate the SOS coefficients, and then writes them to a .rs file as a static array. This happens once, before the crate is published. At compile time, the lookup tables are compiled into the Rust binary. If link-time optimization is enabled, the user's binary will only contain the lookup table(s) their code actually uses.

Steady state (DC) gain formula¶

For a second-order section with $z$ -transform of

$H(z) = \frac{b_0 + b_1 z^{-1} + b_2 z^{-2}}{1 + a_1 z^{-1} + a_2 z^{-2}}$

the section's DC gain is

$g_{dc} = \frac{b_0 + b_1 + b_2}{1 + a_1 + a_2}$

For a cascaded SOS filter, the overall gain is the product of the section gains.

Uses: Gain correction after coefficient interpolation¶

*In SisoSosFilter::new_interpolated, this formula is used to correct the DC gain to 1. A DC gain of 1 is vital: e.g., in a measurement application, if the input is constantly and exactly 1.000 volt, the filter output should converge to 1.000 V, not 0.999 V or 1.001 V.

Each set of SOS coefficients in the lookup tables was designed for DC gain of 1, but interpolated coefficients may give a slightly-off-one DC gain. After interpolation, new_interpolated calculates the DC gain of the overall filter, calculates a correction factor

let correction = dc_gain.powf(-1.0 / SECTIONS);

and scales the numerator of each section by correction. This ensures the overall DC gain is 1 to within 1e-6.

Note that, although the overall DC gain should be 1, the DC gain of each section is often not 1.

Uses: Set the filter to steady state¶

In SisoSosFilter::set_steady_state, the DC gain formula is used to set the internal state of the filter to its steady-state value for a given input. This is useful in initializing the filter.

First, set_steady_state uses each section's steady state gain to calculate what the section output would be in steady state: output = input * ss_gain. With both the input and output known, the section's state can be set using the usual update equations:

// Set the internal states based on the steady state input and output of this section
self.z[s][1] = b2 * input - a2 * output;
self.z[s][0] = b1 * input - a1 * output + self.z[s][1];

Derivation of DC gain¶

The filter's DC gain is its $z$ -transform evaluated at $z = 1$ . The DC gain is the final value of filter's the step response. Let $H(z)$ be the $z$ -transform of a second-order section. The $z$ -transform of the section's step response is:

$Y(z) = \frac{1}{1 - z^{-1}} H(z)$

The final value theorem for $z$ -transforms states:

$\lim_{n \rightarrow \infty} y[n] = \lim_{z->1} (z - 1) Y(z)$

Applying this to the step response gives:

$\lim_{n \rightarrow \infty} y[n] = \lim_{z \rightarrow 1} \frac{z - 1}{1 - z^{-1}} H(z)$

The limit of the first term is 1, so we have:

$$ \lim_{n \rightarrow \infty} y[n] = H(1) $$ assuming that $H(z)$ is stable so the limit exists. See the list of z-transform properties for more information.

Benchmarking and performance tricks¶

I also wanted to make filter updates fast on a general-purpose CPU with f64. Before attempting performance optimization, I set up benchmarking to measure if my changes actually helped. This was easy with criterion. Criterion tracks benchmark statistics and reports if a code change had a statistically significant impact on runtime.

The performance tricks I tried and their effects:

Using fused-multiply-add (FMA) instructions: ~40% speedup
- Requires setting the target-cpu=x86-64-v3 rustc flag to actually get the hardware acceleration.
Unrolling the loop over sections: ~5% speedup
Memory-aligning the arrays for coefficient and state storage: no measurable effect
- Despite no effect on my i7 CPU, I kept this in, as it may help on embedded platforms with less sophisticated memory controllers

FMA hardware acceleration is available on modern general-purpose CPUs, but is usually not available on microcontrollers. To support a range of platforms, the FMA implementation is gated by a feature flag. If the fma feature is not set, the non-FMA implementation is used (code snippet below, and on GitHub). The FMA-or-not choice happens at compile time, so there is no performance hit due to added branches.

#[cfg(not(feature = "fma"))]
{
    // Direct Form II Transposed implementation
    output = b0 * input + self.z[s][0];
    // Update the filter delays, they will be used the next time this function is called
    self.z[s][0] = b1 * input - a1 * output + self.z[s][1];
    self.z[s][1] = b2 * input - a2 * output;
}

// The FMA implementation is ~40% faster, with target-cpu=x86-64-v3 on a i7-8550U CPU.
#[cfg(feature = "fma")]
{
    // Direct Form II Transposed implementation
    output = b0.mul_add(input, self.z[s][0]);  // b0 * input + self.z[s][0]
    // Update the filter delays, they will be used the next time this function is called
    self.z[s][0] = b1.mul_add(input, a1.mul_add(-output, self.z[s][1])); // b1 * input - a1 * output + self.z[s][1]
    self.z[s][1] = b2.mul_add(input, -a2 * output); // b2 * input - a2 * output
}

Using AI in a new domain: coach, not substitute¶

Effective and ineffective strategies I've seen for using AI assistants in a new-to-me domain:

Don't: ask AI to write everything for me
- Removes my learning, and I don't yet know enough to tell if the results are bad
Do: Design and write the code myself, ask AI for explanations to get unstuck faster
- I spend more time learning and less time looking for information

On this project, I used Codex a lot, but almost exclusively in "chat" mode, not "agent" mode. It was my first time extensively using generics in Rust and I often ran into compiler errors I didn't yet understand. I found it helpful to give Cod_filtersex prompts like:

At line {X} in src/{Y}.rs, I'm trying to do {A}. I'm getting rustc error {Z}. Explain why this is happening and some options to resolve it.

This usually got me un-stuck in a few minutes. I felt I could trust its answers because it was easy to check their correctness with rustc.

When learning C/C++ early in my career, each error like this was a long and frustrating quest -- but no more! fd When I was writing the math logic, Copilot eagerly suggested formula completions. Some of these were correct. AI is not a substitute for learning the domain, nor is it a substitute for good tests.

Comparison to other crates¶

iir_filters
- (+) Filter design in Rust
- (-) Dynamic memory allocation, e.g. Vec of second-order sections
- (-) Does not document and test valid range of cutoff ratio for f32
real_time_fir_irr_filters
- (+) No dynamic memory allocation, designed for embedded
- (+) Includes high-pass and peak filters in addition to low-pass
- (-) Not using SOS form
- (-) Only 2^nd and 3^rd order Butterworth filters
- (-) Does not document and test valid range of cutoff ratio for f32
biquad
- (-) Only implements one second-order filter, not a cascade

There are more sophisticated uses of digital signal processing; removing high-frequency noise is merely the simplest and most common. ↩
Filters are easy to use in Python, e.g. scipy.signal.butter and scipy.signal.sosfilt, but Python is not suitable for embedded systems. ↩