Why Rust for Spatial ETL

When I started building FlowForge, people asked why I didn't just use Python. It's a fair question. Python dominates the spatial world—GDAL bindings, Shapely, GeoPandas, the whole ecosystem. If you're doing GIS work, you're probably writing Python.

But I wasn't building a script. I was building a tool that needed to move millions of geometries without flinching. And for that, Rust was the only choice that made sense from day one.

The Problem with "Good Enough"

I've been a software developer for over 30 years, with the last 15 in oil and gas spatial data engineering. I've watched the industry go from command-line utilities to FME to Python notebooks, and somewhere along the way we accepted that spatial processing just takes a while. Run the job, go get coffee, hope it finishes before your meeting.

FME is a fantastic product—I'm an FME Certified Professional, and I genuinely respect what Safe Software built. But I wanted to explore a different approach. When you're processing millions of well locations, running spatial joins against continental-scale polygon datasets, and transforming coordinate systems across dozens of basins, every bit of overhead matters. I wanted to see how fast spatial ETL could be if you started from scratch with performance as the foundation.

Not incrementally faster. Fundamentally faster.

Why Rust?

If you've heard of Rust, you probably think of it as a systems programming language—the kind of thing people use to build browsers or operating systems. And that's true. But here's what matters for spatial work: Rust gives you C-level performance with memory safety guarantees, and the spatial ecosystem is more mature than most people realize.

Three crates made this viable:

GDAL bindings — The gdal crate wraps the same GDAL library you're already using, just without the Python overhead. Every format you care about, same battle-tested code underneath.

PROJ — The proj crate gives you coordinate transformations. Same PROJ library that powers QGIS and PostGIS, direct access from Rust.

DuckDB — This one's the secret weapon. The duckdb crate lets you embed an analytical database directly in your application. Spatial joins on millions of records without standing up a database server.

The ecosystem is ready. It's not experimental. These are production-quality bindings to the same foundational libraries the entire industry relies on.

What the Performance Actually Looks Like

Let me give you a concrete example. Say you're loading a few hundred thousand polygons from a shapefile and running a spatial predicate against them.

In Python with GeoPandas, you're paying for the interpreter overhead on every geometry operation. The data crosses the Python-C boundary constantly. It works, but it's slow in ways that are hard to optimize away.

In Rust, the geometry stays in contiguous memory. The operations compile down to tight loops. There's no interpreter, no garbage collector pausing at inconvenient moments, no serialization between your code and the spatial libraries.

Here's a simplified version of how FlowForge loads and processes spatial data:

use duckdb::{Connection, Result};

fn load_and_query(path: &str) -> Result<()> {
    let conn = Connection::open_in_memory()?;
    
    // Load the spatial extension
    conn.execute_batch("INSTALL spatial; LOAD spatial;")?;
    
    // Read directly from shapefile into DuckDB
    let query = format!(
        "CREATE TABLE parcels AS SELECT * FROM ST_Read('{}')",
        path
    );
    conn.execute_batch(&query)?;
    
    // Now you have a fully queryable spatial table
    // Joins, filters, transforms—all at native speed
    
    Ok(())
}

That's not pseudocode. That's the actual pattern. Load a shapefile directly into DuckDB, and now you have SQL spatial operations at analytical database speeds. No intermediate files, no memory copying, no waiting.

The Escape Hatch Nobody Talks About

Here's something I don't see discussed much in the spatial community: Rust has excellent support for inline assembly. When you absolutely need to squeeze every cycle out of a hot path—say, a geometric predicate that runs a billion times—you can drop down to hand-tuned SIMD instructions.

I'm not doing that everywhere. That would be insane. But knowing the escape hatch exists means I can optimize the critical 1% without rewriting the whole application. It's future headroom. When customers show up with datasets that stress the current implementation, I have somewhere to go.

What This Means for FlowForge

FlowForge isn't about making you learn Rust. You'll never write a line of it. The visual workflow builder, the drag-and-drop interface, the familiar spatial operations—that's what you interact with.

But underneath, when you connect a shapefile reader to a spatial join to a GeoParquet writer, you're running Rust code that processes your data as fast as the hardware allows. No Python interpreter tax. No Java VM warmup. Just compiled code moving bytes.

The stack choice isn't a technical curiosity. It's the foundation that makes FlowForge possible.

Next Up

This is the first post in a series about how FlowForge was built. Next time, I'll get into why we chose Tauri over Electron for the desktop application—and why that decision matters more than you'd think for a spatial tool.

If you're curious about the DuckDB piece specifically, I'm also starting a separate series on DuckDB for spatial work. First post drops soon.

FlowForge is a spatial ETL tool built for engineers who are tired of waiting. Check it out at flowforgelabs.io.