InfoQ Homepage Articles WebAssembly, the Safer Alternative to Integrating Native Code in Java

Java

WebAssembly, the Safer Alternative to Integrating Native Code in Java

Aug 14, 2024 13 min read

Benjamin Eckel
CTO and Co-founder @Dylibso

reviewed by

Olimpiu Pop
Tech Executive and Engineer Focused on a Holistic Approach

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Key Takeaways

Dynamic linking in Java involves loading native libraries at runtime, which can bypass the JVM's safety and performance guarantees, leading to potential security risks and memory safety issues.
Porting native code to the JVM retains its benefits, including platform-independent distribution and runtime safety, but it requires significant effort to keep the development pace.
WebAssembly (Wasm) offers a portable and secure alternative, allowing native code to run safely within JVM applications.
Using Chicory, developers can run Wasm-compiled code, like SQLite, in the JVM environment, benefiting from enhanced portability and security.
Wasm's sandboxing and memory model provides strong security guarantees, preventing unauthorised access to system resources and host memory.

When working in a managed ecosystem like the JVM, we often need to execute native code. This usually happens if you need crypto, compression, database, or networking code written in C.

Take SQLite, for example, the most widely deployed codebase frequently used in JVM applications according to their claim. But SQLite is written in C, so how does it run in our JVM applications?

Dynamic linking is the most common way we deal with this problem today. We’ve been doing this in all our programming languages for decades, and it works well. However, it creates a host of problems when used with the JVM. The alternative way, until not long ago, was porting the code base to another programming language, which comes with its challenges, too.

Problems with Dynamic Linking

To understand the problems with dynamic linking, it’s important to explain how it works. When we want to run some native code, we start by asking the system to load the native library (we’re using some Java Native Access (JNA) pseudocode here to simplify things):

interface LibSqlite extends Library { 
    // Loads libsqlite3.dylib on my mac 
    LibSqlite INSTANCE = Native.load("sqlite3", LibSqlite.class); 

    int sqlite3_open(String filename, PointerByReference db); 
    // ... other function definitions here 
}

For an easy mental model, imagine this reads the native code for SQLite from disk and "appends" it to the native code of the JVM.
We can then get a handle to a native function and execute it:

int result = LibSqlite.INSTANCE.sqlite3_open("chinook.sqlite", ptr);

JNA helps by automatically mapping our Java types to C types and then doing the inverse with the return values.

When sqlite3_open is called, our CPU jumps to that native code. The native code exists outside the guarantees of the JVM but at the same level. It has all the capabilities of the process the JVM is running in. This brings us to the first problem with dynamic linking.

Runtime: Escaping the JVM

When we jump to the native code at runtime, we escape the JVM's safety and performance guarantees. The JVM can no longer help us with memory faults, segmentation faults, observability, etc. Also note that this code can see all the memory and has all the permissions and capabilities of the whole process. So, if a vulnerability or malicious payload makes it in, you may be in deep trouble.

Memory safety is increasingly becoming an essential topic for software practitioners. The US government has deemed memory vulnerabilities a significant enough problem to start pushing vendors away from non-memory-safe languages. I think it’s great to start new projects in memory-safe languages. Still, I believe the likelihood of these foundational codebases being ported away from C and C++ is low, and the ask to port is unreasonable. Still, the effort is valid and may eventually impact your business. For example, the government is also considering shifting some liability to the people who write and run software services. If this happens, it may increase the financial and compliance risk of running native code this way.

Distribution: Multiple Deployment Targets

The second problem with dynamic linking is we can no longer distribute our library or application as just a jar. This ruins the most significant benefit of the JVM, which is the shipping platform's independent code. We now need to ship with a native version of our library compiled for every possible target. Or do we need to burden the end user with installing, securing, and linking the native code themselves? This opens us up to support headaches and risks because the end user may misconfigure the compilation or have code from an invalid or malicious source.

An Alternative Option: Porting to JVM

So, what do we do about this problem? The crux is the native code. Could we port or compile all this code to the JVM?

Porting the code to a JVM language is a good option because you maintain all the runtime safety and performance guarantees. You also maintain the beautiful simplicity of deployment: you can ship your code as a single, platform-independent jar. The downside is that you need to re-write the code from scratch. You also need to maintain it. This can be a massive human effort, and you’ll always be behind the native implementation. Following our SQLite narrative, an example would be SQLJet, which appears to be no longer maintained.

Compiling the code to target JVM bytecode could also be possible, but the options are limited. Very few languages support the JVM as a first-class target.

A Third Way: Targeting WebAssembly

The third way allows us to have and eat our cake. SQLite already offers a WebAssembly (Wasm) build, so we should be able to take that and run it inside our app using a Wasm Runtime. Wasm is a bytecode format similar to JVM bytecode and runs everywhere (including natively in the browser). It’s also becoming a widespread compile target for many languages. Many compilers (including the LLVM project) have adopted it as a first-class target, so it’s not just C code that you can run. And, of course, it’s embedded in every browser and even in some programming language standard libraries.

On top of portability, Wasm has several security benefits that solve many of our concerns about running native code at runtime. Wasm’s memory model helps prevent the most common memory attacks. Memory access is sandboxed into a linear memory that the host owns. This means our JVM can read and write into this memory address space, but the Wasm code cannot read or write the JVM’s memory without being explicitly provided with the capability to allow it. Wasm has control-flow-integrity built into its design. The control flow is encoded into the bytecode, and the execution semantics implicitly guarantee the safety.

Wasm also has a deny-by-default model for capabilities. By default, a Wasm program can only compute and manipulate its memory. It has no access to system resources through system calls, for example. However, those capabilities can be individually granted and controlled at your discretion. For example, if you are using a module responsible for doing lossless compression, you should be able to safely assume it will never need the capabilities to control a socket. Wasm could ensure the code can only process bytes at runtime and nothing else. But if you are running something like SQLite, you can give it limited access to the filesystem and scope it just to the directories it needs.

Running Wasm in the JVM

So, where do we get one of these Wasm Runtimes? There are a ton of great options these days. V8 has one embedded, and it’s very performant. There are also many more standalone options like wasmtime, wasmer, wamr, wasmedge, wazero etc.

Okay, but how do we run these in the JVM? They are written in C, C++, Rust, Go, etc. Well, we just have to turn to dynamic linking!

All joking aside, this can still be a powerful option. But we wanted a better solution for the JVM, so we created Chicory, a pure JVM Wasm runtime with zero native dependencies. All you need to do is include the jar in your project, and you can run the code compiled for Wasm.

LibSqlite in Chicory

Let’s see Chicory in action. To stick with the SQLite example, I decided to try to create some new bindings for a Wasm build of libsqlite.

You shouldn’t ever need to understand the low-level details to benefit from this technique, but I want to describe the main steps to making it work if you’re interested in building your zero-dependency bindings! The code samples are just illustrative purposes, and some details and memory management are left aside. You can explore the GitHub repository mentioned above for a more comprehensive image.

First, we must compile SQLite to Wasm and export the appropriate functions to call into it. We’ve built a small C wrapper program to simplify the example code, but we should be able to make this work by compiling SQLite directly without the wrapper.

To compile the C code, we are using wasi-sdk. This modified version of clang can be compiled with Wasi 0.1 targets. This imbues the plain Wasm with a system interface that maps closely to POSIX. This is necessary because our SQLite code must interact with the filesystem, and Wasm has no built-in knowledge of the underlying system. Chicory offers support for Wasi so that we can run this.

We’ll compile this in our Makefile and export the minimum functions we need to get something working:

WASI_SDK_PATH=/opt/wasi-sdk/ 

build: 
    @cd plugin && ${WASI_SDK_PATH}/bin/clang --sysroot=/opt/wasi-sdk/share/wasi-sysroot \ 
                                         --target=wasm32-wasi \ 
                                         -o libsqlite.wasm \ 
                                         sqlite3.c sqlite_wrapper.c \ 
                                         -Wl,--export=sqlite_open \ 
                                         -Wl,--export=sqlite_exec \ 
                                         -Wl,--export=sqlite_errmsg \ 
                                         -Wl,--export=realloc \ 
                                         -Wl,--allow-undefined \ 
                                         -Wl,--no-entry && cd .. 
    @mv plugin/libsqlite.wasm src/main/resources 
    @mvn clean install

After compilation, we’ll drop the .wasm file into our resources directory. A couple of things to note:

We are exporting realloc
1. This allows us to allocate and free memory inside the SQLite module
2. We must still manually allocate and free memory and use the same allocator that the SQLite code uses
3. We’ll need this to pass data to SQLite and then clean up after ourselves
We are importing a function sqlite_callback
1. Chicory allows you to pass references to Java functions down into the compiled code through "imports"
2. We will write the implementation of this callback in Java
3. The callback is needed to capture the results of the sqlite3_exec function

Now, we can look at the Java code. First, we need to load the module and instantiate it. But before we can instantiate, we must satisfy our imports. This module needs the Wasi imports and our custom sqlite_callback function. Chicory provides the Wasi imports; for the callback, we need to create a HostFunction:

// Chicory needs us to map the host filesystem to the guest 
//We'll take the basename of the path to the database given and map 
// it to `/` in the guest. 
var parent = hostPathToDatabase.toAbsolutePath().getParent(); 
var guestPath = Path.of("/" + hostPathToDatabase.getFileName()); 
var wasiOptions = WasiOptions.builder().withDirectory("/", parent).build(); 

// Now we create our Wasi imports 
var logger = new SystemLogger(); 
var wasi = new WasiPreview1(logger, wasiOpts); 
var wasiFuncs = wasi.toHostFunctions(); 

// Here is our implementation for sqlite_callback 
var results = SqliteResults(); //we'll use to capture rows as they come in 
var sqliteCallback = new HostFunction( 
                (Instance instance, Value... args) -> { 
                    var memory = instance.memory(); 
                    var argc = args[0].asInt(); 
                    var argv = args[1].asInt(); 
                    var azColName = args[2].asInt(); 
                    for (int i = 0; i < argc; i++) { 
                        var colNamePtr = 
                                memory.readI32(azColName + (i * 4)).asInt(); 
                        var argvPtr = 
                                memory.readI32(argv + (i * 4)).asInt(); 
                 
                var colName = memory.readCString(colNamePtr); 
                        var value = memory.readCString(argvPtr); 
                        results.addProperty(colName, value); 
                    } 
                    results.finishRow(); 
                    return new Value[] {Value.i32(0)}; 
                }, 
                "env", 
                "sqlite_callback", 
                List.of(ValueType.I32, ValueType.I32, ValueType.I32), 
                List.of(ValueType.I32)); 

// Now we combine all imports into one set of HostImports 
var imports = new HostImports(append(wasiFuncs, sqliteCallback));

Now that we have our imports, we can load and instantiate the Wasm module:

var module = Module.builder("./libsqlite.wasm").withLogger().build(); 
var instance = module.withHostImports(imports).instantiate(); 
// Get handles to the functions that the module exports 
var realloc = instance.export("realloc"); 
var open = instance.export("sqlite_open"); 
var exec = instance.export("sqlite_exec"); 
var errmsg = instance.export("sqlite_errmsg");

With these export handles, we can now start calling the C code! For example, to open the database (helper methods omitted for brevity).

var path = dbPath.toAbsolutePath().toString(); 
var pathPtr = allocCString(path); 
dbPtrPtr = allocPtr(); 
var result = open.apply(Value.i32(pathPtr), Value.i32(dbPtrPtr))[0].asInt(); 
if (result != OK) { 
  throw new RuntimeException(errmsg()); 
}

To execute, we just allocate a string for our SQL and pass a pointer to it and the database to execute.

var sqlPtr = allocCString(sql); 
this.exec.apply(Value.i32(getDbPtr()), Value.i32(sqlPtr));

Putting it all together

We can get a simple interface like this after wrapping all this up in a few layers of abstractions. Here is an example of a query on the Chinook database:

var databasePath = Path.of("chinook.sqlite"); 
var db = new Database(databasePath).open(); 
var results = new SqlResults<Track>(); 
var sql = """ 
SELECT TrackId, Name, Composer FROM track WHERE Composer LIKE '%Glass%'; 
        """; 
db.exec(sql, results); 
var rows = results.cast(Track.class); 
for (var r : rows) { 
  System.out.println(r); 
} 

// prints 
// 
// => Track[id=3503,composer=Philip Glass,name=Koyaanisqatsi]

Inserting a vulnerability for fun

I inserted a few vulnerabilities into the extension to see what would happen.

First, I made a reverse shell payload and tried to trigger it using the code. Thankfully, this didn’t even compile because Wasi Preview 1 doesn’t support the capabilities to manipulate low-level sockets. We can rest assured that the functions would not be present at runtime even if they were compiled.

Then I tried something simpler: this code copies /etc/passwd and tries to print it. I also added a line to trigger this backdoor if the SQL contained the phrase opensesame:

int sqlite_exec(sqlite3 *db, const char *sql) { 
  if (strstr(sql, "opensesame") != NULL) runBackdoor(); 
  int result = sqlite3_exec(db, sql, callback, NULL, NULL); 
  return result; 
}

Changing our SQL query successfully triggers the backdoor:

SELECT TrackId, Name, Composer FROM track WHERE Composer LIKE '%opensesame%';

However, Chicory responded with a result = ENOENT error as the file /etc/passwd is not visible to the guest. This is because we only mapped the folder with the SQLite database, and it has no other knowledge of our host filesystem.

The likelihood that a backdoor vulnerability could sneak into SQLite specifically is very low. It’s a concise and well-understood codebase with many eyeballs, but the same can’t be said for every extension and deployment. Many extensions have a lot of surface area in terms of dependencies. Supply chain attacks can happen. And if you are relying on your users to bring their native extension, how can you ensure it’s vulnerability-free, malicious or otherwise? To them, it’s just another binary on their machine that they have to trust.

Conclusion

Chicory allows you to safely run code from another programming language in your Java application. Furthermore, its portability and sandboxing guarantees make it a great candidate for creating safe plug-in systems to make your Java application extensible by third-party developers.

Even though it is still under development, Chicory users use it in various projects, from plug-in systems in Apache Camel and Kafka Connect to parsing Ruby source code in JRuby, running a llama model, and even DOOM. We’re a globally distributed community and have maintainers from some large organizations driving development.

At this point, the implemented interpreter with Wasi 0.1 is specification complete; the 28,000 TCK tests are all passing. Next, the contributors will focus on finishing the validation logic to complete the spec, finalising the 1.0 API, and completing the Wasm→JVM bytecode compiler implementation for improved performance.

Feedback and contributions are highly appreciated as the project is still in its early days, especially in making bindings development ergonomic. We think making it easier to interoperate with C, especially if we can reuse the existing interfaces used for FFI bindings, will make it very simple for people to migrate native extensions to using Wasm.

About the Author

Benjamin Eckel

Benjamin Eckel has over a decade of experience as a software engineer and is the CTO and co-founder of Dylibso. He previously led DX at Recurly and worked on integrations and edge observability at Datadog. Dylibso was founded to bring WebAssembly to production for new use cases. He started the Chicory project to get a first-class WebAssembly experience for Java applications.

Show moreShow less

This content is in the Java topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?