WASM GC isn't ready for realtime graphics

113 points by todsacerdoti 15 hours ago | 46 comments

Not surprising tbh, no automatic memory management solution is ready for realtime graphics - it always requires a lot of manual care to reduce memory management overhead (for instance working around the GC instead of using it), which then kinda defeats the purpose of the 'automatic' in automatic memory management. There simply is no memory management silver bullet when performance matters.

pjmlp 4 hours ago | prev | next |

To be fair neither are WebGL and WebGPU, versus the native API counterparts, the best you can get are shadertoy demos, and product visualisation on ecommerce sites.

Due to tooling, sandboxing and not having any control about what GPU gets selected, or why the browser blakckboxes it and switches into software rendering.

chilmers 19 minutes ago | root | parent |

Figma uses WebGL for rendering and they seem to be doing ok.

weinzierl 9 hours ago | prev | next |

Wasn't WASM GC a prerequisite for getting direct DOM access from WASM? Does progress for WASM GC mean progress for DOM access as well?

Every time I check back on that the initiative seems to run under a different name. What is the best way to track progress on that front?

josephg 8 hours ago | root | parent |

It’s not a prerequisite for using the DOM from wasm.

See, for example, the rust web frameworks of leptos and dioxus. They’re honestly great, and usable today as replacements for react and friends. (With the single caveat that wasm bundle size is a bit bigger than .js size).

They work by exposing a number of browser methods through to wasm, and then they call them through a custom wasm/JS API bridge. All rust objects and DOM objects are completely isolated. Rust objects are allocated via an embedded malloc implementation and JS objects are managed by V8 (or whatever). but the DOM can still be manipulated via (essentially) message passing over an RPC like interface.

But the rust code needs to compile malloc specially for wasm. This is ok in rust - malloc is 75kb or something. But in languages like C#, Go or Python, the runtime GC is much bigger and harder to fit in a little wasm bundle.

The upside of wasm-gc is that this divide goes away. Objects are just objects, shared between both languages. So wasm bundles can use & reference JS/DOM objects directly. And wasm programs can piggyback on V8’s GC without needing to ship their own. This is good in rust, and great in GC languages. I saw an example with blazor where a simple C# wasm todo app went from 2mb or something to 10kb when wasmgc was used.

TLDR: wasm-gc isn’t strictly needed. You can use DOM from wasm today. It just makes wasm bundles smaller and wasm-dom interaction easier (and theoretically faster).

weinzierl 35 minutes ago | root | parent | next |

That's why I wrote direct DOM access above. Sure, we can load extra JS in an addition to WASM and and funnel everything through JS. Some say, it does not matter.

I think it does, but it is hard to track the initiatives that tackle this. That's why I'm asking.

flohofwoe 27 minutes ago | root | parent |

WASM-GC is essentially a way to hook into an externally provided garbage collector, it doesn't help much with calling into web APIs.

The DOM has been designed as a JS API (for better or worse), accessing that from WASM will always require to go through some FFI layer (this layer may be hidden and automatically created at runtime, but it still needs to exist).

The question is just how much marshalling needs to happen in that FFI layer. Making the DOM (or other Web APIs) actually WASM friendly would require an alternative DOM API which looks more like a C API.

There's also a middle-way of adding explicit 'garbage free' functions to web APIs which allow to be called with less overhead in the JS engine, for instance WebGPU has such functions, and they were specifically added to reduce marshalling and GC overhead when called from WASM.

E.g. GC-free Web APIs would be much more useful than GC support in WASM for interacting with the browser side. GC support in WASM is mainly useful for languages that depend on garbage collection, because those can delegate the GC runtime tasks to the JS engine's garbage collector.

Out of curiosity, why is malloc 75kb ? That seems like an crazy amount of code (if this after linking and dead code removal for platform specific magic?)

kvdveer 36 minutes ago | root | parent |

Malloc can indeed be implemented in a handful of bytes, but that's nog going to perform well.

Doing malloc well is actually quite a bit of work. You need to group allocations by size, manage alignment, request and release pages from the OS/browser implement reallocate, etc. A typical malloc implementation is actually a combination of several different allocation strategies.

flohofwoe 14 minutes ago | root | parent |

The best solution is to reduce the amount of alloc/free calls in your code, then you can use a slow-and-small allocator like emmalloc: https://github.com/emscripten-core/emscripten/blob/main/syst...

(e.g. if memory management overhead shows up in the profiler, the proper solution is not to go looking for a faster general-purpose allocator, but to reduce the amount of calls into the allocator)

quotemstr 8 hours ago | root | parent | prev |

Wasm GC also solved the problem of reference cycles between objects in disparate heap managers leading to memory leaks. It's not just a performance or size win: it's a correctness win.

out_of_protocol 11 hours ago | prev | next |

Really liked NaCl (and PNaCl) idea, which allows running arbitrary code, sanitized, with ~90% speed of native execution. Playing Bastion game in browser was refreshing. Unfortunately communication with js code and bootstrap issues (can't run code without plugin, no one except chrome supported this) ruined that tech

thegeomaster 10 hours ago | root | parent | next |

WASM nowadays has become quite the monstrosity compared to NaCl/PNaCl. Just look at this WASM GC spaghetti, trying to compile a GC'd language but hooking it up V8/JavaScriptCore's GC, while upholding a strict security model... That sounds like it won't cause any problems whatsoever!

Sometimes I wonder if the industry would have been better off with NaCl as a standard. Old, mature tooling would by and large still be applicable (it's still your ordinary x86/ARM machine code) instead of the nascent and buggy ecosystem we have now. I don't know why, but the JS folks just keep reinventing everything all the time.

duskwuff 8 hours ago | root | parent | next |

> Old, mature tooling would by and large still be applicable (it's still your ordinary x86/ARM machine code)

It wasn't, though. Since NaCl ran code in the same process as the renderer, it depended upon a verifier for security, and required the generated code to follow some unusual constraints to support that verification. For example, on x86, all branch targets were required to be 32-byte aligned, and all indirect branches were required to use a specific instruction sequence to enforce that alignment. Generating code to meet these constraints required a modified compiler, and reduced code density and speed.

In any case, NaCl would have run into the exact same GC issues if it had been used more extensively. The only reason it didn't was that most of the applications it saw were games which barely interacted with the JS/DOM "world".

thegeomaster 7 hours ago | root | parent |

I simplified in my comment. It was a much better story for tooling, since you could reuse large parts of existing backends/codegen, optimization passes, and debugging. The mental model of execution would remain too, rather than being a weird machine code for a weird codesize-optimized stack machine.

I would wager the performance implications of NaCl code, even for ARM which required many more workarounds than x86 (whose NaCl impl has a "one weird trick" aura), were much better than for modern WASM.

It's hard to say if it would've run into the same issues. For one, it would've been easier to port native GCs: they don't run afoul of W^X rules, they just read memory if that, which you can do performantly in NaCl on x86 due to the segments trick. I also suspect the culture could've more easily evolved towards shared objects where you would be able to download/parse/verify a stdlib once, and then keep using it.

I agree it was because the applications were games, but for another second-order reason: they were by and large C/C++ codebases where memory was refcounted manually. Java was probably the second choice, but those were the days when Java applets were still auto-loading, so there was likely no need for anybody to try.

duskwuff 5 hours ago | root | parent |

> It's hard to say if it would've run into the same issues. For one, it would've been easier to port native GCs...

WASM GC isn't just about memory management for the WASM world; it's about managing references (including cyclical refs!) which cross the boundary into the non-WASM world. Being able to write a GC within the WASM (or NaCl) world doesn't get you that functionality.

I'm reminded of writing JavaScript way back in the old Internet Explorer days (6 and to a lesser extent 7), when you had to manually null out any references to DOM elements if you were done with them, or else the JS and the DOM nodes wouldn't get garbage collected because IE had two different garbage collectors and cycles between them didn't get collected immediately.

Vampiero 7 hours ago | root | parent | prev |

> I don't know why, but the JS folks just keep reinventing everything all the time.

It's because they only know the web. They have never seen seen what real programmers actually do. They only live in their stupid web bubble thinking it's all there is.

Same here, and the irony is Mozzilla opposing it hardly matters nowadays for the Firefox browser market, it is Google driving where WebAssembly goes.

Remember NaCL, and PNaCL SDKs, came with support for C, C++ and OCaml, the latter being an example for GC languages.

CyberDildonics 8 hours ago | root | parent | prev |

What does this have to do with wasm gc?

sestep 11 hours ago | prev | next |

I was excited to read this post because I haven't yet tried WasmGC for anything beyond tiny toy examples, but was disappointed to find no actual numbers for performance. I don't know the author well enough to be able to assess their assertions that various things are "slow" without data.

davexunit 5 hours ago | root | parent |

This was a quickie post for me. Just a tale of what my experience has been. No in-depth, apples to apples comparison so I don't know the magnitude of the performance differential.

o11c 8 hours ago | prev | next |

> Unsatisfying workarounds [...] Use linear memory for bytevectors

It never makes sense to use GC for leaf memory if you're in a language that offers both, since mere refcounting (or a GC'ed object containing a unique pointer) is trivial to implement.

There are a lot of languages where it's expensive to make the mistake this post is making. (I don't know much about WASM in particular; it may still have other errors).

davexunit 8 hours ago | root | parent |

Sorry but it's just a different choice not a mistake. I do realtime graphics just fine in non-web managed memory languages.

simon_void 10 hours ago | prev | next |

so what about realtime graphics with wasm without GC? (compiled from languages not needing a GC like Rust, C/C++, Odin, ...)

pjmlp 4 hours ago | root | parent | next |

Better, but WebGPU and WebGL aren't going to win any performance prizes either, and tooling is pretty much non existent.

Nothing like Pix, Instruments or Renderdoc, SpectorJS is the only thing you get after almost 15 years since WebGL 1.0.

And from the hardware level they support, it about PlayStation 3 kind of graphics, if the browser doesn't block the GPU, nor selects the integrated one instead of dedicated one.

Your are left with shaders as the only way to actually push the hardware.

davexunit 10 hours ago | root | parent | prev |

As mentioned, that works quite well already but it's not the topic of this post.

Dwedit 8 hours ago | prev | next |

I just wish WASM could use more than one ArrayBuffer at a time. Would eliminate unnecessary copying for interop with JS code.

Dwedit 2 hours ago | root | parent |

Well I just thought of something obvious... Have a function that lets you pass in an ArrayBuffer, then it brings it into the virtual address space of the WASM program. Function would return the virtual address that was assigned to that array buffer. From there, you call into WASM again with that pointer, and the program can take action.

Then there would be another function to relinquish ownership of the ArrayBuffer.

throwaway290 2 hours ago | root | parent |

There's no SharedArrayBuffer support? Or I misunderstand the idea

flohofwoe 10 minutes ago | root | parent |

There is, but then you'd need to declare the entire WASM heap as a single SharedArrayBuffer. It only makes sense for shared-memory multithreading (but not that support for SharedArrayBuffer only works in 'cross-origin isolated' contexts).

amelius 10 hours ago | prev | next |

Shouldn't it be possible to implement your own GC in WASM? Why does WASM try to be everything?

davexunit 10 hours ago | root | parent | next |

Slower, single threaded, greatly increases binary size, separate heap from JS so bad interop with extern refs. Wasm GC is a great thing.

You can't GC together with the host environment if you do a custom GC (i.e. a wasm object and a JS object in a cycle wouldn't have any way to ever be GC'd).

yes, it's regularly done. But I think you are misunderstanding. WASM GC isn't a GC implementation.

fulafel 4 hours ago | root | parent | prev |

Yes, this is how it's done eg with Python and Go.

An advantage of a common GC could be interop between languages.

kevingadd 12 hours ago | prev |

It's sort of baffled me that people appear to be shipping real code using WasmGC since the limitations described in this post are so severe. Maybe it's fine because they're just manipulating DOM nodes? Every time I've looked at WasmGC I've gone "there's no way I could use this yet" and decided to check back a year later and see if it's There Yet.

Hopefully it gets there. The uint8array example from this post was actually a surprise to me, I'd just assumed it would be efficient to access a typed array via WasmGC!

Beyond the limitations in this post there are other things needed to be able to target WasmGC with existing stuff written in other languages, like interior references or dependent handles. But that's okay, I think, it can be worthwhile for it to exist as-is even if it can't support i.e. existing large-scale apps in memory safe languages. It's a little frustrating though.

wffurr 7 hours ago | root | parent | next |

>> The uint8array example from this post was actually a surprise to me, I'd just assumed it would be efficient to access a typed array via WasmGC!

The problem is that the Scheme i8 array is not actually a UInt8Array with WasmGC. It’s a separate heap allocated object that is opaque to the JS runtime.

In the linear memory Wasm model, the Scheme i8 array is allocated in the wasm memory array, and so one can create an UInt8Array view that exactly maps to the same bytes in the linear memory buffer. This isn’t possible (yet?) with the opaque WasmGC object type.

davexunit 7 hours ago | root | parent |

Yes, that's right. I'm hoping there will be a way to do this in a future revision of Wasm GC.

Definitely a lot is missing, yeah, and adding more will take time. But it works well already for pure computational code. For example, Google Sheets uses WasmGC for Java logic:

https://web.dev/case-studies/google-sheets-wasmgc#the_final_...

refulgentis 10 hours ago | root | parent | prev |

I've been shipping a Flutter app that uses it for months. Pretty heavy stuff, its doing everything from LLM inference to model inference to maintaining a vector store and indexeddb in your browser.

Frame latency feels like it's gone, there's 100% a significant decrease in perceived latency.

I did have a frustrating performance issues with 3rd party code doing "source code parsing" via RegEx, thought it was either the library or Flutters fault, but from the article content, sounds like it was WASM GC. (saw a ton of time spent converting objects from JS<->WASM on a 50 KLOC file)

From that perspective, the article sounds a bit maximalist in its claims, but only from my perspective.

I think if you read "real time graphics" as "3d game" it gives a better understanding of where it's at, my anecdata aside.

stiles11 an hour ago | root | parent | next |

What's the name of the app I want to try it out

When you said "jump in perceived latency", did you mean perceived latency went up or down?

refulgentis 9 hours ago | root | parent |

Down, significantly

ripped_britches 8 hours ago | root | parent | prev |

Which libraries caused these problems for you?

refulgentis 8 hours ago | root | parent |

Don't wanna name names, because it's on me, it's a miracle it exists, and works.

I don't think there's a significant # of alternatives, so hopefully Flutter syntax highlighting library, as used in a package for making markdown columns, is enough to be helpful.

Problem was some weird combo of lots of regex and an absolutely huge amount of code. It's one of those problems it's hard for me to draw many conclusions from:

- Flutter may be using browser APIs for regex, so there's some sort of JS/WASM barrier copying cost

- The markdown column renderer is doing nothing at all to handle this situation lazily, i.e. if any portion of the column is displayed, syntax highlighting must be done on the complete markdown input

- Each different color text, so pretty much every word, gets its own object in the view hierarchy, tens if not hundreds of thousands this case. Can't remember if this is due to the syntax highlighting library or the markdown package

- Regex is used to parse to code and for all I know one of them has pathological performance like backtracking unintentionally.