Unpacking JetStream 3: How Browser Benchmarks Evolve with Modern Web Demands

By • min read

Benchmarks are vital tools for browser engine developers, but as the web evolves, benchmarks must too. The recent release of JetStream 3.0, a collaborative effort by Google, Mozilla, and WebKit, marks a significant update. This Q&A explores the key changes, especially regarding WebAssembly and the “infinity problem” that exposed the limitations of previous versions.

Why was JetStream 3.0 necessary?

Benchmarks are among the best tools for driving performance improvements in browser engines, but they can quickly become outdated as web technologies evolve. The original JetStream 2 suite focused on workloads common at its release, such as large C/C++ applications compiled to WebAssembly (Wasm) with long startup times. Over the years, browser engines optimized these paths so aggressively that startup times for many Wasm modules dropped below measurable thresholds. This made the old benchmarks less useful for guiding real-world performance gains. JetStream 3.0 refreshes the suite to reflect modern best practices, especially in how Wasm is used across libraries, image decoders, and UI frameworks. It also shifts from measuring isolated microbenchmarks to evaluating more holistic, application-like scenarios, ensuring that optimizations benefit actual user experiences rather than just synthetic tests.

Unpacking JetStream 3: How Browser Benchmarks Evolve with Modern Web Demands — Source: webkit.org

What key improvements over a simply “faster” benchmark does JetStream 3 prioritize?

JetStream 3 isn’t just about making things faster—it fundamentally changes how performance is measured. The new suite emphasizes real-world relevance by incorporating workloads that reflect how developers actually use WebAssembly today. Instead of scoring startup and runtime separately, it integrates both into a seamless metric that captures the full user experience. It also addresses the “infinity problem” (see Q3) by using more precise timing methods and scoring formulas that avoid distortion from near-zero times. Additionally, JetStream 3 scales workloads to match the complexity of modern web applications, including larger data sets and more intricate logic. This ensures that optimizations derived from the benchmark translate to tangible speedups in production sites, rather than just improving scores on a synthetic test.

How does JetStream 3 measure WebAssembly differently than JetStream 2?

JetStream 2 measured WebAssembly in two distinct phases: Startup and Runtime. This approach assumed users would tolerate long one-time startup costs for high throughput, which was typical for early Wasm adopters like large C/C++ games. However, as browser engines optimized startup times to sub-millisecond levels, the separate phases became misleading. JetStream 3 blends startup and runtime into a unified scoring mechanism that reflects how Wasm is used on the modern web—where it appears in critical paths, such as image decoding or UI frameworks. The new suite uses high-resolution timing (e.g., performance.now()) instead of Date.now() to avoid rounding issues. It also employs a scoring formula that prevents extreme values (like infinity) from dominating overall scores, ensuring fair comparisons across different subtests.

What was the “infinity problem” in JetStream 2, and why did it happen?

The “infinity problem” arose because JetStream 2 used the formula Score = 5000 / Time for each subtest, with time measured using Date.now(). This function rounds down to the nearest millisecond, so any execution time below 1 ms became 0 ms. Since dividing by zero yields infinity, the subtest score would become infinite, rendering all other scores irrelevant. This was not just a theoretical issue: WebKit optimized startup times so effectively that for many small Wasm workloads, the instantiation time effectively hit zero. The team had to patch JetStream 2.2 to clamp the maximum score to 5000, but that was a band-aid. The underlying problem showed that browsers had outgrown the benchmark’s assumptions, making a new version essential for meaningful measurement.

How did browser engines like WebKit contribute to solving the infinity problem?

WebKit’s JavaScriptCore team tackled the infinity problem through a combination of engineering optimizations and benchmark redesign. On the optimization side, they streamlined Wasm instantiation paths, eliminating redundant checks and reducing memory allocation overhead. This brought startup times down dramatically, but also highlighted the need for better benchmarking. For JetStream 3, WebKit collaborated with Google and Mozilla to define new scoring methods that use high-resolution timers (like performance.now()) and adopt a formula that doesn’t produce infinity from near-zero times. They also introduced workload scaling, ensuring that typical Wasm use cases involve tasks large enough to yield measurable durations. These changes mean that even if a browser achieves extremely fast startup, it won’t break the benchmark’s scoring, and the benchmark remains a valid tool for driving further improvements.

What does the shift in WebAssembly benchmarking tell us about modern web usage?

The evolution from JetStream 2 to JetStream 3 mirrors how WebAssembly’s role on the web has changed. Initially, Wasm was used primarily for heavy compute tasks like gaming or scientific simulations, where users accepted long load times for sustained performance. Today, Wasm is embedded in critical-path libraries—for example, image decoders, compression algorithms, and UI frameworks—where every millisecond matters. A “zero” startup time in a microbenchmark no longer represents success; instead, it signals that the benchmark itself is outdated. The shift to integrated, application-like workloads shows that developers and browser engineers now care deeply about end-to-end responsiveness, not just raw throughput. This reflects a maturing ecosystem where Wasm is a core part of the web platform, and benchmarks must keep pace with real-world usage patterns.

What engineering changes did JavaScriptCore make to improve Wasm performance in JetStream 3?

In preparation for JetStream 3, the JavaScriptCore team focused on optimizing the entire Wasm pipeline, not just startup. They improved the compilation tier system to better predict which functions need immediate optimization versus later tiering. They also reduced memory overhead during module instantiation by reusing shared structures and eliminating redundant parsing. For runtime performance, they enhanced the Just-In-Time (JIT) compiler to generate more efficient machine code for Wasm operations, especially for loops and function calls. Additionally, they worked on concurrency improvements, allowing Wasm validation and compilation to happen off the main thread without blocking user interaction. These changes, combined with the benchmark’s new scoring model, ensure that JetStream 3 rewards genuine efficiency gains rather than narrow tweaks, and WebKit’s scores reflect real-world speedups that benefit web users.

How does JetStream 3 ensure its benchmarks remain relevant for current and future web applications?

JetStream 3 incorporates modular, updatable components that reflect evolving web standards. The suite is designed to be extended as new Wasm features (like threading or SIMD) become mainstream, and it includes workloads that stress these capabilities. It also uses a weighted scoring system that prioritizes realistic scenarios over artificial microbenchmarks. By collaborating across browser vendors, the developers ensure that no single engine can game the test—optimizations must benefit actual web content. The use of high-resolution timing and clamping formulas prevents distortions from extreme improvements. Finally, JetStream 3’s open-source nature allows the community to propose new subtests that match upcoming use cases, such as WebAssembly GC or component model interactions. This forward-looking design keeps the benchmark a reliable yardstick for browser performance well into the future.