# java-performance > Java performance optimization for low-latency systems. Use when analyzing JMH benchmarks, optimizing hot paths, implementing lock-free algorithms, tuning JVM flags, or working with concurrent data structures. Triggers on performance issues, benchmark analysis, VarHandle/UNSAFE usage, memory barriers, false sharing, MPSC queues, or JCTools patterns. - Author: milesfuller - Repository: fullerstack-io/fullerstack-humainary - Version: 20260103053230 - Stars: 3 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/fullerstack-io/fullerstack-humainary - Web: https://mule.run/skillshub/@@fullerstack-io/fullerstack-humainary~java-performance:20260103053230 --- --- name: java-performance description: Java performance optimization for low-latency systems. Use when analyzing JMH benchmarks, optimizing hot paths, implementing lock-free algorithms, tuning JVM flags, or working with concurrent data structures. Triggers on performance issues, benchmark analysis, VarHandle/UNSAFE usage, memory barriers, false sharing, MPSC queues, or JCTools patterns. --- # Java Performance Optimization ## Core Principles 1. **Measure first** - Use JMH for microbenchmarks; never guess where time goes 2. **Understand the hardware** - CPU cache lines (64 bytes), memory barriers, branch prediction 3. **Minimize allocations on hot paths** - But don't assume allocation is the problem without proof 4. **Lock-free when possible** - VarHandle atomics, CAS loops, intrusive data structures ## Quick Reference ### VarHandle Atomic Operations ```java // Setup private static final VarHandle FIELD; static { FIELD = MethodHandles.lookup().findVarHandle(MyClass.class, "field", Type.class); } // Operations (strongest to weakest memory ordering) FIELD.getAndSet(this, newVal); // Full fence, atomic swap FIELD.compareAndSet(this, exp, new); // CAS with full fence FIELD.getVolatile(this); // Acquire semantics FIELD.setVolatile(this, val); // Release semantics FIELD.getOpaque(this); // No reordering, no fence FIELD.setRelease(this, val); // Release only (weaker than volatile) FIELD.getAcquire(this); // Acquire only (weaker than volatile) ``` ### False Sharing Prevention ```java // Pad between producer and consumer fields (128 bytes for safety) @SuppressWarnings("unused") private long p0, p1, p2, p3, p4, p5, p6, p7; // 64 bytes @SuppressWarnings("unused") private long p8, p9, p10, p11, p12, p13, p14; // 56 bytes (120 total) ``` ### MPSC Queue Pattern (JCTools-style) ```java // Producer: atomic swap + link Node prev = (Node) HEAD.getAndSet(this, newNode); prev.next = newNode; // or use lazySet for weaker ordering // Consumer: drain with spin-wait for visibility Node node = head; Node next = node.next; if (next == null && node != tail) { // Spin: producer swapped but hasn't linked yet } ``` ## Detailed References - **JMH benchmarking**: See [references/jmh-patterns.md](references/jmh-patterns.md) for benchmark setup, pitfalls, and interpretation - **Low-latency patterns**: See [references/low-latency.md](references/low-latency.md) for lock-free algorithms, memory barriers, intrusive structures - **JVM tuning**: See [references/jvm-tuning.md](references/jvm-tuning.md) for GC selection, flags, and profiling ## Common Pitfalls | Pitfall | Solution | |---------|----------| | System.nanoTime() in hot path | Use JMH instead (~20-30ns overhead per call) | | Separate volatile read+write | Use atomic getAndSet (single operation) | | No padding between thread-local fields | Add 128 bytes padding | | Blocking in callbacks | Use virtual threads or async | | Reversing LIFO to FIFO on every drain | Consider true FIFO queue design |