Blog

Breaking the Limits: How Folia Made 1,000 Player Minecraft Server a Reality

Check out the impressive results of the large-scale Folia test that took place on June 18th, 2023. Learn more about our findings and technical challenges in this post.

Introduction

Folia emerges as a promising fork of Paper, boasting an innovative implementation of regionalized multithreading on the server. Traditional Minecraft servers have always faced limitations when it came to player capacity, often struggling to support more than a few hundred players at a time. This is because Minecraft servers primarily rely on a single thread to handle all game logic and player interactions. 

Myself, Spottedleaf, and Michael conducted this test to evaluate and analyze Folia's performance and stability under various conditions. We would like to thank Tubbo for streaming this test event.

Our Test

We wanted to conduct a test with Folia and see how it can perform on “regular” hardware and configurations. The previous public test ran on absurdly powerful hardware, which would not be realistic in many of the use cases. However, it's important to note that this test only provides a glimpse into the potential of Folia and its regionalized multithreading capabilities.

The purpose of this test was to gather as much data as possible, while testing different game configurations and seeing how they performed. 

Configuration

Hardware

Neofetch on our test machine.

Neofetch on our test machine.

Our test was conducted on Hetzner’s AX102 with the following configuration:

Software

$ uname -a
Linux test-fsn1-game01 6.1.0-9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.27-1 (2023-05-08) x86_64 GNU/Linux

$ java -version
openjdk version "21-testing" 2023-09-19
OpenJDK Runtime Environment (build 21-testing-builds.shipilev.net-openjdk-jdk-shenandoah-b110-20230615)
OpenJDK 64-Bit Server VM (build 21-testing-builds.shipilev.net-openjdk-jdk-shenandoah-b110-20230615, mixed mode, sharing)

Minecraft

Our Minecraft server was running Minecraft 1.20.1 on Folia build 09d8e7d (Oops). The server ran with a 100 GiB heap allocated, and Shenandoah GC was used as the garbage collector. Furthermore, Spottedleaf and Michael decided that we should try generational Shenandoah GC in OpenJDK 21. 

Our conversation about Java 21.

Our conversation about Java 21.

Paper Configuration

config/paper-global.yml:

chunk-loading-basic:
  player-max-chunk-generate-rate: 40.0
  player-max-chunk-load-rate: 40.0
  player-max-chunk-send-rate: 40.0
chunk-system:
  io-threads: 2
  worker-threads: 1
misc:
  region-file-cache-size: 512
proxies:
  proxy-protocol: true
thread-regions:
  threads: 6

config/paper-world-defaults.yml:

environment:
  treasure-maps:
    enabled: false

Spigot Configuration

settings:
  netty-threads: 6

Bukkit Configuration

spawn-limits:
  monsters: 9
  animals: 7
  water-animals: 4
  water-ambient: 7
  water-underground-creature: 3
  axolotls: 3
  ambient: 4
ticks-per:
  monster-spawns: 30
  water-spawns: 30
  water-ambient-spawns: 30
  water-underground-creature-spawns: 30
  axolotl-spawns: 30
  ambient-spawns: 30

Minecraft Configuration

allow-nether=false
hide-online-players=true
max-players=1001
network-compression-threshold=-1
spawn-protection=0
simulation-distance=5
view-distance=8

JVM Flags

-Xms100G
-Xmx100G
-XX:+AlwaysPreTouch
-XX:+UnlockDiagnosticVMOptions
-XX:+UnlockExperimentalVMOptions
-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseLargePages
-XX:LargePageSizeInBytes=2M
-XX:+UseShenandoahGC
-XX:ShenandoahGCMode=generational
-XX:-ShenandoahPacing
-XX:+ParallelRefProcEnabled
-XX:ShenandoahGCHeuristics=adaptive
-XX:ShenandoahInitFreeThreshold=55
-XX:ShenandoahGarbageThreshold=30
-XX:ShenandoahMinFreeThreshold=20
-XX:ShenandoahAllocSpikeFactor=10
-XX:ParallelGCThreads=10
-XX:ConcGCThreads=3
-Xlog:gc*:logs/gc.log:time,uptime:filecount=15,filesize=1M
-Dchunky.maxWorkingCount=600

JMX flags were stripped.

Initial Thread Allocations

Total: 18

Tools

Methodology

The server was prepared with a 100k x 100k block pre-generated world. Our custom plugin distributed new players to the least-occupied region. We had predefined spawn points as shown below. The reasoning behind this was to prevent concentrated areas with a high number of players. Furthermore, Folia benefits from having multiple regions due to its regionalized multithreading implementation, allowing for better utilization of CPU resources and improved performance.

Spawn points plotted on a plane.

Spawn points plotted on a plane.

The test was presented as an event and was streamed by Tubbo. Players were spread into 49 different teams across the map. Each team consisted of around 20 players.

Results

The event started around 16:00 UTC and we were able to gather 1,000 players on the server. 

1,000 players shown in Grafana.

1,000 players shown in Grafana.

Shortly after we unfroze the players, we experienced some lag. This lasted for ~1 minute or so. We suspected that this may be due to the sudden player movements overwhelming the Netty threads. We peaked at almost 2 Gbps outbound then.

Network throughput graph on Grafana.

Network throughput graph on Grafana.

The server ran fine for a while until it didn't. We had 6 ticking regions, which were completely utilized. A normal Paper server ticks on a single thread, and can probably handle 100 players with optimized settings. By using the same logic, we should've had at least 10 threads for things to run smoothly. However, Folia has more overhead than a normal Paper server due to scheduling, so that should be kept in mind.

Output of /tps.

Output of /tps.

Despite the lag, the server ran surprisingly well. The server ran on a consumer-grade CPU and a commonly available hardware configuration. Sustaining 1,000 players at a playable TPS could've been possible with better thread allocations, but, we don't know for sure since we never got the chance to test that out. Our CPU usage was hovering around 10-14 logical cores out of 32. This meant that we could've potentially allocated more region threads and IO threads. However, we only had 16 physical cores available, so pushing the usage over 16 logical cores may have resulted in decreased performance, depending on how the workload is scheduled.

JVM CPU metrics in Grafana.

JVM CPU metrics in Grafana.

After a while, our server crashed. Preliminary analysis suggests that this may have been caused by an unforeseen bug in the custom patch implemented. The fix was supposed to be deployed before the test, but we built Folia from the wrong branch. Regardless, we also took the opportunity to increase the thread counts:

Soon after, the server was up and running with our patch applied. We had ~630 concurrent players and it performed well at a constant 20 TPS.

Smooth 20 TPS with 600 players.

Smooth 20 TPS with 600 players.

Output from /tps.

Output from /tps.

For fun, we enabled chat for a short duration and everyone got kicked with the following message:

Oops.

Oops.

This has not been investigated yet, but it is most likely related to incorrect handling of chat signing. It is unknown whether this is related to Folia.

We were using generational Shenandoah GC in Java 21. During the period when 1,000 players were online, we reached a maximum of ~7.9GB/s heap allocation and our GC was hovering around 2-3GB/s when averaged over a minute. 

Heap allocation graph.

Heap allocation graph.

GC throughput graph.

GC throughput graph.

During the entire test, our GC pauses were mostly fine. The median GC pause duration was ~3 ms. 

GC pauses graph.

GC pauses graph.

Conclusion

What Next?

Links

Thanks To

Supporting Folia & PaperMC

Interested in supporting the development of Folia and PaperMC software? See sponsors.

Cubxity

Written by Cubxity

Full-stack developer

Contact