Fork of Paper which adds regionised multithreading to the dedicated server.
Go to file
Spottedleaf 31b5b1575b Use coordinate-based locking to increase chunk system parallelism
A significant overhead in Folia comes from the chunk system's
locks, the ticket lock and the scheduling lock. The public
test server, which had ~330 players, had signficant performance
problems with these locks: ~80% of the time spent ticking
was _waiting_ for the locks to free. Given that it used
around 15 cores total at peak, this is a complete and utter loss
of potential.

To address this issue, I have replaced the ticket lock and scheduling
lock with two ReentrantAreaLocks. The ReentrantAreaLock takes a
shift, which is used internally to group positions into sections.
This grouping is neccessary, as the possible radius of area that
needs to be acquired for any given lock usage is up to 64. As such,
the shift is critical to reduce the number of areas required to lock
for any lock operation. Currently, it is set to a shift of 6, which
is identical to the ticket level propagation shift (and, it must be
at least the ticket level propagation shift AND the region shift).

The chunk system locking changes required a complete rewrite of the
chunk system tick, chunk system unload, and chunk system ticket level
propagation - as all of the previous logic only works with a single
global lock.

This does introduce two other section shifts: the lock shift, and the
ticket shift. The lock shift is simply what shift the area locks use,
and the ticket shift represents the size of the ticket sections.
Currently, these values are just set to the region shift for simplicity.
However, they are not arbitrary: the lock shift must be at least the size
of the ticket shift and must be at least the size of the region shift.
The ticket shift must also be >= the ceil(log2(max ticket level source)).

The chunk system's ticket propagator is now global state, instead of
region state. This cleans up the logic for ticket levels significantly,
and removes usage of the region lock in this area, but it also means
that the addition of a ticket no longer creates a region. To alleviate
the side effects of this change, the global tick thread now processes
ticket level updates for each world every tick to guarantee eventual
ticket level processing. The chunk system also provides a hook to
process ticket level changes in a given _section_, so that the
region queue can guarantee that after adding its reference counter
that the region section is created/exists/wont be destroyed.

The ticket propagator operates by updating the sources in a single ticket
section, and propagating the updates to its 1 radius neighbours. This
allows the ticket updates to occur in parallel or selectively (see above).
Currently, the process ticket level update function operates by
polling from a concurrent queue of sections to update and simply
invoking the single section update logic. This allows the function
to operate completely in parallel, provided the queue is ordered right.
Additionally, this limits the area used in the ticket/scheduling lock
when processing updates, which should massively increase parallelism compared
to before.

The chunk system ticket addition for expirable ticket types has been modified
to no longer track exact tick deadlines, as this relies on what region the
ticket is in. Instead, the chunk system tracks a map of
lock section -> (chunk coordinate -> expire ticket count) and every ticket
has been changed to have a removeDelay count that is decremented each tick.
Each region searches its own sections to find tickets to try to expire.

Chunk system unloading has been modified to track unloads by lock section.
The ordering is determined by which section a chunk resides in.
The unload process now removes from unload sections and processes
the full unload stages (1, 2, 3) before moving to the next section, if possible.
This allows the unload logic to only hold one lock section at a time for
each lock, which is a massive parallelism increase.

In stress testing, these changes lowered the locking overhead to only 5%
from ~70%, which completely fix the original problem as described.
2023-05-14 19:46:24 -07:00
.github [ci skip] add issue templates (#15) 2023-03-30 12:53:49 -07:00
build-data Make PoiCompetitorScan region-safe 2023-03-12 15:25:06 -07:00
gradle/wrapper Update to 1.19.4 2023-03-23 06:55:09 -07:00
patches Use coordinate-based locking to increase chunk system parallelism 2023-05-14 19:46:24 -07:00
.editorconfig its working now 2021-06-12 20:04:04 +02:00
.gitattributes its working now 2021-06-12 20:04:04 +02:00
.gitignore Implement BlockEntity#updateTicks 2023-02-23 12:52:39 -08:00
build.gradle.kts [ci skip] Update paperweight (#72) 2023-05-07 09:17:25 -07:00
folia.png Add logo 2023-03-06 02:43:32 -08:00
gradle.properties Update Upstream (PaperMC/Paper@cbcdfd03e7) 2023-03-29 19:41:15 -07:00
gradlew Update to 1.19.4 2023-03-23 06:55:09 -07:00
gradlew.bat Update gradle wrapper 2022-12-13 14:42:49 -07:00
install.bat Add some scheduling API 2023-03-04 13:13:23 -08:00
jar.bat Shorter name for jar script 2023-02-28 06:54:53 -08:00
patch.bat Add some build scripts 2023-02-28 06:48:48 -08:00
PATCHES-LICENSE Add license. 2023-03-06 02:59:21 -08:00
PROJECT_DESCRIPTION.md Point to PaperMC documentation for most things - the rest needs moving over. 2023-03-30 13:02:43 -07:00
rb.bat Add some build scripts 2023-02-28 06:48:48 -08:00
README.md [ci skip] Update README.md (#38) 2023-04-10 21:53:48 -07:00
REGION_LOGIC.md Point to PaperMC documentation for most things - the rest needs moving over. 2023-03-30 13:02:43 -07:00
regiontodo.txt Add more thread checks to API 2023-03-25 16:26:54 -07:00
settings.gradle.kts [si ckip] Copy git warning from Paper (#54) 2023-04-20 08:28:39 -07:00



Fork of Paper which adds regionised multithreading to the dedicated server.

Overview

Folia groups nearby loaded chunks to form an "independent region." See the PaperMC documentation for exact details on how Folia will group nearby chunks. Each independent region has its own tick loop, which is ticked at the regular Minecraft tickrate (20TPS). The tick loops are executed on a thread pool in parallel. There is no main thread anymore, as each region effectively has its own "main thread" that executes the entire tick loop.

For a server with many spread out players, Folia will create many spread out regions and tick them all in parallel on a configurable sized threadpool. Thus, Folia should scale well for servers like this.

Folia is also its own project, this will not be merged into Paper for the foreseeable future.

A more detailed but abstract overview: Project overview.

FAQ

What server types can benefit from Folia?

Server types that naturally spread players out, like skyblock or SMP, will benefit the most from Folia. The server should have a sizeable player count, too.

What hardware will Folia run best on?

Ideally, at least 16 cores (not threads).

How to best configure Folia?

First, it is recommended that the world is pre-generated so that the number of chunk system worker threads required is reduced greatly.

The following is a very rough estimation based off of the testing done before Folia was released on the test server we ran that had ~330 players peak. So, it is not exact and will require further tuning - just take it as a starting point.

The total number of cores on the machine available should be taken into account. Then, allocate threads for:

  • netty IO :~4 per 200-300 players
  • chunk system io threads: ~3 per 200-300 players
  • chunk system workers if pre-generated, ~2 per 200-300 players
  • There is no best guess for chunk system workers if not pre-generated, as on the test server we ran we gave 16 threads but chunk generation was still slow at ~300 players.
  • GC Settings: ???? But, GC settings do allocate concurrent threads, and you need to know exactly how many. This is typically through the -XX:ConcGCThreads=n flag. Do not confuse this flag with -XX:ParallelGCThreads=n, as parallel GC threads only run when the application is paused by GC and as such should not be taken into account.

After all of that allocation, the remaining cores on the system until 80% allocation (total threads allocated < 80% of cpus available) can be allocated to tickthreads (under global config, threaded-regions.threads).

The reason you should not allocate more than 80% of the cores is due to the fact that plugins or even the server may make use of additional threads that you cannot configure or even predict.

Additionally, the above is all a rough guess based on player count, but it is very likely that the thread allocation will not be ideal, and you will need to tune it based on usage of the threads that you end up seeing.

Plugin compatibility

There is no more main thread. I expect every single plugin that exists to require some level of modification to function in Folia. Additionally, multithreading of any kind introduces possible race conditions in plugin held data - so, there are bound to be changes that need to be made.

So, have your expectations for compatibility at 0.

API plans

Currently, there is a lot of API that relies on the main thread. I expect basically zero plugins that are compatible with Paper to be compatible with Folia. However, there are plans to add API that would allow Folia plugins to be compatible with Paper.

For example, the Bukkit Scheduler. The Bukkit Scheduler inherently relies on a single main thread. Folia's RegionScheduler and Folia's EntityScheduler allow scheduling of tasks to the "next tick" of whatever region "owns" either a location or an entity. These could be implemented on regular Paper, except they schedule to the main thread - in both cases, the execution of the task will occur on the thread that "owns" the location or entity. This concept applies in general, as the current Paper (single threaded) can be viewed as one giant "region" that encompasses all chunks in all worlds.

It is not yet decided whether to add this API to Paper itself directly or to Paperlib.

The new rules

First, Folia breaks many plugins. To aid users in figuring out which plugins work, only plugins that have been explicitly marked by the author(s) to work with Folia will be loaded. By placing "folia-supported: true" into the plugin's plugin.yml, plugin authors can mark their plugin as compatible with regionised multithreading.

The other important rule is that the regions tick in parallel, and not concurrently. They do not share data, they do not expect to share data, and sharing of data will cause data corruption. Code that is running in one region under no circumstance can be accessing or modifying data that is in another region. Just because multithreading is in the name, it doesn't mean that everything is now thread-safe. In fact, there are only a few things that were made thread-safe to make this happen. As time goes on, the number of thread context checks will only grow, even if it comes at a performance penalty - nobody is going to use or develop for a server platform that is buggy as hell, and the only way to prevent and find these bugs is to make bad accesses fail hard at the source of the bad access.

This means that Folia compatible plugins need to take advantage of API like the RegionScheduler and the EntityScheduler to ensure their code is running on the correct thread context.

In general, it is safe to assume that a region owns chunk data in an approximate 8 chunks from the source of an event (i.e. player breaks block, can probably access 8 chunks around that block). But, this is not guaranteed - plugins should take advantage of upcoming thread-check API to ensure correct behavior.

The only guarantee of thread-safety comes from the fact that a single region owns data in certain chunks - and if that region is ticking, then it has full access to that data. This data is specifically entity/chunk/poi data, and is entirely unrelated to ANY plugin data.

Normal multithreading rules apply to data that plugins store/access their own data or another plugin's - events/commands/etc. are called in parallel because regions are ticking in parallel (we CANNOT call them in a synchronous fashion, as this opens up deadlock issues and would handicap performance). There are no easy ways out of this, it depends solely on what data is being accessed. Sometimes a concurrent collection (like ConcurrentHashMap) is enough, and often a concurrent collection used carelessly will only hide threading issues, which then become near impossible to debug.

Current API additions

To properly understand API additions, please read Project overview.

  • RegionScheduler, AsyncScheduler, GlobalRegionScheduler, and EntityScheduler acting as a replacement for the BukkitScheduler. The entity scheduler is retrieved via Entity#getScheduler, and the rest of the schedulers can be retrieved from the Bukkit/Server classes.
  • Bukkit#isOwnedByCurrentRegion to test if the current ticking region owns positions/entities

Thread contexts for API

To properly understand API additions, please read Project overview.

General rules of thumb:

  1. Commands for entities/players are called on the region which owns the entity/player. Console commands are executed on the global region.

  2. Events involving a single entity (i.e player breaks/places block) are called on the region owning entity. Events involving actions on an entity (such as entity damage) are invoked on the region owning the target entity.

  3. The async modifier for events is deprecated - all events fired from regions or the global region are considered synchronous, even though there is no main thread anymore.

Current broken API

  • Most API that interacts with portals / respawning players / some player login API is broken.
  • ALL scoreboard API is considered broken (this is global state that I've not figured out how to properly implement yet)
  • World loading/unloading
  • Entity#teleport. This will NEVER UNDER ANY CIRCUMSTANCE come back, use teleportAsync
  • Could be more

Planned API additions

  • Proper asynchronous events. This would allow the result of an event to be completed later, on a different thread context. This is required to implement some things like spawn position select, as asynchronous chunk loads are required when accessing chunk data out-of-region.
  • World loading/unloading
  • More to come here

Planned API changes

  • Super aggressive thread checks across the board. This is absolutely required to prevent plugin devs from shipping code that may randomly break random parts of the server in entirely undiagnosable manners.
  • More to come here

Maven information

  • Maven Repo (for folia-api):
<repository>
    <id>papermc</id>
    <url>https://repo.papermc.io/repository/maven-public/</url>
</repository>
  • Artifact Information:
<dependency>
    <groupId>dev.folia</groupId>
    <artifactId>folia-api</artifactId>
    <version>1.19.4-R0.1-SNAPSHOT</version>
    <scope>provided</scope>
</dependency>

License

The PATCHES-LICENSE file describes the license for api & server patches, found in ./patches and its subdirectories except when noted otherwise.

The fork is based off of PaperMC's fork example found here. As such, it contains modifications to it in this project, please see the repository for license information of modified files.