I noticed during my stress testing that the total size of the
light list was far too large, which indicates many duplicates.
For me, this caused many GC problems which made stress testing
harder.
It turns out, it was possible for the light list recalculation
logic to occur _and_ the addition of the light list data from
the NBT data. Since there is no logic to de-duplicate this list,
every chunk load would re-add all light sources into the light
list and the light list would grow uncontrollably.
Since the recalculation logic would often run, I have
decided to solve this by discarding the data on disk and always
just calculating the list from the chunk data alone. Additionally,
I have applied an optimization from Vanilla 1.20 to avoid
searching sections without light sources by first checking the
palette for possible block sources.
Now my stress tests do not have issues with GC at all.
In 1.18, every chunk section is initialised to a non-null value
and recalcBlockCounts() is invoked for each section.
However, in a standard world, most sections are empty. In such cases,
recalcBlockCounts() would iterate over ever position - even though
the block data would all be air. To avoid this, we skip
searching the section unless the palette indicates there _could_ be
a non-air block state or non-empty fluid state.
Chunk loading initially showed that recalcBlockCounts() over
sections with a ZeroBitStorage data to to take ~20% of the process,
now it takes <1%.
Since the chunk load task was not scheduled, the entity/poi load
task fields will not be set, but the task complete counter
will not be adjusted. Thus, the chunk load task will not complete.
To resolve this, detect when the entity/poi tasks were not scheduled
and decrement the task complete counter in such cases.
It must be marked as completed during that lock hold since the
waiters field is set to null. Thus, any other thread attempting
a cancellation will fail to remove from waiters. Also, any
other thread attempting to cancel may set the completed field
to true which would cause accept() to fail as well.
Completion was always designed to happen while holding the
scheduling lock to prevent these race conditions. The code
was originally set up to complete while not holding the
scheduling lock to avoid invoking callbacks while holding the
lock, however the access to the completion field was not
considered.
Resolve this by marking the callback as completed during the
lock, but invoking the accept() function after releasing
the lock. This will prevent any cancellation attempts to be
blocked, and allow the current thread to complete the callback
without any issues.
Since multiple regions can exist, there are concurrent accesses
in this class. To prevent deadlock, the monitor is not held
when recalculating permissions, as Permissable holds its own
lock.
This fixes CMEs originating from this class.
The concurrent access occurs on the Netty IO threads when
serializing packets. Thus, it seems it was an oversight of
the implementator of this function as there are typically
more than one Netty IO thread.
Fixes https://github.com/PaperMC/Folia/issues/11
A significant overhead in Folia comes from the chunk system's
locks, the ticket lock and the scheduling lock. The public
test server, which had ~330 players, had signficant performance
problems with these locks: ~80% of the time spent ticking
was _waiting_ for the locks to free. Given that it used
around 15 cores total at peak, this is a complete and utter loss
of potential.
To address this issue, I have replaced the ticket lock and scheduling
lock with two ReentrantAreaLocks. The ReentrantAreaLock takes a
shift, which is used internally to group positions into sections.
This grouping is neccessary, as the possible radius of area that
needs to be acquired for any given lock usage is up to 64. As such,
the shift is critical to reduce the number of areas required to lock
for any lock operation. Currently, it is set to a shift of 6, which
is identical to the ticket level propagation shift (and, it must be
at least the ticket level propagation shift AND the region shift).
The chunk system locking changes required a complete rewrite of the
chunk system tick, chunk system unload, and chunk system ticket level
propagation - as all of the previous logic only works with a single
global lock.
This does introduce two other section shifts: the lock shift, and the
ticket shift. The lock shift is simply what shift the area locks use,
and the ticket shift represents the size of the ticket sections.
Currently, these values are just set to the region shift for simplicity.
However, they are not arbitrary: the lock shift must be at least the size
of the ticket shift and must be at least the size of the region shift.
The ticket shift must also be >= the ceil(log2(max ticket level source)).
The chunk system's ticket propagator is now global state, instead of
region state. This cleans up the logic for ticket levels significantly,
and removes usage of the region lock in this area, but it also means
that the addition of a ticket no longer creates a region. To alleviate
the side effects of this change, the global tick thread now processes
ticket level updates for each world every tick to guarantee eventual
ticket level processing. The chunk system also provides a hook to
process ticket level changes in a given _section_, so that the
region queue can guarantee that after adding its reference counter
that the region section is created/exists/wont be destroyed.
The ticket propagator operates by updating the sources in a single ticket
section, and propagating the updates to its 1 radius neighbours. This
allows the ticket updates to occur in parallel or selectively (see above).
Currently, the process ticket level update function operates by
polling from a concurrent queue of sections to update and simply
invoking the single section update logic. This allows the function
to operate completely in parallel, provided the queue is ordered right.
Additionally, this limits the area used in the ticket/scheduling lock
when processing updates, which should massively increase parallelism compared
to before.
The chunk system ticket addition for expirable ticket types has been modified
to no longer track exact tick deadlines, as this relies on what region the
ticket is in. Instead, the chunk system tracks a map of
lock section -> (chunk coordinate -> expire ticket count) and every ticket
has been changed to have a removeDelay count that is decremented each tick.
Each region searches its own sections to find tickets to try to expire.
Chunk system unloading has been modified to track unloads by lock section.
The ordering is determined by which section a chunk resides in.
The unload process now removes from unload sections and processes
the full unload stages (1, 2, 3) before moving to the next section, if possible.
This allows the unload logic to only hold one lock section at a time for
each lock, which is a massive parallelism increase.
In stress testing, these changes lowered the locking overhead to only 5%
from ~70%, which completely fix the original problem as described.
Instead, we can just check the loaded chunk's block position for
the lodestone block, as that is at least safe enough for the light
engine compared to the POI access. This should make it safe for
off-region access.
Fixes https://github.com/PaperMC/Folia/issues/60
In general, worldstate read/write is unacceptable during
data deserialization and is racey even in Vanilla. But in Folia,
some accesses may throw and as such we need to fix this directly.
Fixes https://github.com/PaperMC/Folia/issues/57
The returned TE may be in the world, in which case it is unsafe
for the current thread to modify or access its contents.
Fixes https://github.com/PaperMC/Folia/issues/52
This is to prevent block physics from tripping thread checks by
far exceeding the bounds of the current region. While this does
add explicit block update suppression techniques, it's better
than the server crashing.
Perform thread checks on the chunk send and warn when the
world is mismatched. I suspect that the world mismatches for
an unknown reason, but need to confirm it to chase it down.
Change overview:
- Rework limiting
- Remove mid tick updates
- Introduce consistency checks
The old limiting logic used an intervalled counter, but
did not account for possible slight changes in mid tick
invoke rate as it relied heavily on mid-tick logic. Due to
the removal of mid tick updates, it is now important that
the logic functions correctly no matter what rate it is invoked
at. The new logic directly tracks the last update time and
allocates an amount based proportional on the rate targetted,
which makes the logic call rate independent.
The removal of mid tick updates is done to eliminate recursive
call risk, and to additionally reduce the lock pressure on the
chunk system by grouping chunk loads onto one part of the tick
rather than spreading it out. The limiting rework should ensure
that this does not negatively affect rates, but it will decrease
the perceived smoothness of chunk generation/loading at low rates.
Introduce more consistency checks such as correct tick thread
and ticking-after-removal checks. Also, perform checks during the player
chunk loader tick to avoid updating potentially removed
players during the tick.
The checks are primarily made to try to hunt down a bug that
is causing the player chunk loader to double send a chunk
to a player.
Due to TPS catchup being removed, a lost tick will always
affect block breaking.
Additionally, constant low TPS would affect block breaking
in an intrusive manner to the user when there is no need
for that to occur.
The expected behavior is that the entity is only dismounted
_if_ the teleport takes place, not regardless of whether
the teleport takes place.
To adhere to the expected behavior, we need to create a new teleport
flag so that the NMS teleportAsync can perform the dismount.
Required to add some basic hacks to expose a regionstats object
in the TickRegions data class, but this ensures that the retrieval
is thread-safe and could possibly support other data exposure
for async reads.
Additionally, since DecimalFormat is not thread-safe we need
to use ThreadLocals to instantiate them. Change the format
as well to use commas to separate groups of digits when
formatting large numbers so they are easier to read.
For example, 1000 becomes 1,000.
Vanilla inserted the alive check so that dead players
could see the chunks around them, but in Folia we do not
remove dead players so chunks will still load for them.
This prevents the client from having chunks in memory it
should not.
The inventory was being cleared by removing the entity
from the world, but Folia changed the remove to be
before the entity copy which caused the new entity
to have a cleared inventory. Fix this by creating
a post-dimension copy callback that will clear
the inventory of the old entity.
Fixes https://github.com/PaperMC/Folia/issues/29
If the player moves out of range by the time the block is destroyed,
then the exception would throw and remove the player from the world
Additionally, when players fail to tick instead of removing
the player from the world, kick them to prevent a limbo state
The MinecraftServer executor will throw, and thanks
to CompletableFuture's awful exception handling
the exception was not logged or handled. Add
exception handling as well.
Fixes https://github.com/PaperMC/Folia/issues/27
The code to stop all brain tasks is required to pass the current
game time to the tasks it stops. But, when a villager is being
portalled, the copied entity does not have any running tasks. So,
we can simply return early before invoking getGameTime if there
are no running tasks.
Fixes https://github.com/PaperMC/Folia/issues/23
Now that there is no on-main task, the completion logic
for the status task is completed with the results passed
by the off-main task. Thus, the chunk system saw a non-null
throwable and assumed a fatal crash. The old on-main task
did not pass the throwable through in this case, which allowed
the chunk to re-generate.
Fixes https://github.com/PaperMC/Folia/issues/7
Before, it returned the center chunk section. Also, now instead
of approximating the center chunk from the allocated sections,
actually retrieve all chunks inside the region directly.
Now, entity/global/location schedulers implement a generic run,
runDelayed, and runAtFixedRate methods that provide a ScheduledTask
value that can be used to interact with the scheduled task.
Add also an async task scheduler that implements the same methods,
except the delays/periods are in time and not ticks, as the scheduler
is independent of the server tick process.
Additionally, throw on some unimplemented APIs now.
Mid-tick task execution acquires the ticket lock at least,
which can possibly be a significant performance bottleneck
at a high tick thread + region count. This change should reduce
the impact from scaling the region threads, but is not a fix
to the underlying issue.
The concurrent access occurred in the region merge/split logic,
as it did not own the ticket lock. To resolve this, we simply
make the ticket add/remove acquire the read lock of
the region lock.
Sometimes a region is empty, but only when the threaded regioniser
crashes. So, it should not prevent shutdown.
Additionally, only remove pending teleports after setting the
current region.
This will prevent a race condition where the region is cancelled
and immediately re-scheduled, and where this all happens while
the tick thread is blocking on tryMarkTicking.
Everyone had access to /tps for the private tests, but we cannot
trust everyone to use it responsibly for the public test
(i.e revealing coordinates of all logged in players).
In the case that the entity has a null callback, it means the
entity has not been added to the world - so, we should treat
it the same as entity#isRemoved.
This may be useful for plugins which want to perform operations
over large areas outside of the buffer zone provided by the
regionaliser, as it is not guaranteed that anything outside
of the buffer zone is owned. Then, the plugins may use
the schedulers depending on the result of the ownership
check.
This event allows plugins to perform synchronous operations before
any region will tick. Plugins will not have to worry about the
possibility of a region ticking in parallel while listening
to the event.
- Add thread check for loadChunk
- Make isChunkGenerated use the region task queue to schedule
to "main"
- Don't complete async chunk future if not in the owning thread
for the chunk
Plugins must add "folia-supported: true" to their plugin.yml
otherwise the server will refuse to load them.
Since Folia is a major breakage for plugins, the vast majority
of plugins will not function correctly on Folia. To prevent
user confusion from this, we will refuse to load the plugin
and provide a log indicating why - which will be much
more helpful than some random error log caused by
a breakage.
The generics pose a problem, and the parameter passed to the
Consumer is not needed in API.
Additionally, stop trying to cancel Bukkit scheduler tasks on
plugin disable as the Bukkit scheduler does not work.
We should only iterate over the local region's entities, not the
global entity list to set up the spawner state, as everything else
about the spawner state (player count / chunk count) is regionised.
Additionally, move the last spawner state to regionised state so that
paper mobcaps command functions as expected.
Additionally, process ticket updates as well if either
the mid tick logic did anything or whether we processed
any chunk tasks.
We process the mid tick logic at the start to be consistent
with the inbetween task execution logic (which is not implemented),
and we process ticket updates to ensure that any full status changes
are processed from chunk tasks.
The repeated I/O of creating the directory for the regionfile
or for checking if the file exists can be heavy in
when pushing chunk generation extremely hard - as each chunk gen
request may effectively go through to the I/O thread.
The softlock would occur when a dependency tree finished executing
all of its task and searched for the highest dependency tree
to queue tasks from, only to have that such tree be filled
with purged tasks. Because it would select an empty
tree to pull tasks from, it would not select another
tree to execute tasks from as this logic is done after a task
is executed.
The place/portal async function now track entities that have been
removed from the world but have not teleported. When the server
shuts down, these entities will have their passenger tree restored
and re-added to the entity slices at the location they were teleporting to,
or in the case of portals that did not run placeAsync yet,
the location they entered the portal on. This should ensure that
for regular teleports that the entity is placed at its correct
target location, and for portalling to ensure that either
the entity is placed at the portal entrace location (where
they entered) or the portal destination. In any case,
the entity is preserved in a location and will survive
the shutdown.
Additionally, move player saving until after the worlds save. This
is to ensure that the save logic is performed only after
all teleportations have completed.
Fix some other misc issues as well:
- Fix double nether portal creation by checking if a portal exists again
before creating it, fixing a race condition where two entites would portal
and neither would see that the other created a portal.
- Make all remove ticket add an unknown ticket.
In general this behavior is better since it means that unloads will only
ever occur at the next tick, rather than during the tick logic. Thus,
there will be no cases where a chunk is unloaded unexpectedly.
- Do not use fastFloor for calculating chunk position from block position
It is not going to return a good value outside of [-1024, 1024]
- Always perform mid tick update for ticking regionised player chunk loader
If no entities were loaded, no chunks were loaded, and nothing else -
the logic would not have otherwise ran. This fixed some rare cases of
chunks never loading for players after logging in.
We can just synchronise on all of the map data accesses, but
this means we need to be careful about ensuring that no
sync loads occur, otherwise we could block other threads for
long periods of time.
Namely, everything after FEATURES. By creating a dependency
chain indicating what chunks are in use, we can safely
schedule completely independent tasks in parallel. This
will allow the chunk system to scale beyond 10 threads
per world.
Currently this patch needs some more testing.
The simple solution is that we ignore entities/positions that are not
in the current region. Making retrieval of items in inventory
thread-safe is not going to happen.
This is so that it may be accessed concurrently
from many regions.
Additionally, make sure it does not leak memory by limiting
the number of entries it will cache.
Now, all of the sleep status changes are pushed to the global
tick thread. Had to modify the wake up all players routine
to use the task scheduler to ensure the player is woken up
on the right region context.
Fix erroring while crashing on the global tick thread due to
region field being null.
Now, the spawning should be running Vanilla logic; except
that it is calculated per region (which is what per player
was effectively achieving anyways).
Will be removed later once this is all public.
Fixed many issues with teleporting players on vehicles
while in the same region.
Additionaly, make sure dead players are dismounted - as
this logic was previously done by remove.
Re-add regiontodo.txt
Because game time is now on the global tick thread, we no longer
need adjustments for any block entity. But, just in case we do
updateTicks functions now.