Add project overview doc

Explains most important concepts, like data management, tick count handling, and teleportations
2025-03-11 13:36:04 +01:00 · 2023-03-28 18:08:37 -07:00 · 2023-03-28 18:08:37 -07:00 · ae66537bdc
commit ae66537bdc
parent 108dc2358b
2 changed files with 302 additions and 2 deletions
--- a/PROJECT_DESCRIPTION.md
+++ b/PROJECT_DESCRIPTION.md
@ -0,0 +1,301 @@
+# Project overview
+
+Described in this document is the abstract overview
+of changes done by Folia. Folia splits the chunks within all loaded worlds
+into independently ticking regions so that the regions are ticked
+independently and in parallel. Described first will be intra region
+operations, and then inter region operations.
+
+## Rules for independent regions
+
+In order to ensure that regions are independent, the rules for
+maintaining regions must ensure that a ticking region
+has no directly adjacent neighbour regions which are ticking.
+The following rules guarantee the invariant is upheld:
+1. Any ticking region may not grow while it is ticking.
+2. Any ticking region must initially own a small buffer of chunks outside
+   its perimeter.
+3. Regions may not _begin_ to tick if they have a neighbouring adjacent
+   region.
+4. Adjacent regions must eventually merge to form a single region.
+
+Additionally, to ensure that a region is not composed of independent regions
+(which would hinder parallelism), regions composed of more than
+one independent area must be eventually split into independent regions
+when possible.
+
+Finally, to ensure that ticking regions may store and maintain data
+about the current region (i.e tick count, entities within the region, chunks
+within the region, block/fluid tick lists, and more), regions have
+their own data object that may only be accessed while ticking the region and
+by the thread ticking the region. Also, there are callbacks to merging
+or splitting regions so that the data object may be updated appropriately.
+
+The implementation of these rules is described by [REGION_LOGIC.md](REGION_LOGIC.md).
+
+The end result of applying these rules is that a ticking region can ensure that
+only the current thread has write access to any data contained within the region,
+and that at any given time the number of independent regions is close to maximum.
+
+## Intra region operations
+
+Intra region operations refer to any operations that only deal with data
+for a single region by the owning region, or to merge/split logic.
+
+### Ticking for independent regions
+
+Independent regions tick independently and in parallel. To tick independently
+means that regions maintain their own deadlines for scheduling the next tick. For
+example, consider two regions A and B such that A's next tick start is at t=15ms
+and B's next tick start is at t=0ms. Consider the following sequence of events:
+1. At t = 0ms, B begins to tick.
+2. At t = 15ms, A begins to tick.
+3. At t = 20ms, B is finished its tick. It is then scheduled to tick again at t = 50ms.
+4. At t = 50ms, B begins its 2nd tick.
+5. At t = 70ms, B finishes its 2nd tick and is scheduled to tick again at t = 100ms.
+6. At t = 95ms, A finishes its _first_ tick. It is scheduled to tick again at t = 95ms.
+
+It is important to note that at no time was B's schedule affected by the fact that
+A fell behind its 20TPS target.
+
+To implement the described behavior, each region maintains a repeating
+task on a scheduled executor (See SchedulerThreadPool) that schedules
+tasks according to an earliest start time first scheduling algorithm. The
+algorithm is similar to EDF, but schedules according to start time. However,
+given that the deadline for each tick is 50ms + the start time, it behaves
+identically to the EDF algorithm.
+
+The EDF-like algorithm is selected so that as long as the thread pool is
+not maximally utilised, that all regions that take <= 50ms to tick will
+maintain 20TPS. However, the scheduling algorithm is neither NUMA aware
+nor CPU core aware - it will not make attempts (when n regions > m threads)
+to pin regions to certain cores.
+
+Since regions tick independently, they maintain their own tick counters. The
+implications of this are described in the next section.
+
+### Tick counters
+
+In standard Vanilla, there are several important tick counters: Current Tick,
+Game Time Tick, and Daylight Time Tick. The Current Tick counter is used
+for determining the tick number since the server has booted. The Game Time
+Tick is maintained per world and is used to schedule block ticks
+for redstone, fluids, and other physics events. The Daylight Time Tick
+is simply the number of ticks since noon, maintained per world.
+
+In Folia, the Current Tick is maintained per region. The Game Time Tick
+is split into two counters: Redstone Time and Global Game Time.
+Redstone Time is maintained per region. Global Game Time and
+Daylight Time are maintained by the "global region."
+
+At the start of each region tick, the global game time tick and
+daylight time tick are copied from the global region and any time
+the current region retrieves those values, it will retrieve from
+the copy received at the start of tick. This is to ensure that
+for any two calls to retrieve the tick number throughout the tick,
+that those two calls report the same tick number.
+
+The global game time is maintained for a couple of reasons:
+1. There needs to be a counter representing how many ticks a world
+   has existed for, since the game does track total number of days
+   the world has gone on for.
+2. Significant amounts of new entity AI code uses game time (for
+   a reason I cannot divine) to store absolute deadlines of tasks.
+   It is not impossible to write code to adjust the deadlines of
+   all of these tasks, but the amount of work is significant.
+
+#### Global region
+
+The global region is a single scheduled task that is always scheduled
+to run at 20TPS that is responsible for maintaining data that is not
+tied to any specific region: game rules, global game time, daylight time,
+console command handling, world border, weather, and others. Unlike the other
+regions, the global region does not need to perform any special logic
+for merging or splitting because it is never split or merged - there is
+only one global region at any time. The global region does not own
+any region specific data.
+
+#### Merging and splitting region tick times
+
+Since redstone and current ticks are maintained per region, there needs
+to be appropriate logic to adjust the tick deadlines used by the block/fluid
+tick scheduler and anything else that schedules by redstone/current
+absolute tick time so that the relative deadline is unaffected.
+
+When merging a region x (from) into a region y (into or to),
+we can either adjust both the deadlines of x and y or just one of x and y.
+It is simply easier to adjust one, and arbitrarily the region x is chosen.
+Then, the deadlines of x must be adjusted so that considering the current
+ticks of y that the relative deadlines remain unchanged.
+
+Consider a deadline d1 = from tick + relative deadline in region x.
+We then want the adjusted deadline d2 to be d2 = to tick + relative deadline
+in region y, so that the relative tick deadline is maintained. We can
+achieve this by applying an offset o to d1 so that d1 + o = d2, and the
+offset used is o = tick to - tick from. This offset must be calculated
+for redstone tick and current tick separately, since the logic to increase
+redstone tick can be turned off by the Level#tickTime field.
+
+Finally, the split case is easy - when a split occurs,
+the independent regions from the split inherit the redstone/current tick
+from the parent region. Thus, the relative deadlines are maintained as there
+is no tick number change.
+
+In all cases, redstone or any other events scheduled by current tick
+remain unaffected when regions split or merge as the relative deadline
+is maintained by applying an offset in the merge case and by copying
+the tick number in the split case.
+
+## Inter region operations
+
+Inter region refer to operations that work with other regions that are not
+the current ticking region that are in a completely unknown state. These
+regions may be transient, may be ticking, or may not even exist.
+
+### Utilities to assist operations
+
+In order to assist in inter region operations, several utilities are provided.
+In NMS, these utilities are the EntityScheduler, the RegionizedTaskQueue,
+the global region task queue, and the region-local data provider
+RegionizedData. The Folia API has similar analogues, but does not have
+a region-local data provider as the NMS data provider holds critical
+locks and is invoked in critical areas of code when performing any
+callback logic and is thus highly susceptible to fatal plugin errors
+involving lengthy I/O or world state modification.
+
+#### EntityScheduler
+
+The EntityScheduler allows tasks to be scheduled to be executed on the
+region that owns the entity. This is particularly useful when dealing
+with entity teleportation, as once an entity begins an asynchronous
+teleport the entity cannot tick until the teleport has completed, and
+the timing is undefined.
+
+#### RegionizedTaskQueue
+
+The RegionizedTaskQueue allows tasks to be scheduled to be executed on
+the next tick of a region that owns a specific location, or creating
+such region if it does not exist. This is useful for tasks that may
+need to edit or retrieve world/block/chunk data outside the current region.
+
+#### Global region task queue
+
+The global region task queue is simply used to perform edits on data
+that the global region owns, such as game rules, day time, weather,
+or to execute commands using the console command sender.
+
+#### RegionizedData
+
+The RegionizedData class allows regions to define region-local data,
+which allow regions to store data without having to consider concurrent
+data access from other regions. For example, current per region
+entity/chunk/block/fluid tick lists are maintained so that regions do not
+need to consider concurrent access to these data sets.
+
+<br></br>
+The utilities allow various cross-region issues to be resolved in a
+simple fashion, such as editing block/entity/world state from any region
+by using tasks queues, or by avoiding concurrency issues by using
+RegionizedData. More advanced operations such as teleportation,
+player respawning, and portalling, all make use of these utilities
+to ensure the operation is thread-safe.
+
+### Entity intra and inter dimension teleports
+
+Entities need special logic in order to teleport safely between
+other regions or other dimensions. In all cases however, the call to
+teleport/place an entity must be invoked on the region owning the entity.
+The EntityScheduler can be used to easily schedule code to execute in such
+a context.
+
+#### Simple teleportation
+
+In a simple teleportation, the entity already exists in a world at a location
+and the target location and dimension are known.
+This operation is split into two parts: transform and async place.
+In this case, the transform operation removes the entity from the current
+world, then adjusts the position. The async place operation schedules a task
+to the target location using the RegionizedTaskQueue to add the entity to
+the target dimension at the target position.
+
+The various implementation details such as non-player entities being
+copied in the transform operation are left out, as those are not relevant
+for the high level overview.
+
+Things such as player login and player respawn are generally
+considered simple teleportation. The player login case only differs
+since the player does not exist in any world at the start, and that the async
+transform must additionally find a place to spawn the player.
+The player respawn is similar to the player login as the respawn
+differs by having the player in the world at the time of respawn.
+
+#### Portal teleport
+
+Portal teleport differs from simple teleportation as portalling does
+_not_ know the exact location of the teleport. Thus, the transform step
+does not update the entity position, but rather a new operation is inserted
+between transform and async place: async search/create which is responsible
+for finding and/or creating the exit portal.
+
+Additionally, the current Vanilla code can refuse a portal if the
+entity is non-player and the nether exit portal does not already exist. But
+since the portal location is only determined by the async place, it is
+too late to abort - so, portal logic has been re-done so that there is no
+difference between players and entities. Now both entities and players
+create exit portals, whether it be for the nether or end.
+
+#### Shutdown during teleport
+
+Since the teleport happens over multiple steps, the server shutdown
+process must deal with uncompleted teleportations manually.
+
+## Server shutdown process
+
+The shutdown process occurs by spawning a separate shutdown thread,
+which then runs the shutdown logic:
+1. Shutdown the tick region scheduler, stopping any further ticks
+2. Halt metrics processing
+3. Disable plugins
+4. Stop accepting new connections
+5. Send disconnect (but do not remove) packets to all players
+6. Halt the chunk systems for all worlds
+7. Execute shutdown logic for all worlds by finish all pending teleports
+   for all regions, then saving all chunks in the world, and finally
+   saving the level data for the world (level.dat and other .dat files).
+8. Save all players
+9. Shutting down the resource manager
+10. Releasing the level lock
+11. Halting remaining executors (Util executor, region I/O threads, etc)
+
+
+The important differences to Vanilla is that the player kick and
+world saving logic is replaced by steps 5-8.
+
+For step 5, the players cannot be kicked before teleportations are finished,
+as kicking would save the player dat file. So, save is moved after.
+
+For step 6, the chunk system halt is done before saving so that all chunk
+generation is halted. This will reduce the load on the server as it shuts
+down, which may be critical in memory-constrained scenarios.
+
+For step 7, teleportations are completed differently depending on the type:
+simple or portal.
+
+Simple teleportations are completed by forcing
+the entity being teleported to be added to the entity chunk specified
+by the target location. This allows the entity to be saved at the target
+position, as if the teleportation did complete before shutdown.
+
+Portal teleportations are completed by forcing the entity being teleported
+to be added to the entity chunk specified from where the entity
+teleported _from_. Since the target location is not known, the entity
+can only be placed back at the origin. While this behavior is not ideal,
+the shutdown logic _must_ account for any broken world state - which means
+that finding or create the target exit portal may not be an option.
+
+The teleportation completion must be performed before the world save so that
+the teleport completed entities save.
+
+For step 8, only save players after the teleportations are completed.
+
+The remaining steps are Vanilla.
--- a/README.md
+++ b/README.md
@ -103,7 +103,7 @@ issues, which then become near impossible to debug.
 ### Current API additions

 - RegionisedScheduler and EntityScheduler acting as a replacement for 
-  the BukkitScheduler, however they are not yet fully featured.
+  the BukkitScheduler.

 ### Current broken API

@ -123,7 +123,6 @@ issues, which then become near impossible to debug.
  to implement some things like spawn position select, as asynchronous
  chunk loads are required when accessing chunk data out-of-region.
 - World loading/unloading
- TickThread#isTickThread overloads to API
 - More to come here

 ### Planned API changes