70 lines
2.3 KiB
ReStructuredText
70 lines
2.3 KiB
ReStructuredText
|
|
.. SPDX-License-Identifier: GPL-2.0
|
||
|
|
|
||
|
|
:Author: Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com>
|
||
|
|
|
||
|
|
==========
|
||
|
|
Swap Table
|
||
|
|
==========
|
||
|
|
|
||
|
|
Swap table implements swap cache as a per-cluster swap cache value array.
|
||
|
|
|
||
|
|
Swap Entry
|
||
|
|
----------
|
||
|
|
|
||
|
|
A swap entry contains the information required to serve the anonymous page
|
||
|
|
fault.
|
||
|
|
|
||
|
|
Swap entry is encoded as two parts: swap type and swap offset.
|
||
|
|
|
||
|
|
The swap type indicates which swap device to use.
|
||
|
|
The swap offset is the offset of the swap file to read the page data from.
|
||
|
|
|
||
|
|
Swap Cache
|
||
|
|
----------
|
||
|
|
|
||
|
|
Swap cache is a map to look up folios using swap entry as the key. The result
|
||
|
|
value can have three possible types depending on which stage of this swap entry
|
||
|
|
was in.
|
||
|
|
|
||
|
|
1. NULL: This swap entry is not used.
|
||
|
|
|
||
|
|
2. folio: A folio has been allocated and bound to this swap entry. This is
|
||
|
|
the transient state of swap out or swap in. The folio data can be in
|
||
|
|
the folio or swap file, or both.
|
||
|
|
|
||
|
|
3. shadow: The shadow contains the working set information of the swapped
|
||
|
|
out folio. This is the normal state for a swapped out page.
|
||
|
|
|
||
|
|
Swap Table Internals
|
||
|
|
--------------------
|
||
|
|
|
||
|
|
The previous swap cache is implemented by XArray. The XArray is a tree
|
||
|
|
structure. Each lookup will go through multiple nodes. Can we do better?
|
||
|
|
|
||
|
|
Notice that most of the time when we look up the swap cache, we are either
|
||
|
|
in a swap in or swap out path. We should already have the swap cluster,
|
||
|
|
which contains the swap entry.
|
||
|
|
|
||
|
|
If we have a per-cluster array to store swap cache value in the cluster.
|
||
|
|
Swap cache lookup within the cluster can be a very simple array lookup.
|
||
|
|
|
||
|
|
We give such a per-cluster swap cache value array a name: the swap table.
|
||
|
|
|
||
|
|
A swap table is an array of pointers. Each pointer is the same size as a
|
||
|
|
PTE. The size of a swap table for one swap cluster typically matches a PTE
|
||
|
|
page table, which is one page on modern 64-bit systems.
|
||
|
|
|
||
|
|
With swap table, swap cache lookup can achieve great locality, simpler,
|
||
|
|
and faster.
|
||
|
|
|
||
|
|
Locking
|
||
|
|
-------
|
||
|
|
|
||
|
|
Swap table modification requires taking the cluster lock. If a folio
|
||
|
|
is being added to or removed from the swap table, the folio must be
|
||
|
|
locked prior to the cluster lock. After adding or removing is done, the
|
||
|
|
folio shall be unlocked.
|
||
|
|
|
||
|
|
Swap table lookup is protected by RCU and atomic read. If the lookup
|
||
|
|
returns a folio, the user must lock the folio before use.
|