878 lines
31 KiB
ReStructuredText
878 lines
31 KiB
ReStructuredText
P9 XIVE Exploitation
|
|
====================
|
|
|
|
.. _xive-device-tree:
|
|
|
|
I - Device-tree updates
|
|
-----------------------
|
|
|
|
1) The existing OPAL ``/interrupt-controller@0`` node remains
|
|
|
|
This node represents both the emulated XICS source controller and
|
|
an abstraction of the virtualization engine. This represents the
|
|
fact thet OPAL set_xive/get_xive functions are still supported
|
|
though they don't provide access to the full functionality.
|
|
|
|
It is still the parent of all interrupts in the device-tree.
|
|
|
|
New or modified properties:
|
|
|
|
- ``compatible`` : This is extended with a new value ``ibm,opal-xive-vc``
|
|
|
|
|
|
2) The new ``/interrupt-controller@<addr>`` node
|
|
|
|
This node represents both the emulated XICS presentation controller
|
|
and the new XIVE presentation layer.
|
|
|
|
Unlike the traditional XICS, there is only one such node for the whole
|
|
system.
|
|
|
|
New or modified properties:
|
|
|
|
- ``compatible`` : This contains at least the following strings:
|
|
|
|
- ``ibm,opal-intc`` : This represents the emulated XICS presentation
|
|
facility and might be the only property present if the version of
|
|
OPAL doesn't support XIVE exploitation.
|
|
- ``ibm,opal-xive-pe`` : This represents the XIVE presentation
|
|
engine.
|
|
|
|
- ``ibm,xive-eq-sizes`` : One cell per size supported, contains log2
|
|
of size, in ascending order.
|
|
|
|
- ``ibm,xive-#priorities`` : One cell, the number of supported priorities
|
|
(the priorities will be 0...n)
|
|
|
|
- ``ibm,xive-provision-page-size`` : Page size (in bytes) of the pages to
|
|
pass to OPAL for provisioning internal structures
|
|
(see opal_xive_donate_page). If this is absent, OPAL will never require
|
|
additional provisioning. The page must be naturally aligned.
|
|
|
|
- ``ibm,xive-provision-chips`` : The list of chip IDs for which provisioning
|
|
is required. Typically, if a VP allocation return OPAL_XIVE_PROVISIONING,
|
|
opal_xive_donate_page() will need to be called to donate a page to
|
|
*each* of these chips before trying again.
|
|
|
|
- ``reg`` property contains the addresses & sizes for the register
|
|
ranges corresponding respectively to the 4 rings:
|
|
|
|
- Ultravisor level
|
|
- Hypervisor level
|
|
- Guest OS level
|
|
- User level
|
|
|
|
For any of these, a size of 0 means this level is not supported.
|
|
|
|
- ``single-escalation-support`` (option). When present, indicatges that
|
|
the "single escalation" feature is supported, thus enabling the use
|
|
of the OPAL_XIVE_VP_SINGLE_ESCALATION flag.
|
|
|
|
3) Interrupt descriptors
|
|
|
|
The interrupt descriptors (aka "interrupts" properties and parts
|
|
of "interrupt-map" properties) remain 2 cells. The first cell is
|
|
a global interrupt number which represents a unique interrupt
|
|
source in the system and is an abstraction provided by OPAL.
|
|
|
|
The default configuration for all sources in the IVT/EAS is to
|
|
issue that number (it's internally a combination of the source
|
|
chip and per-chip interrupt number but the details of that
|
|
combination are not exposed and subject to change).
|
|
|
|
The second cell remains as usual "0" for an edge interrupt and
|
|
"1" for a level interrupts.
|
|
|
|
4) IPIs
|
|
|
|
Each ``cpu`` node now contains an ``interrupts`` property which has
|
|
one entry (2 cells per entry) for each thread on that core
|
|
containing the interrupt number for the IPI targeted at that
|
|
thread.
|
|
|
|
5) Interrupt targets
|
|
|
|
Targetting of interrupts uses processor targets and priority
|
|
numbers. The processor target encoding depends on which API is
|
|
used:
|
|
|
|
- The legacy opal_set/get_xive() APIs only support the old
|
|
"mangled" (ie. shifted by 2) HW processor numbers.
|
|
|
|
- The new opal_xive_set/get_irq_config API (and other
|
|
exploitation mode APIs) use a "token" VP number which is
|
|
described in II-2. Unmodified HW processor numbers are valid
|
|
VP numbers for those APIs.
|
|
|
|
II - General operations
|
|
-----------------------
|
|
|
|
Most configuration operations are abstracted via OPAL calls, there is
|
|
no direct access or exposure of such things as real HW interrupt or VP
|
|
numbers.
|
|
|
|
OPAL sets up all the physical interrupts and assigns them numbers, it
|
|
also allocates enough virtual interrupts to provide an IPI per physical
|
|
thread in the system.
|
|
|
|
All interrupts are pre-configured masked and must be set to an explicit
|
|
target before first use. The default interrupt number is programmed
|
|
in the EAS and will remain unchanged if the targetting/unmasking is
|
|
done using the legacy set_xive() interface.
|
|
|
|
An interrupt "target" is a combination of a target processor number
|
|
and a priority.
|
|
|
|
Processor numbers are in a single domain that represents both the
|
|
physical processors and any virtual processor or group allocated
|
|
using the interfaces defined in this specification. These numbers
|
|
are an OPAL maintained abstraction and are only partially related
|
|
to the real VP numbers:
|
|
|
|
In order to maintain the grouping ability, when VPs are allocated
|
|
in blocks of naturally aligned powers of 2, the underlying HW
|
|
numbers will respect this alignment.
|
|
|
|
.. note:: The block group mode extension makes the numbering scheme
|
|
a bit more tricky than simple powers of two however, see below.
|
|
|
|
|
|
1) Interrupt numbering and allocation
|
|
|
|
As specified in the device-tree definition, interrupt numbers
|
|
are abstracted by OPAL to be a 30-bit number. All HW interrupts
|
|
are "allocated" and configured at boot time along with enough
|
|
IPIs for all processor threads.
|
|
|
|
Additionally, in order to be compatible with the XICS emulation,
|
|
all interrupt numbers present in the device-tree (ie all physical
|
|
sources or pre-allocated IPIs) will fit within a 24-bit number
|
|
space.
|
|
|
|
Interrupt sources that are only usable in exploitation mode, such
|
|
as escalation interrupts, can have numbers covering the full 30-bit
|
|
range. The same is true of interrupts allocated dynamically.
|
|
|
|
The hypervisor can allocate additional blocks of interrupts,
|
|
in which case OPAL will return the resulting abstracted global
|
|
numbers. They will have to be individually configured to map
|
|
to a given number at the target and be routed to a given target
|
|
and priority using opal_xive_set_irq_config(). This call is
|
|
semantically equivalent to the old opal_set_xive() which is
|
|
still supported with the addition that opal_xive_set_irq_config()
|
|
can also specify the logical interrupt number.
|
|
|
|
2) VP numbering and allocation
|
|
|
|
A VP number is a 64-bit number. The internal make-up of that number
|
|
is opaque to the OS. However, it is a discrete integer that will
|
|
be a naturally aligned power of two when allocating a chunk of
|
|
VPs representing the "base" number of that chunk, the OS will do
|
|
basic arithmetic to get to all the VPs in the range.
|
|
|
|
Groups, when supported, will also be numbers in that space.
|
|
|
|
The physical processors numbering uses the same number space.
|
|
|
|
The underlying HW VP numbering is hidden from the OS, the APIs
|
|
uses the system processor numbers as presented in the
|
|
``ibm,ppc-interrupt-server#s`` which corresponds to the PIR register
|
|
content to represent physical processors within the same number
|
|
space as dynamically allocated VPs.
|
|
|
|
.. note:: Note about block group mode:
|
|
|
|
The block group mode shall as much as possible be handled
|
|
transparently by OPAL.
|
|
|
|
For example, on a 2-chips machine, a request to allocate
|
|
2^n VPs might result in an allocation of 2^(n-1) VPs per
|
|
chip allocated accross 2 chips. The resulting VP numbers
|
|
will encode the order of the allocation allowing OPAL to
|
|
reconstitute which bits are the block ID bits and which bits
|
|
are the index bits in a way transparent to the OS. The overall
|
|
range of numbers passed to Linux will still be contiguous.
|
|
|
|
That implies however a limitation: We can only allocate within
|
|
power-of-two number of blocks. Thus the VP allocator will limit
|
|
itself to the largest power of two that can fit in the number
|
|
of available chips in the machine: A machine with 3 good chips
|
|
will only be able to allocate VPs from 2 of them.
|
|
|
|
3) Group numbering and allocation
|
|
|
|
The group numbers are in the *same* number space as the VP
|
|
numbers. OPAL will internally use some bits of the VP number
|
|
to encode the group geometry.
|
|
|
|
[TBD] OPAL may or may not allocate a default group of all physical
|
|
processors, per-chip groups or per-core groups. This will be
|
|
represented in the device-tree somewhat...
|
|
|
|
[TBD] OPAL will provide interfaces for allocating groups
|
|
|
|
|
|
.. note:: Note about P/Q bit operation on sources:
|
|
|
|
opal_xive_get_irq_info() returns a certain number of flags
|
|
which define the type of operation supported. The following
|
|
rules apply based on what those flags say:
|
|
|
|
- The Q bit isn't functional on an LSI interrupt. There is no
|
|
garantee that the special combination "01" will work for an
|
|
LSI (and in fact it will not work on the PHB LSIs). However
|
|
just setting P to 1 is sufficient to mask an LSI (just don't
|
|
EOI it while masked).
|
|
|
|
- The recommended setting for a masked interrupt that is
|
|
temporarily masked by a driver is "10". This means a new
|
|
occurrence while masked will be recorded and a "StoreEOI"
|
|
will replay it appropriately.
|
|
|
|
|
|
III - Event queues
|
|
------------------
|
|
|
|
Each virtual processor or group has a certain number of event queues
|
|
associated with it. Each correspond to a given priority. The number
|
|
of supported priorities is provided in the device-tree
|
|
(``ibm,xive-#priorities`` property of the xive node).
|
|
|
|
By default, OPAL populates at least one queue for every physical thread
|
|
in the system. The number of queues and the size used is implementation
|
|
specific. If the OS wants to re-use these to save memory, it can query
|
|
the VP configuration.
|
|
|
|
The opal_xive_get_queue_info() and opal_xive_set_queue_info() can be used
|
|
to query a queue configuration (ie, to obtain the current page and size
|
|
for the queue itself, but also to collect some configuration flags for
|
|
that queue such as whether it coalesces notifications etc...) and to
|
|
obtain the MMIO address of the queue EOI page (in the case where
|
|
coalescing is enabled).
|
|
|
|
IV - OPAL APIs
|
|
--------------
|
|
|
|
.. warning:: *All* the calls listed below may return OPAL_BUSY unless
|
|
explicitely documented not to. In that case, the call
|
|
should be performed again. The OS is allowed to insert a
|
|
delay though no minimum nor maxmimum delay is specified.
|
|
This will typically happen when performing cache update
|
|
operations in the XIVE, if they result in a collision.
|
|
|
|
.. warning:: Calls that are expected to be called at runtime
|
|
simultaneously without conflicts such as getting/setting
|
|
IRQ info or queue info are fine to do so concurrently.
|
|
|
|
However, there is no internal locking to prevent races
|
|
between things such as freeing a VP block and getting/setting
|
|
queue infos on that block.
|
|
|
|
These aren't fully specified (yet) but common sense shall
|
|
apply.
|
|
|
|
.. _OPAL_XIVE_RESET:
|
|
|
|
OPAL_XIVE_RESET
|
|
^^^^^^^^^^^^^^^
|
|
.. code-block:: c
|
|
|
|
int64_t opal_xive_reset(uint64_t version)
|
|
|
|
The OS should call this once when starting up to re-initialize the
|
|
XIVE hardware and the OPAL XIVE related state back to all defaults.
|
|
|
|
It can call it a second time before handing over to another (ie.
|
|
kexec) to re-enable XICS emulation.
|
|
|
|
The "version" argument should be set to 1 to enable the XIVE
|
|
exploitation mode APIs or 0 to switch back to the default XICS
|
|
emulation mode.
|
|
|
|
Future versions of OPAL might allow higher versions than 1 to
|
|
represent newer versions of this API. OPAL will return an error
|
|
if it doesn't recognize the requested version.
|
|
|
|
Any page of memory that the OS has "donated" to OPAL, either backing
|
|
store for EQDs or VPDs or actual queue buffers will be removed from
|
|
the various HW maps and can be re-used by the OS or freed after this
|
|
call regardless of the version information. The HW will be reset to
|
|
a (mostly) clean state.
|
|
|
|
It is the responsibility of the caller to ensure that no other
|
|
XIVE or XICS emulation call happens simultaneously to this. This
|
|
basically should happen on an otherwise quiescent system. In the
|
|
case of kexec, it is recommended that all processors CPPR is lowered
|
|
first.
|
|
|
|
.. note:: This call always executes fully synchronously, never returns
|
|
OPAL_BUSY and will work regardless of whether VPs and EQs are left
|
|
enabled or disabled. It *will* spend a significant amount of time
|
|
inside OPAL and as such is not suitable to be performed during normal
|
|
runtime.
|
|
|
|
.. _OPAL_XIVE_GET_IRQ_INFO:
|
|
|
|
OPAL_XIVE_GET_IRQ_INFO
|
|
^^^^^^^^^^^^^^^^^^^^^^
|
|
.. code-block:: c
|
|
|
|
int64_t opal_xive_get_irq_info(uint32_t girq,
|
|
uint64_t *out_flags,
|
|
uint64_t *out_eoi_page,
|
|
uint64_t *out_trig_page,
|
|
uint32_t *out_esb_shift,
|
|
uint32_t *out_src_chip);
|
|
|
|
Returns info about an interrupt source. This call never returns
|
|
OPAL_BUSY.
|
|
|
|
* out_flags returns a set of flags. The following flags
|
|
are defined in the API (some bits are reserved, so any bit
|
|
not defined here should be ignored):
|
|
|
|
- OPAL_XIVE_IRQ_TRIGGER_PAGE
|
|
|
|
Indicate that the trigger page is a separate page. If that
|
|
bit is clear, there is either no trigger page or the trigger
|
|
can be done in the same page as the EOI, see below.
|
|
|
|
- OPAL_XIVE_IRQ_STORE_EOI
|
|
|
|
Indicates that the interrupt supports the "Store EOI" option,
|
|
ie a store to the EOI page will move Q into P and retrigger
|
|
if the resulting P bit is 1. If this flag is 0, then a store
|
|
to the EOI page will do a trigger if OPAL_XIVE_IRQ_TRIGGER_PAGE
|
|
is also 0.
|
|
|
|
- OPAL_XIVE_IRQ_LSI
|
|
|
|
Indicates that the source is a level sensitive source and thus
|
|
doesn't have a functional Q bit. The Q bit may or may not be
|
|
implemented in HW but SW shouldn't rely on it doing anything.
|
|
|
|
- OPAL_XIVE_IRQ_SHIFT_BUG
|
|
|
|
Indicates that the source has a HW bug that shifts the bits
|
|
of the "offset" inside the EOI page left by 4 bits. So when
|
|
this is set, us 0xc000, 0xd000... instead of 0xc00, 0xd00...
|
|
as offets in the EOI page.
|
|
|
|
- OPAL_XIVE_IRQ_MASK_VIA_FW
|
|
|
|
Indicates that a FW call is needed (either opal_set_xive()
|
|
or opal_xive_set_irq_config()) to succesfully mask and unmask
|
|
the interrupt. The operations via the ESB page aren't fully
|
|
functional.
|
|
|
|
- OPAL_XIVE_IRQ_EOI_VIA_FW
|
|
|
|
Indicates that a FW call to opal_xive_eoi() is needed to
|
|
successfully EOI the interrupt. The operation via the ESB page
|
|
isn't fully functional.
|
|
|
|
* out_eoi_page and out_trig_page outputs will be set to the
|
|
EOI page physical address (always) and the trigger page address
|
|
(if it exists).
|
|
The trigger page may exist even if OPAL_XIVE_IRQ_TRIGGER_PAGE
|
|
is not set. In that case out_trig_page is equal to out_eoi_page.
|
|
If the trigger page doesn't exist, out_trig_page is set to 0.
|
|
|
|
* out_esb_shift contains the size (as an order, ie 2^n) of the
|
|
EOI and trigger pages. Current supported values are 12 (4k)
|
|
and 16 (64k). Those cannot be configured by the OS and are set
|
|
by firmware but can be different for different interrupt sources.
|
|
|
|
* out_src_chip will be set to the chip ID of the HW entity this
|
|
interrupt is sourced from. It's meant to be informative only
|
|
and thus isn't guaranteed to be 100% accurate. The idea is for
|
|
the OS to use that to pick up a default target processor on
|
|
the same chip.
|
|
|
|
.. _OPAL_XIVE_EOI:
|
|
|
|
OPAL_XIVE_EOI
|
|
^^^^^^^^^^^^^
|
|
|
|
.. code-block:: c
|
|
|
|
int64_t opal_xive_eoi(uint32_t girq);
|
|
|
|
Performs an EOI on the interrupt. This should only be called if
|
|
OPAL_XIVE_IRQ_EOI_VIA_FW is set as otherwise direct ESB access
|
|
is preferred.
|
|
|
|
.. note:: This is the *same* opal_xive_eoi() call used by OPAL XICS
|
|
emulation. However the XIRR parameter is re-purposed as "GIRQ".
|
|
|
|
The call will perform the appropriate function depending on
|
|
whether OPAL is in XICS emulation mode or native XIVE exploitation
|
|
mode.
|
|
|
|
.. _OPAL_XIVE_GET_IRQ_CONFIG:
|
|
|
|
OPAL_XIVE_GET_IRQ_CONFIG
|
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
|
.. code-block:: c
|
|
|
|
int64_t opal_xive_get_irq_config(uint32_t girq, uint64_t *out_vp,
|
|
uint8_t *out_prio, uint32_t *out_lirq);
|
|
|
|
Returns current the configuration of an interrupt source. This is
|
|
the equivalent of opal_get_xive() with the addition of the logical
|
|
interrupt number (the number that will be presented in the queue).
|
|
|
|
* girq: The interrupt number to get the configuration of as
|
|
provided by the device-tree.
|
|
|
|
* out_vp: Will contain the target virtual processor where the
|
|
interrupt is currently routed to. This can return 0xffffffff
|
|
if the interrupt isn't routed to a valid virtual processor.
|
|
|
|
* out_prio: Will contain the priority of the interrupt or 0xff
|
|
if masked
|
|
|
|
* out_lirq: Will contain the logical interrupt assigned to the
|
|
interrupt. By default this will be the same as girq.
|
|
|
|
.. _OPAL_XIVE_SET_IRQ_CONFIG:
|
|
|
|
OPAL_XIVE_SET_IRQ_CONFIG
|
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
|
.. code-block:: c
|
|
|
|
int64_t opal_xive_set_irq_config(uint32_t girq, uint64_t vp, uint8_t prio,
|
|
uint32_t lirq);
|
|
|
|
This allows configuration and routing of a hardware interrupt. This is
|
|
equivalent to opal_set_xive() with the addition of the ability to
|
|
configure the logical IRQ number (the number that will be presented
|
|
in the target queue).
|
|
|
|
* girq: The interrupt number to configure of as provided by the
|
|
device-tree.
|
|
|
|
* vp: The target virtual processor. The target VP/Prio combination
|
|
must already exist, be enabled and populated (ie, a queue page must
|
|
be provisioned for that queue).
|
|
|
|
* prio: The priority of the interrupt.
|
|
|
|
* lirq: The logical interrupt number assigned to that interrupt
|
|
|
|
.. note:: Note about masking:
|
|
|
|
If the prio is set to 0xff, this call will cause the interrupt to
|
|
be masked (*). This function will not clobber the source P/Q bits (**).
|
|
It will however set the IVT/EAS "mask" bit if the prio passed
|
|
is 0xff which means that interrupt events from the ESB will be
|
|
discarded, potentially leaving the ESB in a stale state. Thus
|
|
care must be taken by the caller to "cleanup" the ESB state
|
|
appropriately before enabling an interrupt with this.
|
|
|
|
(*) Escalation interrupts cannot be masked via this function
|
|
|
|
(**) The exception to this rule is interrupt sources that have
|
|
the OPAL_XIVE_IRQ_MASK_VIA_FW flag set. For such sources, the OS
|
|
should make no assumption as to the state of the ESB and this
|
|
function *will* perform all the necessary masking and unmasking.
|
|
|
|
.. note:: This call contains an implicit opal_xive_sync() of the interrupt
|
|
source (see OPAL_XIVE_SYNC below)
|
|
|
|
It is recommended for an OS exploiting the XIVE directly to not use
|
|
this function for temporary driver-initiated masking of interrupts
|
|
but to directly mask using the P/Q bits of the source instead.
|
|
|
|
Masking using this function is intended for the case where the OS has
|
|
no handler registered for a given interrupt anymore or when registering
|
|
a new handler for an interrupt that had none. In these case, losing
|
|
interrupts happening while no handler was attached is considered fine.
|
|
|
|
.. _OPAL_XIVE_GET_QUEUE_INFO:
|
|
|
|
OPAL_XIVE_GET_QUEUE_INFO
|
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
|
.. code-block:: c
|
|
|
|
int64_t opal_xive_get_queue_info(uint64_t vp, uint32_t prio,
|
|
uint64_t *out_qpage,
|
|
uint64_t *out_qsize,
|
|
uint64_t *out_qeoi_page,
|
|
uint32_t *out_escalate_irq,
|
|
uint64_t *out_qflags);
|
|
|
|
This returns informations about a given interrupt queue associated
|
|
with a virtual processor and a priority.
|
|
|
|
* out_qpage: will contain the physical address of the page where the
|
|
interrupt events will be posted or 0 if none has been configured
|
|
yet.
|
|
|
|
* out_qsize: will contain the log2 of the size of the queue buffer
|
|
or 0 if the queue hasn't been populated. Example: 12 for a 4k page.
|
|
|
|
* out_qeoi_page: will contain the physical address of the MMIO page
|
|
used to perform EOIs for the queue notifications.
|
|
|
|
* out_escalate_irq: will contain a girq number for the escalation
|
|
interrupt associated with that queue.
|
|
|
|
.. warning:: The "escalate_irq" is a special interrupt number, depending
|
|
on the implementation it may or may not correspond to a normal
|
|
XIVE source. Those interrupts have no triggers, and will not
|
|
be masked by opal_set_irq_config() with a prio of 0xff.
|
|
|
|
..note:: The state of the OPAL_XIVE_VP_SINGLE_ESCALATION flag passed to
|
|
opal_xive_set_vp_info() can change the escalation irq number,
|
|
so make sure you only retrieve this after having set the flag
|
|
to the desired value. When set, all priorities will have the
|
|
same escalation interrupt.
|
|
|
|
* out_qflags: will contain flags defined as follow:
|
|
|
|
- OPAL_XIVE_EQ_ENABLED
|
|
|
|
This must be set for the queue to be enabled and thus a valid
|
|
target for interrupts. Newly allocated queues are disabled by
|
|
default and must be disabled again before being freed (allocating
|
|
and freeing of queues currently only happens along with their
|
|
owner VP).
|
|
|
|
.. note:: A newly enabled queue will have the generation set to 1
|
|
and the queue pointer to 0. If the OS wants to "reset" a queue
|
|
generation and pointer, it thus must disable and re-enable
|
|
the queue.
|
|
|
|
- OPAL_XIVE_EQ_ALWAYS_NOTIFY
|
|
|
|
When this is set, the HW will always notify the VP on any new
|
|
entry in the queue, thus the queue own P/Q bits won't be relevant
|
|
and using the EOI page will be unnecessary.
|
|
|
|
- OPAL_XIVE_EQ_ESCALATE
|
|
|
|
When this is set, the EQ will escalate to the escalation interrupt
|
|
when failing to notify.
|
|
|
|
.. _OPAL_XIVE_SET_QUEUE_INFO:
|
|
|
|
OPAL_XIVE_SET_QUEUE_INFO
|
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
|
.. code-block:: c
|
|
|
|
int64_t opal_xive_set_queue_info(uint64_t vp, uint32_t prio,
|
|
uint64_t qpage,
|
|
uint64_t qsize,
|
|
uint64_t qflags);
|
|
|
|
This allows the OS to configure the queue page for a given processor
|
|
and priority and adjust the behaviour of the queue via flags.
|
|
|
|
* qpage: physical address of the page where the interrupt events will
|
|
be posted. This has to be naturally aligned.
|
|
|
|
* qsize: log2 of the size of the above page. A 0 here will disable
|
|
the queue.
|
|
|
|
* qflags: Flags (see definitions in opal_xive_get_queue_info)
|
|
|
|
.. note:: This call will reset the generation bit to 1 and the queue
|
|
production pointer to 0.
|
|
|
|
.. note:: The PQ bits of the escalation interrupts and of the queue
|
|
notification will be set to 00 when OPAL_XIVE_EQ_ENABLED is
|
|
set, and to 01 (masked) when disabling it.
|
|
|
|
.. note:: This must be called at least once on a queue with the flag
|
|
OPAL_XIVE_EQ_ENABLED in order to enable it after it has been
|
|
allocated (along with its owner VP).
|
|
|
|
.. note:: When the queue is disabled (flag OPAL_XIVE_EQ_ENABLED cleared)
|
|
all other flags and arguments are ignored and the queue
|
|
configuration is wiped.
|
|
|
|
.. _OPAL_XIVE_DONATE_PAGE:
|
|
|
|
OPAL_XIVE_DONATE_PAGE
|
|
^^^^^^^^^^^^^^^^^^^^^
|
|
.. code-block:: c
|
|
|
|
int64_t opal_xive_donate_page(uint32_t chip_id, uint64_t addr);
|
|
|
|
This call is used to donate pages to OPAL for use by VP/EQ provisioning.
|
|
|
|
The pages must be of the size specified by the "ibm,xive-provision-page-size"
|
|
property and naturally aligned.
|
|
|
|
All donated pages are forgotten by OPAL (and thus returned to the OS)
|
|
on any call to opal_xive_reset().
|
|
|
|
The chip_id should be the chip on which the pages were allocated or -1
|
|
if unspecified. Ideally, when a VP allocation request fails with the
|
|
OPAL_XIVE_PROVISIONING error, the OS should allocate one such page
|
|
for each chip in the system and hand it to OPAL before trying again.
|
|
|
|
.. note:: It is possible that the provisioning ends up requiring more than
|
|
one page per chip. OPAL will keep returning the above error until
|
|
enough pages have been provided.
|
|
|
|
.. _OPAL_XIVE_ALLOCATE_VP_BLOCK:
|
|
|
|
OPAL_XIVE_ALLOCATE_VP_BLOCK
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
.. code-block:: c
|
|
|
|
int64_t opal_xive_alloc_vp_block(uint32_t alloc_order);
|
|
|
|
This call is used to allocate a block of VPs. It will return a number
|
|
representing the base of the block which will be aligned on the alloc
|
|
order, allowing the OS to do basic arithmetic to index VPs in the block.
|
|
|
|
The VPs will have queue structures reserved (but not initialized nor
|
|
provisioned) for all the priorities defined in the "ibm,xive-#priorities"
|
|
property
|
|
|
|
This call might return OPAL_XIVE_PROVISIONING. In this case, the OS
|
|
must allocate pages and provision OPAL using opal_xive_donate_page(),
|
|
see the documentation for opal_xive_donate_page() for details.
|
|
|
|
The resulting VPs must be individudally enabled with opal_xive_set_vp_info
|
|
below with the OPAL_XIVE_VP_ENABLED flag set before use.
|
|
|
|
For all priorities, the corresponding queues must also be individually
|
|
provisioned and enabled with opal_xive_set_queue_info.
|
|
|
|
.. _OPAL_XIVE_FREE_VP_BLOCK:
|
|
|
|
OPAL_XIVE_FREE_VP_BLOCK
|
|
^^^^^^^^^^^^^^^^^^^^^^^
|
|
.. code-block:: c
|
|
|
|
int64_t opal_xive_free_vp_block(uint64_t vp);
|
|
|
|
This call is used to free a block of VPs. It must be called with the same
|
|
*base* number as was returned by opal_xive_alloc_vp() (any index into the
|
|
block will result in an OPAL_PARAMETER error).
|
|
|
|
The VPs must have been previously all disabled with opal_xive_set_vp_info
|
|
below with the OPAL_XIVE_VP_ENABLED flag cleared before use.
|
|
|
|
All the queues must also have been disabled.
|
|
|
|
Failure to do any of the above will result in an OPAL_XIVE_FREE_ACTIVE error.
|
|
|
|
.. _OPAL_XIVE_GET_VP_INFO:
|
|
|
|
OPAL_XIVE_GET_VP_INFO
|
|
^^^^^^^^^^^^^^^^^^^^^
|
|
.. code-block:: c
|
|
|
|
int64_t opal_xive_get_vp_info(uint64_t vp,
|
|
uint64_t *flags,
|
|
uint64_t *cam_value,
|
|
uint64_t *report_cl_pair,
|
|
uint32_t *chip_id);
|
|
|
|
This call returns information about a VP:
|
|
|
|
* flags:
|
|
|
|
- OPAL_XIVE_VP_ENABLED
|
|
|
|
Returns the enabled state of the VP
|
|
|
|
- OPAL_XIVE_VP_SINGLE_ESCALATION (if available)
|
|
|
|
Returns whether single escalation mode is enabled for this VP
|
|
(see opal_xive_set_vp_info()).
|
|
|
|
* cam_value: This is the value to program into the thread management
|
|
area to dispatch that VP (ie, an encoding of the block + index).
|
|
|
|
* report_cl_pair: This is the real address of the reporting cache line
|
|
pair for that VP (defaults to 0, ie disabled)
|
|
|
|
* chip_id: The chip that VCPU was allocated on
|
|
|
|
.. _OPAL_XIVE_SET_VP_INFO:
|
|
|
|
OPAL_XIVE_SET_VP_INFO
|
|
^^^^^^^^^^^^^^^^^^^^^
|
|
.. code-block:: c
|
|
|
|
int64_t opal_xive_set_vp_info(uint64_t vp,
|
|
uint64_t flags,
|
|
uint64_t report_cl_pair);
|
|
|
|
This call configures a VP:
|
|
|
|
* flags:
|
|
|
|
- OPAL_XIVE_VP_ENABLED
|
|
|
|
This must be set for the VP to be usable and cleared before freeing it.
|
|
|
|
.. note:: This can be used to disable the boot time VPs though this
|
|
isn't recommended. This must be used to enable allocated VPs.
|
|
|
|
- OPAL_XIVE_VP_SINGLE_ESCALATION (if available)
|
|
|
|
If this is set, the queues are configured such that all priorities
|
|
turn into a single escalation interrupt. This results in the loss of
|
|
priority 7 which can no longer be used. This this needs to be set
|
|
before any interrupt is routed to that priority and queue 7 must not
|
|
have been already enabled.
|
|
|
|
This feature is available if the "single-escalation-property" is
|
|
present in the xive device-tree node.
|
|
|
|
.. warning:: When enabling single escalation, and pre-existing routing
|
|
and configuration of the individual queues escalation
|
|
is lost (except queue 7 which is the new merged escalation).
|
|
When further disabling it, the previous value is not
|
|
retrieved and the field cleared, escalation is disabled on
|
|
all the queues.
|
|
|
|
* report_cl_pair: This is the real address of the reporting cache line
|
|
pair for that VP or 0 to disable.
|
|
|
|
.. note:: When disabling a VP, all other VP settings are lost.
|
|
|
|
.. _OPAL_XIVE_ALLOCATE_IRQ:
|
|
|
|
OPAL_XIVE_ALLOCATE_IRQ
|
|
^^^^^^^^^^^^^^^^^^^^^^
|
|
.. code-block:: c
|
|
|
|
int64_t opal_xive_allocate_irq(uint32_t chip_id);
|
|
|
|
This call allocates a software IRQ on a given chip. It returns the
|
|
interrupt number or a negative error code.
|
|
|
|
.. _OPAL_XIVE_FREE_IRQ:
|
|
|
|
OPAL_XIVE_FREE_IRQ
|
|
^^^^^^^^^^^^^^^^^^
|
|
.. code-block:: c
|
|
|
|
int64_t opal_xive_free_irq(uint32_t girq);
|
|
|
|
This call frees a software IRQ that was allocated by
|
|
opal_xive_allocate_irq. Passing any other interrupt number
|
|
will result in an OPAL_PARAMETER error.
|
|
|
|
.. _OPAL_XIVE_SYNC:
|
|
|
|
OPAL_XIVE_SYNC
|
|
^^^^^^^^^^^^^^
|
|
.. code-block:: c
|
|
|
|
int64_t opal_xive_sync(uint32_t type, uint32_t id);
|
|
|
|
This call is uses to synchronize some HW queues to ensure various changes
|
|
have taken effect to the point where their effects are visible to the
|
|
processor.
|
|
|
|
* type: Type of synchronization:
|
|
|
|
- XIVE_SYNC_EAS: Synchronize a source. "id" is the girq number of the
|
|
interrupt. This will ensure that any change to the PQ bits or the
|
|
interrupt targetting has taken effect.
|
|
|
|
- XIVE_SYNC_QUEUE: Synchronize a target queue. "id" is the girq number
|
|
of the interrupt. This will ensure that any previous occurrence of the
|
|
interrupt has reached the in-memory queue and is visible to the processor.
|
|
|
|
.. note:: XIVE_SYNC_EAS and XIVE_SYNC_QUEUE can be used together
|
|
(ie. XIVE_SYNC_EAS | XIVE_SYNC_QUEUE) to completely synchronize
|
|
the path of an interrupt to its queue.
|
|
|
|
* id: Depends on the synchronization type, see above
|
|
|
|
.. _OPAL_XIVE_DUMP:
|
|
|
|
OPAL_XIVE_DUMP
|
|
^^^^^^^^^^^^^^
|
|
.. code-block:: c
|
|
|
|
int64_t opal_xive_dump(uint32_t type, uint32_t id);
|
|
|
|
This is a debugging call that will dump in the OPAL console various
|
|
state information about the XIVE.
|
|
|
|
* type: Type of info to dump:
|
|
|
|
- XIVE_DUMP_TM_HYP: Dump the TIMA area for hypervisor physical thread
|
|
"id" is the PIR value of the thread
|
|
|
|
- XIVE_DUMP_TM_POOL: Dump the TIMA area for the hypervisor pool
|
|
"id" is the PIR value of the thread
|
|
|
|
- XIVE_DUMP_TM_OS: Dump the TIMA area for the OS
|
|
"id" is the PIR value of the thread
|
|
|
|
- XIVE_DUMP_TM_USER: Dump the TIMA area for the "user" area (unsupported)
|
|
"id" is the PIR value of the thread
|
|
|
|
- XIVE_DUMP_VP: Dump the state of a VP structure
|
|
"id" is the VP id
|
|
|
|
- XIVE_DUMP_EMU: Dump the state of the XICS emulation for a thread
|
|
"id" is the PIR value of the thread
|
|
|
|
.. _OPAL_XIVE_GET_QUEUE_STATE:
|
|
|
|
OPAL_XIVE_GET_QUEUE_STATE
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
.. code-block:: c
|
|
|
|
int64_t opal_xive_get_queue_state(uint64_t vp, uint32_t prio,
|
|
uint32_t *out_qtoggle,
|
|
uint32_t *out_qindex);
|
|
|
|
This call saves the queue toggle bit and index. This must be called on
|
|
an enabled queue.
|
|
|
|
* vp, prio: The target queue
|
|
|
|
* out_qtoggle: toggle bit of the queue
|
|
|
|
* out_qindex: index of the queue
|
|
|
|
.. _OPAL_XIVE_SET_QUEUE_STATE:
|
|
|
|
OPAL_XIVE_SET_QUEUE_STATE
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
.. code-block:: c
|
|
|
|
int64_t opal_xive_set_queue_state(uint64_t vp, uint32_t prio,
|
|
uint32_t qtoggle,
|
|
uint32_t qindex);
|
|
|
|
This call restores the queue toggle bit and index that was previously
|
|
saved by a call to opal_xive_get_queue_state(). This must be called on
|
|
an enabled queue.
|
|
|
|
* vp, prio: The target queue
|
|
|
|
* qtoggle: toggle bit of the queue
|
|
|
|
* qindex: index of the queue
|
|
|
|
|
|
.. _OPAL_XIVE_GET_VP_STATE:
|
|
|
|
OPAL_XIVE_GET_VP_STATE
|
|
^^^^^^^^^^^^^^^^^^^^^^
|
|
.. code-block:: c
|
|
|
|
int64_t opal_xive_get_vp_state(uint64_t vp_id,
|
|
uint64_t *out_state);
|
|
|
|
This call saves the VP HW state in "out_state". The format matches the
|
|
XIVE NVT word 4 and word 5. This must be called on an enabled VP.
|
|
|
|
* vp_id: The target VP
|
|
|
|
* out_state: Location where the state is to be stored
|