850 lines
37 KiB
ReStructuredText
850 lines
37 KiB
ReStructuredText
.. _skiboot-6.4:
|
|
|
|
skiboot-6.4
|
|
===========
|
|
|
|
skiboot v6.4 was released on Tuesday July 16th 2019. It is the first
|
|
release of skiboot 6.4, which becomes the new stable release
|
|
of skiboot following the 6.3 release, first released May 3rd 2019.
|
|
|
|
Skiboot 6.4 will mark the basis for op-build v2.4.
|
|
|
|
skiboot v6.4 contains all bug fixes as of :ref:`skiboot-6.0.20`,
|
|
and :ref:`skiboot-6.3.2` (the currently maintained stable releases).
|
|
|
|
For how the skiboot stable releases work, see :ref:`stable-rules` for details.
|
|
|
|
Over skiboot 6.3, we have the following changes:
|
|
|
|
.. _skiboot-6.4-new-features:
|
|
|
|
New features
|
|
------------
|
|
|
|
Since skiboot v6.4-rc1:
|
|
|
|
- npu2-opencapi: Add opencapi support on ZZ
|
|
|
|
This patch adds opencapi support on ZZ. It hard-codes the required
|
|
device tree entries for the NPU and links. The alternative was to use
|
|
HDAT, but it somehow proved too painful to do.
|
|
|
|
The new device tree entries activate the npu2 init code on ZZ. On
|
|
systems with no opencapi adapters, it should go unnoticed, as presence
|
|
detection will skip link training.
|
|
|
|
Since skiboot v6.3:
|
|
|
|
- platforms/nicole: Add new platform
|
|
|
|
The platform is a new platform from YADRO, it's a storage controller for
|
|
TATLIN server. It's Based on IBM Romulus reference design (POWER9).
|
|
|
|
- platform/zz: Add new platform type
|
|
|
|
We have new platform type under ZZ. Lets add them. With this fix
|
|
- nvram: Flag dangerous NVRAM options
|
|
|
|
Most nvram options used by skiboot are just for debug or testing for
|
|
regressions. They should never be used long term.
|
|
|
|
We've hit a number of issues in testing and the field where nvram
|
|
options have been set "temporarily" but haven't been properly cleared
|
|
after, resulting in crashes or real bugs being masked.
|
|
|
|
This patch marks most nvram options used by skiboot as dangerous and
|
|
prints a chicken to remind users of the problem.
|
|
|
|
- hw/phb3: Add verbose EEH output
|
|
|
|
Add support for the pci-eeh-verbose NVRAM flag on PHB3. We've had this
|
|
on PHB4 since forever and it has proven very useful when debugging EEH
|
|
issues. When testing changes to the Linux kernel's EEH implementation
|
|
it's fairly common for the kernel to crash before printing the EEH log
|
|
so it's helpful to have it in the OPAL log where it can be dumped from
|
|
XMON.
|
|
|
|
Note that unlike PHB4 we do not enable verbose mode by default. The
|
|
nvram option must be used to explicitly enable it.
|
|
|
|
- Experimental support for building without FSP code
|
|
|
|
Now, with CONFIG_FSP=0/1 we have:
|
|
|
|
- 1.6M/1.4M skiboot.lid
|
|
- 323K/375K skiboot.lid.xz
|
|
|
|
- doc: travis-ci deploy docs!
|
|
|
|
Documentation is now automatically deployed if you configure Travis CI
|
|
appropriately (we have done this for the open-power branch of skiboot)
|
|
|
|
- Big OPAL API Documentation improvement
|
|
|
|
A lot more OPAL API calls are now (at least somewhat) documented.
|
|
- opal/hmi: Report NPU2 checkstop reason
|
|
|
|
The NPU2 is currently not passing any information to linux to explain
|
|
the cause of an HMI. NPU2 has three Fault Isolation Registers and over
|
|
30 of those FIR bits are configured to raise an HMI by default. We
|
|
won't be able to fit all possible state in the 32-bit xstop_reason
|
|
field of the HMI event, but we can still try to encode up to 4 HMI
|
|
reasons.
|
|
- opal-msg: Enhance opal-get-msg API
|
|
|
|
Linux uses :ref:`OPAL_GET_MSG` API to get OPAL messages. This interface
|
|
supports upto 8 params (64 bytes). We have a requirement to send bigger data to
|
|
Linux. This patch enhances OPAL to send bigger data to Linux.
|
|
|
|
- Linux will use "opal-msg-size" device tree property to allocate memory for
|
|
OPAL messages (previous patch increased "opal-msg-size" to 64K).
|
|
- Replaced `reserved` field in "struct opal_msg" with `size`. So that Linux
|
|
side opal_get_msg user can detect actual data size.
|
|
- If buffer size < actual message size, then opal_get_msg will copy partial
|
|
data and return OPAL_PARTIAL to Linux.
|
|
- Add new variable "extended" to "opal_msg_entry" structure to keep track
|
|
of messages that has more than 64byte data. We will allocate separate
|
|
memory for these messages and once kernel consumes message we will
|
|
release that memory.
|
|
- core/opal: Increase opal-msg-size size
|
|
|
|
Kernel will use `opal-msg-size` property to allocate memory for opal_msg.
|
|
We want to send bigger data from OPAL to kernel. Hence increase
|
|
opal-msg-size to 64K.
|
|
- hw/npu2-opencapi: Add initial support for allocating OpenCAPI LPC memory
|
|
|
|
Lowest Point of Coherency (LPC) memory allows the host to access memory on
|
|
an OpenCAPI device.
|
|
|
|
Define 2 OPAL calls, :ref:`OPAL_NPU_MEM_ALLOC` and :ref:`OPAL_NPU_MEM_RELEASE`, for
|
|
assigning and clearing the memory BAR. (We try to avoid using the term
|
|
"LPC" to avoid confusion with Low Pin Count.)
|
|
|
|
At present, we use a fixed location in the address space, which means we
|
|
are restricted to a single range of 4TB, on a single OpenCAPI device per
|
|
chip. In future, we'll use some chip ID extension magic to give us more
|
|
space, and some sort of allocator to assign ranges to more than one device.
|
|
- core/fast-reboot: Add im-feeling-lucky option
|
|
|
|
Fast reboot gets disabled for a number of reasons e.g. the availability
|
|
of nvlink. However this doesn't actually affect the ability to perform fast
|
|
reboot if no nvlink device is actually present.
|
|
|
|
Add a nvram option for fast-reset where if it's set to
|
|
"im-feeling-lucky" then perform the fast-reboot irrespective of if it's
|
|
previously been disabled.
|
|
|
|
- platforms/astbmc: Check for SBE validation step
|
|
|
|
On some POWER8 astbmc systems an update to the SBE requires pausing at
|
|
runtime to ensure integrity of the SBE. If this is required the BMC will
|
|
set a chassis boot option IPMI flag using the OEM parameter 0x62. If
|
|
Skiboot sees this flag is set it waits until the SBE update is complete
|
|
and the flag is cleared.
|
|
|
|
Unfortunately the mystery operation that validates the SBE also leaves
|
|
it in a bad state and unable to be used for timer operations. To
|
|
workaround this the flag is checked as soon as possible (ie. when IPMI
|
|
and the console are set up), and once complete the system is rebooted.
|
|
- Add P9 DIO interrupt support
|
|
|
|
On P9 there are GPIO port 0, 1, 2 for GPIO interrupt, and DIO interrupt
|
|
is used to handle the interrupts.
|
|
|
|
Add support to the DIO interrupts:
|
|
|
|
1. Add dio_interrupt_register(chip, port, callback) to register the
|
|
interrupt
|
|
2. Add dio_interrupt_deregister(chip, port, callback) to deregister;
|
|
3. When interrupt on the port occurs, callback is invoked, and the
|
|
interrupt status is cleared.
|
|
|
|
|
|
Removed features
|
|
----------------
|
|
|
|
Since skiboot v6.3:
|
|
|
|
- pci/iov: Remove skiboot VF tracking
|
|
|
|
This feature was added a few years ago in response to a request to make
|
|
the MaxPayloadSize (MPS) field of a Virtual Function match the MPS of the
|
|
Physical Function that hosts it.
|
|
|
|
The SR-IOV specification states the the MPS field of the VF is "ResvP".
|
|
This indicates the VF will use whatever MPS is configured on the PF and
|
|
that the field should be treated as a reserved field in the config space
|
|
of the VF. In other words, a SR-IOV spec compliant VF should always return
|
|
zero in the MPS field. Adding hacks in OPAL to make it non-zero is...
|
|
misguided at best.
|
|
|
|
Additionally, there is a bug in the way pci_device structures are handled
|
|
by VFs that results in a crash on fast-reboot that occurs if VFs are
|
|
enabled and then disabled prior to rebooting. This patch fixes the bug by
|
|
removing the code entirely. This patch has no impact on SR-IOV support on
|
|
the host operating system.
|
|
- Remove POWER7 and POWER7+ support
|
|
|
|
It's been a good long while since either OPAL POWER7 user touched a
|
|
machine, and even longer since they'd have been okay using an old
|
|
version rather than tracking master.
|
|
|
|
There's also been no testing of OPAL on POWER7 systems for an awfully
|
|
long time, so it's pretty safe to assume that it's very much bitrotted.
|
|
|
|
It also saves a whole 14kb of xz compressed payload space.
|
|
- Remove remnants of :ref:`OPAL_PCI_GET_PHB_DIAG_DATA`
|
|
|
|
Never present in a public OPAL release, and only kernels prior to 3.11
|
|
would ever attempt to call it.
|
|
- Remove unused :ref:`OPAL_GET_XIVE_SOURCE`
|
|
|
|
While this call was technically implemented by skiboot, no code has ever called
|
|
it, and it was only ever implemented for the p7ioc-phb back-end (i.e. POWER7).
|
|
Since this call was unused in Linux, and that POWER7 with OPAL was only ever
|
|
available internally, so it should be safe to remove the call.
|
|
- Remove unused :ref:`OPAL_PCI_GET_XIVE_REISSUE` and :ref:`OPAL_PCI_SET_XIVE_REISSUE`
|
|
|
|
These seem to be remnants of one of the OPAL incarnations prior to
|
|
OPALv3. These calls have never been implemented in skiboot, and never
|
|
used by an upstream kernel (nor a PowerKVM kernel).
|
|
|
|
It's rather safe to just document them as never existing.
|
|
- Remove never implemented :ref:`OPAL_PCI_SET_PHB_TABLE_MEMORY` and document why
|
|
|
|
Not ever used by upstream linux or PowerKVM tree. Never implemented in
|
|
skiboot (not even in ancient internal only tree).
|
|
|
|
So, it's incredibly safe to remove.
|
|
- Remove unused :ref:`OPAL_PCI_EEH_FREEZE_STATUS2`
|
|
|
|
This call was introduced all the way back at the end of 2012, before
|
|
OPAL was public. The #define for the OPAL call was introduced to the
|
|
Linux kernel in June 2013, and the call was never used in any kernel
|
|
tree ever (as far as we can find).
|
|
|
|
Thus, it's quite safe to remove this completely unused and completely
|
|
untested OPAL call.
|
|
- Document the long removed :ref:`OPAL_REGISTER_OPAL_EXCEPTION_HANDLER` call
|
|
|
|
I'm pretty sure this was removed in one of our first ever service packs.
|
|
|
|
Fixes: https://github.com/open-power/skiboot/issues/98
|
|
- Remove last remnants of :ref:`OPAL_PCI_SET_PHB_TCE_MEMORY` and :ref:`OPAL_PCI_SET_HUB_TCE_MEMORY`
|
|
|
|
Since we have not supported p5ioc systems since skiboot 5.2, it's pretty
|
|
safe to just wholesale remove these OPAL calls now.
|
|
- Remove remnants of :ref:`OPAL_PCI_SET_PHB_TCE_MEMORY`
|
|
|
|
There's no reason we need remnants hanging around that aren't used, so
|
|
remove them and save a handful of bytes at runtime.
|
|
|
|
Simultaneously, document the OPAL call removal.
|
|
|
|
|
|
Secure and Trusted Boot
|
|
-----------------------
|
|
|
|
Since skiboot v6.3:
|
|
|
|
- trustedboot: Change PCR and event_type for the skiboot events
|
|
|
|
The existing skiboot events are being logged as EV_ACTION, however, the
|
|
TCG PC Client spec says that EV_ACTION events should have one of the
|
|
pre-defined strings in the event field recorded in the event log. For
|
|
instance:
|
|
|
|
- "Calling Ready to Boot",
|
|
- "Entering ROM Based Setup",
|
|
- "User Password Entered", and
|
|
- "Start Option ROM Scan.
|
|
|
|
None of the EV_ACTION pre-defined strings are applicable to the existing
|
|
skiboot events. Based on recent discussions with other POWER teams, this
|
|
patch proposes a convention on what PCR and event types should be used
|
|
for skiboot events. This also changes the skiboot source code to follow
|
|
the convention.
|
|
|
|
The TCG PC Client spec defines several event types, other than
|
|
EV_ACTION. However, many of them are specific to UEFI events and some
|
|
others are related to platform or CRTM events, which is more applicable
|
|
to hostboot events.
|
|
|
|
Currently, most of the hostboot events are extended to PCR[0,1] and
|
|
logged as either EV_PLATFORM_CONFIG_FLAGS, EV_S_CRTM_CONTENTS or
|
|
EV_POST_CODE. The "Node Id" and "PAYLOAD" events, though, are extended
|
|
to PCR[4,5,6] and logged as EV_COMPACT_HASH.
|
|
|
|
For the lack of an event type that fits the specific purpose,
|
|
EV_COMPACT_HASH seems to be the most adequate one due to its
|
|
flexibility. According to the TCG PC Client spec:
|
|
|
|
- May be used for any PCR except 0, 1, 2 and 3.
|
|
- The event field may be informative or may be hashed to generate the
|
|
digest field, depending on the component recording the event.
|
|
|
|
Additionally, the PCR[4,5] seem to be the most adequate PCRs. They would
|
|
be used for skiboot and some skiroot events. According to the TCG PC
|
|
Client, PCR[4] is intended to represent the entity that manages the
|
|
transition between the pre-OS and OS-present state of the platform.
|
|
PCR[4], along with PCR[5], identifies the initial OS loader.
|
|
|
|
In summary, for skiboot events:
|
|
|
|
- Events that represents data should be extended to PCR 4.
|
|
- Events that represents config should be extended to PCR 5.
|
|
- For the lack of an event type that fits the specific purpose,
|
|
both data and config events should be logged as EV_COMPACT_HASH.
|
|
|
|
Sensors
|
|
-------
|
|
|
|
Since skiboot v6.3:
|
|
|
|
- occ-sensors: Check if OCC is reset while reading inband sensors
|
|
|
|
OCC may not be able to mark the sensor buffer as invalid while going
|
|
down RESET. If OCC never comes back we will continue to read the stale
|
|
sensor data. So verify if OCC is reset while reading the sensor values
|
|
and propagate the appropriate error.
|
|
|
|
IPMI
|
|
----
|
|
|
|
Since skiboot v6.3:
|
|
|
|
- ipmi: ensure forward progress on ipmi_queue_msg_sync()
|
|
|
|
BT responses are handled using a timer doing the polling. To hope to
|
|
get an answer to an IPMI synchronous message, the timer needs to run.
|
|
|
|
We can't just check all timers though as there may be a timer that
|
|
wants a lock that's held by a code path calling ipmi_queue_msg_sync(),
|
|
and if we did enforce that as a requirement, it's a pretty subtle
|
|
API that is asking to be broken.
|
|
|
|
So, if we just run a poll function to crank anything that the IPMI
|
|
backend needs, then we should be fine.
|
|
|
|
This issue shows up very quickly under QEMU when loading the first
|
|
flash resource with the IPMI HIOMAP backend.
|
|
|
|
NPU2
|
|
----
|
|
|
|
Since skiboot v6.4-rc1:
|
|
|
|
- witherspoon: Add nvlink peers in finalise_dt()
|
|
|
|
This information is consumed by Linux so it needs to be in the DT. Move
|
|
it to finalise_dt().
|
|
|
|
Since skiboot v6.3:
|
|
|
|
- npu2: Increase timeout for L2/L3 cache purging
|
|
|
|
On NVLink2 bridge reset, we purge all L2/L3 caches in the system.
|
|
This is an asynchronous operation, we have a 2ms timeout here. There are
|
|
reports that this is not enough and "PURGE L3 on core xxx timed out"
|
|
messages appear (for the reference: on the test setup this takes
|
|
280us..780us).
|
|
|
|
This defines the timeout as a macro and changes this from 2ms to 20ms.
|
|
|
|
This adds a tracepoint to tell how long it took to purge all the caches.
|
|
- npu2: Purge cache when resetting a GPU
|
|
|
|
After putting all a GPU's links in reset, do a cache purge in case we
|
|
have CPU cache lines belonging to the now-unaccessible GPU memory.
|
|
- npu2-opencapi: Mask 2 XSL errors
|
|
|
|
Commit f8dfd699f584 ("hw/npu2: Setup an error interrupt on some
|
|
opencapi FIRs") converted some FIR bits default action from system
|
|
checkstop to raising an error interrupt. For 2 XSL error events that
|
|
can be triggered by a misbehaving AFU, the error interrupt is raised
|
|
twice, once for each link (the XSL logic in the NPU is shared between
|
|
2 links). So a badly behaving AFU could impact another, unsuspecting
|
|
opencapi adapter.
|
|
|
|
It doesn't look good and it turns out we can do better. We can mask
|
|
those 2 XSL errors. The error will also be picked up by the OTL logic,
|
|
which is per link. So we'll still get an error interrupt, but only on
|
|
the relevant link, and the other opencapi adapter can stay functional.
|
|
- npu2: Clear fence state for a brick being reset
|
|
|
|
Resetting a GPU before resetting an NVLink leads to occasional HMIs
|
|
which fence some bricks and prevent the "reset_ntl" procedure from
|
|
succeeding at the "reset_ntl_release" step - the host system requires
|
|
reboot; there may be other cases like this as well.
|
|
|
|
This adds clearing of the fence bit in NPU.MISC.FENCE_STATE for
|
|
the NVLink which we are about to reset.
|
|
- npu2: Fix clearing the FIR bits
|
|
|
|
FIR registers are SCOM-only so they cannot be accesses with the indirect
|
|
write, and yet we use SCOM-based addresses for these; fix this.
|
|
|
|
- npu2: Reset NVLinks when resetting a GPU
|
|
|
|
Resetting a V100 GPU brings its NVLinks down and if an NPU tries using
|
|
those, an HMI occurs. We were lucky not to observe this as the bare metal
|
|
does not normally reset a GPU and when passed through, GPUs are usually
|
|
before NPUs in QEMU command line or Libvirt XML and because of that NPUs
|
|
are naturally reset first. However simple change of the device order
|
|
brings HMIs.
|
|
|
|
This defines a bus control filter for a PCI slot with a GPU with NVLinks
|
|
so when the host system issues secondary bus reset to the slot, it resets
|
|
associated NVLinks.
|
|
- npu2: Reset PID wildcard and refcounter when mapped to LPID
|
|
|
|
Since 105d80f85b "npu2: Use unfiltered mode in XTS tables" we do not
|
|
register every PID in the XTS table so the table has one entry per LPID.
|
|
Then we added a reference counter to keep track of the entry use when
|
|
switching GPU between the host and guest systems (the "Fixes:" tag below).
|
|
|
|
The POWERNV platform setup creates such entries and references them
|
|
at the boot time when initializing IOMMUs and only removes it when
|
|
a GPU is passed through to a guest. This creates a problem as POWERNV
|
|
boots via kexec and no defererencing happens; the XTS table state remains
|
|
undefined. So when the host kernel boots, skiboot thinks there are valid
|
|
XTS entries and does not update the XTS table which breaks ATS.
|
|
|
|
This adds the reference counter and the XTS entry reset when a GPU is
|
|
assigned to LPID and we cannot rely on the kernel to clean that up.
|
|
|
|
PHB4
|
|
----
|
|
|
|
Since skiboot v6.3:
|
|
|
|
- hw/phb4: Make phb4_training_trace() more general
|
|
|
|
phb4_training_trace() is used to monitor the Link Training Status
|
|
State Machine (LTSSM) of the PHB's data link layer. Currently it is only
|
|
used to observe the LTSSM while bringing up the link, but sometimes it's
|
|
useful to see what's occurring in other situations (e.g. link disable, or
|
|
secondary bus reset). This patch renames it to phb4_link_trace() and
|
|
allows the target LTSSM state and a flexible timeout to help in these
|
|
situations.
|
|
- hw/phb4: Make pci-tracing print at PR_NOTICE
|
|
|
|
When pci-tracing is enabled we print each trace status message and the
|
|
final trace status at PR_ERROR. The final status messages are similar to
|
|
those printed when we fail to train in the non-pci-tracing path and this
|
|
has resulted in spurious op-test failures.
|
|
|
|
This patch reduces the log-level of the tracing message to PR_NOTICE so
|
|
they're not accidently interpreted as actual error messages. PR_NOTICE
|
|
messages are still printed to the console during boot.
|
|
- hw/phb4: Use read/write_reg in assert_perst
|
|
|
|
While the PHB is fenced we can't use the MMIO interface to access PHB
|
|
registers. While processing a complete reset we inject a PHB fence to
|
|
isolate the PHB from the rest of the system because the PHB won't
|
|
respond to MMIOs from the rest of the system while being reset.
|
|
|
|
We assert PERST after the fence has been erected which requires us to
|
|
use the XSCOM indirect interface to access the PHB registers rather than
|
|
the MMIO interface. Previously we did that when asserting PERST in the
|
|
CRESET path. However in b8b4c79d4419 ("hw/phb4: Factor out PERST
|
|
control"). This was re-written to use the raw in_be64() accessor. This
|
|
means that CRESET would not be asserted in the reset path. On some
|
|
Mellanox cards this would prevent them from re-loading their firmware
|
|
when the system was fast-reset.
|
|
|
|
This patch fixes the problem by replacing the raw {in|out}_be64()
|
|
accessors with the phb4_{read|write}_reg() functions.
|
|
|
|
- hw/phb4: Assert Link Disable bit after ETU init
|
|
|
|
The cursed RAID card in ozrom1 has a bug where it ignores PERST being
|
|
asserted. The PCIe Base spec is a little vague about what happens
|
|
while PERST is asserted, but it does clearly specify that when
|
|
PERST is de-asserted the Link Training and Status State Machine
|
|
(LTSSM) of a device should return to the initial state (Detect)
|
|
defined in the spec and the link training process should restart.
|
|
|
|
This bug was worked around in 9078f8268922 ("phb4: Delay training till
|
|
after PERST is deasserted") by setting the link disable bit at the
|
|
start of the FRESET process and clearing it after PERST was
|
|
de-asserted. Although this fixed the bug, the patch offered no
|
|
explaination of why the fix worked.
|
|
|
|
In b8b4c79d4419 ("hw/phb4: Factor out PERST control") the link disable
|
|
workaround was moved into phb4_assert_perst(). This is called
|
|
always in the CRESET case, but a following patch resulted in
|
|
assert_perst() not being called if phb4_freset() was entered following a
|
|
CRESET since p->skip_perst was set in the CRESET handler. This is bad
|
|
since a side-effect of the CRESET is that the Link Disable bit is
|
|
cleared.
|
|
|
|
This, combined with the RAID card ignoring PERST results in the PCIe
|
|
link being trained by the PHB while we're waiting out the 100ms
|
|
ETU reset time. If we hack skiboot to print a DLP trace after returning
|
|
from phb4_hw_init() we get: ::
|
|
|
|
PHB#0001[0:1]: Initialization complete
|
|
PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling
|
|
PHB#0001[0:1]: TRACE:0x0000001101000000 23ms GEN1:x16:detect
|
|
PHB#0001[0:1]: TRACE:0x0000102101000000 23ms presence GEN1:x16:polling
|
|
PHB#0001[0:1]: TRACE:0x0000183101000000 29ms training GEN1:x16:config
|
|
PHB#0001[0:1]: TRACE:0x00001c5881000000 30ms training GEN1:x08:recovery
|
|
PHB#0001[0:1]: TRACE:0x00001c5883000000 30ms training GEN3:x08:recovery
|
|
PHB#0001[0:1]: TRACE:0x0000144883000000 33ms presence GEN3:x08:L0
|
|
PHB#0001[0:1]: TRACE:0x0000154883000000 33ms trained GEN3:x08:L0
|
|
PHB#0001[0:1]: CRESET: wait_time = 100
|
|
PHB#0001[0:1]: FRESET: Starts
|
|
PHB#0001[0:1]: FRESET: Prepare for link down
|
|
PHB#0001[0:1]: FRESET: Assert skipped
|
|
PHB#0001[0:1]: FRESET: Deassert
|
|
PHB#0001[0:1]: TRACE:0x0000154883000000 0ms trained GEN3:x08:L0
|
|
PHB#0001[0:1]: TRACE: Reached target state
|
|
PHB#0001[0:1]: LINK: Start polling
|
|
PHB#0001[0:1]: LINK: Electrical link detected
|
|
PHB#0001[0:1]: LINK: Link is up
|
|
PHB#0001[0:1]: LINK: Went down waiting for stabilty
|
|
PHB#0001[0:1]: LINK: DLP train control: 0x0000105101000000
|
|
PHB#0001[0:1]: CRESET: Starts
|
|
|
|
What has happened here is that the link is trained to 8x Gen3 33ms after
|
|
we return from phb4_init_hw(), and before we've waitined to 100ms
|
|
that we normally wait after re-initialising the ETU. When we "deassert"
|
|
PERST later on in the FRESET handler the link in L0 (normal) state. At
|
|
this point we try to read from the Vendor/Device ID register to verify
|
|
that the link is stable and immediately get a PHB fence due to a PCIe
|
|
Completion Timeout. Skiboot attempts to recover by doing another CRESET,
|
|
but this will encounter the same issue.
|
|
|
|
This patch fixes the problem by setting the Link Disable bit (by calling
|
|
phb4_assert_perst()) immediately after we return from phb4_init_hw().
|
|
This prevents the link from being trained while PERST is asserted which
|
|
seems to avoid the Completion Timeout. With the patch applied we get: ::
|
|
|
|
PHB#0001[0:1]: Initialization complete
|
|
PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling
|
|
PHB#0001[0:1]: TRACE:0x0000001101000000 23ms GEN1:x16:detect
|
|
PHB#0001[0:1]: TRACE:0x0000102101000000 23ms presence GEN1:x16:polling
|
|
PHB#0001[0:1]: TRACE:0x0000909101000000 29ms presence GEN1:x16:disabled
|
|
PHB#0001[0:1]: CRESET: wait_time = 100
|
|
PHB#0001[0:1]: FRESET: Starts
|
|
PHB#0001[0:1]: FRESET: Prepare for link down
|
|
PHB#0001[0:1]: FRESET: Assert skipped
|
|
PHB#0001[0:1]: FRESET: Deassert
|
|
PHB#0001[0:1]: TRACE:0x0000001101000000 0ms GEN1:x16:detect
|
|
PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling
|
|
PHB#0001[0:1]: TRACE:0x0000001101000000 24ms GEN1:x16:detect
|
|
PHB#0001[0:1]: TRACE:0x0000102101000000 36ms presence GEN1:x16:polling
|
|
PHB#0001[0:1]: TRACE:0x0000183101000000 97ms training GEN1:x16:config
|
|
PHB#0001[0:1]: TRACE:0x00001c5881000000 97ms training GEN1:x08:recovery
|
|
PHB#0001[0:1]: TRACE:0x00001c5883000000 97ms training GEN3:x08:recovery
|
|
PHB#0001[0:1]: TRACE:0x0000144883000000 99ms presence GEN3:x08:L0
|
|
PHB#0001[0:1]: TRACE: Reached target state
|
|
PHB#0001[0:1]: LINK: Start polling
|
|
PHB#0001[0:1]: LINK: Electrical link detected
|
|
PHB#0001[0:1]: LINK: Link is up
|
|
PHB#0001[0:1]: LINK: Link is stable
|
|
PHB#0001[0:1]: LINK: Card [9005:028c] Optimal Retry:disabled
|
|
PHB#0001[0:1]: LINK: Speed Train:GEN3 PHB:GEN4 DEV:GEN3
|
|
PHB#0001[0:1]: LINK: Width Train:x08 PHB:x08 DEV:x08
|
|
PHB#0001[0:1]: LINK: RX Errors Now:0 Max:8 Lane:0x0000
|
|
|
|
|
|
Simulators
|
|
----------
|
|
|
|
Since skiboot v6.3:
|
|
|
|
- external/mambo: Bump default POWER9 to Nimbus DD2.3
|
|
- external/mambo: fix tcl startup code for mambo bogus net (repost)
|
|
|
|
This fixes a couple issues with external/mambo/skiboot.tcl so I can use the
|
|
mambo bogus net.
|
|
|
|
* newer distros (ubuntu 18.04) allow tap device to have a user specified
|
|
name instead of just tapN so we need to pass in a name not a number.
|
|
* need some kind of default for net_mac, and need the mconfig for it
|
|
to be set from an env var.
|
|
- skiboot.tcl: Add option to wait for GDB server connection
|
|
|
|
Add an environment variable which makes Mambo wait for a connection
|
|
from gdb prior to starting simulation.
|
|
- mambo: Integrate addr2line into backtrace command
|
|
|
|
Gives nice output like this: ::
|
|
|
|
systemsim % bt
|
|
pc: 0xC0000000002BF3D4 _savegpr0_28+0x0
|
|
lr: 0xC00000000004E0F4 opal_call+0x10
|
|
stack:0x000000000041FAE0 0xC00000000004F054 opal_check_token+0x20
|
|
stack:0x000000000041FB50 0xC0000000000500CC __opal_flush_console+0x88
|
|
stack:0x000000000041FBD0 0xC000000000050BF8 opal_flush_console+0x24
|
|
stack:0x000000000041FC00 0xC0000000001F9510 udbg_opal_putc+0x88
|
|
stack:0x000000000041FC40 0xC000000000020E78 udbg_write+0x7c
|
|
stack:0x000000000041FC80 0xC0000000000B1C44 console_unlock+0x47c
|
|
stack:0x000000000041FD80 0xC0000000000B2424 register_console+0x320
|
|
stack:0x000000000041FE10 0xC0000000003A5328 register_early_udbg_console+0x98
|
|
stack:0x000000000041FE80 0xC0000000003A4F14 setup_arch+0x68
|
|
stack:0x000000000041FEF0 0xC0000000003A0880 start_kernel+0x74
|
|
stack:0x000000000041FF90 0xC00000000000AC60 start_here_common+0x1c
|
|
|
|
- mambo: Add addr2func for symbol resolution
|
|
|
|
If you supply a VMLINUX_MAP/SKIBOOT_MAP/USER_MAP addr2func can guess
|
|
at your symbol name. i.e. ::
|
|
|
|
systemsim % p pc
|
|
0xC0000000002A68F8
|
|
systemsim % addr2func [p pc]
|
|
fdt_offset_ptr+0x78
|
|
|
|
- lpc-port80h: Don't write port 80h when running under Simics
|
|
|
|
Simics doesn't model LPC port 80h. Writing to it terminates the
|
|
simulation due to an invalid LPC memory access. This patch adds a
|
|
check to ensure port 80h isn't accessed if we are running under
|
|
Simics.
|
|
- device-tree: speed up fdt building on slow simulators
|
|
|
|
Trade size for speed and avoid de-duplicating strings in the fdt.
|
|
This costs about 2kB in fdt size, and saves about 8 million instructions
|
|
(almost half of all instructions) booting skiboot in mambo.
|
|
- fast-reboot:: skip read-only memory checksum for slow simulators
|
|
|
|
Skip the fast reboot checksum, which costs about 4 million cycles
|
|
booting skiboot in mambo.
|
|
- nx: remove check on the "qemu, powernv" property
|
|
|
|
commit 95f7b3b9698b ("nx: Don't abort on missing NX when using a QEMU
|
|
machine") introduced a check on the property "qemu,powernv" to skip NX
|
|
initialization when running under a QEMU machine.
|
|
|
|
The QEMU platforms now expose a QUIRK_NO_RNG in the chip. Testing the
|
|
"qemu,powernv" property is not necessary anymore.
|
|
- plat/qemu: add a POWER8 and POWER9 platform
|
|
|
|
These new QEMU platforms have characteristics closer to real OpenPOWER
|
|
systems that we use today and define a different BMC depending on the
|
|
CPU type. New platform properties are introduced for each,
|
|
"qemu,powernv8", "qemu,powernv9" and these should be compatible with
|
|
existing QEMUs which only expose the "qemu,powernv" property
|
|
- libc/string: speed up common string functions
|
|
|
|
Use compiler builtins for the string functions, and compile the
|
|
libc/string/ directory with -O2.
|
|
|
|
This reduces instructions booting skiboot in mambo by 2.9 million in
|
|
slow-sim mode, or 3.8 in normal mode, for less than 1kB image size
|
|
increase.
|
|
|
|
This can result in the compiler warning more cases of string function
|
|
problems.
|
|
- external/mambo: Add an option to exit Mambo when the system is shutdown
|
|
|
|
Automatically exiting can be convenient for scripting. Will also exit
|
|
due to a HW crash (eg. unhandled exception).
|
|
|
|
VESNIN platform
|
|
---------------
|
|
|
|
Since skiboot v6.3:
|
|
|
|
- platforms/vesnin: PCI inventory via IPMI OEM
|
|
|
|
Replace raw protocol with OEM message supported by OpenBMC's IPMI
|
|
plugins.
|
|
|
|
BMC-side implementation (IPMI plug-in):
|
|
https://github.com/YADRO-KNS/phosphor-pci-inventory
|
|
|
|
Utilities
|
|
---------
|
|
|
|
Since skiboot v6.3:
|
|
|
|
- opal-gard: Account for ECC size when clearing partition
|
|
|
|
When 'opal-gard clear all' is run, it works by erasing the GUARD then
|
|
using blockevel_smart_write() to write nothing to the partition. This
|
|
second write call is needed because we rely on libflash to set the ECC
|
|
bits appropriately when the partition contained ECCed data.
|
|
|
|
The API for this is a little odd with the caller specifying how much
|
|
actual data to write, and libflash writing size + size/8 bytes
|
|
since there is one additional ECC byte for every eight bytes of data.
|
|
|
|
We currently do not account for the extra space consumed by the ECC data
|
|
in reset_partition() which is used to handle the 'clear all' command.
|
|
Which results in the paritition following the GUARD partition being
|
|
partially overwritten when the command is used. This patch fixes the
|
|
problem by reducing the length we would normally write by the number
|
|
of ECC bytes required.
|
|
|
|
|
|
Build and debugging
|
|
-------------------
|
|
|
|
Since skiboot v6.3:
|
|
|
|
- Disable -Waddress-of-packed-member for GCC9
|
|
|
|
We throw a bunch of errors in errorlog code otherwise, which we should
|
|
fix, but we don't *have* to yet.
|
|
|
|
- Fix a lot of sparse warnings
|
|
- With new GCC comes larger GCOV binaries
|
|
|
|
So we need to change our heap size to make more room for data/bss
|
|
without having to change where the console is or have more fun moving
|
|
things about.
|
|
- Intentionally discard fini_array sections
|
|
|
|
Produced in a SKIBOOT_GCOV=1 build, and never called by skiboot.
|
|
- external/trace: Add follow option to dump_trace
|
|
|
|
When monitoring traces, an option like the tail command's '-f' (follow)
|
|
is very useful. This option continues to append to the output as more
|
|
data arrives. Add an '-f' option to allow dump_trace to operate
|
|
similarly.
|
|
|
|
Tail also provides a '-s' (sleep time) option that
|
|
accompanies '-f'. This controls how often new input will be polled. Add
|
|
a '-s' option that will make dump_trace sleep for N milliseconds before
|
|
checking for new input.
|
|
- external/trace: Add support for dumping multiple buffers
|
|
|
|
dump_trace only can dump one trace buffer at a time. It would be handy
|
|
to be able to dump multiple buffers and to see the entries from these
|
|
buffers displayed in correct timestamp order. Each trace buffer is
|
|
already sorted by timestamp so use a heap to implement an efficient
|
|
k-way merge. Use the CCAN heap to implement this sort. However the CCAN
|
|
heap does not have a 'heap_replace' operation. We need to 'heap_pop'
|
|
then 'heap_push' to replace the root which means rebalancing twice
|
|
instead of once.
|
|
- external/trace: mmap trace buffers in dump_trace
|
|
|
|
The current lseek/read approach used in dump_trace does not correctly
|
|
handle certain aspects of the buffers. It does not use the start and end
|
|
position that is part of the buffer so it will not begin from the
|
|
correct location. It does not move back to the beginning of the trace
|
|
buffer file as the buffer wraps around. It also does not handle the
|
|
overflow case of the writer overwriting when the reader is up to.
|
|
|
|
Mmap the trace buffer file so that the existing reading functions in
|
|
extra/trace.c can be used. These functions already handle the cases of
|
|
wrapping and overflow. This reduces code duplication and uses functions
|
|
that are already unit tested. However this requires a kernel where the
|
|
trace buffer sysfs nodes are able to be mmaped (see
|
|
https://patchwork.ozlabs.org/patch/1056786/)
|
|
- core/trace: Export trace buffers to sysfs
|
|
|
|
Every property in the device-tree under /ibm,opal/firmware/exports has a
|
|
sysfs node created in /firmware/opal/exports. Add properties with the
|
|
physical address and size for each trace buffer so they are exported.
|
|
- core/trace: Add pir number to debug_descriptor
|
|
|
|
The names given to the trace buffers when exported to sysfs should show
|
|
what cpu they are associated with to make it easier to understand there
|
|
output. The debug_descriptor currently stores the address and length of
|
|
each trace buffer and this is used for adding properties to the device
|
|
tree. Extend debug_descriptor to include a cpu associated with each
|
|
trace. This will be used for creating properties in the device-tree
|
|
under /ibm,opal/firmware/exports/.
|
|
- core/trace: Change trace buffer size
|
|
|
|
We want to be able to mmap the trace buffers to be used by the
|
|
dump_trace tool. As mmaping is done in terms of pages it makes sense
|
|
that the size of the trace buffers should be page aligned. This is
|
|
slightly complicated by the space taken up by the header at the
|
|
beginning of the trace and the room left for an extra trace entry at the
|
|
end of the buffer. Change the size of the buffer itself so that the
|
|
entire trace buffer size will be page aligned.
|
|
- core/trace: Change buffer alignment from 4K to 64K
|
|
|
|
We want to be able to mmap the trace buffers to be used by the
|
|
dump_trace tool. This means that the trace bufferes must be page
|
|
aligned. Currently they are aligned to 4K. Most power systems have a
|
|
64K page size. On systems with a 4K page size, 64K aligned will still be
|
|
page aligned. Change the allocation of the trace buffers to be 64K
|
|
aligned.
|
|
|
|
The trace_info struct that contains the trace buffer is actually what is
|
|
allocated aligned memory. This means the trace buffer itself is not
|
|
actually aligned and this is the address that is currently exposed
|
|
through sysfs. To get around this change the address that is exposed to
|
|
sysfs to be the trace_info struct. This means the lock in trace_info is
|
|
now visible too.
|
|
- external/trace: Use correct width integer byte swapping
|
|
|
|
The trace_repeat struct uses be16 for storing the number of repeats.
|
|
Currently be32_to_cpu conversion is used to display this member. This
|
|
produces an incorrect value. Use be16_to_cpu instead.
|
|
- core/trace: Put boot_tracebuf in correct location.
|
|
|
|
A position for the boot_tracebuf is allocated in skiboot.lds.S.
|
|
However, without a __section attribute the boot trace buffer is not
|
|
placed in the correct location, meaning that it also will not be
|
|
correctly aligned. Add the __section attribute to ensure it will be
|
|
placed in its allocated position.
|
|
- core/lock: Add debug options to store backtrace of where lock was taken
|
|
|
|
Contrary to popular belief, skiboot developers are imperfect and
|
|
occasionally write locking bugs. When we exit skiboot, we check if we're
|
|
still holding any locks, and if so, we print an error with a list of the
|
|
locks currently held and the locations where they were taken.
|
|
|
|
However, this only tells us the location where lock() was called, which may
|
|
not be enough to work out what's going on. To give us more to go on with,
|
|
we can store backtrace data in the lock and print that out when we
|
|
unexpectedly still hold locks.
|
|
|
|
Because the backtrace data is rather big, we only enable this if
|
|
DEBUG_LOCKS_BACKTRACE is defined, which in turn is switched on when
|
|
DEBUG=1.
|
|
|
|
(We disable DEBUG_LOCKS_BACKTRACE in some of the memory allocation tests
|
|
because the locks used by the memory allocator take up too much room in the
|
|
fake skiboot heap.)
|
|
- libfdt: upgrade to upstream dtc.git 243176c
|
|
|
|
Upgrade libfdt/ to github.com/dgibson/dtc.git 243176c ("Fix bogus
|
|
error on rebuild")
|
|
|
|
This copies dtc/libfdt/ to skiboot/libfdt/, with the only change in
|
|
that directory being the addition of README.skiboot and Makefile.inc.
|
|
|
|
This adds about 14kB text, 2.5kB compressed xz. This could be reduced
|
|
or mostly eliminated by cutting out fdt version checks and unused
|
|
code, but tracking upstream is a bigger benefit at the moment.
|
|
|
|
This loses commits:
|
|
|
|
- 14ed2b842f61 ("libfdt: add basic sanity check to fdt_open_into")
|
|
- bc7bb3d12bc1 ("sparse: fix declaration of fdt_strerror")
|
|
|
|
As well as some prehistoric similar kinds of things, which is the
|
|
punishment for us not being good downstream citizens and sending
|
|
things upstream! Syncing to upstream will make that effort simpler
|
|
in future.
|
|
|
|
General Fixes
|
|
-------------
|
|
|
|
Since skiboot v6.4-rc1:
|
|
|
|
- libflash: Fix broken continuations
|
|
|
|
Some of the libflash debug messages don't print a newlines at the end of
|
|
the line and assume that the next print will be contigious with the
|
|
last. This isn't true in skiboot since log messages are prefixed with a
|
|
timestamp. This results in funny looking output such as: ::
|
|
|
|
LIBFLASH: Verifying...
|
|
LIBFLASH: reading page 0x01963000..0x01964000...[3.084846885,7] same !
|
|
LIBFLASH: reading page 0x01964000..0x01965000...[3.086164489,7] same !
|
|
|
|
Fix this by moving the "same !" debug message to a new line with the
|
|
prefix "LIBFLASH: ..." to indicate it's a continuation of the last
|
|
statement.
|
|
|
|
First reported in https://github.com/open-power/skiboot/issues/51
|