The error occurred when switching from MAXIMUM PERFORMANCE to MAXIMUM AVAILABILITY Data Guard mode on an Exadata system. The primary RAC database, running on version 19c, crashed with an ORA-600 error code. The error mentioned that the redo log for group 45, sequence 509, was incorrectly located on DAX storage, causing issues. The incident details pointed to an internal error code [kfk_iodone_invalid_buffer], indicating that an I/O buffer did not pass HARD checks. The error was associated with the LGWR background process.

Changes
The issue arose after changing the Data Guard mode to MAXIMUM AVAILABILITY; switching to MAXIMUM PROTECTION did not trigger the ORA-600 error. Interestingly, a 12c database within the same Exadata rack did not experience the ORA-600 error. The problem manifested when the online or standby redo log files were located on Extreme Flash Cell or High Capacity Cell nodes. The system applied the following relevant patches: Patch 30165493, fixing log file fast sync parameters for PMEMLOG, Bug 31119057 associated with the ORA-600 error, and Bug 31305624 linked to instance crashes.

Solution
When transitioning to MAXIMUM AVAILABILITY Data Guard mode on Exadata, specific steps must be taken to resolve the ORA-600 [kfk_iodone_invalid_buffer] error.

Firstly, dynamic parameters must be set on all database instances: ‘_smart_log_threshold_usec‘ should be set to 0.

For Exadata systems with PMEM, ‘_exa_pmemlog_threshold_usec‘ must also be set to 0.

Furthermore, it is necessary to download and implement patch 31305624 from support.oracle.com.

Lastly, updating the system to version 19.6.0.0.200114DBRU or a higher version that includes the bug fix is part of the solution.

The symptoms include issues such as the ‘ALTER DATABASE OPEN’ process not completing as logged in the alert log file. DIA0 (Hang Manager reports) sessions are also blocked while waiting for ‘gc freelist.’ The hang manager reports instances waiting for ‘cursor: pin S wait on X’ and ‘gc freelist’, which can lead to extended waiting times and potentially block other sessions.

Diagnosis by MMAN and Development

MMAN (Memory Manager) indicates an ORA-4031 error related to the shared pool, potentially caused by the bug identified as BUG 31459369. This bug leads to multiple incidents of ORA-00600 [15709], [29] during parallel execution. Development has confirmed that SGA_TARGET usage can result in an imbalance in the number of Lock Elements (LE) assigned to LMS processes on NUMA machines, along with setting a minimum size for the buffer cache due to this bug.

Workaround Suggested

To address the issues caused by the bug, the recommended workaround is to establish a minimum size for the database buffer cache and shared pool. By setting these minimum sizes, the workaround aims to mitigate the effects of the bug identified as BUG 31459369, which triggers incidents of ORA-00600 [15709], [29] with parallel execution.

ORA-04091 is an Oracle error code that indicates that a trigger, stored procedure, or package has encountered a mutating table error. This error occurs when a trigger or procedure tries to modify a table that is already being modified by the same transaction.

To resolve the ORA-04091 error, you can consider the following approaches:

1. Restructure the code: Review the trigger or procedure code to see if there is a way to rewrite it without causing the mutating table error. This may involve using a compound trigger, autonomous transactions, or other techniques to avoid the conflict.

2. Use a pragma directive: You can use the PRAGMA AUTONOMOUS_TRANSACTION directive to create an autonomous transaction within the trigger or procedure. This allows the trigger or procedure to perform its operations independently of the main transaction, avoiding the mutating table error.

3. Use a Statement-level trigger: To avoid changing table errors, use triggers at the statement level instead of at the row level. Statement-level triggers fire once for each DML statement, not for each affected row. This ensures that they do not cause conflicts with the table state.

It is important to carefully analyze the code and the specific scenario to determine the most appropriate solution for the ORA-04091 error.