The error occurred when switching from MAXIMUM PERFORMANCE to MAXIMUM AVAILABILITY Data Guard mode on an Exadata system. The primary RAC database, running on version 19c, crashed with an ORA-600 error code. The error mentioned that the redo log for group 45, sequence 509, was incorrectly located on DAX storage, causing issues. The incident details pointed to an internal error code [kfk_iodone_invalid_buffer], indicating that an I/O buffer did not pass HARD checks. The error was associated with the LGWR background process.

Changes
The issue arose after changing the Data Guard mode to MAXIMUM AVAILABILITY; switching to MAXIMUM PROTECTION did not trigger the ORA-600 error. Interestingly, a 12c database within the same Exadata rack did not experience the ORA-600 error. The problem manifested when the online or standby redo log files were located on Extreme Flash Cell or High Capacity Cell nodes. The system applied the following relevant patches: Patch 30165493, fixing log file fast sync parameters for PMEMLOG, Bug 31119057 associated with the ORA-600 error, and Bug 31305624 linked to instance crashes.

Solution
When transitioning to MAXIMUM AVAILABILITY Data Guard mode on Exadata, specific steps must be taken to resolve the ORA-600 [kfk_iodone_invalid_buffer] error.

Firstly, dynamic parameters must be set on all database instances: ‘_smart_log_threshold_usec‘ should be set to 0.

For Exadata systems with PMEM, ‘_exa_pmemlog_threshold_usec‘ must also be set to 0.

Furthermore, it is necessary to download and implement patch 31305624 from support.oracle.com.

Lastly, updating the system to version 19.6.0.0.200114DBRU or a higher version that includes the bug fix is part of the solution.

The symptoms include issues such as the ‘ALTER DATABASE OPEN’ process not completing as logged in the alert log file. DIA0 (Hang Manager reports) sessions are also blocked while waiting for ‘gc freelist.’ The hang manager reports instances waiting for ‘cursor: pin S wait on X’ and ‘gc freelist’, which can lead to extended waiting times and potentially block other sessions.

Diagnosis by MMAN and Development

MMAN (Memory Manager) indicates an ORA-4031 error related to the shared pool, potentially caused by the bug identified as BUG 31459369. This bug leads to multiple incidents of ORA-00600 [15709], [29] during parallel execution. Development has confirmed that SGA_TARGET usage can result in an imbalance in the number of Lock Elements (LE) assigned to LMS processes on NUMA machines, along with setting a minimum size for the buffer cache due to this bug.

Workaround Suggested

To address the issues caused by the bug, the recommended workaround is to establish a minimum size for the database buffer cache and shared pool. By setting these minimum sizes, the workaround aims to mitigate the effects of the bug identified as BUG 31459369, which triggers incidents of ORA-00600 [15709], [29] with parallel execution.

This error is triggered when trying to update a column in a table that is part of a correlated subquery involving the same table. This issue was encountered in Oracle databases of versions equal to or greater than 12.1 but below 12.2, with 12.1.0.2 being the confirmed affected version.

Description:
The error was caused by a regression that prevented the correct marking of the compare column in cases where there is a correlation column from the same table but a different view. This regression led to the ORA-00600 [qeselupdpre_20] error when executing certain update statements within the specified version range of Oracle databases.

Version Affected:
The problem affects Oracle database versions from 12.1 to 12.2, with the specific version 12.1.0.2 (Server Patch Set) confirmed to experience this issue. It’s crucial to note that this problem does not extend to versions 12.2 and above.

Workaround:
Unfortunately, there is no available workaround for the ORA-00600 [qeselupdpre_20] error in the affected versions of the Oracle database. Users encountering this issue are advised to proceed directly to the fix provided by Oracle.

Fixed:
The fix for the bug causing the ORA-00600 [qeselupdpre_20] error is first included in Oracle database version 12.2.0.1 (Base Release). Users experiencing this problem should consider upgrading their Oracle database to version 12.2.0.1 or later to resolve this issue.

Bug Description

The bug described in Bug 32043701 involves row cache locks for sequences in Real Application Clusters (RAC) due to the S-optimization feature for dc_sequences. This bug creates locks on the data dictionary rows, primarily to serialize changes to the data dictionary and wait for a lock on a data dictionary cache.

How to Fix
To fix this bug, the workaround is to turn off the S-optimization feature for the sequences (dc_sequences) cache. This can be achieved by explicitly listing the S-optimized enqueue types in the _lm_share_lock_restype config parameter and skipping the enqueue type QN. The specific setting to address this bug is to set _lm_share_lock_restype=’LNQAQBQCQDQEQFQGQHQIQJQKQLQMQOQPQQQRQSQTQUQVQWQXQYQZ’. It is crucial to ensure that this setting has the same value in all instances to resolve the issue reported in bug 32043701 effectively.

ORA-04091 is an Oracle error code that indicates that a trigger, stored procedure, or package has encountered a mutating table error. This error occurs when a trigger or procedure tries to modify a table that is already being modified by the same transaction.

To resolve the ORA-04091 error, you can consider the following approaches:

1. Restructure the code: Review the trigger or procedure code to see if there is a way to rewrite it without causing the mutating table error. This may involve using a compound trigger, autonomous transactions, or other techniques to avoid the conflict.

2. Use a pragma directive: You can use the PRAGMA AUTONOMOUS_TRANSACTION directive to create an autonomous transaction within the trigger or procedure. This allows the trigger or procedure to perform its operations independently of the main transaction, avoiding the mutating table error.

3. Use a Statement-level trigger: To avoid changing table errors, use triggers at the statement level instead of at the row level. Statement-level triggers fire once for each DML statement, not for each affected row. This ensures that they do not cause conflicts with the table state.

It is important to carefully analyze the code and the specific scenario to determine the most appropriate solution for the ORA-04091 error.

Buffer busy waits occur when multiple processes compete for access to specific blocks in the buffer cache of an Oracle database. In order to address this issue, it is essential to identify which blocks are causing contention and understand the reasons behind it. The most effective solution is to eliminate the root cause of the contention.

In many cases, buffer busy waits for data blocks occur because multiple processes are repeatedly accessing the same blocks. For example, if several users are scanning the same index, the first session quickly processes the blocks that are already in the buffer cache. However, when a block needs to be read from disk, other sessions scanning the same index catch up and also require that block. As a result, they have to wait for the buffer cache since another session is already reading the block from disk.

To mitigate buffer busy waits, it is important to analyze and optimize the access patterns to these blocks. This may involve techniques like tuning queries, optimizing indexing strategies, or reconsidering the nature of concurrent operations on the database. By resolving the contention and optimizing these operations, the buffer busy waits can be reduced, leading to improved performance and reduced wait times for accessing the buffer cache.

Oracle Data Guard is a powerful, high-availability solution that ensures Oracle databases’ continuous availability and protection. Optimizing Oracle Net, which handles the network communication between the primary and standby systems, is essential to achieve optimal performance. This article will focus on optimizing Oracle Net for Data Guard by adjusting the TCP send and receive socket buffers, modifying session data unit (SDU) size, and disabling the TCP Nagle algorithm.

To achieve high network throughput, it is recommended to set the TCP send and receive socket buffers to the bandwidth-delay product (BDP) of the network link between the primary and standby systems. The BDP is calculated as the product of the network bandwidth and latency. For example, if the bandwidth is 622 Mbit/s per second and the latency is 30 ms, the minimum recommended size for the TCP socket buffers would be 6,997,500 bytes. This value can be set using the Oracle Net parameters RECV_BUF_SIZE and SEND_BUF_SIZE.

Oracle Net buffers data into session data units (SDUs) before sending them to the network layer. By default, the SDU size is set to 8192 bytes, which might be insufficient for Data Guard operations that involve sending redo data in larger chunks. To improve performance, increasing the SDU size to 64 KB is recommended. This ensures that Data Guard can send larger pieces of redo data without chopping them up and causing additional overhead.

To preempt delays in buffer flushing in the TCP protocol stack, it is advised to disable the TCP Nagle algorithm. This can be achieved by setting TCP.NODELAY to YES in the sqlnet.ora file on both the primary and standby systems. Disabling the Nagle algorithm helps reduce latency and improve the network communication’s responsiveness.

It is important to note that the actual values of the send and receive socket buffer sizes may be limited by the host operating system or memory constraints. The default values for these parameters are specific to the operating system. On Linux, for example, the default values are 128 KB for SEND_BUF_SIZE and 174,700 bytes for RECV_BUF_SIZE. These default values are often modified to higher when the oracle-database-preinstall-19c package is installed, but they are still lower than the recommended values for Data Guard.

To determine the default and maximum amount of receive and send socket memory, you can use the following commands on the Linux operating system:
– To check the receive socket memory:
# cat /proc/sys/net/core/rmem_default
# cat /proc/sys/net/core/rmem_max
– To check the send socket memory:
# cat /proc/sys/net/core/wmem_default
# cat /proc/sys/net/core/wmem_max

If the default values are lower than recommended, you can modify the /etc/sysctl.conf file. For example, you can set the maximum send and receive socket memory to 10 MB (10485760 bytes) with the following commands:
# echo ‘net.core.wmem_max=10485760’ >> /etc/sysctl.conf
# echo ‘net.core.rmem_max=10485760’ >> /etc/sysctl.conf

Additionally, you must set the socket memory’s minimum, initial, and maximum sizes in bytes. Use the following commands:
# echo ‘net.ipv4.tcp_rmem=10240 131072 10485760’ >> /etc/sysctl.conf
# echo ‘net.ipv4.tcp_wmem=10240 131072 10485760’ >> /etc/sysctl.conf

Once the changes are made, reload the modified settings with the command:
# sysctl -p

It is worth mentioning that starting from Oracle Database 12c Release 1 (12.1), Oracle Net supports larger session data unit (SDU) sizes, with an upper limit of 2 MB. The larger SDU size can benefit networks with high bandwidth-delay products and ample host resources, allowing for better utilization of the available network bandwidth. However, the recommended SDU size for Data Guard is 64 KB. This recommendation is subject to change based on ongoing testing and future advancements.

By optimizing Oracle Net for Data Guard, you can significantly improve the performance and efficiency of your Data Guard configuration. Implementing the recommended settings for TCP socket buffers SDU size, and disabling the TCP Nagle algorithm will ensure smoother and faster network communication between the primary and standby systems, ultimately enhancing your Oracle databases’ overall reliability and availability.

For more information, see:

https://docs.oracle.com/en/database/oracle/oracle-database/19/sbydb/index.html

https://education.oracle.com/oracle-database-19c-data-guard-administration-workshop/courP_86809290

 

 

If Performance Objectives are not being met, Quality of Service Management makes a recommendation. Each recommendation focuses on improving the highest-ranked Performance Class by exceeding its Performance Objective. Submissions may include changing consumer group mappings – and reprioritizing work within existing resource boundaries. For example, changing consumer group mappings may involve promoting a specific workload to get a more significant share of resources, or demoting a competing workload to make additional resources available to the target Performance Class. Another recommendation is to move servers between server pools and reprioritize resources between them to meet workload demands; so effectively, taking a node out of one pool and adding it to another pool gives more resources to the Performance Class running in that pool. And another recommendation is moving CPUs between databases within a server pool – reprioritize CPU resources within existing server pool boundaries. And this is called instance caging, where the CPU count parameter is set to limit the amount of CPUs an instance can use on a node.

The Quality of Service Management recommendations to improve the performance of a particular Performance Class adds more of the bottleneck resource – such as CPU time – for that Performance Class, making the bottleneck resource available more quickly to work requests in the Performance Class. Adding more resources to a Performance Class that is not performing well means taking resources away from another Performance Class. The Performance Class where the resources are removed should be less business-critical than the one being helped. So overall, the reallocation of resources should be beneficial to the business. When generating recommendations, Quality of Service Management evaluates the impact of the system performance as a whole. For example, suppose the improvement for one Performance Class is rather tiny, but the adverse effects on another Performance Class are significant. In that case, Quality of Service Management might report that the performance gain is too small and not recommended. If there is more than one way to resolve the bottleneck, Quality of Service Management advises the best overall recommendation. It is invariable, such as the calculated impact on all the Performance Classes and the predicted disruption and settling time associated with the action. And using Oracle Enterprise Manager, you can view the current and the alternative recommendations. Performance data is sent to Oracle Enterprise Manager for display on the Quality of Service Management Dashboard and Performance History pages. By default, Oracle Database QoS Management does not automatically implement recommendations. Instead, it suggests improving performance, which the administrator must then implement by clicking the Implement button. From version 12.1.0.2, Quality of Service Management allows you to specify authorized automatic actions that it can implement without the intervention of an administrator.

When patching an Oracle home, you might run into problems. OPatch might not behave as expected. It is helpful to know where to look for error messages or find additional information to pass on to Oracle Support if you want to log an SR for your problem. So OPatch maintains logs for apply, rollback, and lsinventory operations. The OPatch log files are located in the $ORACLE_HOME/cfgtoollogs/opatch. Each time you run OPatch, a new log file is created, and each log file is tagged with the operation’s timestamp. OPatch maintains an index of processed commands and log files in the opatch_history.txt file – and that is also in the above-mentioned $ORACLE_HOME/cfgtoollogs/opatch directory. So if you change the directory to $ORACLE_HOME/cfgtoollogs/opatch, you’ll see that every time you run OPatch, a log file is created with a date stamp. And then, at the bottom, you’ll see an opatch_history.txt file. look at the file, you’ll see a record of each time you ran the opatch apply, opatch rollback, or lsinventory command. The DBMS_QOPATCH package provides a PL/SQL or a SQL interface to view the installed database patches.

The package returns the patch and patch metadata information available as part of the “opatch lsinventory -xml” command in real-time. So it’s basically a way to see which patches are applied to your database home but from a PL/SQL or a SQL interface. So you basically get an XML-formatted return of your patch information. The DBMS_QOPATCH package allows users to query what patches are installed from SQL*Plus, write wrapper programs to create reports and do validation checks across multiple environments, and also to check patches installed on cluster nodes from a single location. If you log into SQL*Plus as the sys user and then perform select DBMS_QOPATCH.GET_OPATCH_LSINVENTORY from dual. And you’ll see just lots of XML information – which you can use an XML parser to make sense of it.

You query V$INSTANCE_RECOVERY view and consistently receive an OPTIMAL_LOGFILE_SIZE value that is greater than the size of your smallest online redo log file. The OPTIMAL_LOGFILE_SIZE column of the V$INSTANCE_RECOVERY view can be used to determine the appropriate size for all of the online redo log files in your database. If the value of the OPTIMAL_LOGFILE_SIZE column is greater than the size of your smallest online redo log file, you should change the size of all online redo log files to be at least this value. In addition, the FAST_START_MTTR_TARGET initialization parameter simplifies the configuration of recovery time from instance or system failure. After adjusting the size of your online redo log files, you may be able to adjust the value of this initialization parameter for better performance. This is done by rerunning the MTTR advisor after changing the size of your online redo log file to achieve more optimal results. However, running the MTTR advisor is not the best option in this situation for improving instance recovery performance.