Oracle Memory Leak Bug in IMPAC II Production System


The Version 8.1.6.3.1 Oracle Enterprise Edition RDBMS Engine contains a bug known as a "memory leak." Under heavy load, the Oracle RDBMS Engine fails to clear loaded, no longer used SQL Statements from the Oracle Memory Area known as the shared pool. This condition manifests itself in an ORA-4031 error that indicates a condition whereby Oracle is unable to load new SQL statements into memory. New sessions requiring attention from Oracle stack up, causing a general slowdown in performance. The performance degenerates to the degree that database shutdown and restart become unavoidable. The heavy user load and link account activities, along with the shared pool memory fragmentation, are two problems feeding on each other and compounding performance degradation.

To mitigate the problem, we have implemented every Oracle recommendation, but the problem has persisted. The DBA team has taken care of the fragmentation problem by writing a script to flush the shared pool area at regular intervals (five minutes) if it falls below a threshold level of 20%. When the shared pool is flushed, it deprives the system of often used, cached queries. This is in itself a performance-degrading factor. The more often it is flushed and the more often cached queries are inaccessible, the more performance suffers. The solution has worked for some weeks while we have been closely monitoring it. This does not eliminate the shared pool fragmentation but is meant as an interim solution until we upgrade to Oracle Version 8.1.7.1. Oracle told us that a complete solution to this problem is an upgrade to the new version.

However, we cannot upgrade to the new version at this time, for it requires significant planning for downtime and regression testing. We need to upgrade both machines and therefore, the databases will be exported and re-imported. When the link accounts and long ad hoc queries are issued to the database at peak hours, they put a large demand on the SGA. Shared pool memory fragmentation is accelerated, and the flushing mechanism is activated at an increasingly fast pace. When we confront a problem, we can identify a number of solutions, but can only implement them one at a time, for we need to observe the results and trace them back to the source. Implementation of a series of solutions at the same time prevents us from following the chain of causality. Presently, we are taking a parallel approach in which the DBAs are participating with SAs and QAs to test for the migration to Oracle Version 8.1.7.1. They have created an 8.1.7.1 version of IMPPRD on imapcg.

We are investigating the following solutions:

  1. We have examined memory availability and have concluded that we can increase the shared pool another 300 MB. We have changed the Oracle parameter based on this decision. The SAs will monitor the paging and swapping activities, and if adverse conditions are detected, we will release the appropriate memory.


  2. We will contact the Oracle developers and investigate the possibility of backporting the fix in the new Oracle version to the current version.


  3. The DBAs are asking User Support to consider restricting the Link Account activities to off-peak hours.


  4. A script was written and put in place to monitor and collect information about the SQL statements and hopefully identify and isolate the performance-degrading and unoptimized SQL statements that contribute to the shared pool memory fragmentation.


  5. Using old versions of ODBC may contribute to the fragmentation of the shared pool memory. A compatibility matrix between the middle-tier applications and database versions is being developed.