Search this Blog

Thursday, June 24, 2010

ACL and XDB Problems

Problem statement:

After upgrading a database from Oracle 11.1.0.6 to 11.1.0.7 an XDB installation was performed to be able to send Emails from the database.
As of Oracle 11g the sending of Emails needs the creation of an Access Control List, to control which database users can access a remote host for relaying ( or sending ) the Email to.

On the Internet there are scripts available to create such an ACL.

We used the script below to first create a procedure that takes care of all the steps involved in creating an ACL:

create or replace procedure mailserver_acl(
aacl varchar2,
acomment varchar2,
aprincipal varchar2,
aisgrant boolean,
aprivilege varchar2,
aserver varchar2,
aport number)
is
begin
begin
DBMS_NETWORK_ACL_ADMIN.DROP_ACL(aacl);
dbms_output.put_line('ACL dropped.....');
exception
when others then
dbms_output.put_line('Error dropping ACL: 'aacl);
dbms_output.put_line(sqlerrm);
end;
begin
DBMS_NETWORK_ACL_ADMIN.CREATE_ACL(aacl,acomment,aprincipal,aisgrant,aprivilege);
dbms_output.put_line('ACL created.....');
exception
when others then
dbms_output.put_line('Error creating ACL: 'aacl);
dbms_output.put_line(sqlerrm);
end;
begin
DBMS_NETWORK_ACL_ADMIN.ASSIGN_ACL(aacl,aserver,aport);
dbms_output.put_line('ACL assigned.....');
exception
when others then
dbms_output.put_line('Error assigning ACL: 'aacl);
dbms_output.put_line(sqlerrm);
end;
commit;
dbms_output.put_line('ACL commited.....');
end;

/


Now we used the following call to this procedure to create an ACL for an user:

begin
mailserver_acl(
'mailserver_acl.xml', -- Name of the ACL list
'ACL used for Email', -- Just some comment about what this list is/does
'SCOTT', -- The user/role to grant this privilige to
TRUE, -- Is the privilege granted (TRUE) or denied (FALSE)
'connect', -- What right granted
'our.mail.server', -- to what server
25); -- on which Port
end;


The first time the result was:
ORA-46105: Unable to load security class DAV::dav

Looking at the database objects, an Invalid package was detected and recompiled.
After recompiling the package and retrying the procedure, the below problem appeared:
ORA-46105: Unable to load security class http://xmlns.oracle.com/plsql:network


Googling a bit around, gave us the result that the problem was in a failed XDB installation. This was true, because there still were some Invalid XDB objects, which couldn't be compiled.


The solution:

The below steps brought us to a solution. It took an S/R with Oracle support to get the steps right, because the Notes aren't that clear:

First make a note of the tablepace used by the XDB user in the database.
Also write down the name of the TEMP tablespace of that database.
These values will be needed in the reinstallation steps.

First deinstall the current XDB installation. Refer to Note 742014.1 if you're using applications like APEX or Oracle Multimedia.

$ sqlplus /nolog
SQL> conn / as sysdba
SQL> spool xdb_reins.log
SQL> shutdown Immediate;
SQL> startup;
SQL> @?/rdbms/admin/catnoqm.sql


This will completely deinstall the XDB installation and also drop the XDB user.
Now shutdown the database again, leave and restart SQLplus.

SQL> shutdown immediate
SQL> quit;

It is very important in this stage to exit and restart SQLplus
Use the statement below to reinstall the XDB packages.

Note that you need to enter the above noted values at the prompt of the SQL script.

In our case we entered 'xdb' for the password of the XDB user, 'sysaux' for the tablespace and 'temp' for the temporary tablespace;

SQL> conn / as sysdba
SQL> spool xdb_reins.log append --append to the existing logfile
SQL> startup;
SQL> @?/rdbms/admin/catqm.sql <xdb> <target tablespace> <temp tablespace>
SQL> spool off
SQL> quit

When these steps finish, the Oracle Note about reinstalling XDB also finishes.

It is thanks to good assistance of Oracle Support ( and no, I don't get money for this statement ) that our reinstallation succeeded.

There is another step to take !!
That is why.

You have to re-register XML components of the EXFSYS schema's.
Note 558834.1 discusses these steps. But I'll also list them here.
I strongly recommend to first read the Note, before executing these steps!!


-- Register XML Schemas for the Rules Manager component (part of CATPROC)

SQL> connect / as sysdba
SQL> alter session set current_schema = EXFSYS;
SQL> @?/rdbms/admin/rulpbs.sql

SQL> commit;


Note 558834.1 also show steps for other schema's but they only need to be executed for Oracle Intermedia.

After all these steps and another restart of the database, our ACL could be created and we were able to start using UTL_MAIL packages.

Enable Flashback for RAC

Enabling Flashback in RAC is just a bit more complex than enabling flashback in a single instance.
First we need to make sure that:

- Our RAC database is running in ARCHIVELOG mode. This can be checked using:

SQL> select log_mode from v$database

LOG_MODE
--------------
ARCHIVELOG

- The database is using some kind of shared storage, preferably ASM.
In the statements below a diskgroup of ASM is used. Remember to use the '+' sign in front of the diskgroup name, like '+FB01'.


Now we need to set the required parameters in the spfile to support the flashback function:

SQL> alter system set db_recovery_file_dest_size=20G scope=spfile sid='*';
SQL> alter system set db_recovery_file_dest='' scope=spfile sid='*';
SQL> alter system set db_flashback_retention_target=


The value for represents a value for which you want to be able to keep data. This value is ofcourse restricted by the available diskspace. The default is 1440 minutes.

Now we need to shutdown all instances of the RAC database. This means the complete cluster will be down during this action. If the database name is RACDB then the statement to shutdown the database looks like:

$ srvctl stop database -d RACDB

Start on one of the nodes an instance in mount mode:

SQL> startup mount;

Then turn flashback on using the command:

SQL> alter database flashback on

Verify the setting with the below SQL:

SQL> select log_mode, flashback_on from v$database

LOG_MODE FLASHBACK_ON
--------- -------------------
ARCHIVELOG YES

Now shutdown the instance again and restart the cluster database:

SQL> shutdown immediate
SQL> quit;

$ srvctl start database -d RACDB

Remember that a shutdown of the database als may have stopped the resources for the transparant application failover. They then need to be restarted also again.

This concludes the enabling of flashback on RAC.


Bug 7334226: RAC instance crash possible with FLASHBACK on

Please look up Note 7334226.8 on the Oracle Support site.
It states that all RAC versions up to 11.1.0.7 suffer from possible instance crashes if flashback is enabled. The issue is fixes in patch set update 11.1.0.7.2. and in the 11.2.0.1 base-release

Wednesday, June 16, 2010

SSH equivalence on RAC

A couple of months ago, we setup our first RAC cluster consisting of two LPAR on an IBM AIX Power6 server. Although it has been months ago, and the cluster has been running a production database now successfully for over a month, I do want to share the 'frustration's we've had with setting up the user-equivalence, needed for the installation.


For the newbie's among us, first of all a short explanation of the phrase "SSH equivalence":

SSH equivalence means that acrosse different servers a user, and especially its credentials and rights, are considered equal to the local server.
With SSH equivalence configured a logged on user can access a remote server without password if and only if his account with the same settings also exist on that remote server.

SSH equivalence is needed during the installation of the RAC software. All parts of the Oracle installation ( CRS, ASM and the Rdbms ) are each installed on the local host and afterwards copied to all remote nodes using this SSH equivalence.


First of all a short description of how to setup SSH equivalence. Of course the Oracle installation manual has a more detailed explanation to follow:

Generate a keypair on each node:
[crsprd@node1 ~]$ cd
[crsprd@node1 ~]$ mkdir .ssh
[crsprd@node1 ~]$ chmod 700 ~/.ssh
[crsprd@node1 ~]$ /usr/bin/ssh-keygen -t rsa

Create a authorized_keys file from the public keyfile:
[crsprd@node1 ~]$ cd .ssh
[crsprd@node1 ~]$ cat id_rsa.pub >> authorized_keys
[crsprd@node1 ~]$ ls

The result will be 3 files.
One of them is the file "authorized_keys"
Copy the authorized_keys file to the other node(s):
First make sure the .ssh directory exists in the home directory of the user on each node.

[crsprd@node1 ~]$ scp authorized_keys node2:/opt/crsprd/ora/home/.ssh/

On the question to accept the RSA keys, enter Yes:
It will add the remote-node to the local file called 'known_hosts'.
If this file does not exist it is created automatically.

[crsprd@node1 ~]$ scp authorized_keys node2:/opt/crsprd/ora/home/.ssh/

The authenticity of host 'node2 (xxx.xxx.0.181) can't be established.
RSA key fingerprint is 7e:60:60:ae:40:40:d1:a6:f7:4e:zz:me:a7:48:ae:f6:7e.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node1,xxx.xxx.0.181' (RSA) to the list of known hosts
authorized_keys 100% 828 7.5MB/s 00:00

Log on to the other node(s) as the same user en enter the .ssh directory.

[crsprd@node1 ~]$ ssh node2
The authenticity of host node2 (xxx.xxx.0.181) can’t be established.
RSA key fingerprint is z3:z3:33:z3:z3:33:zz:76:z3:z3:z3.
Are you sure you want to continue connecting? (yes/no)? yes

Add the local public key to the authorized keys file:
[crsprd@node2 ~]$ cd .ssh
[crsprd@node2 ~]$ cat id_rsa.pub >> authorized_keys

Repeat this action for every node in the the cluster, until you've added the last node.
Then start copying back this file to every node ( In this example there are only two nodes)
The result must/will be that on every host the authorized_keys file is identical

[crsprd@node2 ~]$ scp authorized_keys node1:/opt/crsprd/ora/home/.ssh/

Now you should have the SSH equivalence setup.


We followed these steps from the Oracle installation manual and then the problems started to begin:

The first problem we encountered, was that we were forced to enter the passphrase for the SSH connection every time, with every command.
This resulted in not being able to install any of the software applications.

Only with assistance of an onsite Oracle consultant, and only after he saw me configuring the SSH equivalence the problem was solved:

Don't enter a passphrase!!

Just press <Enter> when prompted.
Now that was a great thing to discover. We were trying to secure the SSH connection by means of this passphrase where Oracle always uses an empty passphrase ( i.e. according to this consultant).

As soon we had discovered this issue, we were able to start the installation of the CRS and it succeeded completely.

Then however problem two appeared. We were using seperate accounts for CRS, ASM and the Rdbms. In the oracle documentation we used, I never found a statement that when using multiple users all these users should have SSH equivalence seperately.

Okay, okay... afterwards I must say it is quite logical that all users should have this ability. But until we found this out, I was always assuming the CRS ( and therefore the crs owner ) would be responsible for the distibution of the installation files across the nodes.


Wednesday, June 9, 2010

Create RAC database results in PRKP-1001 and CRS-0215

During the installation of a two Node RAC cluster we decided to install CRS and ASM on the latest stable Oracle level ( 11.1.0.7) and for reasons of application compatibility installed the database with release 10.2.0.4

All installation actions went fine. CRS, ASM and the database installation had their own owners and installation directories. All environments were physically seperated.
CRS, ASM and the listener were running OK.
However during the creation of the 10.2.0.4 database we ran into troubles.

Using 'dbca' for the database creation the final steps of starting the just created instance, dbca failed with the
PRKP-1001 and CRS-0215 errors, giving no real clue of what was going on.

Thinking it must be a kind of compatibility problem, we tried all kinds of combinations.
Creating the database with the 10.2.0.4 software, but administrating it with the 11.1.0.7 release. Nothing really worked.
We even tried to install the database software with one of the other owners, just to rule out any possibility of conflicts, but that also didn't work.


After a couple of hours digging through the logfiles and banging our heads against the wall, we solved the problem !!

It turned out the the use of a central TNS_ADMIN location, i.e. the location where the listener.ora and tnsnames.ora resides, wasn't correctly passed through to the running resources. Although 'dbca' was aware of this central location, it failed to update the Cluster resources responsible for the database with this information.

It resulted in the resources looking for listener.ora and tnsnames.ora in $OH/network/admin.
Errors ORA-01078 and ORA-00119 appeared in the logs of the database.
The starting of the cluster resources then was solved by copying the listener.ora and tnsnames.ora to the $OH/network/admin directory.

But that was not the initial goal.
We especially created a central location for these files.

Then finally with help of Oracle support we also solved that challenge:
It is possible to tell the resources afterwards what the TNS_ADMIN location is.
This can be done by setting the environment variable TNS_ADMIN to the correct value for every resource involved, as shown below:

$> export TNS_ADMIN=/opt/oracle/network/admin
$> srvctl setenv db -d -t TNS_ADMIN=/opt/oracle/network/admin
$> srvctl setenv inst -d -i -t TNS_ADMIN=/opt/oracle/network/admin
$> srvctl setenv inst -d -i -t TNS_ADMIN=/opt/oracle/network/admin

Executing these statements from one node only is sufficient, as this information is saved into the Oracle Cluster Registry

Tuesday, June 8, 2010

LOCK_SGA and ASMM

With the introduction of AMM or ASMM as of Oracle 10g the option to lock the SGA into memory has become obsolete. Creating a database using dbca, especially with Oracle 11g, will turn on ASMM. 
By default lock_sga is set to FALSE.

If one favors the use of LOCK_SGA, AMM needs to be disabled. 
To do this, both MEMORY_TARGET & MEMORY_MAX_TARGET needs to be removed from the init- or spfile. 
 For databases started with a plain-text init.ora file, edit the file and simply remove the lines with these two variables. 
If either one does not exist in the file, it doesn't matter, just remove the existing ones. 
Now add a value for SGA_TARGET, and optionally minimum values for the different buffers. 
Finally set LOCK_SGA to TRUE. 
Save the file and bounce the database. 

For databases started with a spfile, follow the below instructions. First an example of how settings can look:

SQL> show parameter memory_target 
NAME                                 TYPE        VALUE 
------------------------------------ ----------- ------------------------------ memory_target                        big integer 4544M 

SQL> show parameter memory_max_target 
NAME                                 TYPE        VALUE 
------------------------------------ ----------- ------------------------------ memory_max_target                    big integer 4544M 

SQL> show parameter sga_target 
NAME                                 TYPE        VALUE 
------------------------------------ ----------- ------------------------------ sga_target                           big integer 0 

MEMORY_TARGET is set. MEMORY_MAX_TARGET also. 
This parameter most of time calculates or inherits its value from MEMORY_TARGET. 
SGA_TARGET is set to zero. 

Now execute the following: 

SQL> alter system reset memory_target scope=spfile sid='*'; 

 System altered. 

SQL> alter system reset memory_max_target scope=spfile sid='*'; 
alter system reset memory_max_target scope=spfile sid='*' * 
ERROR at line 1: ORA-32010: cannot find entry to delete in SPFILE 

Most of the times there is no entry for MEMORY_MAX_TARGET in the spfile 

SQL> alter system set sga_target=4544M scope=spfile ; 

System altered. 

SQL> alter system set lock_sga=true scope=spfile; 

System altered. 

Now bounce the database. 
Please note: It might be necessary for RAC systems to repeat the 'reset' statements specifically for each different instance name.

ASM Stripe on Stripe


We have
a two node RAC cluster running on AIX5.3 (TL 10).
It has been the first server running on ASM.
During a performance analysis we have seen that the RAC system with the ASM shows more latency in I/O than it predecessor, which was a DG system on normal AIX file systems.
The ASM is configured with two diskgroups, each existing of multiple disks. The disks are virtual RAW devices, configured through an IBM VIO server and existing on a SAN. The SAN itself stripes its data across multiple disks using a RAID5 solution.
As one knows ASM also uses striping as a method to spread data across the disks of a diskgroup.

In this topic we are going to investigate if the "extra" striping ASM does, has a negative effect on the I/O performance. Before we start the image to the left kind of explains how striping works. Because the disks offered to and used by ASM are virtual, they exist in a striped manner on the real physical disks of the SAN. Any data within the ASM diskgroup is striped also, and is theoretically double striped.

For the test we use a similar system, however not a RAC, configured with the same ASM and RDBMs versions ( 11.1.0.7 ).
On the SAN storage six (6) virtual devices are comfigured. Five (5) disks of 16Gb each forming a striped ASM diskgroup and one (1) disk of 64Gb that forms a diskgroup with just this single disk and therefore no striping. All diskgroups use 'External' redundancy.

On the diskgroup using the 5 disks a database is created, called STRIPEDB.
On the single disk diskgroup a database is created, called NOSTRIPE.
Both databases are identical. Tablespace sizes and memory settings are equal.


First of all: What would we expect the result to be.
a) The striped diskgroup is faster
b) The non-striped diskgroup is faster
c) The speed of both diskgroups is the same

Before I started the tests my bet was on b) or c).
That is because the extra overhead should cause extra wait-time (b) or the SAN cache would overcome everything and there was no noteable difference in I/O througput (c).

The test itself consists of the following steps:
1) Create a 10Gb tablespace
2) Create a 20Gb tablespace
3) Create a 40Gb tablespace with two datafiles of 20Gb each
4) Import a medium database (18Gb dumpfile) dump into the empty databases

Of all the steps either the SQL-timings and/or load timewill be measured.


The results:
At first hand the results where somehow dissapointing to me. The creation of both the 10Gb and 20Gb tablespaces was seconds faster on the striped diskgroup. The difference was not large:

10Gb tablespace: 1:09 for the striped diskgroup and 1:11 for the non striped diskgroup
20Gb tablespace: 2:12 for the striped diskgroup and 2:13 for the non striped diskgroup

The creation of the 40Gb diskgroup however gave better results. On the striped diskgroup this took 4:41 minutes. On the non-striped diskgroup it only took 4:17 minutes, so almost half a minute faster !!

Currently the large tests loading the dumpfiles are still running.


The first one, loading a 18Gb dump into the striped database took almost 9 hours to complete. To be precise, it took 8:56 hrs.

The second import totally surprised me. It took an astonishing 14 hours to complete.

I'm still wondering if the cause should be somewhere else. My feeling about the extra overhead that stripe on stripe has, should have made I/O performance worse.
But it doesn't !!
It doubles the I/O performance, which will ofcourse be dependent on the number of disks in the striped diskgroup.


Reason for this difference: The UNIX/SAN guys guessed for either the SAN cache, and/or the stripesize and layout must be a very good match. But anyway writing in parallel to 5 disks must be faster anyway than writing to a single disk, allthough all I/O is virtual.