TRANSPARENT APPLICATION FAILOVER FOR RAC

RAC provides near-continuous availability by hiding failures from end-user clients and application server clients. This provides continuous, uninterrupted data access. Transparent Application Failover (TAF) is what applications use to sync up with Oracle RAC availability. TAF in the database reroutes application clients to an available database node in the cluster when the connected node fails. Application clients do not see error messages describing loss of service.

if the users connection to Node 1 dies, their transaction is rolled back but they can continue work without having to manually reconnect. To get a good understanding of how the TAF architecture works, it is helpful to walk through a failover scenario where a user is querying the database to retrieve 1000 rows from the database. Assume that the user is connected to Node 1/ Instance 1.

1. The heartbeat mechanism between the various nodes in the cluster checks to see if the other node is available and is participating in the cluster configuration.

2. Let’s assume that a user on his/her terminal performs a query against the database to retrieve 1000 rows from the database via Instance 1.

3. The initial 500-rows are retrieved from the database via Instance 1 and returned to the user for browsing through his/her interface.

4. While the user is browsing through the first 500 rows through his browser, Node 1 fails and does not participate in the clustered configuration.

5. Node 2 checks for the heartbeat of the other participating nodes deducts that Node 1 is not available.

6. While the user is unaware of the failure and scrolls through the remaining 500 rows on the window. The process try’s to connect to Instance 1 but detects that the Instance is not available.

7. Using the entries in the tnsnames.ora file the user connection to the other available node is established.

8. User is now connected to Node 2.

9. Oracle re-executes the query using the connection on Instance 2 and displays the remaining rows to the user. If the data was available in the buffer cache, the rows are returned back to the user instantaneously. However, if the rows are not available, Oracle has to perform an I/O operation. This would be delayed until the recovery process has completed.

when Node 1 fails, any SELECT statements that had partially executed on Instance 1 are migrated as part of the failover process and are displayed through Instance 2 when the user process fails over to Node 2. All this happens transparently without any interruption to the user. It should be noted that along with the ‘SELECT’ statement, the following are also failed over:

Client/server connection.

User session state.

Prepared statements.

Active cursors that have begun to return results to the user.

The benefits that such a feature adds in meeting the high availability requirements in today’s machine critical applications are overwhelming and the first question that arises is this, why did Oracle not introduce such a feature before or why is this feature not available among other databases? Though the mechanism is very useful in meeting today’s high availability requirements, implementing such a feature is complex, basically because the database connections are not stateless. This means that during the moment of failure, the database, the users and the transactions are in a specific state of operation, such as:

Retrieving data from the database

Database has a connection via Oracle Net to the Instance on Node 1

User connecting to the database has password and other user-authentication information

The session has language and character set information that is specific to the Instance on which the user has established connection

There are cursors open and in execution

'SELECT’ cursors are open and partially scrolled by the user

INSERT, UPDATE, and DELETE statements and PL/SQL procedures are being executed

It should be noted from the scenarios above that only SELECT statements are failed over from one node to another; transactional statements are not failed over by configuration of TAF. However, transactional statements can programmatically be transferred from Node 1 to Node 2 by proper validation of Oracle returned error messages and taking appropriate actions. This should be a preferred method to avoid any user interruptions as well as keeping the database or system failures transparent to the user. Among the transactional statements, the following do not failover or are not protected when a node fails:

PL/SQL server-side package variables

Global temporary tables

Effect of any ALTER SESSION statement

Applications not using the JDBC thick driver

Transactional statements, ie Statements that include INSERT, UPDATE and DELETE operations

While the failover is in process, it is user friendly to inform the user via the application interface that the activity or command issued may take some time. This information could be communicated by validating the various error messages returned by Oracle as part of the node and connection failure. Some of the common Oracle error codes that should be handled by the application to track and transfer transactional statements include:

• ORA-01012: not logged on to Oracle

• ORA-01033: Oracle initialization or shutdown in progress

• ORA-01034: ORACLE not available

• ORA-01089: immediate shutdown in progress – no operations are permitted

• ORA-03113: end-of-file on communication channel

• ORA-03114: not connected to ORACLE

• ORA-12203: TNS: unable to connect to destination

• ORA-12500: TNS: listener failed to start a dedicated server process

• ORA-12571: TNS: packet writer failure

Transparent Application Failover Configuration :

Using Oracle RAC and TAF Together

The continuous availability features of Oracle RAC and TAF come together when these products cooperate in restarting failed transactions. Let's take a closer look at how this works.
Within each connected Oracle Net client, tnsnames.ora file parameters define the failover types and methods for that client. The parameters direct Oracle RAC and TAF on how to restart any transactions that may be in-flight during a hardware failure on the node.
It is important to note that TAF failover control is external to the Oracle RAC cluster, and each Oracle Net client may have unique failover types and methods, depending on processing requirements. The following is a client tnsnames.ora file entry for a node, including its current TAF failover parameters:

bubba.world =
  (DESCRIPTION_LIST =
    (FAILOVER = true)
    (LOAD_BALANCE = true)
    (DESCRIPTION =
    (ADDRESS =
      (PROTOCOL = TCP)
      (HOST = redneck)(PORT = 1521))
      (CONNECT_DATA =
        (SERVICE_NAME = bubba)
        (SERVER = dedicated)
        (FAILOVER_MODE = 
           (BACKUP=cletus)
           (TYPE=select)           
           (METHOD=preconnect)
           (RETRIES=20)
           (DELAY=3)
        )
      )
    )

The failover_mode section of the tnsnames.ora file lists the parameters and their values:

BACKUP=cletus. This names the backup node that will take over failed connections when a node crashes. In this example, the primary server is bubba, and TAF will reconnect failed transactions to the clients instance in case of server failure.

TYPE=select. This tells TAF to restart all in-flight transactions from the beginning of the transaction (and not to track cursor states within each transaction).

METHOD=preconnect. This directs TAF to create two connections at transaction startup time: one to the primary bubba database and a backup connection to the clients database. In case of instance failure, the clients database will be ready to resume the failed transaction.

RETRIES=20. This directs TAF to retry a failover connection up to 20 times.

DELAY=3. This tells TAF to wait three seconds between connection retries.

Remember, you must set these TAF parameters in every tnsnames.ora file on every Oracle Net client that needs transparent failover.
Putting It All Together
An Oracle Net client can be a single PC or a huge application server. In the architectures of giant Oracle RAC systems, each application server has a customized tnsnames.ora file that governs the failover method for all connections that are routed to that application server.

Watching TAF in Action

The transparency of TAF operation is a tremendous advantage to application users, but DBAs need to quickly see what has happened and where failover traffic is going, and they need to be able to get the status of failover transactions. To provide this capability, the Oracle data dictionary has several new columns in the V$SESSION view that give the current status of failover transactions.
The following query calls the new FAILOVER_TYPE, FAILOVER_METHOD, and FAILED_OVER columns of the V$SESSION view. Be sure to note that the query is restricted to nonsystem sessions, because Oracle data definition language (DDL) and data manipulation language (DML) are not recoverable with TAF.

select
   username, 
   sid, 
   serial#, 
   failover_type, 
   failover_method, 
   failed_over
from
   v$session
where
   username not in ('SYS','SYSTEM',
'PERFSTAT')
and
   failed_over = 'YES';

You can run this script against the backup node after an instance failure to see those transactions that have been reconnected with TAF. Remember, TAF will quickly redirect transactions, so you'll only see entries for a short period of time immediately after the failover. A backup node can have a variety of concurrent failover transactions, because the tnsnames.ora file on each Oracle Net client specifies the backup node, the failover type, and the failover method.

Transparent Application Failover (TAF) Configuration For Standby database (Data Guard) :

The following configurations require that the client reconnect to the database server when the primary database fails over to the standby database:

Local TNS configuration
Oracle Names server configuration
DNS configuration

However, if you are using an Oracle Call Interface (OCI) client, you can use transparent application failover (TAF). TAF is the ability of applications to automatically reconnect to the database if the connection fails. If the client application is not involved in a database transaction, then users may not notice the failure of the primary database server.

The following example shows address information for the ProductDB database and the Standby1 database:

ProductDB=( DESCRIPTION=
            (ADDRESS=(PROTOCOL=tcp)(PORT=1521)(HOST=PRMYHOST.foo.com))
            (CONNECT_DATA=(SID=db1)(FAILOVER_MODE=(BACKUP=Standby1)
                                                  (TYPE=session)
                                                  (METHOD=basic)))
          )
Standby1 =( DESCRIPTION=
            (ADDRESS=(PROTOCOL=tcp)(PORT=1521)(HOST=STBYHOST.foo.com))
            (CONNECT_DATA=(SID=db1))
          )

Sequence of events:

Client application is connected to the ProductDB database.
The primary database in PRMYHOST.foo.com fails.
The standby database is activated as the new production database.
When the client application fails to connect to PRMYHOST.foo.com, it uses the database specified with the BACKUP parameter in the FAILOVER_MODE clause, and automatically connects to STBYHOST.foo.com.

Asif's Oracle Repository

Search This Blog

TRANSPARENT APPLICATION FAILOVER FOR RAC

Transparent Application Failover Configuration :

Using Oracle RAC and TAF Together

Watching TAF in Action

Transparent Application Failover (TAF) Configuration For Standby database (Data Guard) :

Labels

Comments

Post a Comment

Popular posts from this blog

19c ORACLE HOME Cloning -Linux/Solaris

EBS Standby Role Tranistion using standby database and standby application Tier

EBS R12.2.4 AutoConfig could not successfully execute the following scripts followed by error "txkGenADOPWrapper.pl INSTE8_APPLY 1"