Database and Datascience: août 2019

In our previous article, we have talking how database connectivity was established prior Oracle Database RAC 11g Release 2. Now, we will understand how database connectivity is working using SCAN. But in order to understand this, we have to understand how server pools work.

Traditionally, when you connect into a cluster, you had a dedicated clusters for specific applications. Let's take an example. We have an application called CRM, an application called ERP and an application called DWH. By default you would create a database cluster for each one of them individually. When you size your database, you always size for the peak load and always remember that, the peak load in not something that is constant requirement. A peak load might occur one a week, one a month etc. So when you size your database cluster in terms of its resources, you would typically keep the peak load in mind. The result is that your infrastructure investment is lying idle most of the time.

So in our example, we will have one RAC for CRM, one RAC for ERP and one RAC for DWH:

Under such circumstances, you are not using all the three nodes in each of the clusters totally nine nodes. But you have to invest in Hardware, Software including Licenses, Support which includes the number of CPU or cores you have on these clusters. This is not the best use, So what do we means by the best use ? Are we going to run all these nodes actually 80%, 90% utilization? RARELY. That is number 1 problem.

In case on a particular day, my CRM cluster loses two nodes for whatever reason. You will not be able to share the resources from ERP or DWH cluster, by moving node. So you have the resources lying idle on the other two clusters, but you are not able to use them in the cluster which needs them.

So, in 11g release 2, Oracle introduced a concept of server pool, where Oracle said : Don't create the dedicated clusters, Instead, create a common pool of cluster. It's a single cluster which will be shared by the three applications.

We are going to that, we are using our resources to the best and sharing these resources whenever there is a need. So instead of creating three different clusters of three nodes each, we could have a single cluster with nine nodes.

By creating this kind of large cluster, we now have a flexibility of deciding, who will use how much of it and change it from time to time dynamically. In our example, we are going to dynamically configure server pool parameters. We will create three server pools :

One server pool used by CRM application, calling CRM_POOL
One server pool used by ERP application, calling ERP_POOL
One server pool used by DWH application, calling DWH_POOL

In our example, we have one to one relationship between the server pool and the application. But in the other cases, you could have multiple applications connected to the same server pool.

Given that we are going to create 3 server pools within one cluster, we need to know what are the parameters for the server pool like :

What is the minimum nodes that will be part of the server pool
What is the maximum number of nodes will be part of each server pool
What is the importance or priority to be given
Etc

There are more other parameters that we can look at later. Let's say among these server pools, we give the top priority to our CRM and the importance which defines the priority, the larger the number is, the more important it is so. So let's say, our CRM has an importance of 1000, our ERP which is the next important, has a priority of 700 and our DWH which is the least important amongst these three, is 500. The priority is relative. We are not saying that DWH is not important, but given the context of our ERP and CRM, DWH is less in priority. That is how we are constructed our example. In your organization, it might be different.

So, let's say, we need 2 nodes minimum to our CRM, 2 nodes minimum to our ERP and 3 nodes minimum to our DWH (Because our Datawarehouse, DWH would do a lot of work, we would like to give it more resources as the minimum number of servers. And in maximum of nodes, let's say in our CRM we will need 3 nodes, in ERP 3 nodes and in DWH, 3 nodes.

Now, how it works ?

Suppose that our cluster is down and the first node starts up. The cluster has to decide to which server pool should it belong. This is determined by the importance. Amongst the three servers pools that we have CRM is the one with the maximum importance, so the first node joins the CRM pool.

The second starts up ant it has to identify which server pool it has to join. The fact that the CRM has the importance of 1000 and a minimum nodes of 2, the second node will also join the CRM pool.

The next two nodes, with the similar manner will join the ERP pool and then, the next 3 nodes will join the DWH pool. So, seven nodes have started, we have 2-2-3 is the allocation between the three server pools.

When the eight node starts, it realizes that all the three server pools have met their minimum requirement. So we need to meet the maximum requirement, and it will go to join the most important pool which is CRM, the next node will join the ERP pool.

Now, we have the scenario that something goes wrong and two nodes fall out of the cluster and those two nodes are part of CRM server pool. So CRM server pool now just has one available node.

Since the minimum requirement of CRM pool is 2 nodes and it's the most important pool, ERP which is the next important, need 2 nodes as minimum, one of the node on ERP pool will be dynamically moved into CRM pool.

Now, all the three server pool are met their minimum requirement. In the same manner if one nodes goes down in the ERP pool, then one node moved from the DWH node to ERP pool, etc. But it one node comes up, it will join the DWH pool, to meet the minimum requirement. So you can imagine the other scenarios.

What did we realize on this ? With the same set of nodes, we have been able to share the load across the different server pools and in case of failures we could ensure that the most important server pool will be available and the database and the database that is running there, would be available, and as the Database Administrator, I don't need to do anything.

This is the fundamental benefit of server pools.

Now, the next challenge is that, we don't know where our database will belonged. To do this, when you create a database using the Grid Infrastructure setup, you have to create a database as a policy-managed or as administrator-managed.

Administrator-managed is like running prior Oracle 11g release 2. When you create that kind of database, you would explicitly specify on which nodes in the cluster, will the database be available, meaning in which nodes will instance be started. When you create services, you also specify on which nodes these services will be available.

When you create a policy-managed database, you don't specify on which nodes, it will be available. You will specify on which server pool, it will be available.

We have to also know that, we can have one database which is part of multiple server pools; which is mean, if you create a new database and make it part of CRM as well as ERP pool, then, the instance will be available on both the pools. You also have a parameter called cardinality to specify how many instances to bring up in each server pool. So when, you say 2 instances, then when the server pool is available, even if it has 5 nodes, because the cardinality is 2, it will only start it on 2 nodes.

That is how the server pool is running.

With this fact, what do I do with our applications?

Earlier, prior to 11g release 2, in the application, we listed all the VIPs. For backward compatibility, you could the same abode 11g release 2. But imagine that you have 12 nodes or more in a cluster. Every application that is going to connect must have in its connection string a list of all the 12 VIPs or more. That's a pretty long list. And when the application is trying to connect, it will through one after one amongst all the 12 VIPs to identify where it can connect.

To make it easy, Oracle had introduced the SCAN.

SCAN is the cluster ware-managed resource which has a single host name, the scan name which has three associated IP addresses. More about the SCAN, on the next publication.

Database and Datascience

jeudi 22 août 2019

Concepts Of SCAN Part2

oracle

oracle