Switch to standard view 
  Sybase logo
 
 
 



Load Balancing and Failover Using EAServer
 

Introduction

As computing moves forward, applications must become increasingly more dependable and more available to the user community. This need becomes paramount as the user community broadens in scope and increases in distribution?a common occurrence as companies expand their work forces and customer bases. Within this environment, a distributed application should be at least as available as a client/server application?preferably much more.

To ensure the design of highly available solutions for developers, Sybase EAServer includes load balancing and high availability features. This paper discusses these features. Understanding them and their appropriate uses will help you to design and build robust, responsive, and highly available applications. 

Overview

EAServer implements both load balancing and high availability through clustering. A cluster is a logical grouping of application servers that work together as a single logical server to manage client object requests among themselves. 

Clustering facilitates load balancing, a process that routes client requests to arbitrary servers within the cluster to process the work. Load balancing ensures the even distribution of client work throughout the cluster, which allows you to scale your application across multiple machines. If you've got enough servers in the cluster, load balancing can literally allow you to scale your application server code infinitely. 

A clustered architecture also allows you to build a highly available application. The availability of multiple application servers within the cluster allows your application to continue to service client requests even after the physical failure of one or even many of those servers.

To illustrate this discussion, we will use the example of a fictional bank that will build a distributed application with EAServer. The bank will build a cluster of 50 servers, one server in each state in the United States. The bank has been mandated to build the application to not only handle the existing client load, but also to dynamically scale as new users come online. The application, which deals with financial transactions, must be available to the customer base 24 hours a day, with no interruption. 

Document Outline

  1. Introduction/Overview
  2. Load Balancing
  3. Failover/High Availability
  4. Maintenance of a Cluster
  5. Conclusion
     

Load Balancing

To support both the bank's current user load as well as the anticipated increased user load in the months to come, the bank's IT shop has decided to implement EAServer clustering. 

Basic Installation

The bank has branches in all 50 states. The first step is to install EAServer at each site in each state. Each site server should be configured to have its network listeners listening to the server's own hostname. This is easy set in the listener's tab in Jaguar Manager for that server. Additionally, each server participating in the cluster should have the exact same password for the jagadmin user. To implement Operating System authentication, each server within the cluster should also have an identical set of users in the underlying operating system.

How Load Balancing Works under the Covers

Naming services drive load balancing in EAServer. The naming services map a component's name to its physical location. These services are implicitly used whenever a client accomplishes a lookup operation to get a stub reference to a server object. They are like a phone book for component instances, allowing you to find a component instance by looking up its name. 

The example below uses Java to demonstrate the use of these naming services:

//Assume session instance is already created
ObjectRef obj = session.lookup("finance/financeLogin");
Factory factory = FactoryHelper.narrow(obj);
In this example, a factory is returned from the Session's lookup operation. The lookup operation asks the naming services where a physical instance of finance/financeLogin exists on the network. The factory can create instances of the needed object, just as factories build things people need in the real world. 

The naming services are aware of each server in the cluster's installed components. This happens through a process known as binding. Binding is the act of registering your components to the name server. The name service itself is simply a service component running inside of each EAServer. Any server with the name service enabled is known as a name server.

In our bank example, we have 50 servers ready to be in the cluster. To have load balancing, at least one name server among the 50 must be able to act as the phone book. Any EAServer can act as a name server. In our example, we will choose the server in New York State to be the name server. As every other server (each known as a "member server") in the cluster boots, it will connect to the server in New York to "bind" its components to the name server. The process of binding informs the name server which components are currently available in any given server. 

As each server binds to the name server, the name server becomes aware of everywhere a client could access a given object. 

As a result, the name server becomes a logical dispatcher, directing clients to servers that are capable of providing an object instance. When the client executes a lookup operation on the session object, the name server will return a list of every bound server that could provide an instance of that object. 

IORs and ORBs 

The name server returns this list as an array of server profiles within an IOR. An IOR is a string that contains information that uniquely identifies a component instance. CORBA Orbs, for example, use the IOR to communicate with the proper object instance. The name server automatically generates IORs in a process that is transparent to both the developer and the user. 

One of the values within the IOR is a server profile. The profile contains all the information needed to connect to a server. The name server simply returns an array of profiles in the IOR. That array of possible servers is provided in a random order to the client. When the client creates an instance of the stub, the client ORB will choose servers from the array in the given order. So, for example, when the client executes . . . 

ObjectRef obj = factory.create();
financeLogin = financeLoginHelper.narrow(obj);
. . . the call to factory.create() causes the ORB to retrieve the first server in the array of profiles. The ORB then creates the object instance on that particular server. Because the provided list was randomly generated and each subsequent call to session.lookup() by another client will retrieve another list of servers in a random order, load balancing is achieved. 

As the ORB opens sockets to servers to facilitate the creation of object instances, those sockets are cached in the ORB itself. When a new method invocation occurs, the ORB will first check the socket cache. If no socket is available in the cache, the ORB will find the next server in the random list and will open a new socket to this server. In this way, once a client connects to a server, it will favor that particular server-until it needs to execute two concurrent method invocations. This will probably occur in multithreaded client applications that are making multiple simultaneous method invocations.

We can configure how long the ORB keeps cached sockets available by setting the following property:

Properties props = new Properties();
props.put("com.sybase.CORBA.socketReuseLimit","50");
Using the code above, the ORB will keep any cached socket in its cache until 50 method invocations have occurred. After this limit, the socket will be disposed of. This property allows the client developer to trade off the performance boost of socket caching with the increased load balancing gained by moving through the array of valid server profiles.

Implicit and Partitioned Load Balancing

EAServer supports two types of load balancing: implicit and partitioned. In the case of our bank, each client that tries to look up a component instance and narrow it back to a stub reference will receive a list of all 50 servers in the cluster that have that component available. By default, if every server in the cluster has the exact same list of installed components, it will receive implicit load balancing as the name server returns all 50 servers in the profile array. 

However, we could also choose to install only some of the components on some of the servers and others onto other servers in the cluster. For example, imagine that some of our servers have very fast CPUs, and others are physically located near the database. We might only install CPU-intensive components on the fast CPU machines and database-intensive components on those machines close to the database server. In this way we achieve partitioned load balancing. During binding, the name server is informed as to which components each server has. It will only return servers in the profile array that actually contain the requested component. When we physically partition the components among given servers, we can control more closely how the load balancing is distributed.

Creating the Cluster

To configure load balancing, a cluster must be set up. We have already installed the member servers at all 50 sites. The next step is to connect to the one server that we will consider our primary or master server. This primary server will contain the master configuration for our cluster. For our example, we'll choose the New York server to be the primary server. Once we are connected to the New York server, our first step is to be sure that this server is also configured to be a name server. We can set this on the server's property sheet on the "Naming Service" tab. We select "Enable as Name Server" as shown in Figure 1. 

Figure 1: Setting up the name server's property sheet on the "Naming Service" tab.

Once this is set up, we now can configure the cluster. We must be sure that all 50 servers are up and can be connected to. We'll select the "Clusters" folder and right click to create a new cluster. For our bank example, we'll name the new cluster BankCluster and select OK. This will bring up the property sheet for our new cluster. This property sheet can be used to configure all the member servers in the cluster, the name servers, and any other property of the cluster itself. 

Each server, including the name server should be entered into the list on the "Servers" tab. Only the name server should be listed in the "Name Servers" tab. 

Once the cluster is completed, the primary server now has all of the configuration values for the cluster. The next step is to configure the remaining 49 servers. As you might have guessed, this is an enormous job. That's why EAServer has implemented cluster synchronization.

Cluster Synchronization
Because a cluster is a group of servers acting as a single unit, it's important to keep the applications running on this unit in sync. Cluster synchronization ensures that all servers within the cluster have at least the same components, configurations, and security information. It's an automated process that copies the primary server's configuration to every member of the cluster. 

When setting up a cluster for the first time, we use synchronization to copy the cluster settings made on the primary server to every other member of a cluster. First we set up the cluster on the primary. To synchronize the whole cluster, we right click on the cluster and choose to synchronize. 

Figure 2: Synchronization property sheet.

Synchronization Settings

The synchronization property sheet has several configurable settings (see Figure 2). The first and most obvious is the user name and password for the jagadmin user on the cluster. As mentioned earlier, for clustering, the jagadmin password must be identical on every machine in the cluster. Once the user name is selected, the appropriate cluster name is chosen from the drop down list box.

Also in the "targets" area is a text field marked as "Servers." This field can be used to synchronize with any server, even one not a member of a cluster. This feature is very useful for distributing a software release between servers-between development and test servers for example. 

The last section consists of the "options" settings, where individual options can be set. The options are as follows:

  1. All Cluster Files
    By default, only a subset of configuration files is distributed during a cluster synchronization. Selecting this option distributes extra information to the cluster, including the security databases that contain the digital IDs, etc.
  2. All Package Files
    This option forces all packages and components installed with a server to be synchronized. It is enabled by default. 
  3. All Servlet Files
    This setting is enabled by default and forces all servlet classes installed in a server to be synchronized.
  4. All App Files
    This option synchronizes all applications, including the packages, Web applications, and the other resources contained in them.
  5. All WebApp Files
    This option synchronizes all Web applications, including the Servlets, JSPs, and other files contained in them.
  6. All Connector Files
    This option synchronizes all Connector files.
  7. Verbose
    When enabled, this setting prints verbose information to the server's log file, including the percentage complete.
  8. Restart
    This option causes each of the servers to restart when synchronization completes.
  9. New Primary
    You may only start synchronization from the cluster's primary server, and there can be only one primary server in any cluster. If you find it necessary to create a new primary server, enable this check box and start synchronization from the new primary server. This check box also prevents an accidental synchronization start from any server that is not the primary server. Synchronization can only start from another server by using this option to override this safeguard. 
  10. New Version
    For synchronization to distribute a release to every server in a situation like our bank, which has many servers in the cluster, every server must be available at the time synchronization occurs. If any one server is down, that server would be using old configuration values and component implementations. To address this issue, the server's repository can be given a version number during synchronization. This version number allows the servers in the cluster to verify that they are actually running on the correct version of the repository. Version numbers should be used anytime there's a risk that one server in the cluster may not be synchronized.
  11. Refresh
    This option causes a refresh on each of the servers after synchronization completes.

Completing the Synchronization

Once we have checked the configuration values checked, we click OK to begin the synchronization. When synchronization is complete, a dialog box indicates any errors or warning conditions that may have occurred. 

Note: If you haven't set a password for the jagadmin user, the dialog box will warn you of this during synchronization. If this is not a risk to your system, you can ignore this message.

When synchronization is complete, each server in the cluster should be rebooted. Clustering information is distributed to each server, and each is enrolled in the cluster. As each server reboots, it will pause to connect to the primary server to verify the repository version number. The primary server must be up to enable other servers to boot. 

This boot can be disabled, however by checking by setting the cluster property com.sybase.jaguar.cluster.startup=disable_check in the "All Properties" tab of the Cluster's property sheet. But whenever a configuration is changed on the primary server, the cluster should be synchronized.

Developing for Load Balancing

Load balancing is handled automatically by the Orbs and requires only that the client application use the naming services. This is most easily handled using the SessionManager interfaces. The first step is to utilize the SessionManager interface to build a SessionManager::Session with the server. A session represents an authenticated session with the server and allows client applications to find component instances within the server.

String ior = "iiop://newyork:9000";
ObjectRef obj = orb.string_to_object(ior);
Manager manager = ManagerHelper.narrow(obj);

In this case, we create a string to hold the human readable IOR for the name server. Because the name server will provide the location of the components, the only physical address needed is that of the name server. Once a manager object is created, we can use it to build a session for a specific user and password.

In the code below, the session object is used to look up instances of a components based on that component's package and name.

Session session = manager.createSession(user,pass);
The session object returns an object reference for a Factory object. The Factory object contains the full list of servers that can provide the given object in the random order as provided by the naming service. 

When an instance is needed, a call to the create() operation creates a stub reference for the client to use:

obj = session.lookup("Package/Component");
Factory factory = FactoryHelper.narrow(obj);
obj = Compofactory.create();
Component comp = ComponentHelper.narrow(obj);

The client-and even the developer who created the call-is completely unaware of on which server the instance is running within the cluster. The list of servers is seeded at runtime by the name server and chosen when the object is created.
 

Back to document outline.

Failover / High Availability

One goal of computing is to make applications more reliable and more transparent to users as we move forward. In a distributed architecture, however, there are many tiers and as a result many more single points of failure. At the same time, the user base is much larger and broader than that of a client/server application, so keeping that system highly available is imperative. Sybase EAServer provides this level of dependability via its support for failover and high availability.

High availability in EAServer is provided for using clustering and the naming services, much like load balancing. To provide a highly available application, the failure of any server process should be transparent not only to the user, but also to the developer working on the application. The cluster should be able to handle a failure of any server before, during, or after a single method invocation. The act of failover should be nearly imperceptible to the client working with the application. 

Because naming services know every server in the cluster that has an available instance of any given object, the naming services have enough information to failover to any other server within the cluster. The limiting conditions are that 1) we must have at least one name server running on which to do lookups and 2) an instance of the object must be available.

Basic Installation

In an EAServer cluster on any number of servers-say N servers, you must have between two and N servers to have high availability. Two name servers at a minimum are needed in case any one name server fails because clients still need a name server to do lookup operations. The more servers within the cluster that are configured as name servers, the less points of failure within you're the cluster. 

The tradeoff comes in performance when booting the server. Because each member server must bind to its name server, as name servers are added to the cluster, each member server has more name servers it must now bind to. This binding can affect performance when booting, but the advantages of reducing failure points well outweigh the performance implications except in very large clusters. Figure 3 displays setting up name servers in the cluster property sheet.

Figure 3: Setting up name servers in the cluster property sheet.

In the case of our bank, we make 50 percent of the servers name servers in addition to our primary server. With this configuration, we still provide failover until all 25 name servers fail. 

We accomplish this configuration in the same fashion that we added the New York server in the load-balancing demo. 

  • First we log in to each of the servers that we will designate as name servers and enable them on the server's property sheet. 
  • Second, on the primary server we go to the Cluster's property sheet. Selecting the Name Servers tab, we add each server's URL to the list. 
  • Finally, as before, we synchronize the cluster to distribute the changes. 
Note: The act of binding is controlled by a bind password. This is done to prevent a malicious application from trying to bind its own components into your cluster, thus introducing a Trojan Horse. The bind password is generated on the primary server and distributed across the cluster. Except in certain specific cases, its best to not change this generated password.

Automatic failover is not enabled by default for a component. There may be cases where automatic failover is not desirable. It is the responsibility of the server's manager to enable this option, and it must be enabled on a component-by-component basis.

For the bank, we make the Teller component use automatic failover. Opening the property sheet for this component allows us to enable this. We enable the "Automatic Transaction Demarcation" checkbox to allow the enabling of the "Automatic Failover" checkbox. Once both have been enabled, we can simply close the property sheet and refresh the component.

How Stateful Failover Works

We will not cover this topic here, since it is not used in our example. For details and guidelines on using failover for stateful components, see chapter 25 "Managing Persistent Component State" in the EAServer Programmer's Guide.

 

How Stateless Failover Works Under The Covers

When a client executes a lookup operation on a component instance, the name server returns an IOR to the client with an array of profiles. Each index in the array contains a server with an instance of that installed on the server and available. The list is returned in a randomized fashion, and the client ORB connects to these servers in the order provided. 

When the client ORB is thrown an org.omg.CORBA.COMM_FAILURE, the ORB then checks the component's properties to see if it is enabled for automatic failover. If it is not, the COMM_FAILURE is ducked and thrown to the client application. However, if the automatic failover is enabled, the ORB will consume the Exception. It will then flush the socket cache contained within the ORB itself, retrieve the next server within the profile array, and re-execute the failed method call. 

For stateless components, the method is simply re-executed on another server in the cluster. It is possible that, in its profile array, a client could be provided with a server that is no longer running. In this case, when the client attempts to connect to that server, the latency before failover is determined by two ORB properties:

com.sybase.CORBA.RetryCount
com.sybase.CORBA.RetryDelay
RetryCount specifies how many times the ORB will retry an initial connection, and RetryDelay specifies how long the ORB waits between tries. Together they define the latency that will occur when an initial connection fails. 

Figure 4: Enabling heartbeat on the server's property sheet.

To reduce the chances that the name server will return a profile for a server that is currently not available, we can enable server heartbeats (see Figure 4). The heartbeat is a "ping" that is sent from one server to another to be sure that the server is still accepting requests. When the heartbeat is enabled, the name server will ping all servers bound to it to be sure they are still available. If any server is unavailable, the name server will unbind its components. This process reduces the chances that a name server may provide a client with a server profile for a downed server. 

Developing for High Availability

One of the great advantages to employing a high-availability approach for EAServer is that the task of building a highly available component is almost no different from that of building any other component. Handling the failover is accomplished transparently by the Orbs The client responsibility is to provide the client ORB with a list of every possible name server in a semicolon separated list. For example: 

String IOR = "iiop://newyork:9000;maryland:9000;nevada:9000; ... ... ."
The list of name servers will be used by the client ORB if the current name server becomes unavailable. The ORB then connects to another name server to execute lookup operations. 

Building Server-Side Transactional Components

There are some considerations to keep in mind when building server-side transactional components that will take advantage of automatic failover. One consideration relates to components which are transactional, and insert rows, for instance. A situation can arise during failover in which an action occurs twice. For example, at our bank, a client executes a method on a component named Account. The insert() method inserts a row into the database. A client calls this method, the insert() method inserts the row, and commits the transaction. As EAServer returns control from the method, EAServer is then shut down. Because the client didn't get a return status and its socket closes, it will failover to a second server and inserts the same record again. 

To prevent this situation, we develop the component to work in a "handshake" design pattern. 

  • Rather than directly inserting a row, the Account component has a new method called getAccountNumber() that returns the next available account number. 
  • The client calls insert(account) and passes this account number back in. 
  • The insert then inserts the row with that account number as the primary key and commits the transaction. 
  • EAServer is shut down before returning control back to the client application. 
  • The ORB therefore re-executes the insert(account) method. 
  • This time, the duplicate insert will fail, with a primary key violation in the RDBMS. 
    The component will simply consume the primary key error and return a good status to the client because in the end, the proper row did get inserted.

Back to document outline.

Maintenance of a Cluster

Servers within a cluster require very little maintenance. That maintenance primarily includes:
  • Pushing new software releases across the cluster. 
  • Adding and removing members from a cluster at any time.
When a new software release is ready, it should be installed in the primary server. Synchronization then will push this release across the cluster. It's important to note that synchronization is not store and forward. All servers in the cluster must be online and accepting requests. 

To reduce the possibility that missed synchronization could throw one server in the cluster out of sync, the option "New Version" should always be checked during synchronization of new software releases. This option causes the repositories of all servers in the cluster to be increased. When servers boot and join the cluster, they check their own version against the primary server. If a member server finds that its version does not match the repository version, it places itself in "Admin Mode." A server in admin mode can only be connected to from Jaguar Manager and the jagadmin user. This action prevents clients from connecting to, and using objects on, a server that should be considered off-line. 

To bring a server in admin mode back online, we can either synchronize to that server from the primary server, or bring it online manually. When synchronization occurs, the new version of the repository is pushed out to the target server, and the server is taken out of admin mode and made available to clients. Bringing the server online manually is easily accomplished through Jaguar manager. We simply right click the server's icon and choose "Set Ready."

As mentioned in the previous section, it is possible to execute partitioned load balancing-when certain components are placed on specific servers to distribute workload. We might see a case in our application in which we want to move components among servers at runtime. Because the member servers bind to their name servers at boot time, it's not necessary to inform the name server of the changes. The name server is updated when the member servers rebind to it. To do this, we right click the cluster in Jaguar Manager and choose to rebind. This will cause every member server to rebind to its name servers and update the naming information.
 

Back to document outline.

Conclusion

Sybase EAServer provides developers with a highly scalable and available application server solution. This solution is easy to administer and maintain, and it reduces the amount of code changes required by application developers. The application server executes high-availability solutions seamlessly with no single point of failure, which allows development shops to seize the advantages of distributed computing without risking down time for a distributed user base. As we move into the future of distributed application development, keeping these applications available to the user community will no longer be a future, but a reality. EAServer helps makes this a reality today.

 

 

 



Back to Top
© Copyright 2010, Sybase Inc.