Introduction
As computing moves forward, applications must become increasingly more dependable
and more available to the user community. This need becomes paramount as
the user community broadens in scope and increases in distribution?a common
occurrence as companies expand their work forces and customer bases. Within
this environment, a distributed application should be at least as available
as a client/server application?preferably much more.
To ensure the design of highly available solutions for developers, Sybase
EAServer includes load balancing and high availability features. This
paper discusses these features. Understanding them and their appropriate
uses will help you to design and build robust, responsive, and highly
available applications.
Overview
EAServer implements both load balancing and high availability through clustering.
A cluster is a logical grouping of application servers that work together
as a single logical server to manage client object requests among themselves.
Clustering facilitates load balancing, a process that routes client requests
to arbitrary servers within the cluster to process the work. Load balancing
ensures the even distribution of client work throughout the cluster, which
allows you to scale your application across multiple machines. If you've
got enough servers in the cluster, load balancing can literally allow
you to scale your application server code infinitely.
A clustered architecture also allows you to build a highly available
application. The availability of multiple application servers within the
cluster allows your application to continue to service client requests
even after the physical failure of one or even many of those servers.
To illustrate this discussion, we will use the example of a fictional
bank that will build a distributed application with EAServer. The bank
will build a cluster of 50 servers, one server in each state in the United
States. The bank has been mandated to build the application to not only
handle the existing client load, but also to dynamically scale as new
users come online. The application, which deals with financial transactions,
must be available to the customer base 24 hours a day, with no interruption.
Document Outline
-
Introduction/Overview
-
Load Balancing
-
Failover/High
Availability
-
Maintenance
of a Cluster
-
Conclusion
Load Balancing
To support both the bank's current user load as well as the anticipated
increased user load in the months to come, the bank's IT shop has decided
to implement EAServer clustering.
Basic Installation
The bank has branches in all 50 states. The first step is to install
EAServer at each site in each state. Each site server should be configured
to have its network listeners listening to the server's own hostname.
This is easy set in the listener's tab in Jaguar Manager for that server.
Additionally, each server participating in the cluster should have the
exact same password for the jagadmin user. To implement Operating System
authentication, each server within the cluster should also have an identical
set of users in the underlying operating system.
How Load Balancing Works under the Covers
Naming services drive load balancing in EAServer. The naming services
map a component's name to its physical location. These services are implicitly
used whenever a client accomplishes a lookup operation to get a stub reference
to a server object. They are like a phone book for component instances,
allowing you to find a component instance by looking up its name.
The example below uses Java to demonstrate the use of these naming services:
//Assume session instance is already created
ObjectRef obj = session.lookup("finance/financeLogin");
Factory factory = FactoryHelper.narrow(obj);
In this example, a factory is returned from the Session's lookup operation.
The lookup operation asks the naming services where a physical instance
of finance/financeLogin exists on the network. The factory can create instances
of the needed object, just as factories build things people need in the
real world.
The naming services are aware of each server in the cluster's installed
components. This happens through a process known as binding. Binding is
the act of registering your components to the name server. The name service
itself is simply a service component running inside of each EAServer.
Any server with the name service enabled is known as a name server.
In our bank example, we have 50 servers ready to be in the cluster. To
have load balancing, at least one name server among the 50 must be able
to act as the phone book. Any EAServer can act as a name server. In our
example, we will choose the server in New York State to be the name server.
As every other server (each known as a "member server") in the cluster
boots, it will connect to the server in New York to "bind" its components
to the name server. The process of binding informs the name server which
components are currently available in any given server.
As each server binds to the name server, the name server becomes aware
of everywhere a client could access a given object.
As a result, the name server becomes a logical dispatcher, directing
clients to servers that are capable of providing an object instance. When
the client executes a lookup operation on the session object, the name
server will return a list of every bound server that could provide an
instance of that object.
IORs and ORBs
The name server returns this list as an array of server profiles within
an IOR. An IOR is a string that contains information that uniquely identifies
a component instance. CORBA Orbs, for example, use the IOR to communicate
with the proper object instance. The name server automatically generates
IORs in a process that is transparent to both the developer and the user.
One of the values within the IOR is a server profile. The profile contains
all the information needed to connect to a server. The name server simply
returns an array of profiles in the IOR. That array of possible servers
is provided in a random order to the client. When the client creates an
instance of the stub, the client ORB will choose servers from the array
in the given order. So, for example, when the client executes . . .
ObjectRef obj = factory.create();
financeLogin = financeLoginHelper.narrow(obj);
. . . the call to factory.create() causes the ORB to retrieve the first
server in the array of profiles. The ORB then creates the object instance
on that particular server. Because the provided list was randomly generated
and each subsequent call to session.lookup() by another client will retrieve
another list of servers in a random order, load balancing is achieved.
As the ORB opens sockets to servers to facilitate the creation of object
instances, those sockets are cached in the ORB itself. When a new method
invocation occurs, the ORB will first check the socket cache. If no socket
is available in the cache, the ORB will find the next server in the random
list and will open a new socket to this server. In this way, once a client
connects to a server, it will favor that particular server-until it needs
to execute two concurrent method invocations. This will probably occur
in multithreaded client applications that are making multiple simultaneous
method invocations.
We can configure how long the ORB keeps cached sockets available by setting
the following property:
Properties props = new Properties();
props.put("com.sybase.CORBA.socketReuseLimit","50");
Using the code above, the ORB will keep any cached socket in its cache until
50 method invocations have occurred. After this limit, the socket will be
disposed of. This property allows the client developer to trade off the
performance boost of socket caching with the increased load balancing gained
by moving through the array of valid server profiles.
Implicit and Partitioned Load Balancing
EAServer supports two types of load balancing: implicit and partitioned.
In the case of our bank, each client that tries to look up a component
instance and narrow it back to a stub reference will receive a list of
all 50 servers in the cluster that have that component available. By default,
if every server in the cluster has the exact same list of installed components,
it will receive implicit load balancing as the name server returns all
50 servers in the profile array.
However, we could also choose to install only some of the components
on some of the servers and others onto other servers in the cluster. For
example, imagine that some of our servers have very fast CPUs, and others
are physically located near the database. We might only install CPU-intensive
components on the fast CPU machines and database-intensive components
on those machines close to the database server. In this way we achieve
partitioned load balancing. During binding, the name server is informed
as to which components each server has. It will only return servers in
the profile array that actually contain the requested component. When
we physically partition the components among given servers, we can control
more closely how the load balancing is distributed.
Creating the Cluster
To configure load balancing, a cluster must be set up. We have already
installed the member servers at all 50 sites. The next step is to connect
to the one server that we will consider our primary or master server.
This primary server will contain the master configuration for our cluster.
For our example, we'll choose the New York server to be the primary server.
Once we are connected to the New York server, our first step is to be
sure that this server is also configured to be a name server. We can set
this on the server's property sheet on the "Naming Service" tab. We select
"Enable as Name Server" as shown in Figure 1.
Figure 1: Setting up the name server's property sheet on the "Naming
Service" tab.
Once this is set up, we now can configure the cluster. We must be sure
that all 50 servers are up and can be connected to. We'll select the "Clusters"
folder and right click to create a new cluster. For our bank example,
we'll name the new cluster BankCluster and select OK. This will bring
up the property sheet for our new cluster. This property sheet can be
used to configure all the member servers in the cluster, the name servers,
and any other property of the cluster itself.
Each server, including the name server should be entered into the list
on the "Servers" tab. Only the name server should be listed in the "Name
Servers" tab.
Once the cluster is completed, the primary server now has all of the
configuration values for the cluster. The next step is to configure the
remaining 49 servers. As you might have guessed, this is an enormous job.
That's why EAServer has implemented cluster synchronization.
Cluster Synchronization
Because a cluster is a group of servers acting as a single unit, it's
important to keep the applications running on this unit in sync. Cluster
synchronization ensures that all servers within the cluster have at least
the same components, configurations, and security information. It's an
automated process that copies the primary server's configuration to every
member of the cluster.
When setting up a cluster for the first time, we use synchronization
to copy the cluster settings made on the primary server to every other
member of a cluster. First we set up the cluster on the primary. To synchronize
the whole cluster, we right click on the cluster and choose to synchronize.
Figure 2: Synchronization property sheet.
Synchronization Settings
The synchronization property sheet has several configurable settings
(see Figure 2). The first and most obvious is the user name and password
for the jagadmin user on the cluster. As mentioned earlier, for clustering,
the jagadmin password must be identical on every machine in the cluster.
Once the user name is selected, the appropriate cluster name is chosen
from the drop down list box.
Also in the "targets" area is a text field marked as "Servers." This
field can be used to synchronize with any server, even one not a member
of a cluster. This feature is very useful for distributing a software
release between servers-between development and test servers for example.
The last section consists of the "options" settings, where individual
options can be set. The options are as follows:
- All Cluster Files
By default, only a subset of configuration files is distributed during
a cluster synchronization. Selecting this option distributes extra information
to the cluster, including the security databases that contain the digital
IDs, etc.
- All Package Files
This option forces all packages and components installed with a server
to be synchronized. It is enabled by default.
- All Servlet Files
This setting is enabled by default and forces all servlet classes installed
in a server to be synchronized.
- All App Files
This option synchronizes all applications,
including the packages, Web applications, and the other resources contained
in them.
- All WebApp Files
This option synchronizes all Web applications, including the Servlets,
JSPs, and other files contained in them.
- All Connector Files
This option synchronizes all Connector
files.
- Verbose
When enabled, this setting prints verbose information to the server's
log file, including the percentage complete.
- Restart
This option causes each of the servers to restart when synchronization
completes.
- New Primary
You may only start synchronization from the cluster's primary server,
and there can be only one primary server in any cluster. If you find
it necessary to create a new primary server, enable this check box and
start synchronization from the new primary server. This check box also
prevents an accidental synchronization start from any server that is
not the primary server. Synchronization can only start from another
server by using this option to override this safeguard.
- New Version
For synchronization to distribute a release to every server in a situation
like our bank, which has many servers in the cluster, every server must
be available at the time synchronization occurs. If any one server is
down, that server would be using old configuration values and component
implementations. To address this issue, the server's repository can
be given a version number during synchronization. This version number
allows the servers in the cluster to verify that they are actually running
on the correct version of the repository. Version numbers should be
used anytime there's a risk that one server in the cluster may not be
synchronized.
- Refresh
This option causes a refresh on each of the servers after synchronization
completes.
Completing the Synchronization
Once we have checked the configuration values checked, we click OK to
begin the synchronization. When synchronization is complete, a dialog
box indicates any errors or warning conditions that may have occurred.
Note: If you haven't set a password for the jagadmin user, the dialog
box will warn you of this during synchronization. If this is not a risk
to your system, you can ignore this message.
When synchronization is complete, each server in the cluster should be
rebooted. Clustering information is distributed to each server, and each
is enrolled in the cluster. As each server reboots, it will pause to connect
to the primary server to verify the repository version number. The primary
server must be up to enable other servers to boot.
This boot can be disabled, however by checking by setting the cluster
property com.sybase.jaguar.cluster.startup=disable_check in the
"All Properties" tab of the Cluster's property sheet. But whenever a configuration
is changed on the primary server, the cluster should be synchronized.
Developing for Load Balancing
Load balancing is handled automatically by the Orbs and requires only
that the client application use the naming services. This is most easily
handled using the SessionManager interfaces. The first step is to utilize
the SessionManager interface to build a SessionManager::Session with the
server. A session represents an authenticated session with the server
and allows client applications to find component instances within the
server.
String ior = "iiop://newyork:9000";
ObjectRef obj = orb.string_to_object(ior);
Manager manager = ManagerHelper.narrow(obj);
In this case, we create a string to hold the human readable IOR for the
name server. Because the name server will provide the location of the
components, the only physical address needed is that of the name server.
Once a manager object is created, we can use it to build a session for
a specific user and password.
In the code below, the session object is used to look up instances of
a components based on that component's package and name.
Session session = manager.createSession(user,pass);
The session object returns an object reference for a Factory object. The
Factory object contains the full list of servers that can provide the given
object in the random order as provided by the naming service.
When an instance is needed, a call to the create() operation creates
a stub reference for the client to use:
obj = session.lookup("Package/Component");
Factory factory = FactoryHelper.narrow(obj);
obj = Compofactory.create();
Component comp = ComponentHelper.narrow(obj);
The client-and even the developer who created the call-is completely
unaware of on which server the instance is running within the cluster.
The list of servers is seeded at runtime by the name server and chosen
when the object is created.
Back
to document outline.
Failover / High Availability
One goal of computing is to make applications more reliable and more transparent
to users as we move forward. In a distributed architecture, however, there
are many tiers and as a result many more single points of failure. At the
same time, the user base is much larger and broader than that of a client/server
application, so keeping that system highly available is imperative. Sybase
EAServer provides this level of dependability via its support for failover
and high availability.
High availability in EAServer is provided for using clustering and the
naming services, much like load balancing. To provide a highly available
application, the failure of any server process should be transparent not
only to the user, but also to the developer working on the application.
The cluster should be able to handle a failure of any server before, during,
or after a single method invocation. The act of failover should be nearly
imperceptible to the client working with the application.
Because naming services know every server in the cluster that has an
available instance of any given object, the naming services have enough
information to failover to any other server within the cluster. The limiting
conditions are that 1) we must have at least one name server running on
which to do lookups and 2) an instance of the object must be available.
Basic Installation
In an EAServer cluster on any number of servers-say N servers, you must
have between two and N servers to have high availability. Two name servers
at a minimum are needed in case any one name server fails because clients
still need a name server to do lookup operations. The more servers within
the cluster that are configured as name servers, the less points of failure
within you're the cluster.
The tradeoff comes in performance when booting the server. Because each
member server must bind to its name server, as name servers are added
to the cluster, each member server has more name servers it must now bind
to. This binding can affect performance when booting, but the advantages
of reducing failure points well outweigh the performance implications
except in very large clusters. Figure 3 displays setting up name servers
in the cluster property sheet.
Figure 3: Setting up name servers in the cluster property sheet.
In the case of our bank, we make 50 percent of the servers name servers
in addition to our primary server. With this configuration, we still provide
failover until all 25 name servers fail.
We accomplish this configuration in the same fashion that we added the
New York server in the load-balancing demo.
- First we log in to each of the servers that we will designate as name
servers and enable them on the server's property sheet.
- Second, on the primary server we go to the Cluster's property sheet.
Selecting the Name Servers tab, we add each server's URL to the list.
- Finally, as before, we synchronize the cluster to distribute the changes.
Note: The act of binding is controlled by a bind password. This is done
to prevent a malicious application from trying to bind its own components
into your cluster, thus introducing a Trojan Horse. The bind password is
generated on the primary server and distributed across the cluster. Except
in certain specific cases, its best to not change this generated password.
Automatic failover is not enabled by default for a component. There may
be cases where automatic failover is not desirable. It is the responsibility
of the server's manager to enable this option, and it must be enabled
on a component-by-component basis.
For the bank, we make the Teller component use automatic failover. Opening
the property sheet for this component allows us to enable this. We enable
the "Automatic Transaction Demarcation" checkbox to allow the enabling
of the "Automatic Failover" checkbox. Once both have been enabled, we
can simply close the property sheet and refresh the component.
How Stateful Failover Works