Natural interaction refers to the way we communicate with each other. It is the words we use, the meanings we agree to assign to them and the ways we combine those words into larger meaningful structures. By this definition we are separating the language – the words, sentences and paragraphs -- from the mode of transmission we may be using. Our language is essentially the same whether we type the words, write them out by hand, or say them aloud.
Most of our interaction with people involves using our natural language (e.g., English).
We all have tremendous experience expressing complex concepts and navigating great subtlety of meaning using natural language. The ideal behind Answers Anywhere is to apply our experience, skill and comfort with natural language to our interactions with technology.
Understanding Language: Top Down & Bottom Up
The most common approach to natural interaction is to start at the top: model the language.
Natural interaction systems begin with the components of the language -- the words -- and analyze text in terms of the rules of the language: parts of speech, the structure of sentences and so on. A good top down natural interaction system requires a large dictionary, a thesaurus, and a full understanding of the ways people create sentences. It also requires a great deal of general cultural knowledge, including the (often elusive) relationship between the meaning of idiomatic expressions like "he bought the farm" and the literal meanings of the words themselves.
An alternative is to consider language from the bottom up, starting with the tasks to be performed and considering all the different ways potential users might phrase their wishes. Called domain-specific, this approach does not need to process every possible sentence in the target language. It must merely recognize the subset that is relevant to its particular responsibilities.
Domain-specific natural interaction, the basis of Answers Anywhere, can provide an impressive degree of accuracy, measured as the percentage of requests that translate into the operation intended by the user. It can also be extended to support additional vocabulary and additional application functions without having to restart from scratch.
Although top down natural interaction is useful for general language understanding, for example to capture and catalogue the ideas found in large numbers of documents, domain-specific interaction works better as an adjunct to graphical user interfaces in query or command-and-control applications.
Let us use the find command from the UNIX operating system as an example: This command searches a directory hierarchy and identifies files that match a set of criteria including file name, file type, age, and size and so on. It is both one of the more useful commands and among the most complex. We will apply natural interaction to a few of its more common capabilities.
In a UNIX shell (command) window with a natural interaction front end. We can ask it to find files of interest:
“show me web pages that are less than a week old and more than 30kb”
The front end examines this input and translates it into its equivalent as a shell command:
find –mtime –7 –size +30720c \( –name \*.html –o –name \*.htm –o –name \*.shtml \) – printf "%10s %t %P\n"
This command tells find to start in the current directory (the default when no directories are specified) and identify files whose modification times are less than seven days (-mtime –7; the minus indicates less than), bigger than 30,720 bytes (-size +30720c; the plus indicates greater than and the c specifies characters rather than the default of disk blocks) and whose names end in .html, .htm or .shtml. Files that match the criteria will be printed according to the –printf specification: file size, modification date/time and the file’s path.
The shell executes the command and returns the files it finds:
84531 Thu Jan 4 06:49:00 PST 2001 Docs/PolicyReference.html
38392 Fri Jan 5 22:15:48 PST 2001 public_html/Bookshelf.html
44392 Fri Jan 5 22:17:08 PST 2001 public_html/Java-Embed.html
The role of the DNI interface is to serve as a translator between the free form text provided by the user and a more structured command language used to drive the application.
Agent Oriented Software
Agent-oriented Software Engineering defines the agent as the fundamental unit of software. An agent processes requests either directly or by combining its processing with results produced by other agents. Agents are wired together in a network. This network structure defines the communication paths between agents, which in turn determines the way agents get requests and provide responses. The network can be thought of as a basic tree layout with one enhancement: a node.
Nodes can connect to multiple nodes above in the tree. (This explanation assumes a tree with its root at the top, a popular layout in computer science, which is less common in nature.)
The agent network operates by passing requests from agent to agent. A request begins at the root of the tree and flows down (down-chain) to other agents. Agents examine the request and decide for themselves whether they have anything to contribute. Reponses flow back up-chain using the same message paths as the request. The network permits an agent to have multiple up-chain connections. In such cases the down-chain agent will receive the same request from every agent above it. It will only process the request once, however, and will send the same response to all of its up-chain agents.
Figure 1 shows the example of an agent network for the UNIX find command we discussed earlier. Each circle represents an agent; the arrows show the paths a request takes to reach every agent in the network. The tree begins with the FIND agent, which receives the request from the user and passes it along to the agents that are down-chain.
Figure 1 - An agent network for the find command
These second level agents map to the different kinds of requests the network understands:
* the FOLDER agent understands where the search is to begin
* the SIZE agent understands file size specifications
* AGE understands relative modification dates
* FILETYPE identifies files by their extensions; and
* ORDER handles requests to sort the results or to provide a subset (e.g. the five biggest or newest files).
Agents further down the chain break the problem down further, into specific file types to be found or different ways of specifying time or date. We will explore this example in more detail shortly.
The Agent Process Flow
The agent network processes a request in two phases. Phase one relates to interpretation of the request - that is, the determination of the user’s intent. Phase two is the actuation phase, where the network uses its understanding of the request to generate a command to the application.
Phase one begins when the top-level agent receives the request from the outside world. It passes the request to its down-chain agents, which pass it along to their down-chain agents, and so on until every agent has seen the request. The leaf nodes then examine the request, each one deciding whether it recognizes anything in the request that it knows how to process. If it sees anything, it makes a claim on whatever part of the request it thinks it understands. An agent may make multiple claims on multiple parts of the request, including claims on overlapping parts of the request. If it sees nothing of interest in the request, it sends a "no claim" message up the chain.
An up-chain agent waits until it gets a response from every agent down the chain. It looks at the claims it receives and combines those claims with its own thoughts about the request. It may make its own claim based on the down-chain agent claims; it may reject those claims based on its own, better understanding of the request and make a claim unrelated to those it received; or it may decide that neither it nor its down-chain agents have anything to contribute and send a "no claim" message to its up-chain agent. In this way claims and "no claim" responses travel up the tree until they reach the top-level agent.
It is possible, in fact likely, that an agent will get multiple claims from the agents below it. A set of rules are used to determine the relative strength of each claim. It is up to the up-chain agent to decide whether to pass along multiple claims or to send only the strongest. Once the top-level agent has received responses from the rest of the agents, it begins the second phase: the generation of the command to the application. This time, the request is passed only to those agents whose claims were accepted in phase one. Each leaf agent has its chance to contribute some part of the command being generated. For example, given a request for "files more than 30kb", in phase one the BIGGER agent claims the string "more than", the NUMBER agent claims "30" and the SIZE agent combines these claims with its own to claim "more than 30kb".
In phase two the BIGGER agent generates "+" (its translation of "more than" to the find command’s requirements), NUMBER generates "30" and SIZE combines and extends these into "-size +$((30 * 1024))c" (The expression in parentheses will be computed by the shell to produce 30k). This processing continues up the tree until the FIND agent has a complete command. It passes the command on to a special actuation agent that tells the application what to do. The application sends back an answer, which is displayed to the user.
Figure 2 shows the agent network after processing the request "What pictures are there in my agents folder?" In this figure, agents that made winning claims are marked with white circles; agents that made no claim are solid red. In this case the IMAGES agent claims the word "pictures", FILETYPE forwards the claim from IMAGES; the FOLDERNAME agent claims "agents" (based on its knowledge that there is a folder of that name) and FOLDER claims "in my agents folder", combining the claim of the FOLDERNAME agent with its own rules about the ways to specify a folder. In this example, none of the agents make competing claims for the same words in the request.
Figure 2 - Processing a simple request
Agents in Competition
Figure 3 shows the request "find music files that are about two months old". Here we see that two of the agent circles are half- filled. This shows agents whose claims were rejected.
Figure 3 - A request with competing claims
In this example, the MEDIA agent claims "music", FILETYPE combines with that to claim "music files", NUMBER claims "two" and AGE combines with that claim-to-claim "two months old". These are the successful claims. But they aren’t the only ones. Since NUMBER is connected to both AGE and SIZE, it sends its claim on "two" to both agents. SIZE doesn’t see anything else in the request that it understands, so it forwards the claim on "two" up to the FIND agent. FIND compares "two" to "two months old", prefers the latter claim and rejects the claim from SIZE in favor of the one from AGE. Finally, the FOLDERNAME agent makes a claim on "music", since there is a folder called "Music". FOLDER rejects that claim, since the request lacks any of the surrounding words the agent requires for a folder specification.
Let’s try one last request: "Are there any word docs under 200?" This request is ambiguous; without any kind of unit after 200 we don’t know whether we’re talking about file size or age. As we can see in Figure 4 both BIGGER and OLDER make claims against the word "under". (Despite their names, these agents also handle concepts of smaller and newer.)
The SIZE agent combines the claim of BIGGER with that of NUMBER to make its own claim on "under 200". AGE makes the same claim. ORDER also makes a claim on "200", which the FIND agent rejects in favor of the stronger claims from SIZE and AGE. FOLDERNAME made a claim against a folder called "docs", which was rejected by FOLDER as in the previous example.
Figure 4 - An ambiguous request
What do we do in the face of two claims of equal strength? The decision belongs to the programmer, who could give precedence to one or the other agent. But in this case there is really no reason for the network to believe that we meant one thing versus another. Here is a situation where the best course is to let the user decide how to proceed.
The programmer allows for ambiguity by identifying places where it should be permitted and then providing an action to take place when it occurs. The network knows what caused the ambiguity and generates a specific set of choices back to the user:
Which one do you mean: File size in kilobytes, or Last modification in days?
We can pick one of the options presented, at which point the request will be resubmitted with one of the competing claims chosen over the other. The important thing is for the network to be as precise as possible in telling us what it needs to know. In other words, it is far better to provide a targeted request for clarification than to return some generic "huh?" response.
The Agent Network Application Architecture
Up to now we have discussed the inner workings of agent networks without consideration for what is required to integrate them into a larger application. We have assumed that user requests arrive at the top level agent as if by magic, that the command generated by the network somehow causes something useful to happen and that the result of that something useful makes its way back to the user. It is time to understand how all the pieces fit together.
Figure 5 represents the architecture of a server-side application with a natural interaction interface. The system supports multiple modalities, with users submitting requests from a variety of wired and wireless devices. These requests are received by specialized servers based on the wrapper for the request: an SMS server for cell phone messages transmitted using the Short Message Service protocol; a web or WAP server for cell phones, PDAs and computers using web browsers; an email responder for PDAs and computers sending requests to a special email address; and perhaps a speech recognition system that receives voice requests from cellular or wired phones (note that Answers Anywhere does not transcribe the wave form into text, rather, it relies on a speech recognition engine for this transcription).
Figure 5 - Application integration
All of these requests are routed to the agent network via a special agent called an interaction agent. This agent is responsible for receiving the request and using some service-specific bit of information to associate it with earlier requests that should be treated as part of the same conversation. Different services use different information to make this association: a telephone number, an email address, an HTTP cookie. All that matters is that we have some way of knowing whether this request is within the context of an ongoing conversation, or whether it should begin a new conversation.
The interaction agent passes the request on to a user agent. The user agent retrieves information associated with this particular user and conversation, including personal preferences, learned keywords and other behavior and contextual information from prior interactions. An example of the latter would be an interaction like this:
User: "show me web files more than a month old"
The system returns a list of files.
User: "which of these is the most recent?"
The system understands that the user is referring to the result of the previous request.
The user agent passes the request to the top-level interpretation agent, which sends it to the rest of the network as we have already discussed. At the end of the actuation phase, the top-level agent sends the actuation command to a custom actuation agent. This is the agent that interacts with the back end application. How it does so depends on the developer and the specific application. If the application has a Java interface, the actuation agent can use its methods directly. Alternatively, it may need to use the Java Native Interface to integrate with a non-Java application. If the application runs as a separate process, there may be remote procedure calls, Java’s Remote Method Invocation (RMI) or some similar networking protocol involved. In most cases the application sends a response back to the actuation agent. The actuation agent forwards the response to the USER agent, which sends it to the interaction agent, then to the user.
Figure 6 shows the interface agents for the Find command network. Both the Interaction agent and the User agent are generally used without modification. The connection between the application server or servers and the interaction agent can be customized. Applications may have their own custom Actuator agents. Agents below and including the top agent (e.g., Find) are responsible for interpreting the user request and creating the response.
A set of rules govern the behavior of each agent. These rules are in an agent programming language called AASAP. Programmers can write their own set of AASAP rules for each agent or reuse pre-existing ones. Answers Anywhere provides a set of predefined AASAP templates and a library of agent network components to simplify the creation of agent networks.
Figure 6 - External interface agents
In this paper we have discussed some elements of agent-oriented software architecture and examined a particular implementation of agents and its application to user interface design. As the number of technological products we encounter in our daily lives increases, the need for these products to present convenient, expressive and natural styles of interface becomes ever more obvious. Developing these interfaces will require that we consider new models of development that offer much greater productivity and flexibility, even as they maintain the level of reuse, ease of integration, performance and memory footprint we require.
Copyright © 2004 iAnywhere Solutions, Inc. All rights reserved. Sybase, the Sybase logo, iAnywhere Solutions, the iAnywhere Solutions logo, Adaptive Server, MobiLink, and SQL Anywhere are trademarks of Sybase, Inc. or its subsidiaries. All other trademarks are property of their respective owners.
The information, advice, recommendations, software, documentation, data, services, logos, trademarks, artwork, text, pictures, and other materials (collectively, "Materials") contained in this document are owned by Sybase, Inc. and/or its suppliers and are protected by copyright and trademark laws and international treaties. Any such Materials may also be the subject of other intellectual property rights of Sybase and/or its suppliers all of which rights are reserved by Sybase and its suppliers.
Nothing in the Materials shall be construed as conferring any license in any Sybase intellectual property or modifying any existing license agreement.
The Materials are provided "AS IS", without warranties of any kind. SYBASE EXPRESSLY DISCLAIMS ALL REPRESENTATIONS AND WARRANTIES RELATING TO THE MATERIALS, INCLUDING WITHOUT LIMITATION, ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. Sybase makes no warranty, representation, or guaranty as to the content, sequence, accuracy, timeliness, or completeness of the Materials or that the Materials may be relied upon for any reason.
Sybase makes no warranty, representation or guaranty that the Materials will be uninterrupted or error free or that any defects can be corrected. For purposes of this section, ‘Sybase’ shall include Sybase, Inc., and its divisions, subsidiaries, successors, parent companies, and their employees, partners, principals, agents and representatives, and any thirdparty providers or sources of Materials.
iAnywhere Solutions Worldwide Headquarters One Sybase Drive, Dublin, CA, 94568 USA
Phone 1-800-801-2069 (in US and Canada)
World Wide Web http://www.ianywhere.com
Copyright © 1989–2004 Sybase, Inc. Portions copyright © 2001–2004 iAnywhere Solutions, Inc. All rights reserved.