Notes of the fifth Z39.50 Implementor's Group Meeting, held at the Library of Congress, 20-21 May 1991. Prepared by Joe Zeeman, Software Kinetics Ltd. Attendees: Apple Computer Eric Roth Janet Vratny-Watts Central Intelligence Agency Mark Zimmerman Chemical Abstracts International Les Wibberley CNRI David Ely Dartmouth College Eric Bivona Data Research Associates James Michael Sean Donelan Duke University Boris Vychodil Juraj Horacek Dynix, Inc. Steve Jaynes Florida Center for Library Automation Mark Hinnebusch Library of Congress Ray Denenberg Larry Dixson Ralph Orlik Kaushi Belani Maxwell Online Oren Sreebny Mead Data Central Peter Ryall National Library of Medicine Ed Sequira NeXT Computers Jack Greenfield NOTIS Systems Inc. Sara Randall OCLC Ralph LeVan PLS Inc. Larry Fitzpatrick PSI Wengyik Yeong Research Libraries Group Richard Fuchs Lennie Stovel Jay Field Software Kinetics Ltd. Joe Zeeman Sun Microsystems Andy Bensky Thinking Machines, Inc. Brewster Kahle University of California, Berkeley John Kunze Cecilia Preston University of California, Division Clifford Lynch of Library Automation Mark Needleman Michael Thwaites Margery Tibbetts Virginia Tech (VPI&SU) Carol Terry VTLS, Inc. Cathy Winfrey 1. Status Reports Participants introduced themselves and gave status reports of work going on at their organization. 1. FCLA have received a Title II grant to implement Z39.50 over OSI. The IBM OSI/CS product was being used, and the application is being implemented using the C language. 2. UC-DLA is implementing Z39.50 over TCP/IP. An early application will provide Penn State with an origin to access the Melvyl target. 3. DRA are currently debugging their OSI protocol stack. They have implemented ISODE 6.8, and have found it to be a distinct improvement over version 6.0. 4. Software Kinetics reported that the National Library of Canada intends to begin work soon on a project to develop an SR kernel, which will be available for use in the public domain, much as the ISODE is an OSI kernel in the public domain. This project is intended to encourage users in Canada to use the protocol as soon as possible. The National Library intends to initiate a Canadian implementors' group similar to the ZIG. 5. PSI has a prototype of version 2 working. No further development is taking place at present. 6. Mead Data Central are not yet in the implementation phase. They have commercial needs to support multiple protocol stacks, such as TCP/IP, OSI and SNA, running in multiple environments. 7. Thinking Machines have reached a "good crystallization point", with public domain versions of their WAIS server available on 27 internet hosts, and origin implementations available for Next, Macintosh, X-Windows and GNU Emacs. There is considerable public interest, and articles on WAIS have recently been published in Byte, MacUser and Release 1.0. 8. NOTIS Systems currently have interface coding in hand and are about to start coding the Z39.50 protocol machine. They hope to speed up the timeline of their full Z39.50 implememtation. 9. Dartmouth College hope to link their Campus-Wide Information System to resources outside the campus using Z39.50. 10. OCLC have a working implementation of the 1988 version of Z39.50. Current development is awaiting an actively interested partner. 11. UC Berkeley currently have most of the search and init facilities completed. 12. VTLS Inc was present as an observer only. 13. The CIA was present as an observer only. 14. Maxwell Online was present as an observer only. 15. NeXT Inc. are working on a transducer based information architecture and the Z39.50 protocol seems closely related to this. They are using Thinking Machines WAIS implementation. 16. Chemical Abstracts International are planning an international STM ("Scientific, Technical Medical") document retrieval project, for which SR is intended to be used. 17. Dynix was present as an observer only. 18. The Research Libraries Group has designed a technical architecture running under their Orville system, and are upgrading the testbed originally used for testing the LSP protocols to support Z39.50. They are at present negotiating an agreement with OCLC to offer a bi-directional Z39.50 link. 19. The Library of Congress are replacing their LSP software with a fully OSI conformant Z39.50 implementation, using the IBM OSI/CS product to implement the lower layer services. IBM have recently identified their prioritization of the identified bugs in their OSI/CS, as follows: support for recursive definitions in ASN.1 has a high priority; support for the "non-encoded form" of external data types has a high priority; presentation layer context switching has a low priority; and OSI/CS over TCP/IP will not be supported. 20. Duke University is developing an interface to OCLC's Newton database engine using Z39.50 over a TCP/IP connection. The user interface will be based on Microsoft Windows. 21. Virginia Tech was present at this meeting as an observer only. 22. Sun Microsystems has started implementing Z39.50. They are currently in the process of formalizing a specification based on the protocol. 2. Z39.50 Version 2, Draft 3 Ray Denenberg distributed the third draft of Version 2 of Z39.50 (ZIG 91/08) and walked through the significant changes. One change which led to discussion was in the version numbering. In order to support interworking of Z39.50 and ISO SR implementations, this version was designated as version 2 of Z39.50 (199x). There is additionally a notional version 1 (actually SR) which all implementations of version 2 are expected to support. It was suggested by some that greater consistency would be achieved by calling this version of Z39.50 version 1. Ray undertook to examine the implications of this and to clarify the text of the first paragraph of section 4. 3. Attribute Set bib-1 Larry Dixson distributed a draft of an "informational appendix" describing the nature and use of the bib-1 attribute set (ZIG 91/09). There was some discussion about the desirability of adding more attributes to this set, as well as adding more attribute sets. In the end it was decided to stabilize the attribute set for this version of the protocol. Detailed discussion of the document was postponed until Tuesday. Brewster Kahle suggested the addition of a "best section" use attribute for use in retrieval from full-text databases. On discussion it was determined that this was more properly an element set name. 4. Future Work items for ISO TC 46 Sally McCallum reported on the recent TC46 editing meeting. The final texts for ISO 10162 and 10163 (SR) had been completed and all comments had been satisfied. The texts would go to France by the end of June for translation and should complete the standards process by late autumn. The meeting had also considered future work items, including the development of a test suite, and expansions to the protocol. Proposed new work items included resource and access control, and request batching. Liv Holm of Norway had been working on browse, and had identified 4 distinct kinds of browse: simple browsing of indexes, simple browsing of result sets, navigational browsing of indexes and hierarchical browsing of databases. It was decided that only the first two would be included in the current work item. 5. Maintenance agency report Ray Denenberg reported on the activities of the Z39.50 maintenance agency. Most work continued to center on preparing version 2 of the standard. In addition, a technical report and implementors' guide had been proposed. Ray would undertake to do the technical report, which would include a discussion of object identifiers, the application layer structure, etc. He wanted help from actual implementors in preparation of the implementors' guide. During discussion of the various sources of information about Z39.50, Brewster Kahle undertook to put the ZIG archives on Thinking Machines' WAIS server. Ray distributed a new version of the implementors list (ZIG 91/10), and indicated that it would be made publicly available toward the end of the year. In order to be kept on the list, it is essential that implementors send a brief description of their implementation to the maintenance agency. The maintenance agency is also charged with coordinating work on the development of testing procedures and test suites for Z39.50 [there was more discussion of this at the end of the meeting]. The maintenance agency was responsible for reflecting the various US positions on SR to ISO TC46 and had co-ordinated the work on preparing the US position paper on Canada's batch proposal. The current version of Z39.50 would not go to ballot until the late autumn, so there would be at least one more ZIG meeting before the text and attribute sets, etc. would have to be finalized. 6. Shortening top-level ASN.1 identifiers John Kunze introduced this item, and distributed a background paper for this and the next 6 agenda items (ZIG 91/11). It was agreed that the ASN.1 module name would be shortened to "IR" from "ANSIZ39-50-2". A number of people felt that this should be a local matter, however, and it was pointed out that this change would still result in compiler generated variable names longer than 30 characters. 7. Distribution of machine-readable ASN.1 specification Although, again, this was felt to be properly a local matter, it was recognized that errors would be minimized if implementors with ASN.1 compilers could use the standard module as maintained by the maintenance agency. The difficulty of editing a single ASN.1 text to allow both reasonable printed output and machine processing was pointed out. However, Ray Denenberg undertook to maintain the ASN.1 specification with a line length of no more than 71 characters (required by those with IBM compilers). It was also pointed out that a machine-readable text of the standard would only be available until it became a full NISO-approved standard. NISO is dependent on standard sales for its operating revenue. 8. Clarification of Association creation and termination The conclusion of the discussion was that there was agreement on the need for a new "CLOSE" service for Z39.50, which would inform a target that the origin wished to cease any further exchange of Z39.50 APDUs, but that the association should remain in place to allow other ASEs to be invoked. Ray added that this had been raised by the Germans at the TC46 meeting and that it would be the subject of a new work item. All parties therefore recognized the requirement for such a new service. It would be in addition to the existing Abort and Release services. The wording of section 4.2.1.2 of the draft standard would be re-examined to see if the relationship between the service user, the ACSE and Z39.50SE could be clarified. 9. Role of Reverse Polish Notation in the RPNQuery The new text in section 3.2.2.1.1 of the standard clarified much of this. It was pointed out that the protocol does not require a stack, nor specifies an implementation of a stack. It describes an abstract stack to indicate the way a query is assumed to be processed at the target system. 10. Order of evaluation of non-commutative operators This problem had been solved by the new text in the standard and the informative appendix on bib-1. 11. Wild card language specified via attribute type/value pair John Kunze proposed the definition of new relation type attributes to support wild card matching of various kinds, including UNIX-style regular expressions. In discussion it was decided that simply defining additional attributes was not sufficient; additional error states would be required as well, together with an explicit statement as to how this wildcarding would work. John agreed to submit a proposal to the list. 12. Effect of ASN.1 style on implementations This led to a discussion of ASN.1 style. There was some agreement that the use of indirectly defined types had some advantage in terms of ease of maintenance of the protocol: when the definition of ReferenceId is changed only a single definition need be changed, rather than every occurrence of the type. Some implementors felt, however, that many of these named types were used sufficiently infrequently that indirection could safely be removed. It was recommended that implementors who felt strongly about this should use hand-optimization. A related problem is that there are a few instances in which named values have not been given to types (see, for instance, p. 27 of draft 3:) SEQUENCE { [0] IMPLICIT DatabaseName OPTIONAL, It was agreed that this was probably an error that should be looked at. 13. CNRI Knowbots project with NLM [This item was brought forward for discussion in David Ely's presence]. David Ely described the relationship of this project with Z39.50. The Knowbots project is a joint project of the National Science Foundation and the National Library of Medicine. It is part of the NSF's Digital Library project. The ultimate goal is the creation of "knowbots", which are seen as information retrieval agents that actively, and autonomously, seek information required by individuals or groups. The current intention of the project is to make use of Z39.50 as far as possible, but it is not yet sure how well the protocol fits into the knowbot model and canonical query language. Appendix A presents an extract from the Merit LinkLetter, vol. 4 no. 1 (March-April 1991), describing the Knowbots project. 14. New Structure Attribute for Personal Name Michael Thwaites introduced this item by pointing out the difference between searching for a name as a phrase and as a structured term. Melvyl, for instance, allows a searcher to indicate the boundary of a surname by using a comma (Thomas, John). This form of name is subject to special matching rules, and is treated very differently from a name entered as a phrase (John Thomas or Thomas John). There therefore appears to be a requirement to be able to indicate that a search term is structured as a name. As an alternative it was suggested that it might be preferable to define "first-name" and "last-name" search attributes, but this was objected to on the ground that it would have to be duplicated for each existing name use attribute. In the end it was decided that there should be a new attribute of type "structure", called "formatted name", and that the intersystem format should be that specified for a name in AACR2. This has the advantage of being a more general specification than simply personal name, and will allow dates to be included, as well as allowing corporate names to be transferred. 15. New value for all attribute types called Not-applicable Michael Thwaites introduced this discussion as well. UC-DLA had identified a requirement to be able to inform a target system that a particular attribute type was not applicable in this query. In discussion it became clear that the requirement was to be able to indicate that the target system should not use its default in the query. This applied particularly to the truncation attribute type. There was, however, a new "do not truncate" value. It seemed meaningless to specify that a system should not use its default, without specifying what it should use instead. The conclusion was that this attribute value was not needed. 16. Tighten up position and completeness attribute descriptions This was deferred until discussion of the attribute set. 17. List of extensions Les Webberly asked if there was a list anywhere of proposed extensions to the protocol. Ray Denenberg said that there was no such list, but that there should be and that it was a maintenance agency function. It was requested that individuals who had or wished to request extensions should let Ray know. Clifford Lynch mentioned a number of new attributes which were required to support full-text retrieval. These included use attributes of "byte", "line" and "document-id". This led to a lengthy discussion of document ids and types and their relation to the Thinking Machines' WAIS implementation. One particular concern centred around whether presentation of a sub-document should be requested by means of a special search or by means of a special element set name. There was similar discussion as to whether a document type was a necessary search attribute or a record syntax name. In the end it was agreed that Brewster Kahle would prepare a brief presentation on the requirements to support WAIS. One conclusion which emerged from the discussion was that the elements of the WAIS document id were already present in the Z39.50 protocol, with the exception of a means of identifying the system which 'owned' the number. It was agreed that this could most easily be accommodated through the addition of an attribute to indicate the number owner. This was subsequently generalized to be an "authority format indicator". 18. Connectionless Z39.50 service Peter Ryall distributed a document outlining Mead Data Central's requirements for the next two items (ZIG 91/12). Ray Denenberg pointed out that this, or some other form of asynchronous operation was seen by TC46 as a general need. This item was closely connected with the following one. 19. Target driven periodic query invocation This was required to support the many "clippings", Current Awareness and SDI (Selective Dissemination of Information) services operated by commercial vendors. There are a number of problems with such a service. Among them are, how to handle presenting the result, how to notify the user that new results are available, how to model the service, i.e., whether to model it as target driven or as origin driven, whether and how to use external delivery mechanisms. It was clear that there is a requirement to support such a service, but much further work will be required. Ray Denenberg proposed that the problem of persistent result sets be treated as a separate work item, since, it was required by a number of new or extended services. There was much discussion as to whether the service should be modelled as a target initiated service, or whether the origin should be expected to poll the target on a periodic basis. No clear conclusion was reached, and it was evident that much more work is required on this proposal. It was, however, pointed out that an SDI-like mechanism might also meet the Canadian requirement for batch searching. 20. Canadian batch proposal to TC46 and US. position. Ray Denenberg distributed the two papers which had been issued at TC46 (ZIG 91/13 and ZIG 91/14). Discussion of this item had been subsumed under the previous item. 21. Redirection of associations for bridging Mark Hinnebusch introduced this item by saying that FCLA intended to support both OSI and TCP/IP protocol stacks, and suggesting that institutions building applications over different stacks may want to act as a bridge to enable otherwise incompatible systems to interoperate. The question was, what would be the most appropriate mechanism to provide such a bridging service? Several possible mechanisms were described in discussion. One would be for an application to act as an APDU relay between incompatible systems. A difficulty with this approach is that it does not allow an end-to-end presentation context to be negotiated, nor does it support an end-to-end application context. Another approach would be to use the transport layer bridge described in RFC 1006. This approach, which provides the OSI Transport class 0 over TCP, is expected to be widely implemented, and allows the upper layers to act as if they are in a pure OSI environment. However, many of the early Z39.50 implementations will send raw Z39.50 APDUs directly over TCP, without using the OSI upper layer services. Such implementations will not be able to use the RFC 1006 approach. On the other hand, a simple application relay may be relatively easy to kludge together. 22. Proximity review ZIG91-006, defining operators for proximity searching was revisited and confusions and misunderstandings ironed out. The definition of "distance" was revised to read "Distance is the difference between the ordinal position values of the two operands." The xor operator was deleted, since its effect could be achieved by a combination of other operators. [(A and-not B) or (B and-not A)]. There was a lengthy discussion of whether the Z39.50 model allows information relating to the position of a search term to be part of the result set. It was suggested that there should be no difference between using a complex query of the form (dogs and house) or (cats and house), and using multiple queries and operations on result sets: 1. dogs 2. cats 3. 1 or 2 4. house 5. 3 and 4 The intermediate results created during the processing of the single complex query are identical to the multiple result sets. OCLC's result sets allow them to be used in proximity operations by including a number indicating the ordinal position of the term in the record with the record id. This allows a result set to be used an argument with a proximity operator. It is not, however, the same as keeping the original query with the result set. Ray Denenberg explained the position of the original drafters of the standard that a result set could not include such information. The discussion went on to consider whether the current definition of a result set allowed multiple postings of a single record to be present. There was nothing in the wording of the standard which disallowed this, and indeed, it had been recognized by the original drafters that some systems may only be able to operate in such a way. It was pointed out that duplicate postings would be required in retrieval from full text databases. One problem identified was that there was no model given of the use of a result set in a query, only of its use in presentation. Clifford Lynch pointed out that there were potentially serious problems in allowing duplicates to be either implicitly or explicitly present in a result set. For instance, it could lead to the situation where result sets that present themselves identically would behave differently when used in a query. It was agreed that this was a problem which could not be resolved during the course of the meeting, and that revisiting the result set model would be an agenda item for the next meeting. It was pointed out that the set of operators could be simplified by moving the and-not-prox operator into the sequence defining proximityOperator. That is, the sequence could contain a boolean value indicating a "not" operation. Mark undertook to revise the document and reissue it as ZIG91-15. 23. WAIS requirements Brewster Kahle made a presentation on the attribute and query requirements to allow WAIS to conform to the standard. He distributed a paper describing the WAIS document identifier (ZIG 91/16) His proposal involved the use of query attributes to specify the portion of a document which was to be returned in response to a query [note: the WAIS implementation normally involves two independent queries, one of which returns a list of document ids in response to a "normal" search, and the second using one of those document ids to retrieve all or part of a single document. WAIS searches do not expect or use a result set maintained by the server and do not use a separate presentation service]. In the proposed scheme, byte or line position would be searchable, and only those portions of documents meeting the criteria would be returned. A query to return a portion of a document might specify, for example, document id and a byte range: "AFI=WAIS" AND "LocalSystemNo=DowJones-server.A-Database.DocumentNo.12345" AND "Bytes<2000" AND "Bytes>0". Many in the group were uncomfortable with the idea of using a special kind of search to return a subset of a document. It was suggested instead that the ElementSetNames parameter should be used to specify the subdocument to be returned. Since this parameter was currently defined to be a Visible String, it would be impossible to define a structure for its use in version 2 of Z39.50, and WAIS would need to define string values externally to the standard (e.g "BYTES 0-2000"). The definition of a structured means for specifying segmentation of large records for presentation purposes could in the meantime form a work item for version 3. Another issue raised was that of document type. Brewster's original proposal included an additional search attribute "document-type". In the course of a very lengthy discussion, it was determined that, although document type could legitimately be used as a search attribute (e.g. "restrict the search to only those documents of type 'WordPerfect'"), its use by WAIS was actually to specify a syntax for presentation of the document. It was therefore felt that the PreferredRecordSyntax parameter was more appropriate for this information. A WAIS search also required additional use attributes, notably an attribute for "items about" and an attribute for relevance feedback. The first attribute was determined to be different from "subject" in that the target is free to apply various tests of relevance and to substitute synomyms, etc. It was decided to call it "conceptual content". The second attribute identifies a document relevant to the search and asks for other items like it. It is therefore a document id in form, and a subject or contextual attribute in meaning. Another issue raised by WAIS was the need for extensible APDUs, to allow experimental data to be transferred between implementors. It was suggested that there already was an agreement to include the UserInformation field in every APDU. Upon examining minutes it was decided that the issue had been raised, but never resolved. It was now too late to include this in version 2 of the standard, but would be proposed for inclusion in version 3. One problem with this kind of extensibility was the danger that it would lead to multiple, mutually unintelligible dialects of the protocol. 24. EXPLAIN service Clifford Lynch presented the current status of work on the explain service. Three models have been proposed. Liv Holm has prepared a first draft of Explain using a separate protocol service; Cliff earlier proposed the use of a special database - WAIS currently uses such a database; Sun Microsystems has recently proposed handling explain by means of special element set names, which would return information relating to the database being searched, attribute sets, etc. Each of these models has advantages and disadvantages. The separate service as currently defined does not scale very well, since each kind of explanation is predefined. The special database requires the specification of additional record syntaxes as well as a new search attribute set. It also presents the problem of requiring a target to keep multiple result sets, one for the actual search and one for the explain search. The use of element set names does not handle more global requests for information well, such as provision of a list of all databases. The consensus of the discussion was that the special database approach offered the most flexible way of implementing an explain service, though neither of the other models were precluded by using a database. There were also a number of problems in implementing a database. For instance, it would not be sufficient for a target simply to give a list of attributes supported for each database; what the origin requires to know is what combinations of attributes are supported for a database. Similarly, an explain service based on a database could not easily give an end user guidance or assistance in the form of context-sensitive help. This was, however, seen to be a different problem from Explain, and subject to separate investigation. This item led to a renewed discussion as to where the responsibility for "fuzzying" a search lay. The consensus was that this was a responsibility of the origin system. 25. Z39.50 implementation recognition of SR Object Identifiers Ray Denenberg described and explained the Object Identifier tree structure defined by the ASN.1 standard, and how both SR and Z39.50 would fit in. Most implementors indicated that they would ensure their implementations recognized both SR and Z39.50 object identifiers. 26. Local record syntaxes The related question of how and where to register local record syntaxes was also discussed, at some length. While the OId tree descending from Z39.50 (ISO Member-body USA Z39.50 abstract-syntaxes) could easily be extended to allow locally-specified record syntaxes (... local ) there was some question as to how many of these local syntaxes would and should be defined, whether the Z39.50 tree was the most appropriate place to register them, and whether the existence of such local syntaxes would compromise interoperability. An additional problem was that of transfer syntaxes associated with these abstract syntaxes, where these should be registered, and how their use should be negotiated for any given document retrieval. After much discussion, it was decided to postpone any decision until more thought could be given to the matter, with discussion over the list, etc. 27. Comments on Draft 3 of Version 2 There was an inconsistency in the standard in the use of "null result set" and "empty result set". Ray would correct this. The wording of part of section 3.2.2.1.2 still suggested that a query using a result set as an argument would always be unsuccessful: "... prior to processing the query, the existing result set whose name is specified by the parameter Result-set-name will be deleted ..." Ray undertook to find a better wording. The meaning of Maximum-record-size was clarified: it specifies the maximum size of a data record. The message containing this maximum size record will be larger, and systems must make allowance for this. The definitions in Appendix F still required units. Clifford Lynch undertook to send a proposal to the list. 28. Comments on bib-1 attribute set Clifford Lynch raised a number of concerns. Among them were the structure types "year" and "date". The problem with year was that it did not allow for the transmission of BC dates, or dates using any other calendar. The specification of date used an ISO standard format. This , however, placed all the processing burden on the origin. It was desirable to allow the transmission of an unnormalized date as well, which the target could process. In the course of discussion it became apparent that there was a requirement to specify the encoding standard(s) used in search terms. The only standard which was mentioned was the prohibition on the use of ASCII space in the year structure. Failure to specify the encoding standard would lead to interoperability problems if one system used ASCII encodings while another used EBCDIC. An additional use attribute for "body-of-text" was agreed upon. 29. APDU test suite In discussion of this item a distinction was made between a test suite, an "exerciser" and a reference implementation. A full test suite would be expensive and complex to develop, and it was generally agreed that what was immediately required was a set of APDUs which could be used to exercise implementations as they were developed and debugged. Jim Michael reported that the question of test suites was currently under study by the Standards Development Committee of NISO. The committee would not be meeting again until after the next ZIG meeting, so comments and requirements could be directed through this group. It was agreed that Michael Thwaites of UC-DLA would act as the focal point for exercise APDUs. 30. MARBI proposal for MARC record for systems Mark Hinnebusch raised this item by mentioning that MARBI has recently issued a discussion paper on a MARC format for describing information systems. The paper had been posted to the list. 31. Any other business Andy Bensky from Sun gave a very brief (the time being 3:00 PM) description of the implementation being planned by Sun. 32. Next meeting The next meeting was fixed for 9-10 September at OCLC in Dublin, Ohio. The possibility of beginning the meeting on Sunday the 8th, and running for three days was suggested. APPENDIX: KNOWBOTS The following is an excerpt from the Merit LinkLetter, vol. 4, no. 1 (March-April 1991): KNOWBOTS(TM) DELIVER THE GOODS Have you ever wished for a little robot to sit at your workstation and perform the tedious chore of finding and retrieving information from databases distributed around the world? Well, hold onto your keyboards, because the Corporation for Network Research Initiatives (CNRI) is working on a project which is the debut of exactly that kind of tool-one which will automate the searching of multiple disparate databases. CNRI has been working with the National Library of Medicine (NLM) and the National Science Foundation (NSF) on a utility for database searches in the Medline databases of the NLM. All the databases are now accessed via public networks but two of them, ELHILL and TOXNET, will soon be accessible over the Internet, perhaps as soon as June of this year. The work currently being done at CNRI targets the electronic databases at the National Library of Medicine's Lister Hill Center, known as the MEDLARS system. The prototype NLM Multiple Database Access Project was demonstrated recently at the American College of Radiology conference at the Lister Hill Center. The Multiple Database Access Project is part of a larger CNRI project called Digital Library Systems and applies the Knowbot technology of the DLS to the Multiple Database Access effort. Initial project goals nearly complete At the outset of the project, a number of goals were defined which included: 1) providing parallel access to NLM's multiple databases, 2) extending the NLM's form-based user interface for Mac's and PC's (Grateful-Med) to UNIX workstations, 3) supporting non-text information retrieval, and 4) supporting Internet access to the Medline databases. Three of the four are nearly complete. The fourth, supporting non- text information retrieval, is currently under investigation. The heart of the project At the heart of the project is the "Knowbot(TM)," (KNOWledge roBOT) an active, intelligent program which acts on behalf of the user to carry out a search and retrieval task. A Knowbot exchanges messages with other Knowbots and moves from one system to another to carry out the user's wishes. When the Knowbot sets out on an assignment several processes occur: - The user interface, which is called the "user agent" contains functions such as query forms and login menus for the various databases available. The user formulates a query on the user agent and presses the "send" button. - The user agent places the query inside a Knowbot and encapsulates it with the appropriate "travel instructions" for traversing the Internet. - The Knowbot is then transmitted across the Internet to the "database server" where it is received and verified. The database server contains software which expedites access to the databases. - Next the database server runs a series of small programs to process the Knowbot: a) the Knowbot's generic syntax is translated into the appropriate syntax for the database being queried; b) the query is sent, which is equivalent to dialing- in to the appropriate database; c) when a response is received it is translated back into the generic syntax of the Knowbot, and the beginning and end of each record is marked. - A Knowbot transports the retrieved records back to the user agent where yet another small Knowbot reformats the response into a friendly syntax which is then displayed to the user. Figure 1 provides a graphic representation of this process. Additional Features An additional option to forms-based access is to open up a "transparent" window to ELHILL, TOXNET, and to a Johns Hopkins Welch Library database: On-line Mendelian Inheritance in Man (OMIM) and the Genome DataBase (GDB). This is, in effect, a telnet screen. The window offers direct access to the interactive interfaces of the standard ELHILL, TOXNET, OMIM and GDB systems. (See Figure 1.) It is possible to have multiple Knowbot queries running while simultaneously doing manual interactive searches in this transparent window. In addition, since the user agent stores an encrypted form of the logins for each database, the user only needs to provide login information once for each database accessed. Flexible design The general design of the CNRI system is very flexible with the user agent and database server separable across the Internet. In the present experimental implementation, the user agent typically runs on a SUN 4/110 workstation at CNRI, the database server on a SUN 3/160 at the National Library of Medicine, the two NLM database systems on Telenet (but ELHILL is soon to be accessible on the Internet) and the OMIM and GDB systems via the Internet. Demonstrations using Network Computing Devices X- display stations as well as the SUN 4/110 workstations have been conducted for NLM and for NSF. Looking to the future - short term In the next few months Knowbots will be written to perform multiple searches from a single request. For example, the user will complete one Knowbot search form and the single Knowbot will locate and access multiple Medline databases until it finds the information requested. The current Knowbot-based system will be extended to support queries to databases other than ELHILL and TOXNET. The database server will be enhanced to support queries which are not database specific by making use of information about the contents of the various MEDLARS databases. For the long term Knowbots are general tools for implementing complex, distributed computations, processes and services. Researchers at CNRI and elsewhere are exploring applications of Knowbots as part of a more general examination of a national information infrastructure. Looking further into the future, two of many possibilities are: - Resident Knowbot. A Knowbot is instructed to remain resident at a gateway and to query a given database at the time when new citations are posted. The Knowbot is programmed to search for topics of interest to the user; when appropriate citations are located a message is sent to the user listing the citations and their locations. - Image Processing. If a user's personal workstation does not have enough power to quickly process needed calculations, a Knowbot is written to launch the data to a supercomputer where the calculations are completed and returns the results to the user via the Knowbot. Additional CNRI Projects The Corporation for National Research Initiatives is involved in a number of networking research projects in areas of High Speed Digital Networking ("Gigabits"), Digital Library Systems, Inter- Organizational Messaging, and Internet Research. For more information about Knowbots or other CNRI projects, contact: Corp. for National Research Initiatives 1895 Preston White Drive Suite 100 Reston, VA 22091 703/620-8990 -Susan Calcari, Merit/NSFNET