From: Denise A Troll [Originally created 5/30/97. Amended 7/29/97: Correction from Mary Jackson, US report #15] ZIG MEETING - DAY ONE April 7, 1997 Library of Congress, Washington DC 1. ELECTION OF MINUTES-TAKER Les Wibberley will take minutes until Denise Troll arrives; then Troll will take minutes. 1.5. MODIFICATION OF AGENDA * General discussion of Z39.50 version 4 requested as part of discussion of pruning. * Include 4.65 general discussion of Explain topics as separate item following 4.6, discussion of Explain syntax. * Merge 4.7 and 4.8 (ONE API and implementation of Explain). * Cover MIME issues under 6.8. 2. INTRODUCTION AND STATUS REPORTS Based on ZIG Attendance List posted by Ray Denenberg and notes taken by Les Wibberley. Overview: 2 Australia 1 Belgium 6 Canada 1 Denmark 1 France 1 Luxembourg 1 Norway 2 Sweden 4 United Kingdom 47 United States ___________________ 66 TOTAL ATTENDANCE AT APRIL 1997 ZIG AUSTRALIA (1) Sonya Finnigan - DSTC Pty Ltd.; Australia; sonya@dstc.edu.au Major effort to define SQL extensions to Z39.50; national museums of Australia putting museums online; moving toward Z39.50, plan to run testbed of SQL extensions to Z39; Zorba initiative: looking at information discovery and search using object oriented techniques. (2) Janifer Gatenby - Stowe Computing; Australia; 100625.1240@compuserve.com V3 client and server; working with the National Library to define union catalog profile including update feature. BELGIUM (1) Jo Rademakers - Libis-Net; Belgium; johan.rademakers@libis.kuleuven.ac.be Building gateway to various systems. CANADA (1) Bill Monhemius - Sea Change Corporation; Canada; bill@seachange.com BookWare Pro, and 2000; implements Version 3 via YAZ; looking at Scan and Explain. (2) Andrew Oates - Geac Canada Limited; Canada; a.oates@geac.com Z39.50 central component in environment; 3 clients in commercial release: PC client, Geopac, GeoWeb. (3) Rolande St-Gelais - DRA Information Inc.; Canada; rolande@dra.com (4) Julia Taminiau - Best-Seller Inc.; Canada; jtamini@bestseller.com (5) Fay Turner - National Library of Canada; Canada; fay.turner@nlc-bnc.ca Distributed union catalog project; 12 libraries starting up this summer; developing client to access multiple systems. (6) Joe Zeeman - CGI Group; Canada; em2516@cgi.ca Market Amicus, with Z39.50 at front; testing http interface to Z39.50 server in next couple weeks; Z39.50 gateway to remote sites. DENMARK (1) Adam Dickmeiss - Index Data; Denmark; adam@index.ping.dk FRANCE (1) Patrick Moreau - Bibliotheque Nationale de France - Catalogue Collectif de France; France; patrick.moreau@bnf.fr LUXEMBOURG (1) Gordon Pedersen - EFILA/European Commission; Luxembourg; gpedersen@ip.lu Several European projects implementing Z39.50, expanding to ILL in the future. NORWAY (1) Liv Aasa Holm - BRODD - Oslo College; Norway; Liv.A.Holm@brodd.hioslo.no ONE project - trial service in May; 9 systems going live; one a museum system; Item Order, Explain, Scan, Sort included; included re-indexing the databases. SWEDEN (1) Anders Janmyr - Bibliotekstjdnst AB; Sweden; anders.janmyr@mail.btj.se Swedish libraries using Java to develop Java server/ target implementation; integration into library systems in a year. (2) Anders Samuelson - Bibliotekstjdnst AB; Sweden; anders.samuelson@btj.se UNITED KINGDOM (1) Robert Bull - Crossnet Systems Ltd.; UK; bull@sil.com (2) Ed Davidson - Fretwell-Downing Data Systems Ltd.; UK; edavidso@fdgroup.co.uk Working on Oracle-based library system. (3) Denis Lynch - SilverPlatter Information Ltd.; UK; DenisL@SilverPlatter.com Z39.50 server gateway to their DB server. (4) Mike Wheatley - The British Library; UK; Mike.Wheatley@BL.UK Working on MARC, character set, implementing Explain; several Z39.50 projects and ILL implementation; project CHASE; public domain character set conversion software. UNITED STATES OF AMERICA (1) Kandi Arndt - Aurora Simulation, Inc.; USA; karndt@aurorasim.com (2) Randy Arndt - Aurora Simulation, Inc.; USA; rarndt@aurorasim.com DOD implementation using Oracle. (3) Chris Buckley - Sabir Research; USA; chrisb@sabir.com Tipster project for US defense agencies; adding Z39.50 protocol to tipster, using type 102 query. (4) Eliot Christian - U.S. Geological Survey; USA; echristi@usgs.gov (5) Wayne Davison - The Research Libraries Group, Inc.; USA; bb.wed@rlg.org (6) Ray Denenberg - Library of Congress; USA; ray@rden.loc.gov (7) Larry Dixson - Library of Congress; USA; ldix@loc.gov 3 servers: mainframe - homegrown; SiteSearch for foreign resource records; server. (8) Eric Ferrin - Penn State University; USA; egf@psu.edu Web gateway to Z39.50. (9) Rich Fuchs - Research Libraries Group; USA; rfb@lyra.rlg.org (10) Jeff Graubart-Cervone - Ameritech Library Services; USA; ; graubart@als.ameritech.com Adding Explain; auto-configuration with Lucent, Silver Platter, and others by summer, Version 3 self-configuring project; 3 library management systems using Z39.50. (11) Rebecca Guenther - Library of Congress; USA; rgue@loc.gov (12) Ajay Gupte - NASA\Hughes - CEOS Engineering; USA; agupte@eos.hitc.com (13) Mark Hinnebusch - FCLA, USA, mark@mark.fcla.ufl.edu (14) Paul Huf Gaylord Brothers; USA; huf@gaylord.com, huf@servtech.com Z39.50 client and server. (15) Mary Jackson - Association of Research Libraries; USA; mary@cni.org NAILDD, encouraging implementation of the ISO ILL Protocol. (16) William Jordan - University of Washington Libraries; USA; bjordan@u.washington.edu Moving Willow to Java interface, getting close; combined catalog for schools, using BRS search engine. (17) Mark Kelly - Defense Intelligence Agency; USA; mark@markkelly.com (18) Emily Koechlin - U.S. National Library of Medicine; USA; emily_koechlin@occshost.nlm.nih.gov Version 2 server with access to Medline. (19) Manette Lazear - USA; manette@mitre.org Research on attribute sets, consulting on Z39.50. (20) Ralph LeVan - OCLC; USA; rrl@oclc.org Several areas of Z39.50 applications at OCLC: FirstSearch - with lots of databases and many Z39.50 clients accessing; SiteSearch - commercial product (WebZ, Zserver, and search engine) new release includes support for thesaurus and sort; complete redesign underway using Java platform; working with GEAC to set standards for patron level holds; doing database update application; CIC work - WebZ packaged for big 10 colleges to support virtual union catalog in production; Cataloging - starting own Z39 effort; working Z39 server to set holdings and do cataloging. (21) David Loy - Knight-Ridder Info.; USA; loy@krinfo.com Z39.50 server used as internal gateway with customers, including CARL. (22) Clifford Lynch - University of California; USA; clifford.lynch@ucop.edu Has draft version of the Attribute Architecture working group report, which will be covered tomorrow. (23) Mojmir Mazur USPTO; USA; mazur@uspto.gov Using CGI Amicus product; in 14 days USPTO will be deploying 2000 seats of their DLL Z39.50 client, which has been adapted to USPTO needs; experimenting with Amicus prototype server; provides access to several databases; 200-400 seats of Web client to Z39.50; patents under the mainframe. (24) Randy Menakes - Ameritech Library Services; USA; rmenakes@als.ameritech.com (25) Nassib Nassar - CNIDR; USA; nrn@cnidr.org ISITE software is public domain implementation of Z39.50. (26) Mark Needleman - University of California - DLA; USA; mhn@dla.ucop.edu (27) Ralph Orlik - Library of Congress; USA; orlik@mail.loc.gov Version 2 client & server; upgrading server to add all bib and authority files; most server activity via CNIDR gateway; upgraded to stateful gateway; plan to add 30 more use attributes, and proximity; client: searches SiteSearch system with records from other national libraries; working with OCLC for cataloger service; software to merge retrieval records. (28) Paul Over - NIST; USA; over@potomac.ncsl.nist.gov (29) George Percivall - Hughes/NASA; USA; gperciva@eos.hitc.com Geospatial searching applications. (30) Mark Piekenbrock - Chemical Abstracts Service; USA; mpiekenbrock@cas.org (31) Margaret St. Pierre - Blue Angel Technologies; USA; saint@bluangel.com Java-based Z39.50 toolkit; multithreaded; integrating Z39.50 with full-text search engines; will be posting information; completed LC server; up and running for interoperability soon. (32) Cecilia M. Preston - Preston & Lynch; USA; cecilia@well.com (33) Sara Randall - PALCI; USA; srandall@lehigh.edu (34) Lou Reich - NASA/CSC; USA; louis.i.reich@gsfc.nasa.gov (35) Doug Rendall - Ameritech Library Services; USA; drendall@als.ameritech.com (36) Thorn Roby - CARL Corporation; USA; troby@carl.org Version 3 server, Version 2 client ; web-Z39.50 gateway to Dialog server. (37) Rose Smith - University of Wisconsin - Madison; USA; rose.smith@doit.wisc.edu Z39.50 server based on NLC; working on Scan; client: moving toward OCLC WebZ front-end. (38) Stuart Soffer - Taliesin Software Research, inc; USA; Soffer@compuserve.com (39) Pat Stevens - OCLC; USA; pat_stevens@oclc.org (40) Lennie Stovel - Research Libraries Group; USA; bl.mds@rlg.org Version 2 server; web gateway based on ISITE; next effort on the client end. (41) Terry Sullivan - FCLA; USA; fcltps@nervm.nerdc.ufl.edu (42) Kevin Thomas - Ovid Technologies, Inc.; USA; kevint@ovid.com Version 3 Zserver and Zclient; web product which interfaces to Ztarget; looking at doing Java client which will talk Z39.50. (43) Margery Tibbetts - University of California. Library Automation; USA; margery.tibbetts@ucop.edu Moving toward Version 3 implementation; Z39 to all internal databases; servers have built-in URLs for the articles; retrieve full-text article via hyperlink; client going public next Monday. (44) Xavier Trevisani - Geac Computers; USA; x.trevisani@geac.com (45) Denise Troll - Carnegie Mellon University; USA; troll+@andrew.cmu.edu Interoperability projects with SIRSI. (46) Robert Waldstein - Lucent Technologies; USA; wald@lucent.com Interoperating with UMI to exchange articles via title, ISSN, date; using SICIs internally. (47) Les Wibberley - CAS; USA; les.wibb@cas.org Version 3 clients and servers; stateful Web/Z39.50 gateway. 3. ADMINISTRATIVE MATTERS 3.1. REGISTER OF IMPLEMENTORS (Denenberg) The list of registered participants will be used as the ZIG attendance sheet; circle or add your name. Review the Register of Implementors: initial your entry if it is current, or write an update. Your entry in the Register must be updated at least once a year. 3.2. GENERAL MAINTENANCE AGENCY BUSINESS (R. Denenberg) We discussed what to put on the Web at the last ZIG meeting. The issue was whether to register implementors, products, hosts, databases, etc. We decided that RD should sponsor a list of hosts available for testing. A list of servers available for testing was put up last fall, but there are only about 15-20 entries so far. Is this a useful service? Should this be continued? Yes -- the information is useful and is being used. Please register any hosts that are available (for free) for testing new clients being developed. 3.3. ATTRIBUTE WORKING GROUP UPDATE (C. Lynch) Postponed until tomorrow. Announcement: Z39.50 / SQL meeting tonight hosted by Sonya Finnegan. [break] KNOWN SPEAKER KEY (included upon request): BW = Bob Waldstein CL = Cliff Lynch DL = Denis Lynch ED = Ed Davidson FT = Fay Turner JG = Janifer Gatenby JGC = Jeff Graubart-Cervone JZ = Joe Zeeman KT = Kevin Thomas LH = Liv Holm LS = Lennie Stovel LW = Les Wibberley MH = Mark Hinnebusch MN = Mark Needleman MSP = Margaret St. Pierre RD = Ray Denenberg RF = Rich Fuchs RL = Ralph Levan SF = Sonya Finnegan SR = Sara Randall 4. VERSIONS 2 AND 3 ISSUES 4.1. PENDING CLARIFICATIONS AND INTERPRETATIONS (Denenberg) (See handout; note that the pages are not in the right order.) RD briefly described the process: items are first discussed on the ZIG list, then summarized and documented as "pending clarification and interpretation." "Defect reports" are handled similarly. Sometimes the line between a defect report and a pending clarification is vague. The Maintenance Agency for Z39.50 is also going to become the Maintenance Agency for 239.50 (ISO standard). The Maintenance Agency appreciates feedback on pending clarifications and interpretations. Do we need a new category for the "collective wisdom of the ZIG"? Maybe call it FAQ, but that has implication of silly questions. How do we do SGML? Is this an implementors agreement? a clarification? No consensus or decision. (SGML was discussed later at the ZIG.) RD resumed: there is nothing earth shattering or controversial in pending clarifications and interpretations this time, but there are 14 pending items. He posts a synopsis and analysis to the ZIG list, then waits a month for objections. Clarifications and interpretations are ratified at ZIG meetings. Items discussed: * "Explain database term list" needs some additional information and clarification. * Clarified ~Another piggyback question~ by specifying that resources are limited at the server, but resource control is not in effect (LS). LW: define terms in the text or provide web-links to definitions, e.g., partial-4, diagnostic 1005 or 1006. * "Protocol error in version 2": many implementations ignore rather than shut down when they encounter a protocol error (though shut down is proper). * "Segmentation and version 3": issue of whether you can legitimately do segmentation in version 2 (BW): you can do it (it will work), but is it proper? No, the requisite parameters are not negotiated in version 2. You must do version 3 if you want segmentation. DECISION ==> With the exception of "HTML without GRS", the list of pending clarifications and interpretations was approved with minor editorial changes. 4.2. DEFECT REPORTS (Denenberg) RD: this is the first time we have progressed defect reports. It is not always clear when something is a defect and when it just needs clarification. He described the procedure: when a defect report is approved, it represents a change to the standard approved by the ZIG. By default, the change will be incorporated into the next version of the standard. Simple edits do not require balloting for inclusion in the standard. Substantial changes do require balloting for adoption by NISO. "Technical 1" is assigned if the proposed solution is likely to be adopted. "Technical 2" is assigned if the issue is controversial and not expected to be easily resolved. Items discussed: * Defect report 96-05, the Explain icon: If people who are implementing Explain are having problems, then we need to re-open the issue of including icons with brief element set name. Apparently no one implementing Explain actually has icons in their Explain database. DL proposed moving the icon from "brief" to "description" and getting rid of brief-1 element set name. DECISION ==> RD will post DL's proposal for comments. * Defect report 97-01, Semantics of InternationalString (intended to clarify standard, not change standard): Why include ISO 2022, rules for GeneralString (LH)? Concern that this will be too complex to implement (LW). Minor debate between LW and RD about whether to approve the proposed correction (language) or to keep the issue open. The ONE project does not yet have implementation experience in this area. MH: Rather than revise this text, how about fixing SUTRS to allow a new line character? Some consensus....but (BW), if we put a new line in VisibleString, that violates the standard; it is OK to put a new line in InternationalString. SUTRS version 2 is VisibleString. The proposed solution is for version 2 to be VisibleString plus new line character. BW: what about other records, e.g., diagnostics? RL: there is not really a problem here...there are no clients or servers not working. MH: but the standard needs clarification. DECISION ==> RD: we are going to fix SUTRS rather than clarify the semantics of InternationalString. Archive the textual clarification as miscellaneous commentary or ZIG wit and wisdom. 4.3. CHARACTER SET NEGOTIATION FOR RECORDS (Holm) The proposal is to negotiate whether or not records are retrieved according to the negotiated character sets. MN: if we approve the proposal, we need additional diagnostics. MH suggested that the proposed negotiation be optional -- the default is false. CL raised the question of how this would interact with specific syntaxes; some records may have hard-wired character sets. This is a proposed alternative to developing E-SPEC with variants. LW: if this is negotiated in init, can it be over-ridden with E-SPEC with variant? This should be specified in the text. RD agreed to clarify the text. There was some discussion of whether we need to negotiate language for records (MSP, RD, DL, JZ). CL: we need prose to clarify that this is a function of the server and the record syntax, and that understanding what this really does requires an agreement outside of the standard. This led to much discussion and disagreement. Even more discussion is required to fully understand the issue and craft the text and ASN.1 RL proposed making it a private negotiation record with its own OID. But the ONE project does not want that. RD: they could use both the public and the private negotiation records. DL: suggestion for the time being: go with a private OID. Apparently no one is using character-set negotiation except the ONE project and others outside of the US. -- hence the public negotiation record. There was some disagreement over what the issue really is. How does the client request character set and language right up front? MH: it is far too fuzzy to do. Is it the same for museum records and archives? LW: what is really needed is clarifying text for the proposal; do not force a private OID. MSP and LW: some commercial search engines do require character set negotiation for records, not just the ONE project. ONE and MSP are willing to work with a private OID until sufficient implementation experience would warrant making it a public OID. RL: we are going to be issuing clarifications for the rest of our lives.... RD: no we will not; it will not last that long -- maybe we need an expiration date. RL: an expiration date will not recall code. LW: we need to draft text and work on the semantics; he would like to see a public OID before August 1997. The ONE project people are willing to work on semantics over the next few days. RD: to clarify, you can publicize a private OID. ONE is intended to be a publicly accessible system; all software will eventually be public domain. The ONE project has 16 partners, merging 6 major catalogs in Europe; people accessing these databases will have access to 20 million records -- it should be a public OID. MH summarized: this is a very important negotiation format with general applicability. There are serious concerns about semantics. We will revisit the topic later this ZIG. 4.4. RECORD SYNTAX AND NESTED RECORDS (Graubart-Cervone) (See "Proposals on Record Syntax and Schemas Discovered as Problems During the Implementation of Client Explain", Jeff Graubart-Cervone.) JGC: the proposal is somewhat controversial. He does not expect the ZIG to approve everything. More discussion is needed on the list. Background: while implementing client Explain, JGC discovered that there is no way to Explain MARC records, etc. There is no record syntax for Z39.50. We have US MARC records, other MARC records, GRS-1, etc. Each has a record syntax distinguishable from the schemas of elements that go into that syntax. He encountered many problems when trying to Explain MARC records, including trying to Explain local tags. We need some conventions for indicators, fixed fields, etc. JGC claims that we are not doing record syntaxes properly. He proposes for version 4: (1) change definition of record syntax (2) change definition of schema (3) adoption of ISO2709 and ASN.1/BER record syntaxes and new record syntax model (which calls for much further study of what constitutes a record syntax) (4) nested record composition (5) allow AccessInfo to be nested in ElementInfo [corollary to (4)] (6) modify RecordTag (required to Explain MARC) LH: ISO2709 is a transfer syntax. MH: the dichotomy of transfer-record syntax and abstract-record (structure) syntax that we did away with is coming back. LW: the standard specifies the default transfer syntax when no transfer syntax is named. RD: the JGC proposal combines many issues. What is the key pragmatic issue? You can not Explain US MARC. Can we solve that without changing the architecture of the standard? Do we need a single record syntax for MARC and to then use schemas for variants? Is it too late in the game to talk about doing that? DL: one issue is Explain; another issue is terminology of schemas, etc. JGC disagrees. DL says "well, you're wrong"; we have had this discussion many times and we all agreed to leave MARC record syntax implicit. JGC: yes, but in the context of Explain, it gains new importance -- what parser should I use to parse the data I get back? RL: why is not "US MARC" sufficient information about the record syntax to parse the record (acknowledging that some special fields would be problematic)? MH: there is some bickering about what records are really US MARC. JZ: that is a separate problem, just like the problem of requesting authority vs bibliographic record is a separate problem. We need to pick one of JGC's problems to solve.... JGC: we need to be able to Explain local fields in Z39.50 version 2 transfer syntax. Here is a concise problem statement: we cannot associate tags with elements. There is only one spot in Explain for record syntax OID. DL: the problem with MARC is that it is both an abstract syntax and a transfer syntax. Z39.50 v2 is the same. Technically we should register a new OID if we do non-standard stuff with MARC records, but many implementors are cheating. There are two possibilities with a MARC database: (1) you asked for MARC and I try to send you MARC ==> (somewhat) honest: it is MARC-like (2) you asked for MARC and I use Z39.2 as the transfer syntax ==> dishonest: it is local syntax LW: this relates to the TagSet discussion and proposal. Different people use the same MARC tags with different semantics. Schema may help determine the proper semantics, and there is a place in Explain for schema, but how do we specify the schema in the response? DL: there is no need to do this dynamically because Explain does it static-ly. MH: what we ought to do is to define a record syntax .... RD: no, we do not need a new record syntax; we need some new schema. Explain lets you specify schema for MARC record syntax. JGC: when you are Explaining your private transfer syntax, what OID do you use? See Appendix 5, page 92 of the standard. RD: we do not have any transfer syntaxes registered for Z39.50, but we could. AGREEMENT ==> the appropriate transfer syntax for MARC is ISO2709; the appropriate transfer syntax for BER is ISO8825. LH: the ONE project also uses line format (?) for transfer syntax that is not ISO. [Sorry, but I drifted and missed the point.] What should we do? RD: you can do an out-of-band agreement (cheat) or you have to negotiate in the presentation layer. JGC's other required issue: Explain does not let you explicitly specify a tag with a certain indicator value and certain semantic. By now, many of us are really confused -- even CL asked: "what problem are we trying to solve?" JGC will settle for something less than comprehensive, as long as they can talk about local tags and exceptions. He needs to craft text for the record tag. DECISION ==> We will resolve this on the ZIG list. For further study: OPAC record is required by some people (JGC). We need some way to agree on this. RD summarized the history of our need for schema and E-SPEC for OPAC record. MH sees no need or advantage to distinguishing different kinds of OPAC records; why not just use GRS? What if you want to specify a different flavor of MARC or if you want to restrict local holdings, etc.? Why not use GRS? CL: how big is this problem? It would be simple to make OPAC record default to US MARC if this is a small problem. LH: it would be nice to request tags by semantics and get the right thing in US MARC, CAN-MARC, DEN-MARC, etc. MH: but that is a different problem. JGC: how does E-SPEC work when there are nested records in GRS? If you had another record buried as an external, how do you request it in E-SPEC? DL: it is a variant. MH and RD want something other than a variant: if you want to package a MARC record within GRS, you must do it with an OID. The short answer: "there is no good way to do this" (DL). RD: you can not yet request a particular kind of MARC, only MARC. We need to assign body part types or change the protocol. BW: this is related to the discussion on the ZIG list about whether there should be a MIME type for each MARC format. We do not have OIDs for body parts. 4.5. OPAC RECORD (Turner) Postponed until tomorrow. 4.6. EXPLAIN SYNTAX (St. Pierre) (See proposal handout.) MSP: can we deliver Explain records in GRS-1 rather than using Explain syntax, as an alternative to having to code Explain syntax? We can do it by defining an Explain schema (as the proposal illustrates). RL has no problems with this if it is a method "in addition to" the Explain syntax, but not as "an alternative to" Explain syntax. What will servers have to support? JZ has the opposite view; he prefers "replacing" Explain syntax rather than providing two alternative ways to deliver these records -- to avoid the risk of the proposal creating more problems than it solves. The PDUs are in ASN.1/BER. Both client and server must support ASN.1; both need not support GRS-1. RD: if we had had GRS-1 when we started talking about Explain, we would have done it that way; the same pertains to Extended Services. Two different approaches bothers RD, but he does not think it creates more problems than it solves. MH: nothing today says that you can not do this. RD: the ZDSR discussion has not caught on or been implemented. Nonetheless, several commercial vendors participated. One of the requirements was to implement Explain, and to profile data with GRS-1 for document descriptors. After serious deliberation, no one wanted to implement Explain syntax. MH: they wanted ZDSR servers and clients, but not to de-stabilize the standard Explain altogether. Anything that indicates that Explain is unstable is bad; we are having a bad enough time convincing management to implement Explain. DL's practical view of the situation: the NISO document is a nice standard, but many implementors do not attend ZIG meetings. If we spring some new thing on these people (for example, ignore those 50 pages in the standard about Explain), implementors of real servers are going to have to implement real Explain anyway. SF: implementors may choose to do one thing now, and another thing in the future. RL: agreeing to abandon old baggage is a hard thing to do. Is this proposal an alternative or a replacement? LW proposed accepting it as an alternative, but not to deprecate the standard Explain. RL: but that means you have to implement it both ways. One way must be mandatory to avoid that. CL: be very careful about ZDSR because it has no constituency. The issue is that we do not have a lot of client support for Explain because there are not many servers. If you require two methods, then you virtually guarantee that there will be no Explain clients. This is a red herring. MH: why are implementors not being given permission from management to code Explain? On the server side, the work is all collecting and codifying data. DL: how much work is Explain syntax really and is it worth it? The mechanics of an Explain-based client is not the issue; the issue is collecting and codifying the data. CL: if the idea is to replace ASN.1/BER with some other syntax to lower the barrier to Explain, it is not clear that GRS-1 will work. GRS-1 encoding is scary -- we would have to move all the text defining the tags over to GRS-1. Maybe we should consider an ASCII transfer syntax. LW: would we have to re-ballot the standard to do this? RD: not necessarily; perhaps we could handle it with a new definition or defect report. An "alternative" could become a "replacement" over time. A "supplement" means what? RD: that we adopt it five years from now based on the subversive group who implemented it this way. MSP will not publicly state that she wants this as a replacement for existing Explain ASN.1. MH: if people want to do this as an alternative, the ZIG should encourage them to do this and go away. DECISION ==> MH concludes that this proposal is a "supplement." DL: "IR-Explain-1" is a name reserved for databases that support Explain ASN.1/BER. A GRS-1 version is fine, but it can not be called IR-Explain-1 unless an ASN.1 version is also available. JZ: is anyone prepared to finish the Explain schema begun in the proposal? 4.65. EXPLAIN LIST (Waldstein or Graubart-Cervone) Makx Dekkers must give up Explain list, and BW and JGC are looking for keepers of the list. Ameritech may be able to maintain the list. MH thinks the ZIG list is not too busy, so this discussion could move there. JGC: people do not want to advertise bugs on the main ZIG list. MH: so Ameritech will maintain the Explain list. JGC can keep the list of Explain issues. RD can point to this list as a separate item on the Maintenance Agency web page. BW: or we could move the discussion to the real ZIG list, but keep bug reports in private mail. JGC agrees. 4.7. ONE IMPLEMENTATION OF EXPLAIN (Kusej) The ONE project implemented Explain on a UNIX machine using a mini-SQL system. They wrote the data-entry system that sits in front of this. ONE publicized the schema as a public document from the project. There is also a second Explain system built in ONE; the British Library wrote the data-entry system (built with MicroSoft Access). The data is represented in text files that can link to other files, icons, etc. The whole thing is loaded at start-up and can be searched. It supports named result sets, etc. It must be integrated with your software. Seven of the servers will be based on the latter approach, and two servers on the mini-SQL approach. They provided RPM to SQL converter. MH: public domain Explain code could do a lot to jump-start Explain. 4.8. ONE API (Bull & Jorgensen) The ONE API was inherited from the German Library Project. ONE added access control, prompt-1, resource control, trigger, explain record, sort, and character set negotiation. It has ILL request and status. The code is in the public domain now from ONE and German Library FTP servers. It is primarily a UNIX system (supports 6-7 flavors of UNIX). They are porting it to Windows NT. The port is going well; the only work remaining is the TCP/IP daemon for the server. The origin port is done and incorporated into the ONE client, but it has not yet been tested sufficiently. 5. PROFILES 5.1. PROFILES AND PROFILING (Denenberg) RD briefly described the process of approving profiles and explained that not everything has to go through the formal approval process. He thinks the Maintenance Agency web page on profiles is confusing. On the web page: CL suggested that we do not just tag profiles as "approved" but "approved by" somebody. RL agrees and wants a current status field added. DECISION ==> RD agrees to make these changes. What about the whole class of profiles that are not yet approved? It is difficult to articulate where we are with each profile, e.g., the collections profile and the various companion profiles; the cataloging profile; ZDSR; etc. BW: some of the profiles are government and marketing profiles. Others actually affect the behavior of the target and origin. The longer we can go without negotiating profiles, where we can discover what we need (the profile in use), the better. RD: yes. There are two classes of profiles: procurement profiles and discovery profiles. He is concerned about the procurement profiles. Once a profile is approved, people get the impression that it has been implemented. So some profiles have not been approved because we chose to do some implementation and testing before submitting them to the approval process. Once approved, it becomes more difficult to change a profile. A "stable" status for a profile means we are waiting for an implementation base. Do we maintain the OIW for the purpose of approving profiles? RL: give me an example of a profile that needs approval? RD: the cataloging profile. MH: the ZIG is not a legitimate approving body. RL: the OIW can approve profiles. Simple example: RL would like to see some agreement about patron-side holds. There may be only half a dozen people in the world who care. He would expect the ZIG list to have a post about it, a list of participants in the discussion, and perhaps a list of implementors. RD: a "profile" is different from an "implementors agreement." The process for implementors agreements is working. Implementors agreements are approved by the ZIG; they are specific details about how to carry out a function. Profiles generally pertain to a particular application or type of information. RL: is the example of patron-side holds a potential profile or implementors agreement? MH disagrees with RD: what it is depends on the context, not the content. What it is may change with the political and economic context. RD is satisfied with the current process. Dealing with these procurement profiles, if there is a body that approves it (other than the OIW), the Maintenance Agency can maintain it. When there is no appropriate body other than the OIW to approve it, then we convene the OIW and rubber-stamp it. RL: this is theoretically acceptable. JZ: but bogus; the OIW is hypothetical, but the rubber-stamp is real. Why do we perpetuate a process that does nothing? DL: there is nothing to be done. RL: the ZIG has changed over time; now the ZIG is a legitimate body to address profiles (it was not in the past). CL: "You are building up to a world of problems here." The ZIG is not an organizational body that can approve things very well. It typically approves things through the Maintenance Agency, which has to be careful about vocabulary and assertions. Registering a profile for reference is different from scrutinized approval. We need to be clear up front about the level of review that each profile has had. RD: but where do you find approval for a cataloging profile or an item-order profile, etc.? Is it possible for the person who brings a profile to the ZIG to specify the body that should approve the profile? If the ZIG does not approve profiles, who validates the technical end of it? CL: "validation" is a poor choice of terms, but "reviewed" or "vetted" is acceptable. MH: the ZIG will gladly give their "opinion", but they cannot "validate" or "approve." There is legal quicksand here: we can not put the ZIG imprimatur on these profiles; if we do, we will have vendors suing us later because the profile was not validated. ZIG MEETING - DAY TWO April 8, 1997 Library of Congress, Washington DC 4.5. OPAC record (Turner) (Postponed from yesterday. See discussion paper.) The paper discusses using Z39.50 to access distributed union catalogs, including access to local detailed holdings and circulation information -- which steps outside of the MARC rules for bibliographic records, for example, by allowing multiple instances of detailed holdings. Some vendors objected and want to use the OPAC record instead of the MARC record. Vendors need to agree on which options to implement (for interoperability). For Z39.50 to be effectively applied, we need to agree on how to transfer holdings information. Many different numbers and tags are currently being used for holdings information. JZ: the paper discusses several ways to transfer holdings, for example, using locally defined fields, multiple searches, a virtual Canadian union catalog, or the Z39.50 OPAC record. The OPAC record has the advantage of enabling transfer of circulation information too, which is not possible with the MARC bibliographic or holdings records. Nonetheless there are drawbacks with the OPAC record (see page 3 of the discussion paper). For example, the HoldingsAndCircData data type does not contain the granularity of information available in the MARC holdings record; designed for OPAC display purposes, the OPAC record does not contain information needed for union catalog and interlibrary loan work. The problem with having separate MARC holdings records is that users expect one record to be retrieved (the bibliographic record) and to do only one search. How do users request the holdings record without doing a second search? GEAC does the second search automatically. RL described the scenario: user does an A&I searching in OVID, then a known-item search in a union catalog to find the document. Disagreement! Performance issues: will people wait for summary holdings information? MH: there is no mechanism inside of the OPAC record to do partial retrieval; the implication is to do segmentation on huge holdings records. RL: who is the consumer of this holdings information? is it the end-user? or is it the client? is the data consistent enough for clients to handle it? FT: people will use this information. We are trying to replicate the same information that people get from a union catalog, but instead of having all the information in one system, we now have a union catalog that is distributed. MH: the choices are limited: (1) do a follow-on search to retrieve holdings (2) use OPAC record with (a) segmentation or (b) disenfranchise large holdings. (3) increase client buffer size (4) a smart client could restrict holdings information Users will look at lots of titles before deciding that they want holdings. Why transfer all of that information when it is not necessary? SR: huge holdings is a red herring; users should be able to get summary information (location and call number) quickly, if not detailed holdings information. She is concerned that vendors have been talking about the virtual union catalog and holdings information for 5 years. For book holdings, a summary call number is adequate, but not for a serial. BW is concerned about OPAC record and the performance hit of a second search for holdings unless the user has shown interest in the holdings. DL: but really short summaries are useful in the initial search; most users are interested in knowing if a copy is available. MH: have you thought about doing this in GRS-1? Yes, but it does not fit well with the logic of Z39.50 (??). MH: talking about what % of users need detailed holdings is a waste of time. RL: but why implement something that only one person on the planet wants? LS: summary holdings are often automatically generated from detailed holdings, not stored separately. We need to figure out what to do. MH: if GRS had been around when we did OPAC record, we would not have done OPAC record. We should recast this in GRS-1. LS: there is no standard that has all the fields we need. MH, RD, and LW agree that the GRS solution settles structure, but not content, issues. The ZIG is not a content expert. We can make our best shot and give it to MARBI to fix and standardize. GRS is our easiest path for future revisions. JGC: we do not have a mechanism in GRS-1 or E-SPEC to do this. JZ: I am confused. Ameritech, OVID, perhaps others have implemented OPAC record. MH: what will happen if we deprecate it? CL: it could be years before people can generate holdings in GRS-1. FT: it also says something about the stability of the standard. MH has not heard a clear and concise statement of the problem. Summary holdings can be solved by (??), but no solution has been proposed for detailed holdings. He does not want to put a search PDU inside of OPAC record to do a subsequent search for holdings. Can circulation information legitimately be put into the 877 MARC record field? OPAC record has a simple Boolean for available or not. BW: MARC record seems to have a schema, so why not fix it where it is inadequate? There are multiple versions of MARC.... (??) suggested using different element set names for summary holdings, detailed holdings, etc., instead of different (well-known) database names. This led to a discussion of whether this is nit-picking. RL: this is not a nit if you have multiple holdings databases. MH: is the issue here still transfer of holdings or is it discovery? BW: maybe we need to develop an element specification for this purpose (maybe something other than E-SPEC-1). Element specifications indicate which elements to retrieve. CL predicts the failure of distributed union catalogs if we make implementation too tedious. This will need extensive profiling. MH: what if we use OPAC record to retrieve the bib record and summary holdings, but provide a different mechanism to retrieve detailed holdings. Summary holdings may be available currently in 15 different fields in the MARC record. Maybe it should be set to retrieve initially only the holdings from the site where the query originated. Sometimes A&I databases and catalog holdings databases are in different locations (local vs remote). DL: that is irrelevant; the problem is that you have two separate entities, and how do we know where the holdings database is and what it is called? BW: the question is who knows enough to do these things, the user or the client? He predicts that clients will get dumber (web), not smarter. Who should know where the holdings are, the client or the server? RL: it can not be the server. MH: Explain can indicate where the holdings database is. FT: her prime objective is satisfying library staff doing known-item searches (e.g., ILL). MH: yes, but we need to propose one solution for staff and end-users, for known-item and subject searches. (??) is doing this already: when users search the catalog, the client automatically pulls out the unique record number, then searches a circulation and holdings database on a separate server. Should a holdings cookie be sent with the bibliographic record? Can we do that with a MARC record? LS: yes, we could use 001. MN: A two-level solution with double and triple searching -- one solution for the catalog world, and one solution for A&I databases -- is fine. JGC: how can we associate the catalog or A&I databases with a holdings database? How do we find out where the holdings are? Can we do this with Explain? No. LS: we should (continue to) allow bibliographic databases to include holdings information. But a union catalog has lots of holdings databases associated with it.... MH: there is no sign of closure on this. RD: but at least we are beginning to engineer solutions, which means that it is time to take it off line. Perhaps we should pursue this at lunch or dinner today. People agreed to continue discussion at dinner. Some people want a documented plan to read and approve. Those interested in the topic will meet with FT at 5:00 today. 3.3. ATTRIBUTE WORKING GROUP UPDATE (C. Lynch) (Postponed from yesterday. See handout.) CL: the subgroup has been meeting for about a year. The problem definition paper was discussed at length at the Gainesville ZIG meeting; an interim report was presented at the Brussels ZIG. The current handout is the final product of the working group, though it will be revised for clarification. The group tried to sort out the framework that should govern the definition and use of future attributes for Z39.50. The most important thing the group recognized was that we need a framework for attributes that enumerates the types of attributes that can be defined by different groups. Attribute sets should be managed by area specialists, not the ZIG. We should have bibliographic experts, A&I experts, geo-spatial experts, etc. The framework would enable the different groups to work together and not have to reinvent the wheel. When they started, the notion was that the group would define attribute set classes, a class being an enumeration of attribute types conformant to each class. They rapidly abandoned this idea because all current attribute sets are of one class. They concluded that, though new classes may surface, the framework defined here should provide the right functionality. What constitutes the current attribute class and its rules? One rule was not to use attribute repetition as a substitute for Booleans. The group took a strong position on data typing: ASN.1 gives us an extensible data-type system; we believe this should be used where appropriate to give specificity and precision to queries. The easiest way to recognize that you are within the attribute set class is to encounter an OID defined as being conformant to that class. The group did not define the interaction (behavior) between historic attribute sets (e.g., BIB-1) and attribute sets conformant to the new class. The group proposed the following attribute types: (1) "Use attributes" are similar to BIB-1 use attributes. They are intellectual access points to information resources. They are based on intellectual concepts. Repetition of use attributes is permitted in this architecture, which means that use attributes can be nested to provide context. Use attributes are numeric values. The group defined a second thing as parallel to use attribute but mutually exclusive: database fieldname. Fieldname corresponds to the name field in a schema somewhere. This enables the recognition of generic access points and specific structural elements of specific databases. Fieldnames can also be nested to give a path inside of a fielded database. Fieldnames are based on real fields in a database. They are character-based. We may also want numerics here. You cannot combine fieldnames and use attributes in a query operand or nest the two types together. (2) "Query management" attributes deal with the execution of queries and can appear (potentially rewritten) in queries that come back from the server, specifically: (a) term weight, (b) hit count, and (c) stop wording. Each of these is numeric and non-repeating. The group propose normalizing with some pair of values. (3) "Qualifying" attributes provide additional information about what is going on with an operand. For example: (a) a language attribute to specify the language that the term-value is in. Uncertain whether this should be character-based or numeric. We could adopt the three-character language code currently in use. The language attribute is non=repeatable. (b) a content authority attribute to define source of term derived from thesaurus, etc. Probably should be non-repeatable, but there is some marginal argument to make this repeatable (e.g., a term that is authoritative in multiple languages). (c) Expansion and interpretation is a broad catch-all attribute for specifying part of speech, stemming, case sensitivity. Open question of numeric or character-based; probably need both. (4) "Comparison operator" attributes are somewhat similar to relation attributes in BIB-1. The group deliberately chose not to call them relation attributes to avoid confusion with BIB-1. (BIB-1 sometimes uses equality when you are really doing containment or pattern matching.) The group recommends that comparison operators (a longer list than relation operators in BIB-1) be strongly typed (numeric, string, or other ASN.1 data types), and include the notion of matching, e.g., grep, regular expressions. Truncation should be folded into match operators. This was a very interesting discussion. There are different kinds of truncation: general-purpose lexical truncation, language (match this phrase against some kind of text), truncate all the words in this phrase and match them. The group tried to eliminate ambiguity here by proposing one set of comparison operators to deal with lexical strings, another set to deals with language (normalizing punctuation, identifying word boundaries, where you extract key terms in hyphenated terms, etc.). (5) "Format or structure" attributes enable tagging different information about character-string value terms where the client might want to indicate what normalization, etc., has or has not been done. There are important boundary conditions here. For example, the group could not agree on what to do with personal names (complicated normalization and matching algorithms). Do people want a heavy-weight ASN.1 structure for the canonical form of names, or do you want to handle them with strings? Format and structure attributes are useful for handling them as strings. These attributes also include indirection, for example, URL or URN to term value -- which may need to be broken out as its own attribute type. We need non-unary operators. Use External in Term to specify data type. When we do version 4, we probably want to replace Term with SequenceOf to support non-unary operators. Version 4 should also provide attributes for operators. Summary of where we should go next: First, avoid political debates by naming attribute sets in a way that does not imply legitimacy or precedence. CL proposed that the ZIG define at least two attribute sets, a starter set, and one other. The next step is to assign groups to complete the starter material. Then decide whether and when to bring various applications forward under this new architecture. What do we do with STAS? Perhaps a better place to start would be a successor to BIB-1. There is broad consensus that BIB-1 has problems, but someone other than the ZIG should work on it. Maybe we need to address A&I databases and catalogs separately. It will probably be necessary to define additional attributes beyond the starter set that (CL proposed) the ZIG develops. All attributes currently in BIB-1 need to be brought forward in the new architecture. This was hard work, but by the end of it, we had reached general agreement. Comments from other working group members: LW: the group also considered the findings and thoughts of the query type 102 (relevance ranking) and STAS groups. RD: how do we interpret when attributes come from different sets? DECISION ==> The conclusion agreed on was to alter the semantics of the top level OID in the query to take precedence over the subordinate OID. Discussion ensued. BW: thanks, looks good; but he is concerned about having both use attributes and names, and knowing what is the preferred thing to send. RF: it is not clear how to distinguish an author use attribute (field 100 query) or author query. CL: there is lots of ambiguity here. RL: great opportunity for profiles and a richer language to express profiling in (because allow strings, etc.). RD: two mutually exclusive attribute types (use and fieldname) in a single operand -- is the value of fieldname the same as that included in a retrieval request? assume use of integer? MN: are you implying that all developers have to use the specific attributes proposed? CL: yes, in terms of using the attribute types, but no, there is no need to restrict implementation to those attributes specifically defined in the report (handout). LW: we need to indicate schema in a version 4 search request. Philosophically people agreed, but we need a mechanism for this. MH: when we looked at the way lists of attributes were used, lots of people were using them to avoid Booleans. People also used them to specify the order of goodness of the attributes. Does nesting prevent this? Can we structure the nest and somehow indicate at each level the ordered list of alternatives? RD: yes (version 3). CL: mechanisms for specifying alternatives should be in the carrier syntax for the query, not the query itself. CL and MH agree to put it in the ASN.1. MH: is it illegal to mix attributes from multiple classes? CL: not illegal, but not sure what results you will get. Mixing attributes from multiple classes in an operand will create problems. The group did not discuss this. It is probably possible to specify precedence in thesauri. Aligning or making CIP and GEO attribute sets profiles compatible...what problems do you predict? CL: the groups work may be helpful as a taxonomy for where to fit things. The CIP and GEO profiles must define many data types for spatial things (currently handled as strings, not numeric coordinates). Eliot (??) group's work will be useful with GILS, but as you nest, what happens to data-typing? Can the semantics get broader? CL: the last data-type (bottom-level thing) is what you are really searching. There is no inherent semantic or inheritance in the nest, only field containment. ??: what is the context for containment of the entire chain of use attributes? CL: the basic issue in nested chains of fields is whether it is anchored or unanchored. Can you limit it to starting there? Do use attributes have to be handled differently in the two situations? CL is uncertain. RF: this work is helpful, but there is not enough progress toward normalizing term, e.g., personal names. We are not sending canonical forms of the names in the query. Same thing with truncation notation. CL: the root of the problem with personal names is that there is no consensus on what the canonical form of name is and what the components of the canonical form are. If we could identify the components of a name, you could write an ASN.1 structure to represent that and map various text name forms into that. RF: but if you had to form a consensus about it, you could at least make the operations explicit. RD: we are not going to develop that ASN.1 notation here today, but it would be good to know if such notation would be helpful. Is the approach of defining ASN.1 structures for names worthwhile? LW: you will still run into different perceptions about last name, especially internationally. RL is interested in explicitly normalized name terms, but clients do not get sufficient information from users to actually make use of a name ASN.1 structure. Area experts should define canonical structures for specific kinds of information. This work can be done now; no need to wait for version 4. Some discussion and confusion about what the group's work means for the way we have been doing things. [I got lost here.] MH does not like the idea of brand new, never-before-seen ASN.1, even as an External. LW: you can assign an OID to the ASN.1 External. What happens if you get ASN.1 that you never saw before? It "should" be ignored, but is it? This is perfectly legal. Agreed. But (MH) we never did this so blatantly. JGC: this will be a great burden on existing clients; users enter a query that the client must now structure and normalize, though the server is better suited to this. CL: as a client, you can either throw strings at servers to do their best, or you can be very explicit at the client, parse the query, and stuff it into a structure that explicitly tells the server what to do. [Lunch break] NEXT MEETINGS August 20-22, Copenhagen, Denmark. With tutorial on August 19 (LW, RL, MH). January 21-23 in Orlando, FL. (ALA is January 9-15, New Orleans.) Then late May, early June in Washington DC (RD must have every third meeting in DC.) 6.3. TAGSET PROPOSAL (Denenberg) (See handout "TagSet Proposal.") RD: there has been lots of discussion on the ZIG list proposing new elements for TagSet G. The proposal addresses outstanding issues that were raised. There is a certain neutrality in TagSet G and TagSet M. When we came up with the idea of schemas for the 1995 standard, TagSet was a supporting concept. Tag paths are composed of elements. When defining a schema, when you need an element to put into a particular tag path, you can use (import) tag types and tag sets already defined or create and define your own. Reasons for having distinguished tag sets: if more than one schema needs a particular element, then that element is a candidate for TagSet G. Advantages: (1) an easy way for a schema to reference the OID of a TagSet, and (2) when you do not want to reference some other profile. MH: TagSet M is tightly coupled to Z39.50 because it is data about retrieval, but he is not sure about TagSet G. RD: it was not our intention to put discipline-specific tags in TagSet G, which is to be generally useful meta-data. TagSet G is retrieval-level (resource) meta-data; TagSet M is transport level (record) meta-data. MH: how does this work? If I can map 5 of my 300 data elements into TagSet G, what does this mean? RL: it means that everybody will recognize (at least) those 5 tags. [side issue: mapping of tagTypes to tagSets. RD: what if I have a schema that is to reference both Collection and GILS information -- how do I do it? There are only three reserved numbers for TagSets 1-M, 2-G, 3-locally defined. Every tagType above 3 is mapped explicitly by a schema to a tagSet, so for example, the GILS schema maps 4 to the GILS tagSet and the Collections schema maps 4 to the Collections tagSet. SO although this doesn't cause any ambiguity, it does add complexity to an implementation.] MH is worried about us agreeing to not have semantics for the tags in the generic set (G). We could also have solved this problem by referencing other profiles (and their TagSet) rather than creating a new generic TagSet. MH is not sure we should be doing this at all. RD and MH argue. What is the Dublin Core group doing that may be beneficial to us or to TagSet G? Nothing.... BW got lost in their argument. Is TagSet G for when you do not have a schema or a profile? Is it like a mini BIB-1? MH: can we put this outside of the standard? are we re-inventing BIB-1? RD: we already have TagSet G, so why are you raising this point now? LW tries to clarify with practical applications. RL: there is a large community that says there are common semantics for tags that cross disciplines (work of Dublin Core). CL: there is no problem using TagSets defined by other groups. MH: yes, but should we make them part of the standard? He thinks not. When you are trying to do general information retrieval, without having a specifically defined context, then use generic TagSet G. CL: the Dublin Core people have spent lots of time struggling with scope, terms, types of objects, etc. They have also tried to understand primary applicability. RD's TagSet proposal is a "grocery list without a scope statement." TagSet G came out of GILS, which provided the scope. MH: TagSet G is currently part of the standard; we can add the proposed stuff if we want and take it all out in version 4. The standard has very good ways of referencing other contexts, e.g., schema. TagSet G is defined in the standard as implicit in every schema. RD: we can talk about taking TagSet G out of the standard for version 4. (MH does not object to the concept of certain elements or even the proposed elements being implicit in all schema. What he objects to is including this list of elements as part of the standard.) RD: we do not have an architecture that guides both retrieval and searching. Our retrieval architecture is currently better than our searching architecture. MH: we do not have a body of experience to really evaluate either. He advocates a certain set of tags in TagSet M that deal with retrieval, and not including TagSet G in the standard. Today's issue is not whether something is included in the actual text of the standard, but whether or not a universal set of tags is acceptable. MH: no, today's discussion is about whether to double the size of TagSet G. CL: why add these particular elements to TagSet G? what is the scope of this TagSet? RL agreed that the proposal lacks a scope statement, and proposed that OCLC put some "rigor and rationale" to TagSet G. RD warned about impact on the existing standard. RL: if it is not rational now, and we want it to be rational after OCLC works on it, something has got to give. LS: the existing TagSet has no semantics (in the standard); it is just a number space. RL: we could officially abandon that number. OCLC will help us "again." See page 141 (section 2.2.2) of NISO standard for pertinent text. CL: the Dublin Core has a substantial following in the IR community. We will have to do a mapping from Dublin Core to Z39.50. MH: maybe we can replace TagSet G with the Dublin Core. RL will prepare: (a) a rationale for the existence of TagSet G (what we are trying to accomplish), (b) a scope statement (what might we add in the future), and (c) some determination of whether there are things in TagSet G already that are out of scope. FT: we should do the same thing with GILS. RL recognizes the problem and will try to address it in what he prepares. 5. PROFILES / UPDATES 5.1. CIP UPDATE (Percivall) (See handout.) Status of the CEOS (Committee on Earth Observation Satellites) Catalogue Interoperability Protocol (CIP) to enhance catalog interoperability and ordering by standardization. Catalog services include directory, inventory, browse and guide. CEOS protocol task team created a user requirements document for an Interoperable Catalog System (ICS), a system design document, and protocol specification. The system has three levels: clients, servers, and middleware that routes messages to users. Collections are defined from the users point of view. Users can create collections on the fly. The data are really organized underneath according to the satellite that gathered and communicated them. Four terms are used to describe how to retrieve information from the four data structures: (1) discovery (of arbitrary collection) (2) navigation (from one collection to another) (3) searching (based on collection attributes) (4) locating (find collection based on URL or URN) Users can search locally or do a distributed search. They can search collections or products. CIP attributes were developed from multiple profiles and expanded. They have collection-specific attributes. The semantics of local attributes are handled using Explain. The new release provides secure ordering using a task package in Z39.50 Extended Services. Item Order was inadequate for their purposes. They provide four operations: order validate and quote, submit order, monitor order status, cancel order. Security is provided by authenticating the user and doing non-repudiation for orders. Users belong to groups and groups have privileges. An agency retrieval manager can be proxy for a group. In early 1998 they will enable services for search and retrieval of geo-spatial and climatic research data. A "lessons learned" document should be available soon. The ZIG expressed interest in CIP experience with Explain, etc. 5.2. UNION CATALOG PROFILE (Gatenby & Pearce) This profile was first presented at the Brussels ZIG (October 1996). The purpose of the profile is to update the local catalog and simultaneously send the update to the union catalog. (US MARC is becoming standard in Australia.) The profile suggests three revisions to Extended Service Update: (1) record locking - The profile currently does not assume record locking, but uses a date-time-stamp or magic cookie to verify the version of the record. The ZIG requested record locking in Brussels. Three alternative solutions were discussed: (a) use the method GEAC uses (lock the task package) (b) create a new extended service called "lock" and negotiate in the init (c) add some new actions within Extended Services Update to lock and unlock records GEAC: the problem is when to lock the records. When the user starts working, the record is locked to say they have the most recent one, then the client (not the server) requests or unlocks the record; when the user wants to update the database, they lock the record again, get a new copy, and check for differences. Update does not work if the record is not locked. OCLC lets records be locked for a long time (e.g., a week) if that is how long it takes the librarian to do the work. RL wants the standard to support what they do and what GEAC does (implementors choice). The standard does allow choice. RD has a problem with a task package called "lock." This is not a good general solution for Extended Services. What if you want to lock multiple records or retrieve the task package itself? The Australian ZIG folks prefer new actions or no locking at all. The granularity of time-and-date stamp is not fine enough for current purposes. LW, DL, RL think creating new actions for lock and unlock is a good idea. It is clean and satisfies the current need. MH: is this a profile or a general ZIG solution? A general solution would require more work. LW: this is a good solution for MARC records, but not for other record formats. It would be nice if we could generalize this for GRS or SUTRS records, etc. JG was not sure who did Update.... RL: how does authentication fit into this? If authentication is required, how can there be a question about who did the Update? The owner of the lock. A new action field for lock and unlock is an appropriate solution to the problem. LW suggested a record-independent way to provide a time-date-stamp as fail-safe. JG explained that the user retrieves a record, submits a request to lock the record (the request contains the unique record identifier and a time-date-stamp), then the server confirms that record is locked or sends a new version of the record that is locked. DL and RD agree: the server should be able to do this. RL thinks this is an unnecessary burden to put on the server. An alternative would be to return the record with a lock response. RD: you need to set this up as a wait situation and return a task package. What happens if lock succeeds, but the server can not send the record? We need diagnostics. Servers not willing to return records in the task package should return diagnostics. DECISION ==> DL, LW, RL, RD and JG will work together to figure out how they would use this and help flesh out the profile. The profile does not require locking. Disagreement about whether or not there should be a way to specify a "wait" period or whether to distinguish between (suggested) retention time of lock and retention time of task package. Task package must stay around for status information or for statistical purposes. MH: should we talk, not about retention time, but "expiry"? RD: NISO would not let me use that term in the standard. DECISION ==> Consensus reached: add optional actions for locking and unlocking to Extended Services Update (not a change to ASN.1). Also add a parameter for specifying time for lock (this is a change to ASN.1). RD is not sure when we will see these changes. (??): what is the scope of the lock? Only the record, the record and its attachments? Do we need a way to specify that? (2) special instruction element-update request - It is not good to put this kind of information (e.g., merge one record with another in an Update request) in the OtherInformation parameter because this parameter can be legitimately ignored. JG proposed a new element called "special instruction" -- which cannot be ignored -- to handle things like merge. Discussion / digression: We discussed this in the past, but we may need a "merge" action which is different from delete or replace. Who or what knows how to merge records? JG and MH say the server knows. (??) says no, based on two years experience: the only one with the intelligence to merge the records is the human being using the client. We spent some time clarifying what JG means by "merge." Perhaps a better name would be "preserve links" or "move links" -- evidently we want to (a) replace the mother record, but to keep all links and children, grand children, etc., and (b) merge two mother records and keep all links and children, grand children, etc., of both mothers. DECISION ==> LS gets us back on track: give JG the parameter she needs and asked for (special instruction). Agreed. (3) target response -- Inclusion of a database record and a diagnostic message in the task package -- JG wants both. Confronted with a real application rather than theoretical talk, RD does not have a problem with this. The changes will not break GEAC, but they do raise issues of interoperability. ZIG MEETING - DAY THREE April 9, 1997 Library of Congress, Washington DC 5.3. ONE PROFILE FOR BIB-1 (Holm) (See handout "BIB-1 profile for ONE.") LH: ONE began with just a limited set of use attributes, but that was not enough. They added truncation. Then realized that structure attributes were also needed in some cases; structure attributes are currently not working properly. They chose a few legal combinations (see page 2 of handout). A "phrase" is a sequence of one or more words where the sequence is important. A phrase does not have to start at the beginning of a field, though it may. They need to look at position in field -- and to look at user satisfaction since phrase does not necessarily begin the field. The ONE profile is an extension of the ATS-1 profile (Author, Title, Subject). LS asked for some clarification of the vocabulary (e.g., any, year). RL: why is Scan not required? The phrases are more helpful if users can Scan. LH: scan is required; the handout just omitted it. Users can search for more than they can scan. RD: is this part of a more general profile for ONE? The handout addresses more than just BIB-1, e.g., preferred message size. LH: yes. Questions to clarify target requirements (section 5.2 in handout). 5.4. MODELS Z39.50 LIBRARY INTEROPERABILITY PROFILE (Davidson) Acronyms: UKOLN = UK Office for Library & Information Networking MODELS = Moving to Distributed Environments for Library Services The MODELS program is a series of two-day workshops to address problems with distributed library systems: (1) document discovery and request (2) meta-data for network information objects (3) organizing access to printed scholarly resources (including "clumps" -- see below) (4) integrating access to resources across multiple domains (prototypes available) (5) managing access to a distributed library resource They may be able to "clump" several databases together as one, if only temporarily. Currently clumps are theoretical, but there is funding to do research on clumps. The milieu: large scale resource discovery across the UK higher education sector. Z39.50 is the enabling technology. Interoperability depends on semantics -- MODELS / "clumping" require an interoperability profile. They draw on other profiles, including ATS-1 and ONE. The MODELS profile incorporates the concept of an incrementally sophisticated "family" of profiles. Which means that interoperability will get more complex over time: * level 1 - assumes version 2, based on ATS-1 (one use attribute for each) * level 2 - assumes version 3, richer search capability (classification, control number, date range searching, etc.) * level 3 - level 2 + profiled use of Scan * level 4 - level 3 + profiled use of Explain Levels 1 and 2 are currently well defined. Levels 3 and 4 are not yet clearly defined. Conformant systems must support at least one of the following transfer syntaxes: US MARC, UK MARC, or SUTRS. A brief record must only contain Author, Title and Publication Date (may be a date range in the case of a serial). A full record may contain as must as the server is willing to give. This is predicted to change with experience. "Conformant" means that you support some of the things in the profile exactly as described in the profile, otherwise you fail the query. "Fully conformant" means that all attribute combinations in the profile are supported according to the semantic rules. There was some discussion of this vocabulary (RD, KT). KT recommends looking at baseline requirements for version 3. The actual profile is very large. It describes the semantics of attribute combinations. Appendices include BIB-1 attribute set descriptions with mapping of use attributes to both US MARC and UK MARC fields and subfields. Future developments: additional support for serials holdings within level 2, level 3 Scan profiling, and level 4 Explain profiling. The milieu may change to need a higher level, cross-domain search profile (e.g., archive, library). The profile is available (Word 6.0) at http://www.ukoln.ac.uk/models/zprofile.doc They may collect and harvest Explain records from other people's databases to facilitate discovery. They would use these to create transient clumps. BW: my brief records are not conformant with yours; management requires location information -- he does not want to have to negotiate profiles. ED is also leary of brief record as profiled and understands the desire not to have to negotiate profiles. MH: will this lead to the development of new software? Current clients may be configurable, but current servers are not (as configurable). ED: the UK uses mostly American software. SR: is anyone examining the political aspects of the technical issues that you are addressing? ED: the background framework is to facilitate discovery by students, faculty, etc. SR: are you addressing issues or policies about lending, borrowing, etc.? ED: it is just beginning to dawn on folks what all of this means. There are currently only a half dozen Z39.50 servers ("toys") in the UK. There is fear that a Z39.50 server will mean that people can steal their records, and that everything will be on the web. The need to do local ILL in the UK (rather than automatically getting everything from the British Library) may drive Z39.50 implementations. 5.5. PTO EXPERIENCE Deploying 2000 seats of Z39.50-based client (CGI-binary stack developed for National Library of Canada), version 2, for patent examiners. Using CAS messaging interface that facilitates queries in version 3 (through MSI messenger to retrieve patents). Patents are structured documents, some up to 3000 pages long. Two million patents are tagged and stored behind the messenger; the database is growing at the rate of 200K per year. All of the patents are searchable and include clipped images (approximately half page). They decided to use Z39.50 because they want to be independent of search engines, using same client and a gateway with an account. The client will be deployed electronically using a TCP/IP network. It is a 16-bit based client developed for 386 and 486 machines. Besides this DLL client, they are going to deploy another client in April, based on Netscape, and originally developed by CNIDIR. This client will be available outside of the patent office network. Casual users (as opposed to professional examiners) can use the Netscape client to talk to a server that translates HTTP queries to Z39.50. Depository libraries will use both web and DLL clients. In June 1997, they are going to deploy a middle server called Amicus (CGI software developed for the NLC). Amicus is based on Z39.50 and SQL. Databases will include European Abstracts; Family Data (pointers to the same patent in preferred language); a database of structured documents that guides examiners to do classified searches; and a server for handling images. Amicus uses the Topic search engine. They will compare retrieval results from the CAS messenger and Amicus/Topic queries. They will also enable outside accounts (outside of the Patent Office), Patent Office access to databases outside of PTO, and a firewall. BW: is the Patent Office selling access to patent databases? JZ: by policy, outsiders have no Z39.50 access to patents. For PTO to sell access would put PTO in competition with other vendors. JZ: CGI provided the Z39.50 client API to the Patent Office (a Windows DLL based on an API used in the Amicus client). What was most interesting was discovering the huge learning curve of application developers on user interface issues. They spent substantially more time on support than expected. They discovered that the original API was a close approximation of protocol messages, complexity, structure, etc., and built a new API on top of that to simplify it. The big architectural decision made early on was to not return images using Z39.50. Records would contain pointers, but clients would use HTTP to retrieve (FAX group IV) images. The image viewer is integrated with the searching interface. The production version currently supports Scan, SUTRS, and some other version 3 features. They modified the query parser to use the messenger query language rather than Z39.58 (messenger syntax was preferred by patent examiners). They decided not to build support for record syntaxes tightly integrated with ASN.1, but to keep them separate so that they can be decoded. They changed the implementation to be even more configurable. On the server side, they are in the process of providing the Amicus search server with changes to date that match what the Patent Office needs. One set of data is provided on tape as SGML documents (DTD specified by World Intellectual Property Office); an overlapping set of data come on tape in ISO2709 format, which is fascinating MARC-like encoding with different use of subfields. It is "intellectually refreshing" to see this different approach. The SGML data loaded into Amicus is only the front page of the patent, which is straightforward because highly structured. One of the projects they did for the Patent Office was to change the full-text engine behind the Amicus server; they changed from Fulcrum to Topic (currently in testing). The product can now use either engine. Query management is all table-based. They have talked to Verity about Topic eventually supporting Z39.50 queries (rather than needing a translator or tables). 5.6. ZDSR UPDATE (Denenberg) RD: we had a ZDSR (Simple Distributed Search and Ranked Retrieval) meeting at the Brussels ZIG. We are working on preliminary draft 5 now. Final draft 5 will probably be the last draft/version until we have implementation experience. We presume that there is no constituency for ZDSR, but we need to continue this work because the important work has already been done (!). When we started ZDSR a year ago, we thought it would take 2-3 weeks to profile; it really took months. Had we had the profile available during the window of opportunity when people needed it, it would probably have been implemented. We want to be ready if the window of opportunity appears again. Also, ZDSR has some features that just are not available in any other profile, e.g., ranked retrieval. Probably the most important reason to keep this profile alive is not to perpetuate current web (distributed) searching, but to provide distributed searching of real indexes with better search results than the typical web-crawler. If we could get one or two renegade companies to implement ZDSR, this may change. Currently web vendors do not want to go to the expense; they are satisfied with current search results. RD: STARTS is a bad protocol for what it tries to do. ZSTARTS is somewhat better. We need to have this profile available for when the need arises (again). MH: the approach of trying to get the browser manufacturers to do this is not the way to go. The better approach is to get the database providers to do it. LW: ZIG members could organize our clout to make a business case for vendors to do this. RL: OCLC is adding a module to do Z39.50 queries with Altavista. The point is that while Altavista is not doing the Z39.50, they do recognize the need for an API. MH: we use to think of Z39.50 clients sitting on the desk top, but it may have been better to put Z39.50 into the web-crawlers. BW does not want the performance hit of web-crawler searches. MH predicts that the web's static index will fail. LW asked a question about the wording: "any retrieval is outside the scope of the profile." DECISION ==> Can we encapsulate init, search, present and close? RD: yes; he will add text that says this is consistent with the profile. 4.3. CHARACTER-SET AND LANGUAGE NEGOTIATION CLARIFICATION (Postponed from earlier.) The ONE project has proposed an additional field to request that the server return records in the character set that has been negotiated in init. There was a request for clarification that it can be over-ridden with E-SPEC-1. If character set has been successfully negotiated, but for some reason the target cannot retrieve that record in that character set, then the target should return a diagnostic. LW wants the origin to be able to use E-SPEC-1 to get that problematic record (i.e., override for this particular present). The clarification will be posted to the ZIG list, along with the diagnostic. 6. NEW WORK AND VERSION 4 ISSUES 6.1. Z39.50 / SQL SUMMARY (Finnegan) The proposal is NOT to do SQL per se over Z39.50, but to just do querying and retrieval (no updates, etc.). She proposed (a) a new SQL query type, (b) using a new FieldName attribute type, (c) minor additions to Explain, and (d) SQL record and error syntaxes. The meeting Monday evening focused on core issues that must be agreed on before implementation details can be addressed. Why do SQL within Z39.50? What if you want to find out the names of authors and the numbers of books they have published on a particular topic in the last ten years? You can do one SQL search or many Z39.50 queries to get this information. The type-SQL query conforms to query expressions in SQL3. The new ResultSetName _may_ conform to tableName in SQL3. There is a core of things that a result set can do, and another group of things the result set can do depending on the query type (e.g., type 1, type 101). These differences should be elaborated in Z39.50 version 4. No consensus was reached on how to use Z39.50's abstract notion of a database in SQL. This needs to be refined. Sonya proposed using the string FieldName (from CL's report) for mapping table fields to database fields. This was discussed on Monday, but more work needs to be done. SQL record syntax (SQL-RS) eliminates the need to tag each individual field for each record, not tied to particular schema, etc. This would support SQL3 (which is in draft stage) and input from other standards groups and commercial vendors. SF is talking to vendors of object relational databases to make sure that the implementation can handle their data. MSP: what sort of data types do you need that are not in GRS? CL told SF he didn't think GRS was adequate -- otherwise she would not have proposed a new record syntax. Without a new record syntax, you would have to tag all elements and negotiate elements (E-SPEC ?). SQL-RS is template data (column names) followed by table data (rows that fill columns) -- and all of it comes in one PDU. In contrast, GRS sends each row (so to speak) in a PDU. GRS sends one External per record; SQL-RS sends one External per result set. MH: this needs to be worked out, but not at this ZIG. SF would like a discussion group at the next ZIG. LW: relevant to this discussion, is it your expectation that the client at Z39.50 origin is going to be typically SQL-smart, or smart enough to express an SQL query, but on getting results will its main job be to display the results, or to do SQL work? Sonya: it could be both, which is why she proposed the SQL record syntax. 6.2. SGML VIA Z39.50 (Denenberg) RD: the goal for this meeting is to get some sense of how people want to do simple-case SGML. If you are transferring structure data or you have different elements or types within a given record, or complex SGML with external links, then you must use GRS -- which was developed for that purpose. If you have plain vanilla, simple SGML, people want to be able to transfer it without packaging it in GRS. Assuming that that is correct, there are a couple issues and alternatives. One approach is to assign an OID for SGML. But why would we do that for SGML and not for the other MIME types? (In the past we did this for HTML as a special case.) Is it good or not good to assign an OID for SGML? The argument against assigning an OID for SGML and HTML is that there are other bodies that register these body types -- why should we maintain a parallel register? Those in favor of assigning an OID argue that it is a better model than the Mime model, based on character strings. If we were to go with the OID approach, that would probably require a change to the variant-1 spec of GRS because we will still want to transfer stuff in GRS, but we don't want two ways to transfer SGML. The other side of this: deprecate HTML and just go with MIME types. The implications of this approach are that you cannot have a record syntax. This would be in the queue as a change for version 4. For now we could define a MIME record syntax with two fields, one for type and one for content. This approach would be essentially deprecated when we got to version 4 and made a protocol change (added ASN.1 for MIME type). RD is not sure how you represent SGML using MIME. We invented a Z39.50 way to do it (3-4 years ago when SGML had no MIME type, only HTML). Now there are two registered types of SGML: text-sgml and application-sgml (to be interpreted by an application). There was dissatisfaction with a new record syntax. MH: SGML and HTML are the only MIME types that have structure. RD wants to hear an argument about why we are having this discussion in the first place. Why is GRS inadequate for transferring SGML? MH: my server serves SGML but does not do GRS. BW: the web is doing SGML worse than it does other things; his perception is that there are two different ways to specify record syntaxes (OIDs and IANA/MIME types). He has a private OID for PDF, and does not want the wrappers and E-SPEC that GRS implies. RD: why is GRS worse? If we define an IANA syntax with just body type and content... MH: to avoid a namespace issue you (RD) want to force people to do GRS? LW: but we want to avoid multiple ways to do the same thing. MH: but we already have a mechanism for recognizing.... The general response to MSP's Explain proposal was to do it only one way. MH: but this is disingenuous -- now we have a situation where there is no structure; GRS dealt with structure. It is easier to sell an existing OID to management; having to implement GRS is a deterrent. LW: we can put the GRS code in the public domain. MH and RD fight for a while. JGC: it is OK to have OIDs and put things in GRS. Objection from MH and KT: clients would have to know all of GRS-1. This led to a debate about the content and goal of the discussion. MH proposed an OID for all document format types UNTIL version 4; we do something else in version 4. RL: by assigning a separate OID, it becomes trivial to use record syntax to say what you want. BW: the basic issue is that the current registry of document types is strings! We need OSI OIDs. OIDs have more international acceptance. There are no language problems; no ambiguity. RD is not completely opposed to the OID solution, but he is scared. It is not a name- or number-space problem, but an administration problem. The alternative is to use GRS to check the available variants and request one of them. DL: we are crafting a solution, but we do not understand the problem. What we want to do is to retrieve documents -- to tell the server to send me what it has and to tell me what it is (OID). We do not need discovery here. This whole discussion was unnecessary. JGC: if we supply an SGML OID, we are missing the DTD. RD and MH agree: we will NOT register OIDs for individual SGML DTDs. [ lunch break] RD: we will not decide here on the one, right way to do this. Assume that we are talking about a non-GRS method to transfer different types of files. We have covered the various options. Should we be defining OIDs for record syntaxes for the types of documents that we want to transfer? Can we do this with X400? Assigning OIDs is OK, just look at the model that X400 uses to resolve the OIDs. MH: if we decide to do this, we should register types as they are requested. LW: will there be any attempt to define a record syntax in this context? HTML and SGML may be enough like record syntaxes to say OK, but what about proprietary examples (e.g., MS Word). RD: there is no number space problem, so we can register even proprietary formats. Could we structure the record syntax tree so that MIME types go levels down? JGC: we do have a number space problem, the text says.... RD: we will correct the text. MH: in the case where you are not attempting to discover (you have only one type), the solution is to use the OIDs that will be defined. In the case where you have to select from one of multiple record syntaxes or you have to discover the syntax, then use GRS. LW: just say that there are two ways to do this, two kinds of acceptable practice. DECISION ==> Send types to be registered to RD. Within a few months we will have a better idea of the scope of the problem. We will assign provisional OIDs immediately and (if there is lots of activity) ratify them at the next ZIG. We need to discuss the record syntax tree since we do not want a flat list. 6.3. PRUNING Z39.50 (Denenberg) (See handout / proposal.) RD began with some preliminary observations: CL provided the motivation for RD to prune Z39.50. Pruned Z39.50 is NOT version 4 or an attempt to begin version 4. There is no intention of deprecating functionality not included in Pruned Z39.50. MH: what is the purpose of this document? RD: to provide a self-contained document that is a supplement to the actual (forthcoming) version 4 that enables implementors to do a stripped-down version of Z39.50 that is NOT compatible with version 3. ZIG participants seemed to agree that many implementors are afraid of the whole standard, but there was considerable controversy over the Pruned Z39.50 proposal. Should it be compatible with version 3? (The proposal throws out some version 3 mandatory parameters.) Why does it assume version 4 instead of version 3? How can you prune version 4 when we do not yet have version 4? RD: in the past, the discussion of Z39.50 Lite quickly became pejorative. MH: that was because it was linked to ZDSR. What are our objectives? Suggestions included: (a) to toss out useless parameters (b) to give implementors a document that is smaller than the whole standard (c) to provide a fast track for implementing a small set of quick and easy services (d) to provide stateless service (MH: encapsulation makes the standard look stateless.) We need to decide what we want to do before we can decide how to engineer it in the standard and whether to engineer it in version 3 or version 4 (BW, DL, MH, SF). MH, BW: if the goal of Z39.50 Pruned is to simplify implementation, some of the services included in the draft are seldom used and not simple to implement. MH, LW: if people implement Z39.50 Pruned, they may (will?) not be compatible with full Z39.50 implementations -- which is contrary to the whole purpose of Z39.50. DT: we have complained at the last 6 ZIG meetings about the size and density of the standard. Do we really need a new version of the standard, or do we just need to test the documentation and revise it or prepare supplementary documentation? MH: we talked about this before, but no one has the time or resources to test and revise the documentation. At this chaotic point, MH and DT had to leave for the airport, so someone else must provide details of what happened during the last hour of the ZIG meeting.