From adv-search-list-request@lexis-nexis.com Sat Sep 30 10:42:15 1995 Organization: National Institute of Standards and Technology, Gaithersburg, MD Date: Sat, 30 Sep 95 10:23:52 EDT From: lwibberle@cas.org (Les Wibberley - CAS - ext. 2330) Subject: Raw Notes from the 9/95 Advanced Query Group meeting To: adv-search-list@lexis-nexis.com Cc: lwibberle@cas.org, jlahm@cas.org, jskevingt@cas.org, lwibberle@cas.org, rledwith@cas.org, tbangert@cas.org, vnichols@cas.org, mmarquand@cas.org, jsjostrom@cas.org Cc: thill@cas.org, janson@cas.org, jamoss@cas.org, mpiekenbr@cas.org, jdrobina@cas.org Content-Length: 22474 Dear Advanced Query folks, Here are my raw notes from the Advanced/Type 102 Query meeting this past week in D.C. Please note that these are very raw notes (accuracy not guaranteed), and are not intended to serve as official meeting minutes. Peter and/or other participants may wish to clarify/correct some of these items, as necessary. -Les. Type 102/RLQ meeting 9/27/95 - 9/28/95 Peter: provided overview & background of RLQ effort. (new draft of RLQ distributed, dated 9/22/95) Q: is the result set resulting from the restriction accessible to the client? A: not yet determined. But could specify a result set as the restriction. This hasn't come up yet. Could return name of the result set in the additionalsearchinfo structure (as an intermediate result set). Ray: would like to be able to specify the restriction as an RPN query. Need to identify what the requirements are before specifying how to specify it. Q: How would query reformulation happen? A: Simplest: return the query in the form submitted for review, perhaps as an external, like within additionalsearchinfo. Ray didn't like this approach. But the current searchResult-1 already contains the target's interpretation of the query: subqueryInterpretation. Since the type is query, this could carry the type 102 query. Q: use of ResouceControl where target prompts the user for reformulation? Option of user seeing the results of query formulation should be optional. The subqueryInterpretation could be carried in either ResouceControl or SearchResponse. Q: target maintain state of the query formulation. The protocol definition should neither assume nor preclude a stateful nor stateless implementation. The user may reformulate, throw away, or submit as formulated. Q: how many modes of usage do you envision? A: two flags: return reformulation and search. Any combination of the possible settings. Q: what are people currently doing along these lines? What mode will people expect to be standard behavior (most common)? Set reformulation OFF, returnResultSet ON, mData OFF. Q: what if server doesn't support returning reformulated query? A: This would be a diagnostic. Q: What if reformulated Query doesn't fit in the SearchResponse? A: could create a retrievable metadata element containing the reformulated query, which can be retrieved via Present, and use segmentation. This is a good example of the creation of result-level metadata. Create a logical result set containing one record containing the reformulated result set, which the client can retrieve using present with segmentation. Make this an option to returnReformulatedQuery. Issue of annotating the query on return (actually expanded, plus human strings). But the query ASN.1 would include optional fields to carry the annotation. The client could turn the query around and resubmit to the target. Q: necessary to remove before resubmitting to the target? Return the query twice: one normal, one annotated? Issue of reformulation by database. Additionalsearchinfo does allow reformulated queries to be returned by database. This is perhaps a valid issue, since the databases may reside on different servers, which use different search engines with different reformulation logic. In oversized case, the result set could contain a record for each database. Above was discussion of reformulation (item 6 on the agenda). Paul: How will we proceed? Just jump into the agenda? Return to restrictions. (item 1 on agenda) Q: does the restriction need to support multiple databases? For example, look at the examples on p.5. Database names need to be specified by component since, it may be a multicomponent query. This could also be accomplished by having an attribute for databasename, and express in the query. Peter had issued a list of candidates for restrictions. Not sure about supporting the restriction subset of a group (cluster) database (2 out of 100). Other components included: result set, RPN query, etc. See 9/5/95 message. Redefine document as perhaps a record or something else. John: how about making this http compatible? Amended asn.1 handout has proposal for specifying restriction. Q: need to include list of databases excluded by the databasenames (subset of cluster database name). Not peculiar to the type 102 case. Therefore, propose this to the ZIG for consideration. This could be done via an attribute for databasename, and use it with an ANDNOT operator in the boolean query. This would be useful in type 1/101 queries as well. Ability to apply part of the query criteria to a subset of a collection. this same concept applies to the digital library profile, as well. Restrict set is a combination of inclusions and exclusions. Q: is a sequence of restrict criteria adequate to express this? The resultsetids could be expressed via the RPNQuery. So could localDocids? What is the precedence of evaluation of the restrictSet? This may need to be spelled out explicitly. localDocids may need to be characterstrings, instead of Octetstring? turn docid into a URL? What is the requirement? Where do the docids come from? from a previous result set record? Need to restrict the answers from a specific resultsetid? Can we collapse these all into an RPN query? We could use the restriction feature of the attribute plus result set, define a new USE attribute , where the term is the number, and a range could be expressed with an AND construct. It appears that all of these restrictions can be rolled into the RPNQuery under the RPNQuery. Now, is it an RPNQuery, or a Sequence of RPNQuery? Needlist then collapses into an RPNQuery. No. Needlist is a sequence. RestrictSet collapses into RPNQuery. combineNeedLists says how to combine the results of the queries specified by the needlists. Issue of the meaning of default. Leave it in. There will be a wide variation of how servers will interpret, and a client will not always have control over this aspect. How about databasenames. Addweight is an algorithm, perhaps based on the weights applied to each individual need statement. External allows for other algorithms to be specified. Ray: add an indicator to combineneedsllist: a new flag : 1. use specified algorithm, or fail search. 2. recommend algorithm, but use other if better. 3. do what you think is best. Q: can we remove database names? Yes, define an attribute for database name. Under the NeedStatement, can the rqquery be optional? Could do an RPNQuery, with relevance feedback. Allow the type 102 query to degenerate to an RPN query in the simplest case. The original set of database names in the Search request is the original universe of databases. RelInfo comments: relevance is perhaps the same as weight. Perhaps collapse Relinfo into Rqquery? These are documents cited for relevance feedback. WAIS profile has a way with a combination of attributes to express relevance feedback. The WAIS approach may not work in this environment. This is different than querying. This provides a set of documents similar to these identified documents with relevance. Can express positive and negative aspect of relevance. Need to define what 0...1 means. If 0=relevant, 1=relevant, then .5 is don't care. Could have separate flag relevant vs irrelevant. The relevance indicates the degree to which it is relevant/irrelevant. Q: need to add databasename? Why is this a localDocid? Could this be a URL? Some question about this issue - defer resolution on this. Tomorrow: meet here at 9am room G35. Chris Buckley opposed to putting the databases down within the type 1 query. Trying to merge databases is one of the unsolved research problems in this area. Trying to merge ranked list results is unsolved; doing it at an intermediate stage if databases are buried down in the query is hopeless. Current syntax only combines it at the top level, where you can specify how things can be combined. Trying to specify this at a lower level, including databases at both the lower and higher level is not reasonable to try to support. If you split a single query into multiple queries, have to have a mechanism for handing back the results for multiple queries, so that the server can combine the results for the client. Question of supporting multiple databases? How to simplify the options? Keep support for the multiple databases in the protocol ASN.1. But initial implementations may be limited to single databases. If reformulation must occur per database, then a reformulation may need to be returned one per database. Q: define a query more powerful than initial implementation, and profile down for initial implementation; later perhaps revise the query based on implementation experience. A number of the search engines on the net are doing this type of stuff (like InfoSeek), but not in the same way. InfoSeek is a prime example. Issue of supporting databases as an attribute? this happened as a byproduct of simplifying the restrictSet definition, rolling it into the RPNQuery structure. If this causes concern in the ZIG, that we back off on it, rather than debate at length? Ray: No, prefer to keep the definition simple and elegant, for better acceptance. Noted need to insure that the requirements and needs captured in the text are adhered to, and that we don't lose sight of it relative to the ASN.1. How to proceed? continue with the ASN.1, or discuss the document? Work from the ASN.1, but refer to the text. First, recap the discussion from yesterday, and reconfirm our agreements. 1. What localDocids mean, and how to use. under restrictset and under RelInfo. under restrictset, fold into the query. 2. Change to combineNeedList. Sequence of 3 options, plus choice already there. 3 options: server choice, use specified algorithm, or try to use specified algorithm Chris: Does this already get covered under clientServerInfo by reformClause? How specific do we want to get? Combining needlist rolled into reformulating? Reformclause can be attached to every operand of the query. Opinion that combining the need lists is separate from the reformClause. There are tuning knobs at different levels of the query. combineNeedLists addresses how you do fusion - research topic. The reformClause addresses a different need. The clientserverinfo is at highest level, as an overall default, but can also be attached at the per-term level. the reformclause is perhaps not needed at the top level. Still useful to have top-level clientServerInfo. Need at both levels. How about the relationship between combineNeedLists and clientServerInfo. Agree that they are separate? Apply at separate stages of the query. The reformclause applies to reformulation. Keep structures as they are. Agreed to the extensions to combineNeedLists. Issue of DatabaseNames being rolled into the RPNQuery. Databasenames in the search apply to the RLQuery. Tentatively: collapse RestrictSet to an RPNQuery. Chris: not collapse databasenames into the RPNQuery. Not treat them as an attribute in a rankedlist system. it is a matter of semantics. Databasenames that are in the restriction, it is also the set of databases that the rlquery apply to. Agreed compromise: sequence of : databaseNames and RPNQuery comprises the RestrictSet. Query is optional. Database names semantics: inclusion, exclusion, or what? These are the only databases to be used was intended meaning. This should be a subset of the set of databases specified in the SearchRequest. Perhaps add another parameter which specifies excluded databases? Make it a choice between included or excluded databases. All referenced databases are a subset of the full list. Make both the database list & query optional. RelInfo - relevance - allow negative numbers? Yes. AttributeSet parameter applies to everything except the RPNQuery (RPNQuery requires an AttributeSet to be specified). This applies to everything except the RPNQuery (the RestrictSet). Q: rename relInfo? Call it FeedbackInfo. Range is -1 to 1 range. Negative is anti-relevant. 0= don't care. 1 is highly relevant. -1 is highly anti-relevant. How about the localDocid? not make it octet string? Let the ZIG decide what we specify. Perhaps refer to the bib-1 doc-id, and its description within the WAIS profile? Q: Need to reference parts of the document to be relevant? Yes. Supply a subsection of a document as relevant text. Need at least the ability to submit a section of text. Perhaps change localDocId to be a choice of a full document or text. Text could be either user input or textual subset from a document. Also allow for an EXTERNAL to carry other relevance information (such as a CXF structure). Make RelInfo a Choice: documentid, document text, and external. How is human-entered text handled? Express it via the query as part of the need statement? Or include it here? Is this user input really feedback information? No. So it is just additional search criteria. If it is negative feedback, need to include ability to indicate negative relevance. Note that we considered that weight within OperandPlusWeight might in the future be extended to include negative values to indicate negative weights. SearchOutputRequest - big topic. defer metadata for now. Q: why not throw out returnResultSet? Ray: no. Perhaps rename: DoSearch or execute Search to better reflect what it is doing. Could have any combination between this and returnReformulatedQuery. Could turn both off, and just return metadata. See note about metadata Peter sent out. This is metadata at various levels. It provides metadata, which would be returned in some way. Some question about using a tagset to define returned metadata. Defer this topic for now. Perhaps define a Tagset for this, perhaps reuse Tags from TagSet-M. There are some questions about just how independent the 3 items in the SearchOutputRequest parms can be. Do all combinations make sense? Probably not. Need for response statuses to reflect various combinations of success/failure of portions of the Search request. Current SearchStatus is only a boolean. Make SearchOutputRequest a choice among the various options, to control the legal choices? Real issue is how to return the detailed status of the request? Add a sequence of status codes or diagnostics to either the AdditionalSearchInfo, or the ServerClientInfo structure. The entire structure is sent to the target, and returned to the client. Easiest thing may be to echo the entire structure. Ray: perhaps break up which pieces are sent vs returned. Have a structure such that RankedQuery contains clientServerInfo, but not ServerClientInfo, and vice versa on response. Problem: this info is attached on the operand level. Chris: everything is informally called Query. Idea is for client to send a query, target to be able to return the same query. Ray's proposal: RQ ::= seq { [1] : [5] but not [6] serverClientInfo ServerInfo ::= [1] RQ [2] ServerClientInfo RQ ::= choice { RQRequest [1] RQ RQResponse [2] There is a problem that the client/server, server/client information is also at the operand level. This precludes the above approach, without a lot of restructuring. Chris: ability to iteratively operate on the query at the client and server, refining the definition at both ends. This is a difference between the traditional RPNQuery and this type of Query. on request, this is carried as a query, it comes back in additionalsearchinfo. Break up pieces and send appropriately? At the lower level, this allows the client and server to exchange details about the query. Q: does the client send the serverClientInfo back? Yes, but server ignores it. Does the server remove left-over serverClientInfo? Perhaps. Chris: 2 issues here: 1. We may want to say that the query back to client doesn't have any serverClient info at the top level, only in queries returning metadata. At low level, to return metadata within the query structure. A server ignores ServerClient info arriving from client. remaining agenda: 2. Where the metadata will go. 4. Support boolean operators within the type 102 query? If we do it, support under OperandPlusWeight structure, under operand, as a new choice. That would retain the integrity of the query structure. Not at the operator level. Howard/West wanted it originally, but it might not be needed, at least initially. An rlqAND with weight of 1 is not the same as a boolean AND. It also depends on the weight of the operands. Q: Add it now, since we know where we go, or wait to add it? Perhaps need a list of features and how we see them used, which features are placeholders, etc. Perhaps start a rationale document to try to capture some of these ideas/proposals, and the disposition/position on each. We may also need to think about Explain, and using it to explain which of these features/functions, etc. is supported. Could define a record for Type 102 query details. Add a comment in the ASN.1 marking the place where a boolean query might go in the future (resolution of item 3). 5. Need to talk some about the attribute types we have. Agreed to submit this list of attributes types to Cliff's ZIG attribute group. Peter's Sept. 8 message, LHW's response. Chris: Location in doc consists of content and meta-content. Cut down to items 1&2. Q: Perhaps split metadata out of this attribute type. Define a very simple attribute set which contains attributes which apply to all databases. Peter: example: in Freestyle, ability to express indexes, subsets of documents. Most useful ones are headline, abstract, etc. Kevin: Why do we need attributes in the RLQ part of the query? Because without attributes, the usefulness of the type102 query is significantly reduced. Should we try to take on the whole issue of an attribute set? Chris: No one really agrees with this approach for the attribute sets for type 102. Chris: Does location in document get covered by the context concept? How much overlap is there in concept? Disadvantage of separating one dimension of attribute from the others. Is context more like proximity? Context speaks more to the relation of the location of multiple terms, where locationinDoc pertains to location of a given element. Ray: Q: would renaming context "proximity" help? Chris: Attributes do not include elements. Attributes imply a structure on a document, which may not represent its actual structure. Elements in the document do not necessarily correspond to attributes in bib-1, and so mappings are not always adequate. Ray: if we're willing to defer this attribute discussion, this could be folded into the new attribute discussion of the ZIG. Need to have representation at that meeting. Submit these ideas to that group. Chris will craft a statement about this. Cliff is aware of these proposed attribute types. Ray concerned about the impact of defining yet-another attribute set, without syncing up with the ZIG attribute effort. Ray: metadata: possibly create a pseudo-result set with one record, containing the metadata. That could be the resultset corresponding to the resultset in the query. Do we want to be able to retrieve the metadata, using a record syntax. Then the tagset would fit in. The serverClient structure was intended to contain the metadata. If you create a result record of metadata, you can eliminate some of the mData information in the ASN.1. Can use usual Present features (segmentation, etc.). Need to create result set of metadata, plus actual results? Perhaps return high-level metadata in the search response. Chris: metadata is not all related to the record database. Applies to the record in search. High-level metadata in a separate record in a pseudo-result set. Record-level metadata would be retrieved from the result set, via GRS-1. Chris: concerned re use of GRS-1, since it's a hammer which solves all problems. Concerned about a record which contains transient metadata, (query level metadata). Peter: GRS-1 does solve a lot of problems, and provides consistent syntax for information. Ray: perhaps profile use of GRS-1 for use with type 102 query. Type 102 query will ultimately become part of Z39.50 standard. Alternatively, could define a Type 102 record syntax for retrieving metadata. Chris: Counter-proposal. Record syntax defined for type102 record syntax, and allow use of GRS-1, as well. Ray: pseudo database for high-level search-level metadata, which could be retrieved via GRS-1. Record level metadata would be brought back from the search result set via present. Logically, this is additional-search-info (search level metadata). Can only come back in a Search Response. Peter: general need for result-set-level metadata, even for type 1 queries. For V4, want the ability to reference relationships between search results. One element in the metadata resultset could be a pointer/name to the related search result set. this could provide a linkage. Ray: is there a requirement to create intermediate result sets in type 102: No. not yet. This could be modeled as representing a Need statement, for example. Any requirement for a client to present records from one of those intermediate result sets? No. not yet. Kevin: for set-level metadata, perhaps under otherinfo in Present request, indicate that this is a request for set-level metadata. Perhaps specify record zero to retrieve the metadata for the resultset. This last suggestion was agreed to as the current working approach. Another idea suggested was to create dynamically generated database of metadata, with one record per result set, containing the metadata for that resultset. It would have a well-defined name: IR-ResultSet-Metadata. Each record would have a key of the name of the result set it pertains to. Les agreed to post these raw notes to the adv.query list. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Les Wibberley Internet: les.wibb@cas.org Chemical Abstracts Service 2540 Olentangy River Rd. Voice: (614) 447-3600 Extension 2330 Columbus, Ohio 43210 FAX: (614) 447-3854 or 447-3697 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~