ZNG Archives -- August 2002 (#7)

L-Soft - Home of the LISTSERV mailing list manager

Date: Thu, 22 Aug 2002 14:58:45 +0100 Reply-To: "Z39.50 Next-Generation Initiative" <[log in to unmask]> Sender: "Z39.50 Next-Generation Initiative" <[log in to unmask]> From: Robert Sanderson <[log in to unmask]> Subject: Re: Indexes Comments: To: "Z39.50 Next-Generation Initiative" <[log in to unmask]> Comments: cc: [log in to unmask] In-Reply-To: <[log in to unmask]> Content-Type: TEXT/PLAIN; charset=US-ASCII

> >The problem I'm having is that I no longer remember how we thought we > >could distinguish between "first-characters-in-field" and > >"first-words-in-field" searches in Bath. In other words, what would > >Were we specifying the "firstWords" search with proximity? > My meeting notes say: > Left anchored is assumed. > Exact match is assumed > Unanchored begins with a question mark > My notes do say that our prose would relate different SRW/U queries to Bath > searches. We did not want specific Bath elements in the query syntax. I > agree with Mike that we should not load the index names. I agree with a somewhat rambling caveat: Index Names are supposed to represent the attribute combinations in Z39.50. So we can say 'titleWord' not (1=4, 3=3, 4=6) So we're already loading the difference between structure 1, completeness 1 and structure 3 completeness 6 into 'title' vs 'titleWord' I think this is an acceptable level of semantics in index names, so long as there is a recommendation to use one particular naming scheme (foo, fooWord) Obviously this can be ignored, but so can attribute combinations, it just means that the searches will produce possibly unexpected results due to lousy configuration. titleFirstWordsIncludingLeadingArticles is a search, not an index. 'First' or any other description of the location of the term should be part of the query language, not the name of the index. And as we can express it with either proximity or truncation, there's no need for it to be in the index. This brings me to my next questions: 1. Why even fooWord? We can express it with an unanchored search with spaces. eg: foo="? term *" I don't think that this is sensible, but it's the logical conclusion and there needs to be an answer to give to it. The index name describes what is in the index. fooWord contains the individual words from 'foo', which implies that the server best knows how to extract a 'word' from its data with respect to which normalisation and extraction routines to use. The search shouldn't have to know which normalisation/extraction routines are used, so there needs to be a 'word' index. For first words, it does know the extraction routine -- take the first words in the field. Right? 2. If someone searches with: titleWord="search term" The agreement was to fail it, if my memory serves? As there's no single word which matches 'search term'? The search should have been titleWord="search" and titleWord="term" ? Rob -- ,'/:. Rob Sanderson ([log in to unmask]) ,'-/::::. http://www.o-r-g.org/~azaroth/ ,'--/::(@)::. Special Collections and Archives, extension 3142 ,'---/::::::::::. Twin Cathedrals: telnet: liverpool.o-r-g.org 7777 ____/:::::::::::::. WWW: http://liverpool.o-r-g.org:8000/ I L L U M I N A T I

Back to: Top of message | Previous page | Main ZNG page

LISTSERV.LOC.GOV