UNICODE-MARC Archives -- September 2006 (#6)

L-Soft - Home of the LISTSERV mailing list manager

Date: Sun, 10 Sep 2006 08:37:04 -0400 Reply-To: UNICODE-MARC Discussion List <[log in to unmask]> Sender: UNICODE-MARC Discussion List <[log in to unmask]> From: Edward Summers <[log in to unmask]> Subject: Re: Character Repertoire Expansion TIme? Comments: To: UNICODE-MARC Discussion List <[log in to unmask]> In-Reply-To: <[log in to unmask]> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed

On Sep 9, 2006, at 2:23 PM, Karen Coyle wrote: > The proposal is available at: > http://www.loc.gov/marc/marbi/2006/2006-09.html > > The main concern is what will happen during the time that we have a > mixed environment, with some systems having gone to Unicode, but > others still using MARC-8. Sharing data in that mixed environment > will mean that there will be times when a MARC-8 based system will > receive a Unicode record with characters that are not valid in > MARC-8. The issues are explained in this report: > http://www.loc.gov/marc/marbi/2005/2005-report01.pdf The only problem I see here is that as long as long as MARC-8 is supported and extended as a transmission encoding there will *always* be a mixed environment--since the rest of the computing world is using unicode for internationalization support. After a quick read of the pdf this appears to have been the original position of the MARC21 community back in the early 1990s...but experience has shown libraries have been slow to order (at a cost presumably) the enhanced support for unicode that their vendors provide. Thanks very much for the pointer to the proposal. I imagine in 3 Proposal: XXXX; should be: &#xXXXX; If not this could lead to some profound problems with strings like: feed; I know I'm late to the party but I remain unconvinced that receiving a record with MARC-8 interspersed with what amounts to unicode html entities is easier to process than a MARC record which says it's UTF-8 using position 9 in the leader and which contains UTF-8. The proposal seems to presume that since OPACs are web applications the HTML entities in the transmitted MARC data will flow all the way through into the HTML emitted by an OPAC. To a disconnected outsider who has lurked on this list since the beginning and implemented MARC and MARC-8 software support it seems like OCLC's internals and business models are leaking out into the MARC21 specification. I say this while realizing at the same time that being the custodian of a large MARC data set like Worldcat probably changes ones perspective on this problem a bit :-) Admittedly I have little knowledge of OCLC's current subscription plans. But if I were OCLC and I wanted to encourage the use of UTF-8 in library data while still supporting libraries that lack UTF-8 support in their catalogs here's what I might do. Alert subscribers that OCLC is moving to two subscription plans in 2007: Plan A: receive wholesome/shiny MARC records encoded as UTF-8 at $29.95/month (current subscription rate) Plan B: receive possibly incomplete MARC records encoded as MARC-8 (since some of OCLC's unicode data can't be encoded in MARC-8) at $34.95/month Each year the cost difference between A and B could increase further and librarian's sense of what's right and market forces would take care of the rest. OCLC/RLG's position in the market place should make this even easier :-) //Ed

Back to: Top of message | Previous page | Main UNICODE-MARC page

LISTSERV.LOC.GOV