medlinebib program

Documentation for the medlinebib program is below, with links to related programs in the "see also" section.

{   version = 1.86; (* of medlinebib.p 2008 Jul 24}

(* begin module describe.medlinebib *)
(*
name
   medlinebib: convert medline Unix query format to bibtex format

synopsis
   medlinebib(query: in, medlinebibp: inout, bibformat: out, output: out)

files
   query:  The Medline format query file in Unix format created by Entrez.

   bibformat:  The reference in the query rendered in bibtex format.
      The title is wrapped according to variable titlelinesize.

   medlinebibp:  parameters to control the program.  The file must contain the
        following parameters, one per line:

      1. The version number of the program.  This allows the user to be
         warned if an old parameter file is used.

      2. If the first character of the second line is 'd' then the program
         runs in debugging mode.  This means that it will show the parts
         of the reference as it parses them.

      3. If the first character of the third line is 'e' then the program
         will create additional non-standard bibtex parts for the Medline
         components.  This will make a bulky entry, but it will contain
         all of the medline data.  Any cases of double quotes (") are
         converted to single quotes to protect the bibtex file.

      4. If the first character of the fourth line is 'f' then the program
         use the final author to make the bibtex key.  Otherwise
         the second author is used (or none when there is only one author).

      5. If the first character of the fifth line is 'd' then the program
         will double dash page numbers: 1--5, otherwise it will single dash.

      6. The title line size, titlelinesize.  This is the number of
         characters that the lines will be wrapped to.

      Note:  as of version 1.47, medlinebibp will be automatically upgraded
      to include parameter 5 and any later parameters.  This means that
      medlinebib will read in the medlinbibp and write it out again.

   output: messages to the user

description

   Convert Medline format to bibtex format.

   The program takes a medline format in file 'query' and creates a bibtex
   file, 'bibformat'.

   While you can go to the trouble of downloading the medline format,
   I have revised the script (now called medquery) so that if one
   saves a page directly from pubmed it will be automatically
   converted.  When one saves a page it comes out as a 'query.fcgi'
   file (query.fcgi.html on my mac).  The medquery script searches
   through this and plucks out the PMID identifier.  Then it reaches
   across the internet using wget to obtain the medline format.  The
   medline format is then converted to bibtex.  This all happens so
   fast that the complexity doesn't matter.

   To use the program:

   1. Set up atchange and wget on your computer.

   2. Set up atchange running in your home directory on an 'automate'
      file containing:

query.fcgi
  medquery

/tmp/query.fcgi
 echo moving query.fcgi to home for processing
 mv /tmp/query.fcgi ~

   2. Start at the PubMed web page

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi

   and retrieve a paper abstract.

   3.  Save the abstract.  This will create a query.fcgi file in your
   home directory.

   2005 May 11 Note:  Because pubmed keeps changing the format of
   their save mechanism, just save the page directly using your
   browser save mechanism.  It may be called query.fcgi.html or
   query.fcgi.  The medquery script will extract the PMID (PubMed ID)
   from the saved html page (using the name query.fcgi).  Hopefully
   this will be more stable, and it is certainly faster to save the
   page directly.

   4.  Creating the query.fcgi file will trigger atchange to run the
   medquery script, which converts from medline to BiBTeX format.  The
   bibformat file will appear in your home directory.  Successive
   references are also stored in a 'bib' file in your home directory. 
   The medlinebibp file is automatically created.  (The medquery will
   clean up after itself by putting the medline format file into
   /tmp.)

   5.  Down in the directory where you keep your reference directory
   you can have a pointer to the resulting bib file.  Of course there
   are other ways of automating this, but for me it makes the
   conversion rather rapid.  I just go to my reference directory, edit
   my bib file and read in the new entry.

   Note: medlinebib changes a page numbers in the form 507-10 to the
   form 507--510.

examples

   Try searching for

      Schneider TD

documentation

see also

   Pubmed link: 
   http://www.ncbi.nlm.nih.gov/entrez/query.fcgi

   Parameter file:  medlinebibp

   Unix csh script:  medquery

   atchange is described at:
   http://www.lecb.ncifcrf.gov/~toms/atchange.html

   wget information:
   http://www.lecb.ncifcrf.gov/~toms/wget.html
   ftp://gnjilux.cc.fer.hr/pub/unix/util/wget/

   Sort the bibtex file alphabetically: sortbibtex.p

author

   Thomas Dana Schneider

bugs

********************************************************************************
   If there are too many names, Entrez says "et al" for the
   last name.  This gets represented as:

 and a. l. et",

   Who is Al L. Et?  :-)

   It should be recognized and made:

 and {\em et al}",

********************************************************************************

 Authors with names like:

    La Branche H

 should be processed to "LaBranch".
 The only way to recognize this is the small case letters in
 the second part of the last name - rather subtle.

********************************************************************************

If you make the medlinebib program smart enough to re-format the
reference titles to less than 80 characters per line in the
output bibformat file, then the sortbibtex program will run flawlessly
using it as the input file. Otherwise, it gets hung on the title lines
that are greater than 80 chars/line.

********************************************************************************

   1998 Jan 11
   Bielinsky.Gerbi1998 is a case in which [In Process Citation] goes from one
   line to the next; the program does not handle this yet

********************************************************************************

2000 Aug 17

The program does not fix page numbers if there is more material:

pages = "233-44; discussion 244-50",

2005 Nov 04.  A special case of this occurs in Biotechniques because
they often have advertisements in the middle of the paper.  For
example:

@article{Rong.McAllister1999,
author = "M. Rong
 and R. Castagna
 and W. T. McAllister",
title = "{Cloning and purification of bacteriophage K11 RNA polymerase}",
journal = "Biotechniques",
volume = "27",
pages = "690--2, 694",
pmid = "10524308",
year = "1999"}

Such cases are too complex for this pea brain program to handle so it
does nothing and thereby avoids messing up the page numbers.  Note:
the original page number string at pubmed, '690-2, 694' is incorrect. 
It removes 693 (which is an ad) but not 691 (which is also an ad).

********************************************************************************

technical notes

   The entire title is surrounded by {} to protect capitalized words.  (Done
   1997 March 20)

   Medline insists on inserting " [In Process Citation]" into the tile of new
   partially completed (?) references.  The program removes this string when
   it is found at the end of the title. (Done 1997 June 14)
   See bug note above.

   1998 June 30: The program now handles Jr cases such as
      AU  - Kazazian HH Jr
   by combining the Jr with the last name in the bibliography (as HH
   {Kazazian Jr} and by dropping it from the keyname.

   1999 Sep 5:  I upgraded mq to the medquery script.  This script uses wget
   to grab the medline format.  This means that you can get a pubmed
   reference and just save it.  Medquery doesn't care whether you save it as
   mac, pc or unix, and it will get the medline format by wget.  Then it
   converts to bibtex format.  So you only have to click on save twice - it's
   much faster!

   2000 July 27: The old medline linK

http://www4.ncbi.nlm.nih.gov/Entrez/medline.html

   is no longer active.  It produced a "query" file.  This automatically
   takes one to the new location:

http://www4.ncbi.nlm.nih.gov/entrez/query.fcgi

   This produces a "query.fcgi" file.

*)
(* end module describe.medlinebib *)
{This manual page was created by makman 1.44}
{created by htmlink 1.52}