{ version = 1.86; (* of medlinebib.p 2008 Jul 24}
(* begin module describe.medlinebib *)
(*
name
medlinebib: convert medline Unix query format to bibtex format
synopsis
medlinebib(query: in, medlinebibp: inout, bibformat: out, output: out)
files
query: The Medline format query file in Unix format created by Entrez.
bibformat: The reference in the query rendered in bibtex format.
The title is wrapped according to variable titlelinesize.
medlinebibp: parameters to control the program. The file must contain the
following parameters, one per line:
1. The version number of the program. This allows the user to be
warned if an old parameter file is used.
2. If the first character of the second line is 'd' then the program
runs in debugging mode. This means that it will show the parts
of the reference as it parses them.
3. If the first character of the third line is 'e' then the program
will create additional non-standard bibtex parts for the Medline
components. This will make a bulky entry, but it will contain
all of the medline data. Any cases of double quotes (") are
converted to single quotes to protect the bibtex file.
4. If the first character of the fourth line is 'f' then the program
use the final author to make the bibtex key. Otherwise
the second author is used (or none when there is only one author).
5. If the first character of the fifth line is 'd' then the program
will double dash page numbers: 1--5, otherwise it will single dash.
6. The title line size, titlelinesize. This is the number of
characters that the lines will be wrapped to.
Note: as of version 1.47, medlinebibp will be automatically upgraded
to include parameter 5 and any later parameters. This means that
medlinebib will read in the medlinbibp and write it out again.
output: messages to the user
description
Convert Medline format to bibtex format.
The program takes a medline format in file 'query' and creates a bibtex
file, 'bibformat'.
While you can go to the trouble of downloading the medline format,
I have revised the script (now called medquery) so that if one
saves a page directly from pubmed it will be automatically
converted. When one saves a page it comes out as a 'query.fcgi'
file (query.fcgi.html on my mac). The medquery script searches
through this and plucks out the PMID identifier. Then it reaches
across the internet using wget to obtain the medline format. The
medline format is then converted to bibtex. This all happens so
fast that the complexity doesn't matter.
To use the program:
1. Set up atchange and wget on your computer.
2. Set up atchange running in your home directory on an 'automate'
file containing:
query.fcgi
medquery
/tmp/query.fcgi
echo moving query.fcgi to home for processing
mv /tmp/query.fcgi ~
2. Start at the PubMed web page
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi
and retrieve a paper abstract.
3. Save the abstract. This will create a query.fcgi file in your
home directory.
2005 May 11 Note: Because pubmed keeps changing the format of
their save mechanism, just save the page directly using your
browser save mechanism. It may be called query.fcgi.html or
query.fcgi. The medquery script will extract the PMID (PubMed ID)
from the saved html page (using the name query.fcgi). Hopefully
this will be more stable, and it is certainly faster to save the
page directly.
4. Creating the query.fcgi file will trigger atchange to run the
medquery script, which converts from medline to BiBTeX format. The
bibformat file will appear in your home directory. Successive
references are also stored in a 'bib' file in your home directory.
The medlinebibp file is automatically created. (The medquery will
clean up after itself by putting the medline format file into
/tmp.)
5. Down in the directory where you keep your reference directory
you can have a pointer to the resulting bib file. Of course there
are other ways of automating this, but for me it makes the
conversion rather rapid. I just go to my reference directory, edit
my bib file and read in the new entry.
Note: medlinebib changes a page numbers in the form 507-10 to the
form 507--510.
examples
Try searching for
Schneider TD
documentation
see also
Pubmed link:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi
Parameter file: medlinebibp
Unix csh script: medquery
atchange is described at:
http://www.lecb.ncifcrf.gov/~toms/atchange.html
wget information:
http://www.lecb.ncifcrf.gov/~toms/wget.html
ftp://gnjilux.cc.fer.hr/pub/unix/util/wget/
Sort the bibtex file alphabetically: sortbibtex.p
author
Thomas Dana Schneider
bugs
********************************************************************************
If there are too many names, Entrez says "et al" for the
last name. This gets represented as:
and a. l. et",
Who is Al L. Et? :-)
It should be recognized and made:
and {\em et al}",
********************************************************************************
Authors with names like:
La Branche H
should be processed to "LaBranch".
The only way to recognize this is the small case letters in
the second part of the last name - rather subtle.
********************************************************************************
If you make the medlinebib program smart enough to re-format the
reference titles to less than 80 characters per line in the
output bibformat file, then the sortbibtex program will run flawlessly
using it as the input file. Otherwise, it gets hung on the title lines
that are greater than 80 chars/line.
********************************************************************************
1998 Jan 11
Bielinsky.Gerbi1998 is a case in which [In Process Citation] goes from one
line to the next; the program does not handle this yet
********************************************************************************
2000 Aug 17
The program does not fix page numbers if there is more material:
pages = "233-44; discussion 244-50",
2005 Nov 04. A special case of this occurs in Biotechniques because
they often have advertisements in the middle of the paper. For
example:
@article{Rong.McAllister1999,
author = "M. Rong
and R. Castagna
and W. T. McAllister",
title = "{Cloning and purification of bacteriophage K11 RNA polymerase}",
journal = "Biotechniques",
volume = "27",
pages = "690--2, 694",
pmid = "10524308",
year = "1999"}
Such cases are too complex for this pea brain program to handle so it
does nothing and thereby avoids messing up the page numbers. Note:
the original page number string at pubmed, '690-2, 694' is incorrect.
It removes 693 (which is an ad) but not 691 (which is also an ad).
********************************************************************************
technical notes
The entire title is surrounded by {} to protect capitalized words. (Done
1997 March 20)
Medline insists on inserting " [In Process Citation]" into the tile of new
partially completed (?) references. The program removes this string when
it is found at the end of the title. (Done 1997 June 14)
See bug note above.
1998 June 30: The program now handles Jr cases such as
AU - Kazazian HH Jr
by combining the Jr with the last name in the bibliography (as HH
{Kazazian Jr} and by dropping it from the keyname.
1999 Sep 5: I upgraded mq to the medquery script. This script uses wget
to grab the medline format. This means that you can get a pubmed
reference and just save it. Medquery doesn't care whether you save it as
mac, pc or unix, and it will get the medline format by wget. Then it
converts to bibtex format. So you only have to click on save twice - it's
much faster!
2000 July 27: The old medline linK
http://www4.ncbi.nlm.nih.gov/Entrez/medline.html
is no longer active. It produced a "query" file. This automatically
takes one to the new location:
http://www4.ncbi.nlm.nih.gov/entrez/query.fcgi
This produces a "query.fcgi" file.
*)
(* end module describe.medlinebib *)
{This manual page was created by makman 1.44}
{created by htmlink 1.52}