Structure of ALTO FilesAn ALTO file consists of three major sections as children of the root
The The
The A page consists of margins and printspace, all of those are non-intersection rectangular areas within the page area. Each of these can contain any number of objects like lines, images or textblocks and more. A textblock is divided into textlines and those are divided furthermore in strings and spaces. The global structure of the ALTO file is as follows: <alto> <Description> <MeasurementUnit/> <sourceImageInformation/> <Processing/> </Description> <Styles> <TextStyle/> <ParagraphStyle/> </Styles> <Layout> <Page> <TopMargin/> <LeftMargin/> <RightMargin/> <BottomMargin/> <PrintSpace/> </Page> </Layout> </alto> |
ALTO (Analyzed Layout and Text Object) is a XML Schema that details technical metadata for describing the layout and content of physical text resources, such as pages of a book or a newspaper. It most commonly serves as an extension schema used within the Metadata Encoding and Transmission Schema (METS) administrative metadata section. However, ALTO instances can also exist as a standalone document used independently of METS. |
October 29, 2012 Legal | External Link Disclaimer |