Skip to content

Baseprint Document Format (BDF)

BDF is a format for encoding a Baseprint document snapshot. These snapshots can be identified with a SoftWare Hash IDentifier (SWHID). Baseprint document snapshots exemplify the concept of "baseprint" discussed in the document "What is a baseprint?".

Objectives

BDF aims to minimize format rot. Unlike formats like LaTeX and Markdown, which are used for authoring, BDF is designed for redistribution and archiving.

BDF'23

BDF'23 is the version of BDF as supported by software circa 2023 that can read and write BDF.

Technically, BDF'23 is not a file format but a format for a directory-like data structure. This structure is addressable as a Git tree and SWHID directory.

When working with BDF data, it is often temporarily stored in a file system directory. However, for public long-term storage, BDF data is stored in a SWHID addressable Git tree or an equivalent "directory" object in the Software Heritage Archive.

Inside BDF'23, there is a file named article.xml encoded in a subset of the JATS XML format. This file format can informally be referred to as Baseprint JATS XML. As of October 2023, all applications that read Baseprint document snapshots encoded in BDF'23 use the epijats Python library.

BDF Reading Software

epijats:
An open-source library used by perm.pub, popgen.es, BaseprintPress, and Baseprinter (for previews).

Snapshot Writing Software

Baseprinter:
A BDF authoring tool available through GitHub Actions, container, or local installation.
Pandoc:
This tool can output JATS XML, which can be BDF compatible.