What is PDF?

What is PDF and why is it indispensable for exchanging electronic documents?

Updated: August 26, 2024 Author: Vitaly

Find out what PDF is and how it became the standard for document exchange. This article will reveal all the secrets of this popular format, including its technical characteristics and unique capabilities. You will also learn how to create and edit a PDF.

What is the PDF format?

PDF (Portable Document Format) is a file format designed to show printed documents consistently, no matter what software, hardware and the operating system you are using. The format equally reflects all components of the layout: text, images or other graphic elements.

PDF is different from popular electronic image formats (JPEG, PNG and others), because it is designed to display a paper, not an electronic image. In particular, the main characteristics of an electronic image are usually size in pixels and color depth. The pixel resolution in vector PDF can often be almost anything and PDF documents are measured in inches of paper. Besides, PDF is quite often multi-page.

Unlike other document formats (docx, xlsx and others) PDF is designed to be independent from a specific program used for viewing. A correctly formed PDF will be displayed identically on any operating system despite the presence or absence of fonts on the computer, while a typical docx will be quite different if you open it in Microsoft Word or Open Office. On the other hand, docx is very easy to edit, unlike PDF.

Format details

PDF was developed by Adobe Systems in 1993. This format is based on technology that originates from PostScript, which was created by the same Adobe in 1982. It is a page description language, originally intended to control the printer and ensure the correct printing of documents. When you read a PDF on a computer or smartphone, it emulates the operation of a printer, but instead of paper, the output is displayed on the screen.

PDF documents contain various objects such as text, images, forms and annotations. When the viewer opens a file, it runs a set of instructions that correctly layout all the objects on each page, so that the user sees the document as intended.

PDF consist of layers that are divided into types:

  • Object. Contains text, images and other graphic elements.
  • Structural. The logical structure of a document, including headings, paragraphs, and metadata.
  • Fonts. PDF can store embedded fonts to ensure correct text display.
  • Data compression. The PDF format supports various compression algorithms, including ZIP, JPEG and LZW to optimize file size without losing quality.
  • Elements of interactivity. Later versions of the format support embedded multimedia objects like video and audio, as well as interactivity elements such as hyperlinks, forms, and buttons.

Using PDF, you can create documents of any complexity, such as booklets, e-books or instructions with complex graphic elements, while ensuring correct reflection on the screen.

Characteristics and capabilities of the format

During its development, the PDF format has evolved and received several variations, each designed to solve specific problems and meet specific requirements.

  • PDF/A. A special version of the format designed for long-term archiving of documents. PDF/A eliminates elements that could cause rendering instability or the inability to open the file in the future, such as encryption or links to external resources.
  • PDF/X. It was developed for professional printing and ensures that all file elements needed for printing, including fonts and images, are embedded in the document. This format eliminates features such as interactivity and encryption to ensure predictability and stability when printed.
  • PDF/E. This version is used in engineering and technical fields. It supports 3D drawings, engineering data and other specialized elements, making it useful for sharing and archiving engineering documents.
  • PDF/UA. It is aimed at making documents accessible to people with disabilities. It includes design and metadata requirements to ensure correct interaction with screen readers and other assistive technologies.
  • PDF/VT. The PDF version is designed for variable and transactional printing, which is widely used in the custom printing industry, such as creating invoices, notices and other variable data documents.

It is also worth noting that PDF documents support MRC (Mixed Raster Content) technology. This technology divides an image into several parts, each of which can be compressed using the optimal algorithm for it. For example, the background may contain large areas such as colored fills or images. The mask defines the outlines and boundaries of objects, and the foreground contains small details, such as text or small pictures.

All three are compressed individually using appropriate algorithms, such as JPEG for the background and JBIG2 for text. The latter combines similar-looking characters into clusters, creating a single dictionary for the entire document. The result is only one copy of the symbol with a list of placement coordinates on the page. Assembling the final display in MRC is the final step that combines the image, providing a significant reduction in file size without noticeable loss of quality.

MRC technology is especially useful for files with text and images, making it great for digitizing, scanning, and archiving paper documents.

Comparison of PDF with other electronic formats of printed documents

In addition to PDF, there are other formats that are intended to be an electronic version of a printed document. PDF differs from other formats because it is more versatile and supports a greater variety of content types, showing efficient compression for each. This is both an advantage and a disadvantage of the PDF format. On the one hand, it can replace almost all other formats of this type while keeping all functions. On the other hand, this excessive versatility has made the PDF format overly complex and expensive to maintain. For example, a document describing the not-so-latest version 1.7 of the PDF standard has 747 pages.

Differences between PDF and DjVu

What is PDF and how is it different from DjVu
What is PDF and how is it different from DjVu

DjVu is a format developed in the 1990s for compressing scanned documents and images. Its main feature is the use of image segmentation and compression technology, which can significantly reduce file sizes while maintaining fairly high quality. However, there are significant differences between PDF and DjVu:

  • Functionality. PDF supports a wide range of features, including text, vector graphics, multimedia, hyperlinks and forms, making it a versatile tool for creating and distributing documents. DjVu is focused exclusively on scanned images of text documents.
  • Vector PDF can be rasterized to any resolution, DjVu is already tied to the resolution in which the original image was scanned.
  • File size. DjVu is able to significantly reduce file size compared to PDF because it works exclusively with images and does not support embedded fonts. Therefore, it is often used for archiving and distributing large volumes of scanned documents, such as books or old manuscripts.
  • Compatibility. PDF is recognized as an international standard (ISO 32000) and is supported on all major platforms and devices. This format can be opened with standard software like the EDGE Windows browser. DjVu, on the other hand, is less common and often requires special programs or plugins to open it.

PDF vs EPUB comparison

What is PDF and how is it different from EPUB
What is PDF and how is it different from EPUB

EPUB (Electronic Publication) is a file format designed for electronic books and other types of publications designed for screen reading. Despite their apparent similarity to PDF, they have fundamental differences that make them suitable for different tasks.

The main advantage of EPUB is its ability to adapt to different screen sizes and font changes. Content automatically adjusts to fit the screen size and orientation, making it easy to read on a variety of devices, from smartphones to large monitors. PDF, on the other hand, maintains fixed formatting, which ensures that the document displays accurately, but may not be easy to read on small screens.

Like PDF, EPUB supports embedded multimedia elements and various interactive components, like hyperlinks and annotations. At the same time, its ability to adapt content to the user’s screen makes it a more convenient solution for mobile devices.

PDF is widely used in professional and business circles to exchange documents that should look the same on all platforms. EPUB is more common in the field of e-books and educational materials, where the adaptability of content for different devices is important.

EPUB is based on XHTML and HTML technology, therefore it is an archive that stores a mini website with all its component elements: styles, images, fonts. To verify this, just open the file using an archiver.

EPUB file structure
EPUB file structure

How to open and edit PDF

All modern browsers can open PDFs without any extra extensions. However, they can’t edit them or add comments. For these tasks, you need to use special viewers, for example, Adobe Acrobat Reader, which is free and can be installed on Windows or MacOS. There are mobile versions for iOS and Android.

What is PDF and how to edit it in Acrobat Reader
What is PDF and how to edit it in Acrobat Reader

Adobe does not release software for this family of Linux operating systems, but users of the latter can use analogues. For example, Okular is a cross-platform open source viewer for the KDE project that has all the capabilities of Acrobat, but is free and available for both Linux and Windows operating systems.

Opening a PDF document in Okular
Opening a PDF document in Okular

You can also create a PDF file using the Word text editor or a free analogue LibreOffice Writer.

How to Save PDF in LibreOffice Writer
How to Save PDF in LibreOffice Writer

It is harder to edit a file saved in PDF format because this is not what the format was originally designed for. It’s easier to edit the original document and save it as a PDF again. For cases when this is not an option, Adobe offers a paid version of Acrobat Reader Pro for $25 per month. In addition, there are free and open source solutions. For example, in the same office suite LibreOffice there is a vector graphics editor called Draw, which works well with PDF documents and supports the ability to edit them.

How to Edit PDF in LibreOffice Draw
How to Edit PDF in LibreOffice Draw

If the capabilities of LibreOffice Draw seem insufficient to you, there are more professional free solutions such as Inkscape or Scribus. The latter program is similar to Adobe InDesign and has all the necessary functions for publishing and printing.

How to Edit PDF in Scribus
How to Edit PDF in Scribus
How to Edit PDF in Inkscape
How to Edit PDF in Inkscape

Conclusion

PDF has become the standard for document exchange due to its unique technical characteristics and versatility. It provides reliable storage and transmission of data while preserving all design elements making it essential for creating, distributing and archiving documents in various fields. Compared to other formats like DjVu, PDF offers more functionality and better compatibility, but DjVu may be preferable when you need to minimize file size in digitizing and archiving scanned materials.