How to turn web pages into PDFs with Puppeteer and NodeJS

As a web developer, you want to create a PDF page of a web page to share with your customers, use it in presentations, or add it to your web app as a new feature . No matter what your reason, Puptair, Google’s node API for Puppet Chrome and Chromium, makes the task quite simple for you.

In this tutorial, we will see that web pages are called Puppeteer and Node.js. How to convert to PDF with Let’s start the work with a quick introduction to what is pipetcher.

What is a puppet, and why is it terrible?

In Google’s own words, Puppeteer “A node library that provides a high-level API to control headless Chrome or Chromium on the DevTools protocol.”

[Read: Meet the 4 scale-ups using data to save the planet]

What is a headless browser?

If you are unfamiliar with the term headless browser, it is a browser without GUI. In this sense, a headless browser is simply another browser that understands how to render HTML web pages and process JavaScript. Due to the lack of a GUI, interaction with a headless browser occurs on a command line.

Even though Puppeteer is primarily a headless browser, you can configure and use it as non-headless Chrome or Chromium.

What can you do with a puppet?

Puppeteer’s powerful browser capabilities make it an ideal candidate for web app testing and web scraping.

To name a few use cases where Puppeteer provides the right functionality for web developers,

  • Generate screenshots of PDF and web page
  • Automatic Form Submission
  • Crawl web pages
  • Perform automated UI testing while keeping the test environment up to date.
  • Preparing pre-paid content for single page applications (SPA)

Set project environment

You can use Puppeteer on the backend and frontend to generate the PDF. In this tutorial, we are using a node backend for the task.

Start npm and set up a normal express server to begin with the tutorial.

Be sure to install the Puppeteer NPM package with the following command before starting.

Convert web pages to pdf

Now we come to the exciting part of the tutorial. With Puppeteer, we only need a few lines of code to convert web pages to PDF.

First, create a browser instance using Puppeteer launch Function.

After that, we create a new page and go to the given page URL and use Puppeteer.

We have determined waitUntil substitute for networkidle0. When we use networkidle0 Optionally, Puppeteer waits until there is a new network connection within the last 500 ms. There is a way to determine if the site has finished loading. It is not accurate, and Puptair offers other options, but it is one of the most reliable for most cases.

Finally, we create a PDF from the crawled page content and save it to our device.

Print for Pdf function Is quite complex and allows a lot of customization, which is fantastic. Some of the options we use are as follows:

  • printBackground: When this option is corrected, Puppeteer prints any background color or images you use in PDF on a web page.
  • The path: Path Specifies where to save the generated PDF file. You can also store it in a memory stream to avoid writing to disk.
  • The format: You can set the PDF format to any of the given options: Letter, A4, A3, A2, etc.
  • Margin: You can specify a margin for the PDF generated with this option.

When the PDF creation is finished, close the browser connection browser.close().

Create an API to create and respond to PDFs from URLs

With the knowledge gathered so far, we can now create a new endpoint that will receive a URL in the form of a query string, and then it will stream back to the client generated PDF.

Here is the code:

If you start the server and travel /pdf Route with one target Query the URL we want to convert. The server will serve the generated PDF directly without storing it to disk.

URL example: http://localhost:3000/pdf?target=

Which will generate the following PDF as it appears on the image: