viavario / pdfgenerator
PDF Generator using headless Chrome on AWS Lambda
Requires
- guzzlehttp/guzzle: 7.7.0
Requires (Dev)
- clean/phpdoc-md: ^0.19.1
README
PDFGenerator allows you to easily generate PDF documents or PNG screenshots of any webpage using PHP.
In the past we have used PhantomJS, wkhtmltopdf and TCPDF to generate PDF's from a given URL or HTML. However, these libraries are no longer actively developed or maintained and lack support for newer features in HTML and CSS. Google Chrome has a built in feature to take a screenshot of a webpage as PDF or PNG format with better support for new features.
Using Amazon Lambda and Amazon API Gateway you can minimize running costs, have scalable performance and flexible security controls. Lambda is used to run a headless Chrome as an on-demand service to take screenshots from any website or submitted HTML content without having to set up a server. You can configure security controls on the API Gateway like API keys, rate limiting, and quota to protect your screenshot service against third parties or inappropriate use.
Puppeteer is used to communicate with Chrome to load content and take screenshots programmatically.
Requirements
To deploy Chrome on Amazon Lambda you will need:
- An Amazon AWS account to deploy the Lambda function
- Node.js installed on your local machine
- Serverless installed on your local machine
To use this library in PHP:
- Composer
- PHP 7+
Deploying Chrome on Amazon Lambda
-
Go to the official Node.js website, download and follow the installation instructions to install Node.js on your local machine.
-
Install the Serverless Framework via npm which was already installed when you installed Node.js.
Open up a terminal and run the following command to install Serverless:
npm install -g serverless
Next, you'll have to set up your credentials so that Serverless is able to connect to your Amazon AWS account to deploy the Lambda function and API Gateway.
The easiest way is to create a new IAM user and attach a custom JSON policy to limit the Serverless Framework's access to your AWS account. This way you don't have to sign up for an account on https://serverless.com either.
To limit the Serverless Frameworkâs access your AWS account, follow these steps to create an IAM User and attach a custom JSON file policy to your new IAM User. This IAM User will have its own set of AWS Access Keys.
- Login to your AWS Account and go to the Identity & Access Management (IAM) page.
- Click on Users and then Add user. Enter a name in the first field to remind you this User is related to the Service you are deploying with the Serverless Framework, like
serverless-servicename-agent1
. Enable Programmatic access by clicking the checkbox. Click Next to go through to the Permissions page. Click on Create policy. Select the JSON tab, and add a JSON file. You can use this gist as a guide. - When you are finished, select Review policy. You can assign this policy a Name and Description, then choose Create Policy. Check to make sure everything looks good and click Create user. Later, you can create different IAM Users for different apps and different stages of those apps. That is, if you don't use separate AWS accounts for stages/apps, which is most common.
- View and copy the API Key & Secret and replace
<your-key-here>
and<your-secret-key-here>
in the command below and run the command:
serverless config credentials --provider aws --key <your-key-here> --secret <your-secret-key-here>
- Navigate to the directory
screenshot-service
in this repository in your terminal window and install the required packages by running:
npm install
- Open
screenshot-service/serverless.yml
and change the region on line 10 to the region you wish to deploy to on Amazon AWS as well as the corresponding reference to the required Lambda layer on line 27. See https://github.com/shelfio/chrome-aws-lambda-layer and [https://docs.aws.amazon.com/general/latest/gr/rande.html#regional-endpoints]{https://docs.aws.amazon.com/general/latest/gr/rande.html#regional-endpoints} for more information about the available regions and corresponding references for the Lambda layer.
region: eu-west-3
...
layers: # reference to the already existing layer with Chrome
- arn:aws:lambda:eu-west-3:764866452798:layer:chrome-aws-lambda:31
- Next, deploy the screenshot service to Amazon AWS:
serverless deploy --stage production
If you haven't configured any API keys yet in the API Gateway for the screenshot service, you can simply copy the endpoint URL of the GET request, append ?filename=screenshot.png&url=[URL of your website]
to the endpoint URL and open it in your browser.
Note that you have to set API Key Required to false for the /capture - GET -Method Request in the API to test this without API Keys:
- Go to Amazon API Gateway
- Open the API for the screenshot service
- Click on GET under /capture in the Resources panel
- Click on Method Request
- Click on the pencil icon next to API Key Required, change the setting from true to false, and click the check mark icon to save the setting.
- Click on the Actions button, and choose Deploy API from the menu to redeploy the API to test the GET-request without API Keys.
Make sure to set API Key Required to true after you have verified that the screenshot service works in order to prevent unauthorized use or access.
For example: https://##########.execute-api.eu-west-3.amazonaws.com/dev/capture?filename=screenshot.png&url=https://www.google.com/finance/quote/TSLA:NASDAQ?hl=en
returns a screenshot of the Tesla stock on Google Finance:
Securing your screenshot service
To protect the screenshot service against unauthorized use or access you can set up API Keys for the API Gateway. If you do not set up API Keys for the screenshot service other people may be able to use your installation to create screenshots which will add to your bill. It is important that you secure your API Gateway.
Make sure to redeploy the API for the API Key requirement to take effect!
See https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-setup-api-key-with-console.html on how to set up API Keys.
Installation of this library
composer require viavario\pdfgenerator
Usage
<?php include_once __DIR__.'/vendor/autoload.php'; use viavario\pdfgenerator\PDFGenerator; // This is invoke URL of the API Gateway or the endpoint returned by the // serverless deploy command without `capture` at the end $endpoint = 'https://##########.execute-api.eu-west-3.amazonaws.com/dev/'; $apiKey = '<your-api-key>'; // Your API Key configured in the API Gateway $filename = 'screenshot.pdf'; // Change to screenshot.png to get a PNG image $generator = new PDFGenerator($endpoint, $apiKey); $generator->setURL('https://google.com') ->setFilename($filename) ->setMargins('1.5cm') ->setFormat('A4') // The screenshot service automatically increases the height of the viewport // to take a full page screenshot ->setViewportSize(1920, 1080) ->setOrientation(PDFGenerator::ORIENTATION_PORTRAIT); try { $tempfile = $generator->generate(); // Change the Content-Type to image/png if you changed the extension of the // filename to .png header('Content-Type: application/pdf'); echo $tempfile; } catch (\Exception $e) { echo $e->getMessage(); }
Methods
PDFGenerator::__construct
Description
public __construct (string $endpoint, string $apiKey)
Class constructor.
Parameters
(string) $endpoint
: The endpoint URL(string) $apiKey
: The API key for the AWS Lambda function
Return Values
void
PDFGenerator::displayHeaderFooter
Description
public displayHeaderFooter (bool $displayHeaderFooter)
Display the footer and header template.
Parameters
(bool) $displayHeaderFooter
: set to true to display the header and footer in the PDF document
Return Values
PDFGenerator
PDFGenerator::generate
Description
public generate (string $filename)
Generate a screenshot.
Parameters
(string) $filename
: The filename to write to (defaults to null which will cause the method to return binary data)
Return Values
mixed
If no filename is specified, then the generated file is returned as binary data.
If a filename is specified, then true is returned when the file was written
successfully, or false otherwise.
PDFGenerator::omitBackground
Description
public omitBackground (bool $omitBackground)
Hides default white background and allows capturing screenshots with transparency. Defaults to false.
Parameters
(bool) $omitBackground
: set to true to omit white backgrounds
Return Values
PDFGenerator
PDFGenerator::preferCSSPageSize
Description
public preferCSSPageSize (bool $preferCSSPageSize)
Give any CSS @page size declared in the page priority over what is declared in width and height or format options. Defaults to false, which will scale the content to fit the paper size.
Parameters
(bool) $preferCSSPageSize
: set to true to enqble CSS page size
Return Values
PDFGenerator
PDFGenerator::printBackground
Description
public printBackground (bool $printBackground)
Set to true to print backgrounds.
Parameters
(bool) $printBackground
: Set to true to print backgrounds
Return Values
PDFGenerator
PDFGenerator::setContent
Description
public setContent (string $content)
Sets the content.
Parameters
(string) $content
: the HTML content to take a screenshot of
Return Values
PDFGenerator
PDFGenerator::setFilename
Description
public setFilename (string $filename)
Sets the filename.
Parameters
(string) $filename
: Set the filename for the output. Only extensions .pdf and .png are allowed.
Return Values
PDFGenerator
PDFGenerator::setFooterTemplate
Description
public setFooterTemplate (string $html)
Sets the Footer template.
Should be valid HTML markup with following classes used to inject printing values into them:
- date formatted print date
- title document title
- url document location
- pageNumber current page number
- totalPages total pages in the document
Parameters
(string) $html
: HTML Markup
Return Values
PDFGenerator
PDFGenerator::setFormat
Description
public setFormat (void)
Sets the format of the page.
The format options are:
- Letter: 8.5in x 11in
- Legal: 8.5in x 14in
- Tabloid: 11in x 17in
- Ledger: 17in x 11in
- A0: 33.1in x 46.8in
- A1: 23.4in x 33.1in
- A2: 16.54in x 23.4in
- A3: 11.7in x 16.54in
- A4: 8.27in x 11.7in
- A5: 5.83in x 8.27in
- A6: 4.13in x 5.83in
Parameters
This function has no parameters.
Return Values
PDFGenerator
PDFGenerator::setHeaderTemplate
Description
public setHeaderTemplate (string $html)
Sets the Header template.
Should be valid HTML markup with following classes used to inject printing values into them:
- date formatted print date
- title document title
- url document location
- pageNumber current page number
- totalPages total pages in the document
Parameters
(string) $html
: HTML Markup
Return Values
PDFGenerator
PDFGenerator::setHttpAuthentication
Description
public setHttpAuthentication (void)
Sets the HTTPAuthentication username and password.
Parameters
This function has no parameters.
Return Values
PDFGenerator
PDFGenerator::setMargins
Description
public setMargins (string $margins, string $right, string $bottom, string $left)
Sets the margins for the page.
Parameters
(string) $margins
: Top margin, or all margins(string) $right
: Right margin(string) $bottom
: bottom margin(string) $left
: Left margin
Return Values
PDFGenerator
return this instance
PDFGenerator::setOrientation
Description
public setOrientation (string $orientation)
Set the page orientation.
Parameters
(string) $orientation
: PDFGenerator::ORIENTATION_LANDSCAPE or PDFGenerator::ORIENTATION_PORTRAIT
Return Values
PDFGenerator
PDFGenerator::setURL
Description
public setURL (string $url)
Set the URL.
Parameters
(string) $url
: the URL of the page to take a screenshot of
Return Values
PDFGenerator
PDFGenerator::setViewportSize
Description
public setViewportSize (void)
Sets the viewport height and viewport width.
Parameters
This function has no parameters.
Return Values
PDFGenerator