I have recently stumbled upon a few nice discussions on how to optimize PDF files for search engines so I decided to sum up the knowledge into one blog post.
The first question that arises is whether or not search engines actually index PDF files?
The major search engines have been indexing PDFs for quiet some time now. We’ve been seeing PDFs ranking in the search results in all the major search engines and for some time now, Google even offers the HTML versions of some PDF files. This indicates that Google is in fact indexing the PDFs. Some test were done to insure that PDFs are really being indexed such as trying to search for unique content buried deep inside a PDF to see if Google will return the result and he did.
Now that we know that Google does index these files, how do they rank?
PDF files are seen in normal search results. They appear to have no “special treatment” and are ranked as any HTML document would be. This leads us to conclusion that same ranking factors apply to both the PDF files and any other type of indexed web content. This means all the ranking factors that you take into consideration when you are trying to optimize a web page will work for the PDFs as well. Link to them with relevant anchors, use bold, italic, H1, H2 and so on. Simply treat the PDF as any other URL and not as a “download file”. You should also be careful in choosing the name of the file as it will be as important or even more important then the name of the HTML files in the website’s URL. For example, if this article was a PDF, I’d name it How_to_SEO_optimize_PDF_files.pdf
Also watch out on obvious stuff such as not applying any encryption or password protection to your PDFs if you want them to be indexed. Also be sure that text is in fact text in your file and not turned into curves or pictures when you export your PDF file.
One more good advice comes more from the usability point of view then the SEO one. Make your files small in size. This means you should not use big images, or create tens of megabytes heavy PDFs. If your PDFs rank well, people will land directly on them, which means they will have to download / “stream” them. If your files are to big, this will result in really high bounce rate as people will not wait for a long time for the PDF to load.