KeyLimeTie Blog

PDFBox in .NET - Easily Convert PDFs to Text

By Brian Pautsch – 11/23/2005 8:57:45 PM. Posted to Code Snippets.

I've been working with the Lucene.NET library for over a year now and am constantly finding new ways to integrate it into websites and applications. Several of my customers use a customized version of the Desktop Search application. I have also integrated the Website Spider for Lucene I created into many websites. For some time now, I've been looking for an open source approach to extracting the raw text from PDF files. I searched all over the place and have finally found it! PDFBox is an open source Java PDF library. The .NET version is available at IKVM.NET .

Download code (5.8 Mb, including all PDFBox DLLs)

To convert a PDF to text, it's this simple:
1using org.pdfbox.pdmodel;
2using org.pdfbox.util;
...
134 PDDocument objDocument = PDDocument.load(strFileName);
136 PDFTextStripper objTextStripper = new PDFTextStripper();
137 txtText.Text = objTextStripper.getText(objDocument);
...

It seems to run pretty fast, I extracted the text from a 2.3 Mb PDF in 5.2 sec.

Comments

On 1/8/2010 dips said:
I want to covert pdf to doc.I am using PDFBox-0.7.3 from code project.

On 1/8/2010 Raghu said:
Im using web appliation. In that I referenced IKVM.GNU.Classpath.dll, IKVM.Runtime.dll, PDFBox-0.7.2.dll and when I executed, exception thown like...

On 1/8/2010 Raghu said:
Im using web appliation. In that I referenced IKVM.GNU.Classpath.dll, IKVM.Runtime.dll, PDFBox-0.7.2.dll and when I executed, exception thown like...

On 1/8/2010 Raghu said:
Im using web appliation. In that I referenced IKVM.GNU.Classpath.dll, IKVM.Runtime.dll, PDFBox-0.7.2.dll and when I executed, exception thown like...

On 1/8/2010 Maulik said:
Neat Solution for anybody interfacing with PDF, Thanks for sharing your code!!!

On 12/4/2006 Tathagata Roy said:
But for 25mb pdf file it's taking almost 1 min. How to make it faster? any idea?

On 12/4/2006 Tathagata Roy said:
But for 25mb pdf file it's taking almost 1 min. How to make it faster? any idea?

On 7/3/2006 Brian Pautsch said:
I just tried and it works. Your browser might be blocking. Try right clicking on the link and selecting "Save Target As...".

On 7/2/2006 Phil said:
the link to download the code does not work.

On 6/26/2006 Brian Pautsch said:
Which link isn’t working for you? I tried them all and they work.

Leave a Comment

Name:
Email:
URL:
Comment:
Security Code:
Type Security Code:

Photos on Flickr

More Photos »

Search Blog


Get Email Updates

Like what you read here at KeyLimeTie? Sign up for our email list!

Subscribe