I've been working with the Lucene.NET library for over a year now and am constantly finding new ways to integrate it into websites and applications. Several of my customers use a customized version of the
Desktop Search application. I have also integrated the
Website Spider for Lucene I created into many websites. For some time now, I've been looking for an open source approach to extracting the raw text from PDF files. I searched all over the place and have finally found it!
PDFBox is an open source Java PDF library. The .NET version is available at
IKVM.NET .
Download code (5.8 Mb, including all PDFBox DLLs)
To convert a PDF to text, it's this simple:
1using org.pdfbox.pdmodel;
2using org.pdfbox.util;
...
134 PDDocument objDocument = PDDocument.load(strFileName);
136 PDFTextStripper objTextStripper = new PDFTextStripper();
137 txtText.Text = objTextStripper.getText(objDocument);
...
It seems to run pretty fast, I extracted the text from a 2.3 Mb PDF in 5.2 sec.
Comments
|
On
1/8/2010
dips
said:
I want to covert pdf to doc.I am using PDFBox-0.7.3 from code project.
On
1/8/2010
Raghu
said:
Im using web appliation. In that I referenced IKVM.GNU.Classpath.dll, IKVM.Runtime.dll, PDFBox-0.7.2.dll and when I executed, exception thown like...
On
1/8/2010
Raghu
said:
Im using web appliation. In that I referenced IKVM.GNU.Classpath.dll, IKVM.Runtime.dll, PDFBox-0.7.2.dll and when I executed, exception thown like...
On
1/8/2010
Raghu
said:
Im using web appliation. In that I referenced IKVM.GNU.Classpath.dll, IKVM.Runtime.dll, PDFBox-0.7.2.dll and when I executed, exception thown like...
On
1/8/2010
Maulik
said:
Neat Solution for anybody interfacing with PDF, Thanks for sharing your code!!!
On
12/4/2006
Tathagata Roy
said:
But for 25mb pdf file it's taking almost 1 min. How to make it faster? any idea?
On
12/4/2006
Tathagata Roy
said:
But for 25mb pdf file it's taking almost 1 min. How to make it faster? any idea?
On
7/3/2006
Brian Pautsch
said:
I just tried and it works. Your browser might be blocking. Try right clicking on the link and selecting "Save Target As...".
On
7/2/2006
Phil
said:
the link to download the code does not work.
On
6/26/2006
Brian Pautsch
said:
Which link isn’t working for you? I tried them all and they work.
|
Leave a Comment