DevExpress PDF: Working with Pdf Files in Code (What’s New in 13.2)

DevExpress Data Blog
19 November 2013

DevExpress PDF Viewer

In 13.1 we released a beta for our new pdf viewer. This excellent addition to our control suite enables an in app viewing experience for your pdf documents. Given that these documents can be loaded by filename or even from a stream of bits, you have the ultimate flexibility in loading and displaying pdf documents in a controlled manner.

Consider the scenario, however, where you have hundreds (or thousands+) of pdf documents. Invariably the boss might one day ask for that one pdf document with specific text (any lawyers out there?). How does one efficiently search these documents without going opening each and every document? Starting in 13.2 we are greatly increasing your ability to manage and work with these documents within code.

var search = "parameters";
PdfDocumentProcessor processor = new PdfDocumentProcessor();
processor.LoadDocument("CSharpSpec.pdf");
var searchParams = new PdfTextSearchParameters
{
    CaseSensitive = false,
    WholeWords = true
};

var results = processor.FindText(search, searchParams);
while (results.Status == PdfTextSearchStatus.Found)
{
    var text = string.Join(", ",
                    results.Words.Select(p => p.Text).ToArray());

    Console.WriteLine("Found \"{0}\" on page {1}",
        text,
        results.PageIndex);

    results = processor.FindText(search, searchParams);
}

Console.WriteLine("That's all folks!");
Console.ReadKey();

Notice how easy it is to load up a pdf document processor and search for specific text. Now imagine doing this across your entire library of pdf documents!

Working with PDFs in Code

In the age of “big data” it is imperative that we, as developers, have the ability work with any type of data: be it structured or unstructured. Indeed the best way to derive the greatest value from our data is the ability to handle it all at once. I think this tool will greatly help with pdfs!

As always, if there are any comments and/or questions, feel free to get a hold of me!

Seth Juarez
Email: sethj@devexpress.com
Twitter: @SethJuarez

8 comment(s)
Alexey Mironov
Alexey Mironov

Great news!

19 November, 2013
Cristian Tempestini
Cristian Tempestini

Very interesting feature! Will be possible to find text also by regex?

20 November, 2013
Michel Jallet
Michel Jallet

Is there a save-as function in this release ?

I would like to save PDF file as PDF image. Is it possible ?

21 November, 2013
George (DevExpress)
George (DevExpress)

@Cristian: Although regular expressions for finding text are not supported out of the box, it is possible to use the standard .NET tools over the text after it has been extracted from a PDF file.

@Michel: The PdfViewer.CreateBitmap method has been available since the 13.1 version. A similar method has been implemented in the PDF Processor.

21 November, 2013
James S K Makumbi
James S K Makumbi

Seth,

You are THE MAN!! Now to get 13.2. This is EXACTLY the problem I try to address in my software. I also need to eliminate the issue of having to download the pdf to user's PCs or devices.

27 November, 2013
Mohsen Abo-Ghaly
Mohsen Abo-Ghaly

Hello,

we need it for asp.net.

1 December, 2013
Seth Juarez (DevExpress)
Seth Juarez (DevExpress)

Mohsen:

 In 13.2 you can indeed access pdf files in code to do searches and the like. Showing pdf's in the browser is also a matter of using the response type (set to application/pdf) and pushing the bits to the browser. Let me know if this works!

-Seth

3 December, 2013
Mohsen Abo-Ghaly
Mohsen Abo-Ghaly

Thanks Seth,

Yes we can display PDF in the browser, but this is not the issue, we want to play with it similar to WinForms ..

a lot of issues can be done without DevExpress but with DevExpress it will be better and advanced, so. we want to play with the PDF using DevExpress not using adobe.

the same as "ASP.NET: Spreadsheet Control"

Thanks

11 March, 2014

Please login or register to post comments.