Xojo Conferences
XDCMay2019MiamiUSA

[MBS] DynaPDF ParseContent (MBS Xojo Plugin Mailinglist archive)

Back to the thread list
Previous thread: [MBS] [ANN] 18.2pr1
Next thread: [MBS] PictureToPNGStringMBS not working on my Linux system


Re: [MBS] 9.3pr11   -   Garth Hjelte
  [MBS] DynaPDF ParseContent   -   Jean-Luc Arnaud
   Re: [MBS] DynaPDF ParseContent   -   Christian Schmitz

[MBS] DynaPDF ParseContent
Date: 03.04.18 10:09 (Tue, 3 Apr 2018 11:09:11 +0200)
From: Jean-Luc Arnaud
Hi all,

I'm currently extracting pages from PDF files to JPeg files using
RenderPageToImage.
This works very well but is a little bit slow (about 2 minutes for 270
pages).
However, somes files have empty pages that I have to filter and not to
export. I do that using RenderPageToImage to buffer, analysing buffer
size and then redoing RenderPageToImage to Jpeg files

Here are my questions:
- How to save the buffer directly as a JPeg file, avoiding redoing
RenderPageToImage?
- I found ParseContent which is drastically faster than
RenderPageToImage (5 s for 270 pages!), but generates Tiff files. Is it
possible to direct it to a buffer instead of a Tiff file? Doing so, I
could analyse the buffer size and avoid doing 2 RenderPageToImage.
- In addition, could ParseContent generate JPeg files directly (it seems
like PDF pages are actually Tiff images)?

Many thanks in advance for any help.

Re: [MBS] DynaPDF ParseContent
Date: 03.04.18 13:19 (Tue, 3 Apr 2018 14:19:27 +0200)
From: Christian Schmitz

> Am 03.04.2018 um 11:09 schrieb Jean-Luc Arnaud <<email address removed>>:
>
> Hi all,
>
> I'm currently extracting pages from PDF files to JPeg files using
> RenderPageToImage.

Okay. If you pass in empty folderitem, you make a JPEG in memory which you can query with GetImageBuffer.

> This works very well but is a little bit slow (about 2 minutes for 270
> pages).

That is not a bad time, but depends on how much is on the pages (especially transparency effects) and how big the resolution is.

> - How to save the buffer directly as a JPeg file, avoiding redoing
> RenderPageToImage?

Why do you do it twice anyway?

> - I found ParseContent which is drastically faster than
> RenderPageToImage (5 s for 270 pages!), but generates Tiff files. Is it
> possible to direct it to a buffer instead of a Tiff file?

ParseContent does different things.
It exacts content of page, so if page has an image, it can give you the image.

Of course if page only contains one big image, you could simply exact it.

What do you do to get a tiff?
Maybe we can add a parameter for jpeg?

Sincerely
Christian