Click here to Skip to main content
16,022,205 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Current Situation
I'm developing a WordPress plugin that processes PDFs uploaded through the admin panel to create a flipbook. The process involves:

PDF upload (via admin panel)

Thumbnail generation (using Imagick)

Text and coordinate extraction (using Smalot PdfParser)

Storing processed data in the wp-content/uploads folder

Displaying the PDF as a flipbook on the frontend (using PDF.js)

Issues
The thumbnail generation process is taking too long.

Smalot PdfParser is not providing accurate text coordinates and dimensions.

After processing, it takes 20-30 seconds to load the PDF on the frontend.

The entire process (upload, processing, and initial display) is slow, especially for larger PDFs.

Technical Details
WordPress version: [Your WordPress version]

PHP version: [Your PHP version]

Maximum PDF size: 75MB

Maximum page count: 140

Libraries used:

Imagick (for thumbnails)

Smalot PdfParser (for text extraction)

PDF.js (for frontend rendering)

Hosting limitations:

No Node.js support

exec() is disabled for security reasons (can't use Ghostscript)

Questions
How can I optimize this processing workflow, especially the backend operations (thumbnail generation and text extraction)?

Can you recommend an alternative to Smalot PdfParser for text and coordinate extraction that provides more accurate results and works within typical WordPress hosting constraints?

Should I consider processing these files on a separate server? If so, what would be the best approach considering I can't use Node.js on the main server?

Are there any caching strategies or asynchronous processing techniques I could implement to improve performance within a WordPress environment?

How can I achieve performance closer to what I experienced with the Node.js implementation while working within WordPress hosting limitations?

Goal
Process PDFs quickly and accurately, ideally reducing the total processing time to under 10 seconds for a 75MB, 140-page PDF, while working within typical WordPress hosting constraints.

Any insights or suggestions on improving the overall performance and accuracy of this system, while keeping costs down, would be greatly appreciated. Thank you!

What I have tried:

Developed a working system using Node.js locally

Result: Fast and accurate processing, but can't be used in production due to hosting limitations.
Increased PHP memory limit and execution time

Result: Can handle larger files, but didn't significantly improve speed.
Implemented basic caching for processed PDFs

Result: Slightly faster for repeat views, but initial processing is still slow.
Tested other PHP-based PDF processing libraries (e.g., FPDI, TCPDF)

Result: Similar performance issues or lack of needed features.
Posted
Updated 16-Aug-24 7:33am

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900