Introduction
If your web app or service must perform a posibly huge process at server side on demand, then this article is for you.
Background
When our page makes a call to a server action and the underlying php script takes more time than specified on timeout server parameter (apache, nginx, etc...), it's sure that response will come with a 504 (timeout) server error. If you have an alternative to get same approach changing app structure or UI experience (without damaging it), I recommend you to implement it. If there isn't another way but that huge petition, this article will put you on the way to get it working without multithreading, shell calls or third party classes.
I propose using a paging and multipetition technique that has this advantages:
- It's relatively simple to implement. You may migrate your monolithic script with not much code changes.
- It's easy to adjust general performing. With this technique global time execution is inversely proportional to page size, but it's easy to adjust that to get a good user experience.
- You may use a progress bar to show process state with all the detail that you want, although at more detail, less performance.
Using this pattern is not free on disadvantages:
- Not compatible with atomic processes. If your script must be executed all at a time (in a global transaction) or makes a call to another script on wich you have not control at all, you may not use this technique. In that cases you may consider to migrating your server side app from PHP to frameworks that allows multithreading like .NET or JSP or use shell commands and Java or nodeJS as background processes.
- You will need Javascript client side programming to handle multiple petitions and detect final petition, errors, etc...
As you can see not all scenaries are covered by this technique, but it's obvious that many problems will find here a perfect solution.
Using the code
First of all is thinking of how to divide our task in smallest logical fragments as possible. These will be called Elements. We will need to get all of this elements in every petition, but only the elements that are in the page that is requested, are processed.
For this example, we will import a csv file into database. This is the function script that processes a page (import.php):
<?php
$path = '/userFiles/import.csv';
$page = $_POST["page"];
$pageSize = 1000;
$first = $pageSize * $page;
$line = 0;
$processed = 0;
$file = fopen($path, "r");
while ($lineStr = stream_get_line($file, 65536, "\n")) {
if ($line >= $first) {
insertLineInDatabaseOrDoAnyHeavyThingWithIt($line, $lineStr);
$processed ++;
if ($processed = $pageSize) break;
}
$line ++;
}
fclose($file);
echo $processed;
This is the server side, and now comes the client side in JavaScript:
function import() {
pageImport(0);
}
function pageImport(page) {
$.post("/import.php", {
page: page
}, function(data) {
if (data > 0)
pageImport(page + 1);
else
alert("Work finished");
});
}
Two methods are required: first one to launch first page. Second to process a page and recursively calling itself to process next page until no more records are processed. It is not difficult to imagine how to implement a progress bar using JSON to return processed lines and total pages in same response.
Points of Interest
Think that at smaller $pageSize
less probability of 504 error, but takes more time to complete entire process because on each petition we read all CSV lines previous to current page. Optimal value of $pageSize
is as large as possible to taking less time to complete a page than timeout configured at server. For example, if we have 60 seconds of timeout configured, processing a page should not take more than 45/50 seconds or even less. It's easy to guess this by using F12 developer tools on the browser.
History
No changes.