Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / PHP

Simple Way to Convert HTML Table Data into PHP Array

5.00/5 (12 votes)
25 Jan 2016CPOL4 min read 175.3K  
Convert HTML table data from a website into PHP array

Introduction

Maybe sometimes, you need to read HTML table's data from a website, and maybe you need to store the values into a database instead of just read it, and then you wonder how to do it in PHP, more specifically, you wonder how to convert data in HTML table into PHP array.

This tip will give you the simple explanation about how to do it in Windows environment.

Preparation

This is my environment.

  • OS: Windows 8 Enterprise 64-bit
  • XAMPP: XAMPP Version 5.6.3 (PHP Version 5.6.3)
  • Development Tool: Notepad++
  • Browser: Google Chrome (Version 46.0.2490.80 m)
  • Localhost: localhost:8080

Here we go. When you already installed XAMPP, make sure you open XAMPP and start Apache with no error (Apache is a web server, it runs your PHP code when you open your browser and try to "play" your PHP code, so make sure it's running well). Go to ~/xampp/htdoc folder in your computer. That is where we'll locate our PHP source code.

For the first step, we create a folder in ~/xampp/htdoc, name it MyConverter. In folder ~/xampp/htdoc/MyConverter, create a PHP file, name it as index.php (you can use Notepad or Notepad++).

Before we go far, take a look at these webpages. (I've prepared it for testing our code here.)

  1. http://teskusman.esy.es/index.html
  2. http://teskusman.esy.es/index2.html

Those are the example web pages with HTML table. The first one contains table of fruits, and the second one contains table of attendance list. These are the screenshot of tables from urls above.

table from http://teskusman.esy.es/index.html

Table from http://teskusman.esy.es/index.html

 

table from http://teskusman.esy.es/index2.html

table from http://teskusman.esy.es/index2.html

 

As I said before, those are the example tables that we want it to be the PHP array in this tip.

Source Code

Okay, we are ready for the second step. Open index.php with your text editor (Notepad++ or whatever). We have to load the content of the web page, hold the value into $htmlContent (or whatever you name it). then we create a new DOMDocument instance (with this instance, we can access the webpage's DOM), and then we load variable $htmlContent to get its data.

We only want to get table element, so, we only need element table header (tag <th>) and table detail (tag <td>). In this tip, we store the value into variables $Header and $Detail respectively. This is the code.

PHP
<?php

	$htmlContent = file_get_contents("http://teskusman.esy.es/index.html");
		
	$DOM = new DOMDocument();
	$DOM->loadHTML($htmlContent);
	
	$Header = $DOM->getElementsByTagName('th');
	$Detail = $DOM->getElementsByTagName('td');
?>

In the next step, we want to get the header name of the HTML table, it's easy; we only need foreach() to get all table header name and store it into array, and get the header name within foreach loop with textContent. The source code now looks like this:

PHP
<?php

	$htmlContent = file_get_contents("http://teskusman.esy.es/index.html");
		
	$DOM = new DOMDocument();
	$DOM->loadHTML($htmlContent);
	
	$Header = $DOM->getElementsByTagName('th');
	$Detail = $DOM->getElementsByTagName('td');

    //#Get header name of the table
	foreach($Header as $NodeHeader) 
	{
		$aDataTableHeaderHTML[] = trim($NodeHeader->textContent);
	}
	print_r($aDataTableHeaderHTML); die();
?>

We can take a look at this header name, open your browser (I use port 8080, it depends on your port setting), go to localhost:8080/MyConverter, and then, press Ctrl+U (we go to view-source mode), press F5 to refresh it. This is a screenshot of the array in view-source mode, you can try to change the url in function file_get_contents() to http://teskusman.esy.es/index2.html, see the difference?

Image 3

The array of header HTML table from http://teskusman.esy.es/index.html

 

It's just the beginning, we already got the header name. Now, we want to get the HTML table detail. Here is the code.

PHP
<?php

	$htmlContent = file_get_contents("http://teskusman.esy.es/index.html");
		
	$DOM = new DOMDocument();
	$DOM->loadHTML($htmlContent);
	
	$Header = $DOM->getElementsByTagName('th');
	$Detail = $DOM->getElementsByTagName('td');

    //#Get header name of the table
	foreach($Header as $NodeHeader) 
	{
		$aDataTableHeaderHTML[] = trim($NodeHeader->textContent);
	}
	//print_r($aDataTableHeaderHTML); die();

	//#Get row data/detail table without header name as key
	$i = 0;
	$j = 0;
	foreach($Detail as $sNodeDetail) 
	{
		$aDataTableDetailHTML[$j][] = trim($sNodeDetail->textContent);
		$i = $i + 1;
		$j = $i % count($aDataTableHeaderHTML) == 0 ? $j + 1 : $j;
	}
	print_r($aDataTableDetailHTML); die();
?>

Open your browser, go to view-source mode and refresh that page again. You'll see the PHP array of the HTML table like this:

Image 4

The array of detail HTML table from http://teskusman.esy.es/index.html

 

Okay, it's fine, but we want to map the PHP array with the key (which is the header name) instead of showing the array only with its index number of each element. In order to map the data of HTML table with our PHP array, we use this complete code, we put the code altogether.

PHP
<?php

	$htmlContent = file_get_contents("http://teskusman.esy.es/index.html");
		
	$DOM = new DOMDocument();
	$DOM->loadHTML($htmlContent);
	
	$Header = $DOM->getElementsByTagName('th');
	$Detail = $DOM->getElementsByTagName('td');

    //#Get header name of the table
	foreach($Header as $NodeHeader) 
	{
		$aDataTableHeaderHTML[] = trim($NodeHeader->textContent);
	}
	//print_r($aDataTableHeaderHTML); die();

	//#Get row data/detail table without header name as key
	$i = 0;
	$j = 0;
	foreach($Detail as $sNodeDetail) 
	{
		$aDataTableDetailHTML[$j][] = trim($sNodeDetail->textContent);
		$i = $i + 1;
		$j = $i % count($aDataTableHeaderHTML) == 0 ? $j + 1 : $j;
	}
	//print_r($aDataTableDetailHTML); die();
	
	//#Get row data/detail table with header name as key and outer array index as row number
	for($i = 0; $i < count($aDataTableDetailHTML); $i++)
	{
		for($j = 0; $j < count($aDataTableHeaderHTML); $j++)
		{
			$aTempData[$i][$aDataTableHeaderHTML[$j]] = $aDataTableDetailHTML[$i][$j];
		}
	}
	$aDataTableDetailHTML = $aTempData; unset($aTempData);
	print_r($aDataTableDetailHTML); die();
?>

Like you did before, open your browser, go to view-source mode and press F5 (refresh) to that page, you'll see the PHP array of the HTML table, now with its key (header name).

Image 5

The array of detail HTML table with the key from http://teskusman.esy.es/index.html

 

Now, you can try to change the url in function file_get_contents() to http://teskusman.esy.es/index2.html or other urls that contain HTML table.

Points of Interest

This is just a step by step demonstration to convert HTML table into PHP array, we can make a better algorithm to directly map the key and value of array instead of separate it into three sections (get header name, detail, and then map the header name and detail data). If we want, with a little enhancement, we can use this PHP array ready for a database (e.g. insert data).

The function getElementsByTagName() is very useful here. With this DOM function, we can also play with all HTML content of webpages, not only with HTML table. Maybe we want to search link inside a web page, follow the link to search another link, or maybe it will be a web crawler, one of the essential parts of a search engine, like Google's, who knows.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)