Parse an HTML Table with PHP

by jl2 on October 14th, 2009

I recently was in the position where I needed to parse a table within an HTML file on a number of different pages.  To save myself some time I wrote this simple script to handle the parsing programatically.

The script will work with most simple tables where the <th> tag has been used to define headers. It is unlikely to work with nested tables! Essentially, it worked for the purposes it was created for but your milage may vary!

function parseTable($html)
{
  // Find the table
  preg_match("/<table.*?>.*?<\/[\s]*table>/s", $html, $table_html);
 
  // Get title for each row
  preg_match_all("/<th.*?>(.*?)<\/[\s]*th>/", $table_html[0], $matches);
  $row_headers = $matches[1];
 
  // Iterate each row
  preg_match_all("/<tr.*?>(.*?)<\/[\s]*tr>/s", $table_html[0], $matches);
 
  $table = array();
 
  foreach($matches[1] as $row_html)
  {
    preg_match_all("/<td.*?>(.*?)<\/[\s]*td>/", $row_html, $td_matches);
    $row = array();
    for($i=0; $i<count($td_matches[1]); $i++)
    {
      $td = strip_tags(html_entity_decode($td_matches[1][$i]));
      $row[$row_headers[$i]] = $td;
    }
 
    if(count($row) > 0)
      $table[] = $row;
  }
  return $table;
}

Download parseTable.php

Share and Enjoy:
  • email
  • Print
  • Twitter
  • del.icio.us
  • Facebook
  • Digg
  • Technorati
  • StumbleUpon
  • Slashdot
  • Ping.fm
  • LinkedIn
  • Google Bookmarks
  • MySpace
  • Netvibes
  • Reddit
  • Tumblr

From PHP

52 Comments

Trackbacks & Pingbacks

Leave a Reply

Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS