Parse an HTML Table with PHP
I recently was in the position where I needed to parse a table within an HTML file on a number of different pages. To save myself some time I wrote this simple script to handle the parsing programatically.
The script will work with most simple tables where the <th> tag has been used to define headers. It is unlikely to work with nested tables! Essentially, it worked for the purposes it was created for but your milage may vary!
function parseTable($html) { // Find the table preg_match("/<table.*?>.*?<\/[\s]*table>/s", $html, $table_html); // Get title for each row preg_match_all("/<th.*?>(.*?)<\/[\s]*th>/", $table_html[0], $matches); $row_headers = $matches[1]; // Iterate each row preg_match_all("/<tr.*?>(.*?)<\/[\s]*tr>/s", $table_html[0], $matches); $table = array(); foreach($matches[1] as $row_html) { preg_match_all("/<td.*?>(.*?)<\/[\s]*td>/", $row_html, $td_matches); $row = array(); for($i=0; $i<count($td_matches[1]); $i++) { $td = strip_tags(html_entity_decode($td_matches[1][$i])); $row[$row_headers[$i]] = $td; } if(count($row) > 0) $table[] = $row; } return $table; }
Download parseTable.php
2 Comments
→
Thank you! I have been all over google looking for a simple version of this that I can use. This is absolutely perfect! This is the first I found with good regex.
Thanks,
Derek
it rocks… thanks