PHP HowTo: Match Complete HTML Tag With Regular Expression

The Problem

We want to be able to find in a string a specific HTML tag. For example, we might need to extract content of a div tag. The problem is when tags are nested.

<body>
  <div class="top-nav">
  </div>
  <div class="book" id="1">
    <div class="inner">
    </div>
  </div>
 
  <div class="book" id="2">
    <div class="inner">
    </div>
  </div>
 
  <div class="menu">
  </div>
</body>

Analysis

We will try to find a solution to the problem.

Of course we are going to use regular expressions. To build the correct regular expression, let's discuss some cases.

We can start with the following regex /<div class=“book” id=“\d*”>.*<\/div>/. Applied to above HTML it will match everything starting with the first div with class book and ending with the last closing div tag.

<div class="book" id="1">
    <div class="inner">
    </div>
  </div>
 
  <div class="book" id="2">
    <div class="inner">
    </div>
  </div>
 
  <div class="menu">
  </div>

This happens, because the regular expressions by default are greedy. Well we can modify this behavior by adding the U modifier: /<div class=“book” id=“\d*”>.*<\/div>/U or appending a question mark after the asterisk: /<div class=“book” id=“\d*”>.*?<\/div>/. In both cases the result will be the same. The regular expression will return two matches that are not complete (according to what we need):

<div class="book" id="1">
    <div class="inner">
    </div>

and

<div class="book" id="2">
    <div class="inner">
    </div>

Seems that we are very close. All that we need is to replace the .* part with something that will include all complete div tags nested inside the div that is of interest to us.

The Solution

TODO

 
php/howtos/matchcompletehtmltag.txt · Last modified: 2009/10/31 23:39 (external edit)
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki