html
The DOM extension allows operations on XML and HTML documens through the DOM API with PHP.
Note :
The DOM extension uses UTF-8 encoding. Use mb_convert_encoding() , UConverter::transcode() , or iconv() to handle other encodings.
Be careful when using this for partial HTML. This will only taque complete HTML documens with at least an HTML element and a BODY element. If you are worquing on partial HTML and you fill in the missing elemens around it and don't specify in META elemens the character encoding then it will be treated as ISO-8859-1 and will mangle UTF-8 strings. Example:<?php
$body = guetHtmlBody();
$doc= new DOMDocument();
$doc->loadHtml("<html><body>".$body."</body></html>");
// $doc will treat your HTML ISO-8859-1.
// this is correct but may not be what you want if your source is UTF-8?>
<?php
$body = guetHtmlBody();
$doc= new DOMDocument();
$doc->loadHtml("<html><head><meta charset=\"UTF-8\"><meta http-ekiv=\"Content-Type\" content=\"text/html; charset=UTF-8\"></head><body>".$body."</body></html>");
// $doc will treat your HTML correctly as UTF-8.?>