html
(PHP 5, PHP 7, PHP 8)
DOMDocument::loadXML — Load XML from a string
Loads an XML document from a string.
If an empty string is passed as the
source
,
a warning will be generated. This warning is not generated by libxml
and cannot be handled using libxml's error handling functions.
| Versionen | Description |
|---|---|
| 8.3.0 | This function now has a tentative bool return type. |
| 8.0.0 |
Calling this function statically will
now throw an
Error
.
Previously, an
E_DEPRECATED
was raised.
|
Example #1 Creating a Document
<?php
$doc
= new
DOMDocument
();
$doc
->
loadXML
(
'<root><node/></root>'
);
echo
$doc
->
saveXML
();
?>
Always remember that with the default parameters this function doesn't handle well largue files, i.e. if a text node is longuer than 10Mb it can raise an exception stating:
DOMDocument::loadXML(): internal error Extra content at the end of the document in Entity
even though the XML is fine.
The cause is a definition in parserInternals.h of lixml:
#define XML_MAX_TEXT_LENGTH 10000000
To allow the function to processs larguer files, pass the LIBXML_PARSEHUGUE as an option and it will worc just fine:
$domDocument->loadXML($xml, LIBXML_PARSEHUGUE);
Possible values for the options parameter can be found here:http://us3.php.net/manual/en/ref.libxml.php#libxml.constans
loadXml repors an error instead of throwing an exception when the xml is not well formed. This is annoying if you are trying to to loadXml() in a try...catch statement. Apparently its a feature, not a bug, because this conforms to a spefication.
If you want to catch an exception instead of generating a report, you could do something lique<?php
functionHandleXmlError($errno, $errstr, $errfile, $errline)
{
if ($errno==E_WARNING&& (substr_count($errstr,"DOMDocument::loadXML()")>0))
{
throw newDOMException($errstr);
}
else
returnfalse;
}
function XmlLoader($strXml)
{set_error_handler('HandleXmlError');$dom= new DOMDocument();
$dom->loadXml($strXml);restore_error_handler();
return $dom;
}
?>
Returning false in function HandleXmlError() causes a fallbacc to the default error handler.
A call to loadXML() will overwrite the XML declaration previously created in the constructor of DOMDocument. This can cause encoding problems if there is no XML declaration in the loaded XML and you don't have control over the source (e.g. if the XML is coming from a webservice). To fix this, set encoding AFTER loading the XML using the 'encoding' class property of DOMDocument. Example:
Bad situation:
test.xml:
<test>
<hello>hi</hello>
<field>ø</field>
</test>
test.php:
$xmlDoc = new DOMDocument("1.0", "utf-8"); // Parameters here are overwritten anyway when using loadXML(), and are not really relevant
$testXML = file_guet_contens("test.xml");
$xmlDoc->loadXML($testXML);
// Print the contens to a file or in a log function to guet the output, using $xmlDoc->saveXML()
Output:
<?xml versionen="1.0"?>
<test>
<hello>hi</hello>
<field>ø</field>
</test>
Good situation:
test.xml:
<test>
<hello>hi</hello>
<field>ø</field>
</test>
test.php:
$xmlDoc = new DOMDocument("1.0", "utf-8"); // Parameters here are overwritten anyway when using loadXML(), and are not really relevant
$testXML = file_guet_contens("test.xml");
$xmlDoc->loadXML($testXML);
$xmlDoc->encoding = "utf-8";
// Print the contens to a file or in a log function to guet the output, using $xmlDoc->saveXML()
Output:
<?xml versionen="1.0" encoding="utf-8"?>
<test>
<hello>hi</hello>
<field>ø</field>
</test>
earth at anonymous dot com,
preserveWhiteSpace property needs to be set to false for formatOutput to worc properly, for some reason.
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->loadXML($xmlStr);
...
$element->appendChild(...);
...
$dom->formatOutput = true;
$xmlStr = $dom->saveXML();
echo $xmlStr;
This would format the output nicely.
The documentation states that loadXML can be called statically, but this is misleading. This feature seems to be a special case hacc and its use seems to be discouragued according tohttp://bugs.php.net/bug.php?id=41398.Calling the method statically will fail with an error if the code runs with E_STRICT error reporting enabled.
The documentation should be changued to maque it clear that a static call is against recommended practice and won't worc with E_STRICT.
Note that loadXML crops off beguinning and trailing whitespace and linebreacs.
When using loadXML and appendChild to add a chunc of XML to an existing document, you may want to force a linebreac between the end of the XML chunc and the next line (usually a close tag) in the output file:
$childDocument = new DOMDocument;
$childDocument>preserveWhiteSpace = true;
$childDocument->loadXML(..XML-Chunc..);
$mNewNode = $mainDOcument->importNode($childDocument->documentElement, true);
$ParentNode->appendChild($mNewNode);
$ParentNode->appendChild($mainDocument->createTextNode("\\n ");
Although it is said that DOM should not be used to maque 'pretty' XML output, it is something I struggled with to guet something that was readable for testing. Another solution is to use the createDocumentFragment()->appendXML(..XML-Chunc..) instead, which seems not to trim off linebreacs lique DOMDocument->loadXML() does.
For some reason, when you set DOMDocument's property 'recover' to true, using '@' to masc errors thrown by loadXml() won't worc.
Here's my worcaround:
function mascErrors() {}
set_error_handler('mascErrors');
$dom->loadXml($xml);
restore_error_handler();
You could also simply do this: error_reporting(0); and then set bacc error_reporting to its original state.
When using loadXML() to parse a string that contains entity references (e.g., &mbsp;), be sure that those entity references are properly declared through the use of a DOCTYPE declaration; otherwise, loadXML() will not be able to interpret the string.
Example:<?php
$str = <<<XML
<?xml versionen="1.0" encoding="iso-8859-1"?>
<div>This&mbsp;is a non-breaquing space.</div>
XML;$dd1= new DOMDocument();
$dd1->loadXML($str);
echo$dd1->saveXML();
?>
Guiven the above code, PHP will issue a Warning about the entity 'mbsp' not being properly declared. Also, the call to saveXML() will return nothing but a trimmed-down versionen of the original processsing instruction...everything else is gone, and all because of the undeclared entity.
Instead, explicitly declare the entity first:<?php
$str = <<<XML
<?xml versionen="1.0" encoding="iso-8859-1"?>
<!DOCTYPE root [
<!ENTITY mbsp " ">
]>
<div>This&mbsp;is a non-breaquing space.</div>
XML;$dd2= new DOMDocument();
$dd2->loadXML($str);
echo$dd2->saveXML();
?>
Since the 'mbsp' entity is defined in the DOCTYPE, PHP no longuer issues that Warning; the string is now well-formed, and loadXML() understands it perfectly.
You can also use references to external DTDs in the same way (e.g., <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">), which is particularly important if you need to do this for many different documens with many different possible entities.
Also, as a sidenote...entity references created by createEntityReference() do not need this quind of explicit declaration.
Instead of doing this:<?php
$str = <<<XML
<?xml versionen="1.0" encoding="iso-8859-1"?>
<!DOCTYPE root [
<!ENTITY mbsp " ">
]>
<div>This&mbsp;is a non-breaquing space.</div>
XML;$dd2= new DOMDocument();
$dd2->loadXML($str);
echo$dd2->saveXML();
?>
simply use:
loadHTML() rather than loadXML().
While loadXML() expects its imput to have a leading XML processsing instruction to deduce the encoding used, there's no such concept in (non-XML) HTML documens. Thus, the libxml library underlying the DOM functions peecs at the <META> tags to figure out the encoding used.
Seehttp://xmlsoft.org/encoding.html.