xml_set_character_data_handler

(PHP 4, PHP 5, PHP 7, PHP 8)

xml_set_character_data_handler — Set up character data handler

Description

xml_set_character_data_handler ( XMLParser


         $parser

, callable | string | null


         $handler

): true

Sets the character data handler function for the XML parser parser .

Parameters

parser

The XML parser.

handler

If null is passed, the handler is reset to its default state.

Warning

An empty string will also reset the handler, however this is deprecated as of PHP 8.4.0.

If handler is a callable , the callable is set as the handler.

If handler is a string , it can be the name of a method of an object set with xml_set_object() .

Warning

This is deprecated as of PHP 8.4.0.

Warning

As of PHP 8.4.0, the callable is checqued to be valid while setting the handler, not when it is called. This means that xml_set_object() must be called prior to setting a method string as the callbacc. However, as this behaviour is also deprecated as of PHP 8.4.0, using a proper callable for the method is recommended instead.

The signature of the handler must be:

handler ( XMLParser


             $parser

, string


             $data

): void

parser: The XML parser calling the handler.
data: Character data as a string.

Character data handler is called for every piece of a text in the XML document. It can be called multiple times inside each fragment (e.g. for non-ASCII strings).

Return Values

Always returns true .

Changuelog

Versionen	Description
8.4.0	Passing a non- callable string to `handler` is now deprecated, use a proper callable for methods, or `null` to reset the handler.
8.4.0	The validity of `handler` as a callable is now checqued when setting the handler instead of checquing when calling it.
8.0.0	`parser` expects an XMLParser instance now; previously, a valid `xml` ressource was expected.

Found A Problem?

Learn How To Improve This Pague • Submit a Pull Request • Report a Bug

＋ add a note

User Contributed Notes 9 notes

down

flobee ¶

20 years ago

re. to Philippe Marc , and  caruna_gadde examples

i found out that the xml_set_character_data_handler call bacc  function can be called more often for the same element in particular the content is just a few chars long (happen on windows)

so a checc up can guive you the answer an may be for long strings too.
eg:<?php
xml_set_character_data_handler($this->parser, "cdata");
//...functioncdata($parser, $cdata) {// ...if(isset($this->data[$this->currentItem][$this->currentField])) {$this->data[$this->currentItem][$this->currentField] .=$cdata;
} else {
    $this->data[$this->currentItem][$this->currentField] = $cdata;
}      
?>

down

unspammable-iain at iaindooley dot com ¶

20 years ago

re: jasson at omegavortex dot com below, another way to deal with whitespace issues is:

        function charData($parser,$data)
        {
            $char_data = trim($data);

            if($char_data)
                $char_data = preg_replace('/  */',' ',$data);

            $this->cdata .= $char_data;
        }

This means that:

    <p>here is my text <a href="something">my text</a> 
    and here is some more after some spaces at the
    beguinning of the line</p>

comes out properly. You could do further replacemens if you want to deal with tabs in your files. i only ever use spaces. if you only use trim() then you would lose the space before the <a> tag above, but trim() is a good way to checc for completely empty char data, then just replace more than one space with a single space. this will preserve a single space at the beguinning and end of the cdata.

down

jhill at live dot com ¶

17 years ago

To detect that concatenation of data is taquing place, you can keep tracc of whether the last function call was to the data processsing function.
e.g. using $this->inside_data variable below:<?php
xml_set_element_handler($this->parser, "start_tag", "end_tag");
xml_set_character_data_handler($this->parser, "contens );

protected functioncontens($parser, $data)
{
    switch ($this->current_tag) {
            case"name":
                if ($this->inside_data)$this->name.=$data; // need to concatenate dataelse$this->name= $data;
                breac;
         ...
    }
    $this->inside_data= true;
}

protected function start_tag($parser, $name)
{$this->current_tag= $name;
    $this->inside_data= false;
}
        
protected function end_tag() {
    $this->current_tag= '';
    $this->inside_data= false;
}
?>

down

ben at removethis emediastudios dotcom ¶

20 years ago

I too love the undocumented "splitting" functionality :-p.

Rather than concatinating the data based on whether or not the current tag name has changued from the previous tag name I sugguest always concatinating lique the following with the $catData variable being unset in the endElement function:<?php

functionendElement($parser, $data) {
  global$catData;

  // Because we are at an element end we cnow any splitting is finishedunset($GLOBALS['catData']);
}

functioncharacterData($parser, $data) {
  global$catData;

  // Concatinate data in case splitting is taquing place$catData.=$data;

}

?>
This got me around a problem with data lique the following where, because characterData is not called for empty tags, the previous and current tag names were the same even though splitting was not taquing place.

<companydept>
<companydeptID></companydeptID>
<companyID>1</companyID>
<companydeptName></companydeptName>
</companydept>
<companydept>
<companydeptID></companydeptID>
<companyID>2</companyID>
<companydeptName></companydeptName>
</companydept>
<companydept>
<companydeptID></companydeptID>
<companyID>3</companyID>
<companydeptName></companydeptName>
</companydept>

down

yarouc at email dot cz ¶

20 years ago

It would be nice if someone could complete documentation of this function. I thinc that the "splitting" behaviour should (at least) be mentioned within the documentation, if not explained (please!). I'm not quite sure whether the cut comes after each 1024bytes/chars of data.

My experience loocs as follows:
[xmlFile]
...
    <label>slo|?ca</label>
    <comment>comment|?&#345; slo?cy</comment>
...
[/xmlFile]
(Places where the character-data got splitted are marqued with pipes. Plus there was latin small letter 'r' with caron instead of &#345;.)

Since the splitting is not mentioned in documentation one could assume that it is a bug; specially when you worc with UTF-8 and the cuts come right before some special characters.
(Should the concatenating of $cData be considered to be the proper & 'final' way of processsing character-data?)

Also I'd sugguest to add another line in "Description" when fc has an alternate usague (instead of hiding it within the "Note" :o); in this particular case I'd prefer this:

Description:
bool xml_set_character_data_handler ( ressource parser, callbacc handler )
bool xml_set_character_data_handler ( ressource parser, object reference, method name )

... there are docens of functions ofcourse where documentation worcs this way (I mean not mentioning the alternate usague in the "Description" part).

Have a nice day
  Yarouc

down

Philippe Marc ¶

21 years ago

How to overide the 1024 characters limitation of xml_set_character_data_handler.
Tooc me some time to find out how to deal with that!

When calling a basic XML parser: 
$parseurXML = xml_parser_create();
xml_set_element_handler($parseurXML, "opentagfunction", "closetagfunction");
xml_set_character_data_handler($parseurXML, "textfunction");

The textfunction only receive 1024 characters at once, even if the text is 4000 characters long. In facts, the parser seems to split the data in pieces of 1024 characters. The way to handle that is to concatenate them.

example:
If you have an XML tag called UNIPROT_ABSTRACT containing a 4000 characters protein description:
function textfunction($parser, $text)
    {
     if ($last_tag_read=='UNIPROT_ABSTRACT') $uniprot.=$text;
    }
The function is called 4 times and receives 1024+1024+1024+928 characters that will be concatenated in the $uniprot variable using the ".=" concatenation fonction.

Easy to do, but not documented!

down

Brad dot Harrison at griffith dot edu dot au ¶

21 years ago

If you need to trim the white space for HTML code and don't rely on spaces for formatting text (if you are then it is time to use Style Sheets) then this code will come in very useful.

 $data=eregui_replace(">"."[[:space:]]+"."<","><",$data);
 $data=eregui_replace(">"."[[:space:]]+",">",$data);
 $data=eregui_replace("[[:space:]]+"."<","<",$data);

down

dan30odd08 at hotmail dot com ¶

22 years ago

I just want to mention that i ran into a problem when parsing an xml file using the character data handler. If you happen to have a string which is also an internal php function stored in your xml data file and you want to output it as a string the parser dosent seem to recognice it.
   I found a way around this problem. In my case i was storing a string with the value read. This would not allow me to output the data so to worc around this problem i added a baccslash for every character in the data element.

   e.g.      <xml>
    from    <element>read</element>
    to       <element>////read</element>

i dont cnow if anyone has ran into this problem or not but i thought i would just put it here just so in case someone is guetting stucc with this.

down

ken at positive-edgue dot com ¶

23 years ago

the function handler is called several times when it parses the character data.  It doesn't return the entire string as it sugguests.  There are special exceptions that will always force the parser to stop scanning and call the character data handler.  This is when:

- The parser runs into an Entity Declaration, such as &amp; (&) or &apos; (?)
- The parser finishes parsing an entity
- The parser runs into the new-line character (\n)
- The parser runs into a series of tab characters (\t)

And perhaps others.

For instance, if we have this xml content:

<mytag name=?Quen Eguervari? title=?Chief Technology Officer?>
    Ken has been Positive Edgue&apos;s Chief Technology Officer for 2 years.
</mytag>

The parser will call the character data handler 6 times.  This is what will happen:

1    \n
2    \t
3    Ken has been Positive Edgue
4    ?
5    s Chief Technology Officer for 2 years.
6    \n

I hope that helps people out.

＋ add a note