With all the fancy XML parsers/serializers/deserializers around, you would almost forget there are built-in PHP functions for defining highly customized XML parsers. You can use the xml_* functions to assemble an XML parser step by step.

Setting up an XML parser - the procedural way

First, you create a new parser resource using

$encoding = 'UTF-8';
$parser = xml_parser_create($encoding);

The first and only argument of this function is the target encoding of the parser.

Next, we need to define functions that are to be called for each tag that is opened, and also one for each end tag. We should then register these functions:

function startElement($parser, $name, array $attributes)
{
}

function endElement($parser, $name)
{
}

xml_set_element_handler($parser, 'startElement', 'endElement');

Optionally, we can define a function for CDATA sections:

function cdata($parser, $cdata)
{
}

xml_set_character_data_handler($parser, 'cdata');

Finally, call

$result = xml_parse($parser, $xmlData);

The return value is a boolean value indicating the successfulness of the function. When you want to capture the output of your XML parser functions, you should use output buffering (ob_start(), ob_get_contents() and ob_end_clean()). If the return value is false, you can find out what the problem was by calling

echo xml_error_string(xml_get_error_code($parser));

An object-oriented parser

All of this is unfortunately procedural-style. But, fortunately there is a nice solution to this: we can wrap all the functions in a class: the element and CDATA handlers will be methods of this class and inside the constructor we can set the current instance as the parser's object using

xml_set_object($parser, $this);

An object-oriented parser would look like

class ObjectOrientedXmlParser
{
    private $parser;

    public function __construct($encoding = 'UTF-8')
    {
        $this->parser = xml_parser_create($encoding);

        xml_set_object($this->parser, $this);
        xml_set_element_handler($this->parser, 'startElement', 'endElement');
        xml_set_character_data_handler($this->parser, 'cdata');
    }

    public function parse($data $final)
    {
        return xml_parse($this->parser, $data, $final);
    }

    public function startElement($parser, $name, array $attributes)
    {
        var_dump(func_get_args());
    }

    public function cdata($parser, $cdata)
    {
        var_dump(func_get_args());
    }

    public function endElement($parser, $name)
    {
        var_dump(func_get_args());
    }

    public function __destruct()
    {
        if (is_resource($this->parser)) {
            xml_parser_free($this->parser);
        }
    }
}

As you can see I've added two extra methods: __destruct() makes sure the parser resource gets freed. And the parse() method is a wrapper for xml_parse().

The only thing you have to do to be able to use the parser is execute these statements:

$parser = new ObjectOrientedXmlParser;
$parser->parse($xmlData);

Keeping track of global state

Using the procedural style you may want to keep track of some kind of global state, since the startElement and endElement functions will be called in a very much "stand-alone" way - no context is available, except for the $parser resource itself). This is why many examples in the PHP documentation show us the bad practice of using the global keyword in front of variables that are needed inside the functions. Creating the XML parser in an object-oriented fashion makes these global variables unnecessary and therefore makes the parser much more self-contained.

Using SplStack to store previous elements

A trick you can use to keep track of the place inside the tree of elements is using a SplStack.

A stack is created by adding things to it. Then you can take a look at all the things in the stack and afterwards (possibly) throw these things away. It works like a real-world stack does: the last thing you put on top off it, will be the first thing you take from it (LIFO: last in - first out).

$stack = new \SplStack;
$stack->push('html');
$stack->push('body');
$stack->push('p');

foreach ($stack as $name) {
    echo $name . ', ';
}

// output: p, body, html

Using a stack inside the ObjectOrientedXmlParser allows us to keep track of our place in the XML tree:

class ObjectOrientedXmlParser
{
    // ....

    private $stack;

    public function __construct($encoding = 'UTF-8')
    {
        // ...

        $this->stack = new \SplStack;
        $this->stack->push('#root');
    }

    public function startElement($parser, $name, $attributes)
    {
        echo 'Previous tags: ';
        foreach ($this->stack as $previousName) {
            echo $previousName . ', ';
        }
        echo "\n";

        $this->stack->push($name);

        // ...
    }

    public function endElement($parser, $name)
    {
        $this->stack->pop();
    }

    // ...
}

Now whenever a new element starts, a list of previous element names will be printed. Afterwards, the current element name will be added to the stack ("pushed" means - added on top of the stack). Of course, the stack is also available inside the CDATA handler method.

PHP XML
Comments
This website uses MailComments: you can send your comments to this post by email. Read more about MailComments, including suggestions for writing your comments (in HTML or Markdown).
Divya

what are all the methods are avliable for xml pharsing in php