Using SimpleXML To Read & Parse XML

posted by John Owen on July 17th, 2009, in PHP | 14 Comments

Before we dive into SimpleXML lets first look at XML. XML (Extensible Markup Language) is a flexible text format for creating structured computer documents. It’s W3C’s recommended standard for creating formats and sharing data on the Web.

An XML document comprises elements, attributes, processing instructions, comments, and entities:

Element: Text delimited by an opening and a closing tag. A tag is a name enclosed within angle brackets.

Attribute: A piece of qualifying information for an element. An attribute consists of a name, an equals sign, and an attribute value delimited by either single-quotes or double-quotes.

Processing instruction: The software that is reading an XML document is referred to as a processor. A processing instruction is additional information embedded in the document to inform the processor and possibly change its behaviour.

Comment: An XML comment begins with the characters: less-than, exclamation mark, minus, minus; and ends with the characters: minus, minus, greater-than. Any text within a comment is intended for a human reader and is ignored by the processor.

Entity: An entity is a compact form that represents other text. Entities are used to specify problematic characters and to include slabs of text defined elsewhere. An entity reference consists of an ampersand, a name, and a semi-colon.

Before we start using simpleXML we have to have an XML file. Heres an example we’ll be using for part of the tutorial.

XML Feed

<?xml version="1.0" encoding="utf-8" ?>
<people title="Names">
	<name1>Jim</name1>
	<name2>Bob</name2>
	<name3>Sam</name3>
</people>

Copy this and create a new XML file from it. I named it names.xml.

Step 1 – Loading The File

We need to load this file into PHP so use this piece of code:

$xml = simplexml_load_file('names.xml');

Step 2 – Basic Reading

An XML feed is made of elements. Each element stores a value. For example:

<name1>Jim</name1>

To start I will show you how to just retrieve the value of one element. I will print the value of name1 to the page.

# You can also use "echo" its the same thing.
# Displays the value of name1 which is "Jim"
print $xml->name1;

If you look at the XML file, you’ll see that the “people” element has an attribute called “title” with the value of “Names”. Use this piece of code to retrieve an attribute.

# You call an attribute just like you would with an array: $array['arrayname/number']
print $xml['title'];

Heres the full code so far, I added an extra line to make the output more easier to understand. You can remove it if you want.

# You can also use "echo" its the same thing.
# Displays the value of <name1> which is "Jim"
$xml = simplexml_load_file('names.xml');
print $xml->name1;
# Seperates outputs (easier to read & understand)
print "<p></p>";
# You call an attribute just like you would with an array: $array['arrayname/number']
print $xml['title'];

The output would look like this:

Jim
 
Names

Not much at the moment but this is just the basics. Now onto reading multiple items!

Step 3 – Parsing Multiple Items

Grab this XML data:

<?xml version="1.0" encoding="utf-8" ?>
<people title="People">
 
	<item id="1">
		<name>
			<first>Bob</first>
			<last>Turner</last>
		</name>
		<age>42</age>
	</item>
 
	<item id="2">
		<name>
			<first>Andrew</first>
			<last>Reed</last>
		</name>
		<age>11</age>
	</item>
 
	<item id="3">
		<name>
			<first>Martin</first>
			<last>Surf</last>
		</name>
		<age>21</age>
	</item>
 
	<item id="4">
		<name>
			<first>Joan</first>
			<last>Cliffe</last>
		</name>
		<age>37</age>
	</item>
 
	<item id="5">
		<name>
			<first>Sue</first>
			<last>Beach</last>
		</name>
		<age>45</age>
	</item>
 
	<item id="6">
		<name>
			<first>Gabriel</first>
			<last>Owen</last>
		</name>
		<age>5</age>
	</item>
 
	<item id="7">
		<name>
			<first>Jack</first>
			<last>Truscott</last>
		</name>
		<age>17</age>
	</item>
 
	<item id="8">
		<name>
			<first>Jim</first>
			<last>Fourgorn</last>
		</name>
		<age>14</age>
	</item>
 
	<item id="9">
		<name>
			<first>Mike</first>
			<last>Snider</last>
		</name>
		<age>13</age>
	</item>
 
</people>

I named my XML data “people.xml”. We will now display all three items with a few short lines of code. I will go through the code step by step.

# Load the people.xml file
$xml = simplexml_load_file('people.xml');
# Start a foreach loop. Translation: for every <item> in the xml file put it into the var $item. 
# now the $item can display all the elements inside the <item>
foreach($xml->item as $item) {
	# These three print's will display the attribute of the <item> (ID), display the first and last name joined together
	# and then the age. The </br> and <p></p> are for spacing out the results
	print "ID: " . $item['id'] . "<br/>";
	print "Name: " . $item->name->first . " " . $item->name->last . "</br>";
	print "Age: " . $item->age . "<p></p>";
}

In this chunk of code there is actually only 5 proper lines of code to parse a xml file that could have hundreds of entries with the “item” element. The comment’s in the code explain the code step-by-step. Here is the output:

ID: 1
Name: Bob Turner
Age: 42
 
ID: 2
Name: Andrew Reed
Age: 11
 
ID: 3
Name: Martin Surf
Age: 21
 
ID: 4
Name: Joan Cliffe
Age: 37
 
ID: 5
Name: Sue Beach
Age: 45
 
ID: 6
Name: Gabriel Owen
Age: 5
 
ID: 7
Name: Jack Truscott
Age: 17
 
ID: 8
Name: Jim Fourgorn
Age: 14
 
ID: 9
Name: Mike Snider
Age: 13

Step 4 – Filtering

Say we only want to find the item that has the id of 2? Well, this can be achieved easily by adding an IF statement inside the for loop. Heres the updated code:

# Load the people.xml file
$xml = simplexml_load_file('people.xml');
# Start a foreach loop. Translation: for every <item> in the xml file put it into the var $item. 
# now the $item can display all the elements inside the <item>
foreach($xml->item as $item) {
	if ($item['id'] == 2) {
		# These three print's will display the attribute of the <item> (ID), display the first and last name joined together
		# and then the age. The </br> and <p></p> are for spacing out the results
		print "ID: " . $item['id'] . "<br/>";
		print "Name: " . $item->name->first . " " . $item->name->last . "</br>";
		print "Age: " . $item->age . "<p></p>";
	}
}

Explanation: If the item’s attribute ID = 1 on the first item then it won’t do anything with that item so it moves onto the next one. It checks the ID to the IF statement again and realizes that the ID = 2, which makes the IF statement’s condition true. This will carry out the code in the IF statement.

If you want to filter but display more than one item then change the condition to this:

($item['id'] == 2 OR $item['id'] == 3)

Lets say we want to only find the people that are between the age of 10 and 21. To do this we will use two IF statements. Since were not comparing an attribute we have to insert a little snippet of code just before the IF statement. Here’s the full code:

# Load the people.xml file
$xml = simplexml_load_file('people.xml');
# Start a foreach loop. Translation: for every <item> in the xml file put it into the var $item. 
# now the $item can display all the elements inside the <item>
foreach($xml->item as $item) {
	$age = $item->age;
	if ($age >= 10) {
		if ($age <= 21) {
			# These three print's will display the attribute of the <item> (ID), display the first and last name joined together
			# and then the age. The </br> and <p></p> are for spacing out the results
			print "ID: " . $item['id'] . "<br/>";
			print "Name: " . $item->name->first . " " . $item->name->last . "</br>";
			print "Age: " . $item->age . "<p></p>";
		}
	}
}

The $age var is set to the age in the item element. The first IF statement checks if the age is above or equal to 10, if it is then it will pass onto the next IF statement which will check if the age is below or equal to 21. If it is then it will display the item.

Output:

ID: 2
Name: Andrew Reed
Age: 11
 
ID: 3
Name: Martin Surf
Age: 21
 
ID: 7
Name: Jack Truscott
Age: 17
 
ID: 8
Name: Jim Fourgorn
Age: 14
 
ID: 9
Name: Mike Snider
Age: 13

Step 5 – Reading A Standard RSS Feed

Here is a standard XML RSS Feed:

<?xml version="1.0" encoding="utf-8" ?>
<rss version="0.91">
<channel>
    <title>Rarest Animals</title>
    <link>http://www.animalinfo.org/rarest.htm</link>
    <description>A list of the world's rarest mammals.</description>
</channel>
 
<item>
    <title>The Addax</title>
    <link>http://www.animalcorner.co.uk/wildlife/addax.html</link>
    <description>The Addax (Addax nasomaculatus), sometimes called the 'screw horn antelope', 
    	because of its twisted horns, is a large, desert dwelling member of the antelope family, 
    	closely related to the Oryx.</description>
</item>
<item>
    <title>Ethiopian Wolf</title>
    <link>http://en.wikipedia.org/wiki/Ethiopian_Wolf</link>
    <description>
    The Ethiopian wolf (Canis simensis) is a carnivorous mammal of the family Canidae. 
    It is also known as the Abyssinian wolf, Abyssinian fox, red jackal, red fox, 
    Simien fox, or Simien jackal among other names.</description>
</item>
</rss>

Lets take a look at the code to read this and display it.

#Load Feed
$xml = simplexml_load_file('feed.xml');
#Display Feed title and description with a link to the page
print '<a href="' . $xml->channel->link . '"><h3>' . $xml->channel->title . '</h3></a>';
print $xml->channel->description;
print '</br></br>';
#Display the items
foreach($xml->item as $item){
	# Display title with link to page
	print '<a href="' . $item->link . '"><h4>' . $item->title . '</h4></a>';
	# display description
	print $item->description;
	# break up the items
	print '<p></p>';
}

This will output:

Rarest Animals
A list of the world's rarest mammals.
 
The Addax
The Addax (Addax nasomaculatus), sometimes called the 'screw horn antelope', because of its twisted horns, is a large, desert dwelling member of the antelope family, closely related to the Oryx.
 
Ethiopian Wolf
The Ethiopian wolf (Canis simensis) is a carnivorous mammal of the family Canidae. It is also known as the Abyssinian wolf, Abyssinian fox, red jackal, red fox, Simien fox, or Simien jackal among other names.

Now, this code is basically the same as the first foreach loop with different element names. For every item in the XML file, do this with the elements.

Finish

If you have any problems or questions just post a comment! Thanks, don’t forget to follow us on twitter!

14 Responses to “Using SimpleXML To Read & Parse XML”

  1. payam said...

    very best and useful
    thanks

  2. thanks alot said...

    hi thanks alot this tutorial help me

  3. Oli said...

    Awesome tutorial, thanks a lot. gonna make some great web apps now with this!

  4. Rashmi said...

    I m new to use xml with php and I have few questions. Can i get ur mailid to discuss? Thank u.

  5. Rashimi said...

    GIVE ME YOUR FUCKING EMAIL YOU STUPID BITCH MOTHER

  6. Rashmi said...

    fucking bitch nigger go suck mi penes

  7. payam said...

    shut up. you are ruining this.

  8. Admin said...

    Thread Closed – due to spam.

  9. Rashmi said...

    yea rite nigguh

  10. John Owen said...

    SHUT THE FUCK UP I HATE THIS TUTOIRAL I MAD EIS BAD I HATE MY LIFE EAT A PIG OR ELSE FOOD WIL LBE LOST

  11. barriers said...

    does php support vtd-xml?

  12. Alan Harries said...

    Nice and simple. Thanks

  13. Paolo said...

    very nice and simple. I did not know where to start and this was fast and clear

  14. Fredrik said...

    Hi John,

    I was really glad that I found your tutorial, it is excellent!
    I would be very happy if you could write a tutorial on how-to write XML via PHP?

    You could send it to me via email as well if you do not want to publish it.

    Thanks

Leave a Reply