How to Parse XML in Python (10 Methods + Examples)

Expert Network Defense Engineer
Parsing XML is a common task in Python, whether you are working with configuration files, web scraping, or APIs. This guide provides 10 different solutions with code examples, use cases, comparison tables, and FAQs. By the end, you’ll know which method fits your project best.
🔹 What is XML Parsing?
XML (eXtensible Markup Language) is widely used to store and transport data. Parsing XML means reading the XML structure and extracting useful information. In Python, you have multiple ways to achieve this, ranging from built-in libraries to advanced frameworks.
🔹 Solution 1: Using xml.etree.ElementTree
(Standard Library)
python
import xml.etree.ElementTree as ET
xml_data = '''<root><item>Apple</item><item>Banana</item></root>'''
root = ET.fromstring(xml_data)
for child in root:
print(child.tag, child.text)
✅ Pros: Built-in, fast, no dependencies.
⚠️ Cons: Limited validation, weaker for complex XML.
🔹 Solution 2: Using xml.dom.minidom
python
from xml.dom import minidom
doc = minidom.parseString('<root><item>Apple</item></root>')
items = doc.getElementsByTagName("item")
for item in items:
print(item.firstChild.data)
✅ Pros: Pretty printing, simple to use.
⚠️ Cons: Memory-heavy for large XML.
🔹 Solution 3: Using lxml
python
from lxml import etree
xml_data = '''<root><item>Apple</item><item>Banana</item></root>'''
root = etree.fromstring(xml_data)
for item in root.findall("item"):
print(item.text)
✅ Pros: Very fast, XPath support, validation.
⚠️ Cons: Requires installation (pip install lxml
).
🔹 Solution 4: Using BeautifulSoup
python
from bs4 import BeautifulSoup
xml_data = """<root><item>Apple</item><item>Banana</item></root>"""
soup = BeautifulSoup(xml_data, "xml")
for item in soup.find_all("item"):
print(item.text)
✅ Pros: Beginner-friendly, flexible parsing.
⚠️ Cons: Slower than lxml
.
🔹 Solution 5: Using defusedxml
(Secure Parsing)
python
from defusedxml.ElementTree import fromstring
xml_data = '<root><item>Apple</item></root>'
root = fromstring(xml_data)
for child in root:
print(child.text)
✅ Pros: Prevents XML vulnerabilities (XXE, Billion Laughs).
⚠️ Cons: Limited features.
🔹 Solution 6: Using xmltodict
python
import xmltodict
xml_data = """<root><item>Apple</item><item>Banana</item></root>"""
parsed = xmltodict.parse(xml_data)
print(parsed["root"]["item"])
✅ Pros: Converts XML → Python dict directly.
⚠️ Cons: Not ideal for streaming large XML.
🔹 Solution 7: Using pandas
python
import pandas as pd
xml_data = """<root><row><name>John</name></row><row><name>Jane</name></row></root>"""
df = pd.read_xml(xml_data)
print(df)
✅ Pros: Perfect for data analysis.
⚠️ Cons: Requires structured XML.
🔹 Solution 8: Using Regex (Not Recommended)
python
import re
xml_data = '<root><item>Apple</item><item>Banana</item></root>'
items = re.findall(r'<item>(.*?)</item>', xml_data)
print(items)
✅ Pros: Quick hacks.
⚠️ Cons: Fragile, breaks on nested/complex XML.
🔹 Solution 9: Using Scrapy (For Web Crawling)
python
import scrapy
class XMLSpider(scrapy.Spider):
name = "xml_spider"
start_urls = ["https://example.com/data.xml"]
def parse(self, response):
for item in response.xpath("//item/text()").getall():
yield {"item": item}
✅ Pros: Scalable, great for scraping XML feeds.
⚠️ Cons: Overkill for simple parsing.
🔹 Solution 10: Using Scrapeless API (Best Alternative)
Instead of maintaining parsing logic yourself, you can use Scrapeless Scraping Browser. It automatically:
- Handles dynamic content
- Extracts structured data (JSON, XML)
- Bypasses anti-bot protection
python
import requests
url = "https://api.scrapeless.com/xml-extract"
payload = {"url": "https://example.com/data.xml"}
response = requests.post(url, json=payload)
print(response.json())
✅ Pros: No setup, robust, scalable.
⚠️ Cons: Paid service.
🔹 Comparison Table
Method | Ease of Use | Speed | Security | Best For |
---|---|---|---|---|
ElementTree | ⭐⭐⭐ | Fast | ❌ | Simple XML |
minidom | ⭐⭐ | Medium | ❌ | Pretty-printing |
lxml | ⭐⭐⭐⭐ | Very Fast | ✅ | Complex XML, XPath |
BeautifulSoup | ⭐⭐⭐ | Slow | ❌ | Beginners |
defusedxml | ⭐⭐ | Medium | ✅ | Secure parsing |
xmltodict | ⭐⭐⭐⭐ | Fast | ❌ | Dict conversion |
pandas | ⭐⭐⭐ | Medium | ❌ | Data analysis |
Regex | ⭐ | Fast | ❌ | Quick hacks only |
Scrapy | ⭐⭐⭐ | Medium | ✅ | Crawling feeds |
Scrapeless API | ⭐⭐⭐⭐ | Very Fast | ✅ | Enterprise-grade parsing |
🔹 Real-World Scenarios
- Config files →
ElementTree
- Large datasets →
lxml
- APIs →
xmltodict
- Data science →
pandas
- Secure apps →
defusedxml
- Web scraping →
Scrapy
orScrapeless
🔹 FAQ
Q1: What is the fastest way to parse XML in Python?
👉 lxml
is the fastest open-source solution. Scrapeless API is faster for production-grade tasks.
Q2: How do I prevent XML security issues?
👉 Use defusedxml
or Scrapeless API, which sanitize inputs.
Q3: Can I convert XML directly into JSON?
👉 Yes, xmltodict
or Scrapeless API can do this.
Q4: Which method is best for web scraping?
👉 Use Scrapy for small projects, Scrapeless for enterprise needs.
🔹 References
🔹 Conclusion
Python offers many ways to parse XML, from built-in libraries like ElementTree
to advanced solutions like lxml
and Scrapy
. If you need scalable, secure, and maintenance-free parsing, consider using Scrapeless Scraping Browser.
At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.