How to Parse XML in Python (10 Methods + Examples)

Michael Lee

Expert Network Defense Engineer

24-Sep-2025

Parsing XML is a common task in Python, whether you are working with configuration files, web scraping, or APIs. This guide provides 10 different solutions with code examples, use cases, comparison tables, and FAQs. By the end, you’ll know which method fits your project best.

🔹 What is XML Parsing?

XML (eXtensible Markup Language) is widely used to store and transport data. Parsing XML means reading the XML structure and extracting useful information. In Python, you have multiple ways to achieve this, ranging from built-in libraries to advanced frameworks.

🔹 Solution 1: Using `xml.etree.ElementTree` (Standard Library)

python Copy

import xml.etree.ElementTree as ET

xml_data = '''<root><item>Apple</item><item>Banana</item></root>'''
root = ET.fromstring(xml_data)

for child in root:
    print(child.tag, child.text)

✅ Pros: Built-in, fast, no dependencies.
⚠️ Cons: Limited validation, weaker for complex XML.

🔹 Solution 2: Using `xml.dom.minidom`

python Copy

from xml.dom import minidom

doc = minidom.parseString('<root><item>Apple</item></root>')
items = doc.getElementsByTagName("item")

for item in items:
    print(item.firstChild.data)

✅ Pros: Pretty printing, simple to use.
⚠️ Cons: Memory-heavy for large XML.

🔹 Solution 3: Using `lxml`

python Copy

from lxml import etree

xml_data = '''<root><item>Apple</item><item>Banana</item></root>'''
root = etree.fromstring(xml_data)

for item in root.findall("item"):
    print(item.text)

✅ Pros: Very fast, XPath support, validation.
⚠️ Cons: Requires installation (pip install lxml).

🔹 Solution 4: Using `BeautifulSoup`

python Copy

from bs4 import BeautifulSoup

xml_data = """<root><item>Apple</item><item>Banana</item></root>"""
soup = BeautifulSoup(xml_data, "xml")

for item in soup.find_all("item"):
    print(item.text)

✅ Pros: Beginner-friendly, flexible parsing.
⚠️ Cons: Slower than lxml.

🔹 Solution 5: Using `defusedxml` (Secure Parsing)

python Copy

from defusedxml.ElementTree import fromstring

xml_data = '<root><item>Apple</item></root>'
root = fromstring(xml_data)

for child in root:
    print(child.text)

✅ Pros: Prevents XML vulnerabilities (XXE, Billion Laughs).
⚠️ Cons: Limited features.

🔹 Solution 6: Using `xmltodict`

python Copy

import xmltodict

xml_data = """<root><item>Apple</item><item>Banana</item></root>"""
parsed = xmltodict.parse(xml_data)

print(parsed["root"]["item"])

✅ Pros: Converts XML → Python dict directly.
⚠️ Cons: Not ideal for streaming large XML.

🔹 Solution 7: Using `pandas`

python Copy

import pandas as pd

xml_data = """<root><row><name>John</name></row><row><name>Jane</name></row></root>"""
df = pd.read_xml(xml_data)
print(df)

✅ Pros: Perfect for data analysis.
⚠️ Cons: Requires structured XML.

🔹 Solution 8: Using Regex (Not Recommended)

python Copy

import re

xml_data = '<root><item>Apple</item><item>Banana</item></root>'
items = re.findall(r'<item>(.*?)</item>', xml_data)
print(items)

✅ Pros: Quick hacks.
⚠️ Cons: Fragile, breaks on nested/complex XML.

🔹 Solution 9: Using Scrapy (For Web Crawling)

python Copy

import scrapy

class XMLSpider(scrapy.Spider):
    name = "xml_spider"
    start_urls = ["https://example.com/data.xml"]

    def parse(self, response):
        for item in response.xpath("//item/text()").getall():
            yield {"item": item}

✅ Pros: Scalable, great for scraping XML feeds.
⚠️ Cons: Overkill for simple parsing.

🔹 Solution 10: Using Scrapeless API (Best Alternative)

Instead of maintaining parsing logic yourself, you can use Scrapeless Scraping Browser. It automatically:

Handles dynamic content
Extracts structured data (JSON, XML)
Bypasses anti-bot protection

python Copy

import requests

url = "https://api.scrapeless.com/xml-extract"
payload = {"url": "https://example.com/data.xml"}

response = requests.post(url, json=payload)
print(response.json())

✅ Pros: No setup, robust, scalable.
⚠️ Cons: Paid service.

🔹 Comparison Table

Method	Ease of Use	Speed	Security	Best For
ElementTree	⭐⭐⭐	Fast	❌	Simple XML
minidom	⭐⭐	Medium	❌	Pretty-printing
lxml	⭐⭐⭐⭐	Very Fast	✅	Complex XML, XPath
BeautifulSoup	⭐⭐⭐	Slow	❌	Beginners
defusedxml	⭐⭐	Medium	✅	Secure parsing
xmltodict	⭐⭐⭐⭐	Fast	❌	Dict conversion
pandas	⭐⭐⭐	Medium	❌	Data analysis
Regex	⭐	Fast	❌	Quick hacks only
Scrapy	⭐⭐⭐	Medium	✅	Crawling feeds
Scrapeless API	⭐⭐⭐⭐	Very Fast	✅	Enterprise-grade parsing

🔹 Real-World Scenarios

Config files → ElementTree
Large datasets → lxml
APIs → xmltodict
Data science → pandas
Secure apps → defusedxml
Web scraping → Scrapy or Scrapeless

🔹 FAQ

Q1: What is the fastest way to parse XML in Python?
👉 lxml is the fastest open-source solution. Scrapeless API is faster for production-grade tasks.

Q2: How do I prevent XML security issues?
👉 Use defusedxml or Scrapeless API, which sanitize inputs.

Q3: Can I convert XML directly into JSON?
👉 Yes, xmltodict or Scrapeless API can do this.

Q4: Which method is best for web scraping?
👉 Use Scrapy for small projects, Scrapeless for enterprise needs.

🔹 References

🔹 Conclusion

Python offers many ways to parse XML, from built-in libraries like ElementTree to advanced solutions like lxml and Scrapy. If you need scalable, secure, and maintenance-free parsing, consider using Scrapeless Scraping Browser.

At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.

How to Parse XML in Python (10 Methods + Examples)

🔹 What is XML Parsing?

🔹 Solution 1: Using `xml.etree.ElementTree` (Standard Library)

🔹 Solution 2: Using `xml.dom.minidom`

🔹 Solution 3: Using `lxml`

🔹 Solution 4: Using `BeautifulSoup`

🔹 Solution 5: Using `defusedxml` (Secure Parsing)

🔹 Solution 6: Using `xmltodict`

🔹 Solution 7: Using `pandas`

🔹 Solution 8: Using Regex (Not Recommended)

🔹 Solution 9: Using Scrapy (For Web Crawling)

🔹 Solution 10: Using Scrapeless API (Best Alternative)

🔹 Comparison Table

🔹 Real-World Scenarios

🔹 FAQ

🔹 References

🔹 Conclusion

Most Popular Articles

Scrapeless and Nstbrowser Jointly Establish “Browser Labs”: Launching Strategic Partnership and Comprehensive Cloud Browser Upgrade Plan

How to Enhance Crawl4AI with Scrapeless Cloud Browser

Scrapeless MCP Server Is Officially Live! Build Your Ultimate AI-Web Connector

How to Parse XML in Python (10 Methods + Examples)

🔹 What is XML Parsing?

🔹 Solution 1: Using xml.etree.ElementTree (Standard Library)

🔹 Solution 2: Using xml.dom.minidom

🔹 Solution 3: Using lxml

🔹 Solution 4: Using BeautifulSoup

🔹 Solution 5: Using defusedxml (Secure Parsing)

🔹 Solution 6: Using xmltodict

🔹 Solution 7: Using pandas

🔹 Solution 8: Using Regex (Not Recommended)

🔹 Solution 9: Using Scrapy (For Web Crawling)

🔹 Solution 10: Using Scrapeless API (Best Alternative)

🔹 Comparison Table

🔹 Real-World Scenarios

🔹 FAQ

🔹 References

🔹 Conclusion

Most Popular Articles

Scrapeless and Nstbrowser Jointly Establish “Browser Labs”: Launching Strategic Partnership and Comprehensive Cloud Browser Upgrade Plan

How to Enhance Crawl4AI with Scrapeless Cloud Browser

Scrapeless MCP Server Is Officially Live! Build Your Ultimate AI-Web Connector

🔹 Solution 1: Using `xml.etree.ElementTree` (Standard Library)

🔹 Solution 2: Using `xml.dom.minidom`

🔹 Solution 3: Using `lxml`

🔹 Solution 4: Using `BeautifulSoup`

🔹 Solution 5: Using `defusedxml` (Secure Parsing)

🔹 Solution 6: Using `xmltodict`

🔹 Solution 7: Using `pandas`