🥳Join the Scrapeless Community and Claim Your Free Trial to Access Our Powerful Web Scraping Toolkit!
Back to Blog

How to Parse XML in Python (10 Methods + Examples)

Michael Lee
Michael Lee

Expert Network Defense Engineer

24-Sep-2025

Parsing XML is a common task in Python, whether you are working with configuration files, web scraping, or APIs. This guide provides 10 different solutions with code examples, use cases, comparison tables, and FAQs. By the end, you’ll know which method fits your project best.


🔹 What is XML Parsing?

XML (eXtensible Markup Language) is widely used to store and transport data. Parsing XML means reading the XML structure and extracting useful information. In Python, you have multiple ways to achieve this, ranging from built-in libraries to advanced frameworks.


🔹 Solution 1: Using xml.etree.ElementTree (Standard Library)

python Copy
import xml.etree.ElementTree as ET

xml_data = '''<root><item>Apple</item><item>Banana</item></root>'''
root = ET.fromstring(xml_data)

for child in root:
    print(child.tag, child.text)

✅ Pros: Built-in, fast, no dependencies.
⚠️ Cons: Limited validation, weaker for complex XML.


🔹 Solution 2: Using xml.dom.minidom

python Copy
from xml.dom import minidom

doc = minidom.parseString('<root><item>Apple</item></root>')
items = doc.getElementsByTagName("item")

for item in items:
    print(item.firstChild.data)

✅ Pros: Pretty printing, simple to use.
⚠️ Cons: Memory-heavy for large XML.


🔹 Solution 3: Using lxml

python Copy
from lxml import etree

xml_data = '''<root><item>Apple</item><item>Banana</item></root>'''
root = etree.fromstring(xml_data)

for item in root.findall("item"):
    print(item.text)

✅ Pros: Very fast, XPath support, validation.
⚠️ Cons: Requires installation (pip install lxml).


🔹 Solution 4: Using BeautifulSoup

python Copy
from bs4 import BeautifulSoup

xml_data = """<root><item>Apple</item><item>Banana</item></root>"""
soup = BeautifulSoup(xml_data, "xml")

for item in soup.find_all("item"):
    print(item.text)

✅ Pros: Beginner-friendly, flexible parsing.
⚠️ Cons: Slower than lxml.


🔹 Solution 5: Using defusedxml (Secure Parsing)

python Copy
from defusedxml.ElementTree import fromstring

xml_data = '<root><item>Apple</item></root>'
root = fromstring(xml_data)

for child in root:
    print(child.text)

✅ Pros: Prevents XML vulnerabilities (XXE, Billion Laughs).
⚠️ Cons: Limited features.


🔹 Solution 6: Using xmltodict

python Copy
import xmltodict

xml_data = """<root><item>Apple</item><item>Banana</item></root>"""
parsed = xmltodict.parse(xml_data)

print(parsed["root"]["item"])

✅ Pros: Converts XML → Python dict directly.
⚠️ Cons: Not ideal for streaming large XML.


🔹 Solution 7: Using pandas

python Copy
import pandas as pd

xml_data = """<root><row><name>John</name></row><row><name>Jane</name></row></root>"""
df = pd.read_xml(xml_data)
print(df)

✅ Pros: Perfect for data analysis.
⚠️ Cons: Requires structured XML.


🔹 Solution 8: Using Regex (Not Recommended)

python Copy
import re

xml_data = '<root><item>Apple</item><item>Banana</item></root>'
items = re.findall(r'<item>(.*?)</item>', xml_data)
print(items)

✅ Pros: Quick hacks.
⚠️ Cons: Fragile, breaks on nested/complex XML.


🔹 Solution 9: Using Scrapy (For Web Crawling)

python Copy
import scrapy

class XMLSpider(scrapy.Spider):
    name = "xml_spider"
    start_urls = ["https://example.com/data.xml"]

    def parse(self, response):
        for item in response.xpath("//item/text()").getall():
            yield {"item": item}

✅ Pros: Scalable, great for scraping XML feeds.
⚠️ Cons: Overkill for simple parsing.


🔹 Solution 10: Using Scrapeless API (Best Alternative)

Instead of maintaining parsing logic yourself, you can use Scrapeless Scraping Browser. It automatically:

  • Handles dynamic content
  • Extracts structured data (JSON, XML)
  • Bypasses anti-bot protection
python Copy
import requests

url = "https://api.scrapeless.com/xml-extract"
payload = {"url": "https://example.com/data.xml"}

response = requests.post(url, json=payload)
print(response.json())

✅ Pros: No setup, robust, scalable.
⚠️ Cons: Paid service.


🔹 Comparison Table

Method Ease of Use Speed Security Best For
ElementTree ⭐⭐⭐ Fast Simple XML
minidom ⭐⭐ Medium Pretty-printing
lxml ⭐⭐⭐⭐ Very Fast Complex XML, XPath
BeautifulSoup ⭐⭐⭐ Slow Beginners
defusedxml ⭐⭐ Medium Secure parsing
xmltodict ⭐⭐⭐⭐ Fast Dict conversion
pandas ⭐⭐⭐ Medium Data analysis
Regex Fast Quick hacks only
Scrapy ⭐⭐⭐ Medium Crawling feeds
Scrapeless API ⭐⭐⭐⭐ Very Fast Enterprise-grade parsing

🔹 Real-World Scenarios

  • Config filesElementTree
  • Large datasetslxml
  • APIsxmltodict
  • Data sciencepandas
  • Secure appsdefusedxml
  • Web scrapingScrapy or Scrapeless

🔹 FAQ

Q1: What is the fastest way to parse XML in Python?
👉 lxml is the fastest open-source solution. Scrapeless API is faster for production-grade tasks.

Q2: How do I prevent XML security issues?
👉 Use defusedxml or Scrapeless API, which sanitize inputs.

Q3: Can I convert XML directly into JSON?
👉 Yes, xmltodict or Scrapeless API can do this.

Q4: Which method is best for web scraping?
👉 Use Scrapy for small projects, Scrapeless for enterprise needs.


🔹 References


🔹 Conclusion

Python offers many ways to parse XML, from built-in libraries like ElementTree to advanced solutions like lxml and Scrapy. If you need scalable, secure, and maintenance-free parsing, consider using Scrapeless Scraping Browser.

At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.

Most Popular Articles

Catalogue