Subscribed unsubscribe Subscribe Subscribe

SE Can't Code

A Tokyo based Software Engineer. Not System Engineer :(

Tips, about XML.

HTML is complex to communicate between computer systems. For example, this HTML code is actioned without ploblems.

        hello world!

There isn't end tag of <b> in thid code. End tags isn’t necessary for all tags in HTML. But HTML is difficult to parse web page’s contents. So, there are many formats of WebAPI which make easy for computer systems to parse web contents.

XML(Extensible Markup Language)

XML is consistance Markup language. It is differense between XML and HTML. XML always has end tags. HTML's tags is determined, but you can discretion create XML's tags. There is XHTML which similars HTML. Because XHTML is consistant, Computer systems can parses correction rendering.

Parsing XML

I use built-in parser in Python. Python has a library called “minidom”, you can use minidom how to say something like below:

>>> from xml.dom import minidom

Dom is refering to the computing representation of the XML. Minidom has features what simple and fast in Dom parser. When you try to parse big dates, Minidom is broken. But it is better choice to use Minidom when you try to parse small dates. So, To parse subject XML, you say something like below:

>>> minidom.parseString("contents!12")

There are various objects in Minidom, Let’s use toprettyxml in Minidom. Return result like below:

>>> print x.toprettyxml()

So, Pretty!! You can see structure of XML document. Next, I use getElementsByTagName in Minidom.

>>> x.getElementsByTagName("item")[0].childNodes[0].nodeValue
  • getElementsByTagName(“item”)[0] is refering 1.
  • childNodes[0] is refering 1.

And if you say nodeValue, you get textnodes. return result like below:


XML is good Hyper Media Format. And there are other good format, for example JSON, Atom, AtomPub … It is important to choice format which suitabele for application.

Remove all ads