HTML is complex to communicate between computer systems. For example, this HTML code is actioned without ploblems.
<form> <b> hello world! </b> </form>
There isn't end tag of
<b> in thid code. End tags isn’t necessary for all tags in HTML. But HTML is difficult to parse web page’s contents. So, there are many formats of WebAPI which make easy for computer systems to parse web contents.
XML(Extensible Markup Language)
XML is consistance Markup language. It is differense between XML and HTML. XML always has end tags. HTML's tags is determined, but you can discretion create XML's tags. There is XHTML which similars HTML. Because XHTML is consistant, Computer systems can parses correction rendering.
>>> from xml.dom import minidom
Dom is refering to the computing representation of the XML. Minidom has features what simple and fast in Dom parser. When you try to parse big dates, Minidom is broken. But it is better choice to use Minidom when you try to parse small dates. So, To parse subject XML, you say something like below:
There are various objects in Minidom, Let’s use toprettyxml in Minidom. Return result like below:
>>> print x.toprettyxml()
So, Pretty!! You can see structure of XML document. Next, I use getElementsByTagName in Minidom.
- getElementsByTagName(“item”) is refering
- childNodes is refering 1.
And if you say nodeValue, you get textnodes. return result like below: