Avian’s Blog

Electronics and Free Software

BeautifulSoup tips, part 2

25.05.2008 19:49

Here's another interesting catch in BeautifulSoup: you can iterate through BeautifulSoup Tag's child nodes simply by using a Tag object as an iterable object. For example in a for loop like this:

for t in tag:
	# do something with t

However, what if tag is a NavigableString? If you're doing a recursive search through the tree, this will happen sooner or later. Since NavigableString doesn't have any child nodes, you would expect that this for loop would throw an exception, right? Well, not exactly.

Since NavigableString derives from Unicode class, it can also be used as an iterable object, however this time you'll iterate through single characters in its contents (which are Unicode objects themselves).

That was a source of some weird parsing errors in some of the code I was working on. So before iterating through a tag, check if it isn't a subclass of NavigableString:

if not isinstance(tag, NavigableString):
	for t in tag:
		# do something with t
Posted by Tomaž | Categories: Code
Comments
Add a new comment

Your name

Your email (optional, will be shown publicly)

Your web site (optional)


(No HTML tags allowed. Separate paragraphs with a blank line.)