please note, using this library is not as complicated as it sounds, it consists of only 275 lines of
python,
which is several orders of magnitude shorter than this documentation.
from atropine import go, check, special from atropine.atropine import Atropine import re atropine = Atropine(''' <!-- snip --> <table id="earningsTable"> <tbody> <tr> <td class="headerTableCell"> Quarterly Earnings </td> <td class="dataTableCell"> <span class="unhelpfulClassName">GBP</span> <span class="unhelpfulClassName">123.45</span> </td> </tr> </tbody> </table>''', ignorewhitespace=True) qearningsregex = re.compile(r'quarterly earnings', re.IGNORECASE) atropine = atropine.resolve(go.only(tag='table', attrs=dict(id='earningsTable')), go.child(0), check.has(tag='tbody'), go.child(0), check.has(tag='tr'), go.child(0), check.has(tag='td', cls='headerTableCell', onlytext=qearningsregex), go.nextsib, check.has(tag='td', cls='dataTableCell'), special.collect('earnings-info', alltext=True)) (currency, amount) = atropine.collection['earnings-info'] amount = int(float(amount) * 100) # store these variables somewhere |
Atropine.current
Atropine.registerchecker(name, function)
Atropine.getchecker(name)
Atropine.istextnode(tag)
Atropine.onlytext(tag)
Atropine.assimilate(tag)
Atropine.resolve(resolver, [resolver, ...])
a null resolver is defined as any resolver that asserts some stuff about the current node, but doesn't change it - conversely, a directional resolver is one which locates some node and sets it as the current one.
resolve returns a new Atropine instance that represents the current tag at the end of the resolve call
def randomchild(atropine): # Atropine.assimilate is identical to assigning to # atropine.current, but it asserts it argument is not # BeautifulSoup.Null or None atropine.assimilate(random.choice(atropine.current.contents)) # then use it just as you would any other resolver atropine.resolve(randomchild) |
Writing Your Own Checkers
def ntextnodes(atropine, n): #(check.equal is a utility function equal(x, y) that returns # x == y if y is not a sequence, or x in y, if y is a sequence) return check.equal(len(t for t in atropine.current.contents if atropine.istextnode(t)), n) atropine.registerchecker('ntextnodes', ntextnodes) # you can now use this like so: atropine.resolve(check.has(tag='td', ntextnodes=4)) atropine.resolve(check.has(tag='td', ntextnodes=(1, 2, 3, 4))) |
indexonparent