Java provides some interesting libraries that have no exact equivalent in python. In my case, the awesome boilerpipe library allows me to remove uninteresting parts of HTML pages, like menus, footers and other "boilerplate" contents.
Boilerpipe is written in Java. Two solutions then: using java from python or reimplement boilerpipe ...
read moreThere are comments.