HOANG Cong Duy Vu's research logs
Thursday, 28 April 2011
Boilerpipe - Boilerplate Removal and Fulltext Extraction from HTML pages
Link
:
http://code.google.com/p/boilerpipe/
The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.
No comments:
Post a Comment
Newer Post
Older Post
Home
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment