Biography: |
Trafilatura is a Python library designed to download, parse, and scrape web page data. It also offers tools that can easily help with website navigation and extraction of links from sitemaps and feeds. It scrapes the main text of web pages while preserving some structure, a task which is also known as boilerplate removal or HTML text cleaning. The result of processing can be in TXT, CSV, JSON & XML formats. |