Data warehousing is a generally accepted method of providing corporate decision support. Today, the majority of information in these warehouses originates from sources within a company, although changes often occur from the outside. Companies need to look outside their enterprises for valuable information, increasing their knowledge of customers, suppliers, competitors etc. The largest and most frequently accessed information source today is the Web, which holds more and more useful business information. Today, the Web primarily relies on HTML, making mechanical extraction of information a difficult task. In the near future, XML is expected to replace HTML as the language of the Web, bringing more structure and content focus. One problem wh...