Web environment has developed into the largest source of electronic documents, so it would be very useful, to process this information automatically. This is however not a trivial problem. Most documents are written in HTML (Hypertext Markup Language), which does not support semantic description of the content. The goal of this work is to create modular system for information extraction and further processing of this information from HTML documents. Further processing of information means to store this information in XML document or relational database. System modularity makes it possible to use various information extraction and storing methods, thus the system can be used for various tasks
The explosive growth and popularity of the Web has resulted in a huge amount of digital information ...
Information and content integration are believed to be a possible solution to the problem of informa...
Abstract. The advance of the Web has significantly and rapidly changed the way of information organi...
This thesis presents a mechanism based on eXtensible Markup Language (XML) to extract data from HTML...
this paper we propose a model of a Web site that describes logical structure of contained data. Fur...
Abstract. The Word Wide Web has becoming one of the most important information repositories. However...
With the development of the Internet, the World Wide Web has become an invaluable information source...
We present new techniques for supervised wrapper generation and automated web information extraction...
This work describes scope of creating application for extraction and following data from HTML sites....
Current methods of information extraction from HTML documents are mostly based on the discovery of ...
This work contains a brief overview of technologies for representation and obtaining data on WWW and...
Goals of this work are design and implementation of an application which will allow efective data ex...
We present an approach for exploiting knowledge from documents in the web. It is based on the integr...
This is an electronic version of the paper presented at the International e-Conference of Computer S...
Information and content integration are believed to be a possible solution to the problem of informa...
The explosive growth and popularity of the Web has resulted in a huge amount of digital information ...
Information and content integration are believed to be a possible solution to the problem of informa...
Abstract. The advance of the Web has significantly and rapidly changed the way of information organi...
This thesis presents a mechanism based on eXtensible Markup Language (XML) to extract data from HTML...
this paper we propose a model of a Web site that describes logical structure of contained data. Fur...
Abstract. The Word Wide Web has becoming one of the most important information repositories. However...
With the development of the Internet, the World Wide Web has become an invaluable information source...
We present new techniques for supervised wrapper generation and automated web information extraction...
This work describes scope of creating application for extraction and following data from HTML sites....
Current methods of information extraction from HTML documents are mostly based on the discovery of ...
This work contains a brief overview of technologies for representation and obtaining data on WWW and...
Goals of this work are design and implementation of an application which will allow efective data ex...
We present an approach for exploiting knowledge from documents in the web. It is based on the integr...
This is an electronic version of the paper presented at the International e-Conference of Computer S...
Information and content integration are believed to be a possible solution to the problem of informa...
The explosive growth and popularity of the Web has resulted in a huge amount of digital information ...
Information and content integration are believed to be a possible solution to the problem of informa...
Abstract. The advance of the Web has significantly and rapidly changed the way of information organi...