This thesis presents a mechanism based on eXtensible Markup Language (XML) to extract data from HTML-based Web pages and populate relational databases. This task is performed by a system called the XML-based Web Agent (XWA). The data extraction is done in three phases. First, the Web pages are converted to well-formed XML documents to facilitate their processing. Second, the data is extracted from the well-formed XML documents and formatted into valid XML documents. Finally, the valid XML documents are mapped into tables to be stored in a relational database. To extract specific data from the Web, the XWA requires information about the Web pages from which to extract the data, the location of the data within the Web pages, and how the extra...
The presented thesis deals with the task of automatic information extraction from HTML documents for...
The Extensible Markup Language (XML) provides a simple, extendable, well-structured, platform indepe...
This application proves XML as the excellent backend technology for storing and sharing data in a hi...
This work contains a brief overview of technologies for representation and obtaining data on WWW and...
Data extraction from web document is becoming more popular and widely used for many tasks. The obje...
With the development of the Internet, the World Wide Web has become an invaluable information source...
Efficient access to data, sharing data, extracting information from data, and making use of the info...
Abstract XML technology has emerged during recent years as a popular choice for representing and exc...
EXtensible Markup Language (XML) is a self-describing meta-language and fast emerging a standard for...
International audienceOne of the promises of the Semantic Web is to support applications that easily...
Goals of this work are design and implementation of an application which will allow efective data ex...
Web environment has developed into the largest source of electronic documents, so it would be very u...
Abstract. The Word Wide Web has becoming one of the most important information repositories. However...
International audienceThe World Wide Web no longer consists just of HTML pages. Our work sheds light...
This thesis deals with data extraction from web pages created in HTML language. It describes methods...
The presented thesis deals with the task of automatic information extraction from HTML documents for...
The Extensible Markup Language (XML) provides a simple, extendable, well-structured, platform indepe...
This application proves XML as the excellent backend technology for storing and sharing data in a hi...
This work contains a brief overview of technologies for representation and obtaining data on WWW and...
Data extraction from web document is becoming more popular and widely used for many tasks. The obje...
With the development of the Internet, the World Wide Web has become an invaluable information source...
Efficient access to data, sharing data, extracting information from data, and making use of the info...
Abstract XML technology has emerged during recent years as a popular choice for representing and exc...
EXtensible Markup Language (XML) is a self-describing meta-language and fast emerging a standard for...
International audienceOne of the promises of the Semantic Web is to support applications that easily...
Goals of this work are design and implementation of an application which will allow efective data ex...
Web environment has developed into the largest source of electronic documents, so it would be very u...
Abstract. The Word Wide Web has becoming one of the most important information repositories. However...
International audienceThe World Wide Web no longer consists just of HTML pages. Our work sheds light...
This thesis deals with data extraction from web pages created in HTML language. It describes methods...
The presented thesis deals with the task of automatic information extraction from HTML documents for...
The Extensible Markup Language (XML) provides a simple, extendable, well-structured, platform indepe...
This application proves XML as the excellent backend technology for storing and sharing data in a hi...