This paper is concerned with automatic extraction of titles from the bodies of HTML documents (web pages). Titles of HTML documents should be correctly defined in the title fields by the authors; however, in reality they are often bogus. It is advantageous if we can automatically extract titles from HTML documents. In this paper, we take a supervised machine learning approach to address the problem. We first propose a specification on HTML titles, that is, a ‗definition ‘ on HTML titles. Next, we employ two learning methods to perform the task. In one method, we utilize features extracted from the DOM (Direct Object Model) Tree; in the other method, we utilize features based on vision. We also combine the two methods to further enhance the ...
Abstract—Automatic titling (i.e. providing titles) is one of key domains of Web site accessibility. ...
The Title Tag is an HTML code. The text embedded in the title tag of a web page appears as the title...
We consider the problem of efficient and template-independent news extraction on the Web. The popula...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Title...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Title...
In this paper, we present an analysis based on linguistic and typographic features that allows for t...
The HTML title tag information should identify and describe exactly what a web page contains. This p...
This is an interactive resource which demonstrates how to enter a website title in HTML. Students ar...
Abstract. Titles are denoted by the TITLE element within a web page. We queried the title against th...
In this paper, we show how we can learn to select good words for a document title. We view the probl...
This paper examines the feasibility of discovering "title-like" terms using a decision tree classifi...
The HTML title tag information should identify and describe exactly what a web page contains. This p...
We present a novel method for open domain named entity extraction by exploiting the collective hidde...
In this paper we present an algorithm for automatic extraction of textual elements, namely titles an...
Automatic titling of text documents is an essential task for several applications (automatic heading...
Abstract—Automatic titling (i.e. providing titles) is one of key domains of Web site accessibility. ...
The Title Tag is an HTML code. The text embedded in the title tag of a web page appears as the title...
We consider the problem of efficient and template-independent news extraction on the Web. The popula...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Title...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Title...
In this paper, we present an analysis based on linguistic and typographic features that allows for t...
The HTML title tag information should identify and describe exactly what a web page contains. This p...
This is an interactive resource which demonstrates how to enter a website title in HTML. Students ar...
Abstract. Titles are denoted by the TITLE element within a web page. We queried the title against th...
In this paper, we show how we can learn to select good words for a document title. We view the probl...
This paper examines the feasibility of discovering "title-like" terms using a decision tree classifi...
The HTML title tag information should identify and describe exactly what a web page contains. This p...
We present a novel method for open domain named entity extraction by exploiting the collective hidde...
In this paper we present an algorithm for automatic extraction of textual elements, namely titles an...
Automatic titling of text documents is an essential task for several applications (automatic heading...
Abstract—Automatic titling (i.e. providing titles) is one of key domains of Web site accessibility. ...
The Title Tag is an HTML code. The text embedded in the title tag of a web page appears as the title...
We consider the problem of efficient and template-independent news extraction on the Web. The popula...