This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, in reality HTML titles are often bogus. It is desirable to conduct automatic extraction of titles from the bodies of HTML documents. This is an issue which does not seem to have been investigated previously. In this paper, we take a supervised machine learning approach to address the problem. We propose a specification on HTML titles. We utilize format information such as font size, position, and font weight as features in title extraction. Our method significantly outperforms the baseline method of using the lines in largest font size as title (20.9%-32.6 % improve...
Ten important guidelines to consider when optimizing your (X)HTML title tags for search/retrieval. T...
We present a novel method for open domain named entity extraction by exploiting the collective hidde...
The Title Tag is an HTML code. The text embedded in the title tag of a web page appears as the title...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Title...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents (web p...
In this paper, we present an analysis based on linguistic and typographic features that allows for t...
The HTML title tag information should identify and describe exactly what a web page contains. This p...
This is an interactive resource which demonstrates how to enter a website title in HTML. Students ar...
The HTML title tag information should identify and describe exactly what a web page contains. This p...
In this paper, we show how we can learn to select good words for a document title. We view the probl...
Automatic titling of text documents is an essential task for several applications (automatic heading...
Abstract. Titles are denoted by the TITLE element within a web page. We queried the title against th...
Abstract—Automatic titling (i.e. providing titles) is one of key domains of Web site accessibility. ...
This paper examines the feasibility of discovering "title-like" terms using a decision tree classifi...
In this paper we present an algorithm for automatic extraction of textual elements, namely titles an...
Ten important guidelines to consider when optimizing your (X)HTML title tags for search/retrieval. T...
We present a novel method for open domain named entity extraction by exploiting the collective hidde...
The Title Tag is an HTML code. The text embedded in the title tag of a web page appears as the title...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Title...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents (web p...
In this paper, we present an analysis based on linguistic and typographic features that allows for t...
The HTML title tag information should identify and describe exactly what a web page contains. This p...
This is an interactive resource which demonstrates how to enter a website title in HTML. Students ar...
The HTML title tag information should identify and describe exactly what a web page contains. This p...
In this paper, we show how we can learn to select good words for a document title. We view the probl...
Automatic titling of text documents is an essential task for several applications (automatic heading...
Abstract. Titles are denoted by the TITLE element within a web page. We queried the title against th...
Abstract—Automatic titling (i.e. providing titles) is one of key domains of Web site accessibility. ...
This paper examines the feasibility of discovering "title-like" terms using a decision tree classifi...
In this paper we present an algorithm for automatic extraction of textual elements, namely titles an...
Ten important guidelines to consider when optimizing your (X)HTML title tags for search/retrieval. T...
We present a novel method for open domain named entity extraction by exploiting the collective hidde...
The Title Tag is an HTML code. The text embedded in the title tag of a web page appears as the title...