Sensitive Unclassified information is defined as any unclassified information that may cause adverse consequences against the government facilities. In this chapter, we explore the use of categorization techniques and information extraction to discover this kind of information in scanned documents. We show here that the combined use of a K-Dependence Bayesian categorization engine and a semi-automated review application reduce by nearly 95% the number of man hours required to redact sensitive unclassified information. We also discuss and provide statistics on how OCR errors can affect the information extraction tasks
In this paper we use information retrieval metrics to evaluate the effect of a document sanitization...
The volume of information, generated each day is increasing at a staggering rate. Much of this ever-...
With the advent of internet, large numbers of text documents are published and shared every day.Each...
Sensitive Unclassified information is defined as any unclassified information that may cause adverse...
In this paper, we report on the identification of document type using a k-dependence Bayesian catego...
International audienceThis paper presents a complete system able to categorize handwritten documents...
The sensitivity review of government records is essential before they can be released to the officia...
In a system where medical paper document images have been converted to a digital format by a scannin...
OCR error has been shown not to affect the average accuracy of text retrieval or text categorization...
The digital processing of electronic documents is widely exploited across many domains to improve th...
The U.S. government protects a massive number of documents as part of its Official Security Classifi...
This thesis is about the identification of unintelligible documents using machine learning technique...
Information extraction is a process of extracting relevant data in a specified structured format fro...
Abstracf- Modern computer networks make it possible to distribute documents quickly and economically...
Modern international trade activities rely heavily on thousands of daily information artifacts repor...
In this paper we use information retrieval metrics to evaluate the effect of a document sanitization...
The volume of information, generated each day is increasing at a staggering rate. Much of this ever-...
With the advent of internet, large numbers of text documents are published and shared every day.Each...
Sensitive Unclassified information is defined as any unclassified information that may cause adverse...
In this paper, we report on the identification of document type using a k-dependence Bayesian catego...
International audienceThis paper presents a complete system able to categorize handwritten documents...
The sensitivity review of government records is essential before they can be released to the officia...
In a system where medical paper document images have been converted to a digital format by a scannin...
OCR error has been shown not to affect the average accuracy of text retrieval or text categorization...
The digital processing of electronic documents is widely exploited across many domains to improve th...
The U.S. government protects a massive number of documents as part of its Official Security Classifi...
This thesis is about the identification of unintelligible documents using machine learning technique...
Information extraction is a process of extracting relevant data in a specified structured format fro...
Abstracf- Modern computer networks make it possible to distribute documents quickly and economically...
Modern international trade activities rely heavily on thousands of daily information artifacts repor...
In this paper we use information retrieval metrics to evaluate the effect of a document sanitization...
The volume of information, generated each day is increasing at a staggering rate. Much of this ever-...
With the advent of internet, large numbers of text documents are published and shared every day.Each...