Title: A Tool for Transformation of PDF to Text Author: Jonáš Bujok Department: Institute of Formal and Applied Linguistics (32-UFAL) Supervisor: Mgr. Jan Raab, Institute of Formal and Applied Linguistics (32-UFAL) Abstract: In this thesis we described an extraction procedure of text information from PDF (Portable Document Format) files. Thesis is focused mainly on middle-Europe languages. We designed, described and implemented program for this purpose. Besides the program and it's description the thesis contains information about PDF format object structure, it's syntax and logic necessary for proper understanding of text searching principles in PDF file. We also discussed filters, fonts and all other PDF Objects that the program need to p...
English: This Final Degree Project (FDP) is a collaboration with a Free Software project: GNU PDF. T...
The article discusses the algorithms for detecting and extracting lines, paragraphs with their prope...
In this poster we present a recent extension of the OntoGene text mining utilities, which enables th...
Title: A Tool for Transformation of PDF to Text Author: Jonáš Bujok Department: Institute of Formal ...
Bachelor's thesis is concerned with text extraction from PDF dokument which contains mainly multi-co...
Documents in PDF format are nowadays called the Universal document format. PDF to speech converter s...
The purpose of the thesis was to create an application that works as an assistive tool for translati...
The Concordia INdexing and DIscovery system (CINDI) is an information discovery and retrieval system...
Interest in the new publishing phenomenon known as e-book has grown enormously in last few years. Th...
Extracting text out of PDF documents is never an easy task when a higher degree of accuracy and cons...
Information can include text, pictures and signatures that can be scanned into a document format, su...
Result of my diploma work is library for Java programming language. Which transform PDF to XHTML fil...
Text preprocessing and segmentation are critical tasks in search and text mining applications. Due t...
Comunicació presentada a la Language Resources and Evaluation Conference (LREC) 2018, celebrada els ...
The purpose of this article is to demonstrate a practical use case of PDF/A file format for digitiza...
English: This Final Degree Project (FDP) is a collaboration with a Free Software project: GNU PDF. T...
The article discusses the algorithms for detecting and extracting lines, paragraphs with their prope...
In this poster we present a recent extension of the OntoGene text mining utilities, which enables th...
Title: A Tool for Transformation of PDF to Text Author: Jonáš Bujok Department: Institute of Formal ...
Bachelor's thesis is concerned with text extraction from PDF dokument which contains mainly multi-co...
Documents in PDF format are nowadays called the Universal document format. PDF to speech converter s...
The purpose of the thesis was to create an application that works as an assistive tool for translati...
The Concordia INdexing and DIscovery system (CINDI) is an information discovery and retrieval system...
Interest in the new publishing phenomenon known as e-book has grown enormously in last few years. Th...
Extracting text out of PDF documents is never an easy task when a higher degree of accuracy and cons...
Information can include text, pictures and signatures that can be scanned into a document format, su...
Result of my diploma work is library for Java programming language. Which transform PDF to XHTML fil...
Text preprocessing and segmentation are critical tasks in search and text mining applications. Due t...
Comunicació presentada a la Language Resources and Evaluation Conference (LREC) 2018, celebrada els ...
The purpose of this article is to demonstrate a practical use case of PDF/A file format for digitiza...
English: This Final Degree Project (FDP) is a collaboration with a Free Software project: GNU PDF. T...
The article discusses the algorithms for detecting and extracting lines, paragraphs with their prope...
In this poster we present a recent extension of the OntoGene text mining utilities, which enables th...