Companies often process invoices manually, therefore automation could reduce manual labor. The aim of this thesis is to evaluate which OCR-engine, Tesseract or OCRopus, performs best at interpreting invoices. This thesis also evaluates if it is possible to use machine learning to automatically process invoices based on previously stored data. By interpreting invoices with the OCR-engines, it results in the output text having few spelling errors. However, the invoice structure is lost, making it impossible to interpret the corresponding fields. If Naïve Bayes is chosen as the algorithm for machine learning, the prototype can correctly classify recurring invoice lines after a set of data has been processed. The conclusion is, neither of the t...
Information Extraction is a sub-field of Natural Language Processing that aims to extract structured...
Previous research has compared the performance of OCR (optical character recognition) engines strict...
The ability to automatically read, recognize, and extract different information from unstructured te...
Companies often process invoices manually, therefore automation could reduce manual labor. The aim o...
Large corporations receive and send large volumes of invoices containing various fields detailing a ...
When an organization receives invoices, accountants specify accounts and cost centers related to the...
Manually typing long numbers on paper invoices is tedious and timeconsuming work. The task of typing...
Companies with more than 3 million SEK in revenue per year are by law in Sweden required to bookkeep...
Purchase invoice processing is generally considered as one of the most burdensome processes in finan...
Optical character recognition (Optical Character Recognition (OCR)) has many applications, such as...
Even when optical character recognition has been researched since first half of twentieth century, i...
Att manuellt hantera och klassificera stora mängder textdokument tar mycket tid och kräver mycket pe...
The study has a purpose designed to investigate within which framework automation of accounting and ...
With the development in the area of machine learning, society has become more dependent on applicati...
Information Extraction is a sub-field of Natural Language Processing that aims to extract structured...
Previous research has compared the performance of OCR (optical character recognition) engines strict...
The ability to automatically read, recognize, and extract different information from unstructured te...
Companies often process invoices manually, therefore automation could reduce manual labor. The aim o...
Large corporations receive and send large volumes of invoices containing various fields detailing a ...
When an organization receives invoices, accountants specify accounts and cost centers related to the...
Manually typing long numbers on paper invoices is tedious and timeconsuming work. The task of typing...
Companies with more than 3 million SEK in revenue per year are by law in Sweden required to bookkeep...
Purchase invoice processing is generally considered as one of the most burdensome processes in finan...
Optical character recognition (Optical Character Recognition (OCR)) has many applications, such as...
Even when optical character recognition has been researched since first half of twentieth century, i...
Att manuellt hantera och klassificera stora mängder textdokument tar mycket tid och kräver mycket pe...
The study has a purpose designed to investigate within which framework automation of accounting and ...
With the development in the area of machine learning, society has become more dependent on applicati...
Information Extraction is a sub-field of Natural Language Processing that aims to extract structured...
Previous research has compared the performance of OCR (optical character recognition) engines strict...
The ability to automatically read, recognize, and extract different information from unstructured te...