was to automate the task of identifying the activities and skills of a collection of enterprises, namely Belgian and French open source companies. In order to avoid manual annotation through visual analysis of the websites ’ content, a tool chain was developed to collect the content of websites and extract the important terms. Standard software libraries were identified, allowing to clean up HTML documents and to perform the part-of-speech tagging process used for extracting terminology. This procedure is supplemented by the extraction and the recognition of named entities. The terms extracted in the HTML pages of a company website were then merged and filtered and a circular tags cloud was generated. This presentation facilitates the ident...
Nowadays, society finds itself in the so called “Information Age”, in which the combination of expon...
This paper demonstrates a method to transform and link textual information scraped from companies' w...
While NLP tools are now widely available, their use can be problematic considering the lack of homog...
This paper presents a system that uses the domain name of a German business website to locate its in...
Abstract. Personalization is increasingly vital especially for enterprises to be able to reach their...
We describe an application of information extraction from company websites focusing on product offer...
Enterprises provide professionally authored content about their products/services in different langu...
The aim of this thesis is training named entity recognition model on a dataset created using structu...
This dissertation proposes a method and a system for the identification of entities (persons, locati...
This paper demonstrates a method to transform and link textual information scraped from companies' w...
This paper demonstrates a method to transform and link textual information scraped from companies' w...
The presented thesis deals with the task of automatic information extraction from HTML documents for...
One of the results of modern era is a massive production and usage of manifold electronic resources....
This paper addresses the problem of categorizing terms or lexical entities into a predefined set of ...
Regional Innovation Systems describe the relations between actors, structures and infrastructures i...
Nowadays, society finds itself in the so called “Information Age”, in which the combination of expon...
This paper demonstrates a method to transform and link textual information scraped from companies' w...
While NLP tools are now widely available, their use can be problematic considering the lack of homog...
This paper presents a system that uses the domain name of a German business website to locate its in...
Abstract. Personalization is increasingly vital especially for enterprises to be able to reach their...
We describe an application of information extraction from company websites focusing on product offer...
Enterprises provide professionally authored content about their products/services in different langu...
The aim of this thesis is training named entity recognition model on a dataset created using structu...
This dissertation proposes a method and a system for the identification of entities (persons, locati...
This paper demonstrates a method to transform and link textual information scraped from companies' w...
This paper demonstrates a method to transform and link textual information scraped from companies' w...
The presented thesis deals with the task of automatic information extraction from HTML documents for...
One of the results of modern era is a massive production and usage of manifold electronic resources....
This paper addresses the problem of categorizing terms or lexical entities into a predefined set of ...
Regional Innovation Systems describe the relations between actors, structures and infrastructures i...
Nowadays, society finds itself in the so called “Information Age”, in which the combination of expon...
This paper demonstrates a method to transform and link textual information scraped from companies' w...
While NLP tools are now widely available, their use can be problematic considering the lack of homog...