De-identifying textual data is an important task for publishing and sharing the data among researchers while protecting privacy of indi-viduals referenced therein. While supervised learning approaches are successfully applied to the task in the clinical domain, existing methods are hard to transfer to different do-mains and languages because they require a considerable cost and time for preparation of linguistic resources. This paper presents an efficient unsupervised algorithm to detect all substrings occurring less than k times in the input string, based on the assumption that such rare sequences are likely to contain sensitive information such as names of people and rare diseases that may identify individuals. The proposed algorithm work...
We study the problem of efficiently removing equal frequency n-gram substrings from an n-gram set, f...
International audienceString data are often disseminated to support applications such as location-ba...
Abstract. Finding similar substrings/substructures is a central task in analyzing huge amounts of st...
We propose new frequent substring pattern mining which can enumerate all substrings with statistical...
We propose new frequent substring pattern mining which can enumerate all substrings with statistical...
8siString data are often disseminated to support applications such as location-based service provisi...
String data are often disseminated to support applications such as location-based service provision ...
String data are often disseminated to support applications such as location-based service provision ...
Since early stages of bioinformatics, substrings played a crucial role in the search and discovery o...
In this paper we study the problem of estimating the number of occurrences of substrings in textual ...
Strings are used to model genomic, natural language, and web activity data, and are thus often share...
Strings are used to model genomic, natural language, and web activity data, and are thus often share...
Part 3: MHDWInternational audienceAn increasing number of applications, in domains ranging from bio-...
Strings are used to model genomic, natural language, and web activity data, and are thus often share...
Strings are used to model genomic, natural language, and web activity data, and are thus often share...
We study the problem of efficiently removing equal frequency n-gram substrings from an n-gram set, f...
International audienceString data are often disseminated to support applications such as location-ba...
Abstract. Finding similar substrings/substructures is a central task in analyzing huge amounts of st...
We propose new frequent substring pattern mining which can enumerate all substrings with statistical...
We propose new frequent substring pattern mining which can enumerate all substrings with statistical...
8siString data are often disseminated to support applications such as location-based service provisi...
String data are often disseminated to support applications such as location-based service provision ...
String data are often disseminated to support applications such as location-based service provision ...
Since early stages of bioinformatics, substrings played a crucial role in the search and discovery o...
In this paper we study the problem of estimating the number of occurrences of substrings in textual ...
Strings are used to model genomic, natural language, and web activity data, and are thus often share...
Strings are used to model genomic, natural language, and web activity data, and are thus often share...
Part 3: MHDWInternational audienceAn increasing number of applications, in domains ranging from bio-...
Strings are used to model genomic, natural language, and web activity data, and are thus often share...
Strings are used to model genomic, natural language, and web activity data, and are thus often share...
We study the problem of efficiently removing equal frequency n-gram substrings from an n-gram set, f...
International audienceString data are often disseminated to support applications such as location-ba...
Abstract. Finding similar substrings/substructures is a central task in analyzing huge amounts of st...