Fuzzy search is often used in digital forensic investigations to find words that are stringologically similar to a chosen keyword. However, a common complaint is the high rate of false positives in big data environments. This chapter describes the design and implementation of cedas, a novel constrained edit distance approximate string matching algorithm that provides complete control over the types and numbers of elementary edit operations considered in approximate matches. The unique flexibility of cedas facilitates fine-tuned control of precision-recall trade-offs. Specifically, searches can be constrained to the union of matches resulting from any exact edit combination of insertion, deletion and substitution operations performed on the ...
Knowledge discovery in big data is one of the most important applications of computing machinery tod...
Over the past few years the popularity of approximate matching algorithms (a.k.a. fuzzy hashing) has...
The paper contains a new text searching method representing modification of the Boyer-Moore algorith...
The technical aspects of digital forensics are often dependent upon the progress made in other scien...
Fast similarity search is important for time-sensitive applications. Those include both enterprise a...
We survey the current techniques to cope with the problem of string matching that allows errors. Thi...
Approximate pattern matching entails finding approximate occurrences of a search pattern P in a sear...
The first step prior to data mining is often to merge databases from different sources. Entries in t...
Approximate pattern matching entails finding approximate occurrences of a search pattern P in a sear...
One of the initial hurdles in taking advantage of big data is the ability to quickly analyze and est...
Applying the noisy channel model to search query spelling correction requires an error model and a l...
One of the initial hurdles in taking advantage of big data is the ability to quickly analyze and est...
This paper proposes a new method for approximate string search, specifically candidate generation in...
For health and human services, fraud detection and other security services, identity resolution is a...
The obvious need for using modem computer networking capabilities to enable the effective sharing of...
Knowledge discovery in big data is one of the most important applications of computing machinery tod...
Over the past few years the popularity of approximate matching algorithms (a.k.a. fuzzy hashing) has...
The paper contains a new text searching method representing modification of the Boyer-Moore algorith...
The technical aspects of digital forensics are often dependent upon the progress made in other scien...
Fast similarity search is important for time-sensitive applications. Those include both enterprise a...
We survey the current techniques to cope with the problem of string matching that allows errors. Thi...
Approximate pattern matching entails finding approximate occurrences of a search pattern P in a sear...
The first step prior to data mining is often to merge databases from different sources. Entries in t...
Approximate pattern matching entails finding approximate occurrences of a search pattern P in a sear...
One of the initial hurdles in taking advantage of big data is the ability to quickly analyze and est...
Applying the noisy channel model to search query spelling correction requires an error model and a l...
One of the initial hurdles in taking advantage of big data is the ability to quickly analyze and est...
This paper proposes a new method for approximate string search, specifically candidate generation in...
For health and human services, fraud detection and other security services, identity resolution is a...
The obvious need for using modem computer networking capabilities to enable the effective sharing of...
Knowledge discovery in big data is one of the most important applications of computing machinery tod...
Over the past few years the popularity of approximate matching algorithms (a.k.a. fuzzy hashing) has...
The paper contains a new text searching method representing modification of the Boyer-Moore algorith...