In large integration projects one is often confronted with poorly documented databases. One possibility to gather information on database schemas is to search for inclusion dependencies (IND). These provide a solid basis for deducing foreign key constraints—as they are pre-condition for potential (semantically valid but missing) foreign key constraints. In this paper we present and compare several algorithms to identify unary INDs. The obvious way is to utilize an appropriate SQL statement on each potential IND to test its satisfiedness. We show that this approach is not efficient enough for large databases. Therefore, we developed database-external approaches that are up to several magnitudes faster than a SQL based approach. We tested our...
Data integration in the life sciences is currently implemented by very costly manually curated proje...
International audienceDeclarative pattern mining implies to define common frameworks and atomic oper...
Data Mining (DM) represents the process of extracting interesting and previously unknown knowledge f...
Large data integration projects must often cope with undocumented data sources. Schema discovery aim...
Data sources for data integration often come with spurious schema definitions such as undefined fore...
International audienceForeign keys form one of the most fundamental constraints for relational datab...
International audienceForeign keys form one of the most fundamental constraints for relational datab...
International audienceInclusion dependencies together with functional dependencies form the most imp...
National audienceInclusion dependencies together with functional dependencies form the most fundamen...
Relational database schemas must be semantically enriched to reflect knowledge about the data, as ne...
Relational database schemas must be semantically enriched to reflect knowledge about the data, as ne...
Functional dependencies (FDs) and inclusion dependencies (INDs) are the most fundamental database in...
Since real world databases are known to be very large, they raise problems of the access. Therefore,...
Determining relationships such as functional or inclusion dependencies within and across databases i...
Matching dependencies (MDs) are recently proposed for various data quality applications such as dete...
Data integration in the life sciences is currently implemented by very costly manually curated proje...
International audienceDeclarative pattern mining implies to define common frameworks and atomic oper...
Data Mining (DM) represents the process of extracting interesting and previously unknown knowledge f...
Large data integration projects must often cope with undocumented data sources. Schema discovery aim...
Data sources for data integration often come with spurious schema definitions such as undefined fore...
International audienceForeign keys form one of the most fundamental constraints for relational datab...
International audienceForeign keys form one of the most fundamental constraints for relational datab...
International audienceInclusion dependencies together with functional dependencies form the most imp...
National audienceInclusion dependencies together with functional dependencies form the most fundamen...
Relational database schemas must be semantically enriched to reflect knowledge about the data, as ne...
Relational database schemas must be semantically enriched to reflect knowledge about the data, as ne...
Functional dependencies (FDs) and inclusion dependencies (INDs) are the most fundamental database in...
Since real world databases are known to be very large, they raise problems of the access. Therefore,...
Determining relationships such as functional or inclusion dependencies within and across databases i...
Matching dependencies (MDs) are recently proposed for various data quality applications such as dete...
Data integration in the life sciences is currently implemented by very costly manually curated proje...
International audienceDeclarative pattern mining implies to define common frameworks and atomic oper...
Data Mining (DM) represents the process of extracting interesting and previously unknown knowledge f...