Papers

Title: V-DIF: Virtual Data Integration Framework
Year of Publication: 2018
Publisher: International Journal of Computer Systems (IJCS)
ISSN: 2394-1065
Series: Volume 05, Number 5, May 2018
Authors: Ali Zidane El Qutaany, Ali Hamid El Bastawissy, Osman Hegazi

Citation:

Ali Zidane El Qutaany, Ali Hamid El Bastawissy, Osman Hegazi, "V-DIF: Virtual Data Integration Framework", In International Journal of Computer Systems (IJCS), pp: 26-32, Volume 5, Issue 5, May 2018. BibTeX

@article{key:article,
	author = {Ali Zidane El Qutaany, Ali Hamid El Bastawissy, Osman Hegazi},
	title = {V-DIF: Virtual Data Integration Framework},
	journal = {International Journal of Computer Systems (IJCS)},
	year = {2018},
	volume = {5},
	number = {5},
	pages = {26-32},
	month = {May}
	}


Abstract

Data Integration is the process of combining data residing at homogeneous, autonomous, and heterogeneous data sources, and providing users with a unified global schema GS. Users pose their queries in terms of the GS, and they expect accurate, complete and unambiguous answers. Data integration system processes users’ queries transparently, by translating each query to a set of sub-queries over the participating local sources LSs through the mappings defined between the GS and LSs. Even if none of the participating data sources have internal inconsistencies; mutual inconsistencies appear in the answers of the users’ queries due to the integration process. To ensure the unambiguity in answers, the data integration process should be followed by detecting and resolving such inconsistencies. Most of the data integration frameworks introduced in the literature concentrate mainly on data integration process and avoid or ignore the other two processes (inconsistency detection and resolution). A few frameworks consider detecting and resolving the inconsistencies but don’t consider the interfacing or linkage between the three processes. Interfacing means each process tries to serve the successive process through preparing the parameters needed for such process. We developed a Virtual – Data Integration Framework (V-DIF) and tested it over 8 heterogeneous information sources. VDIF meets most of the users’ expectations. In this article the theoretical part of the framework is introduced to ensure the interfacing between the three processes.

References

[1]W. H. Inmon, The Evolution of Integration, White Paper (consulting services), 2007.
[2] Amineh Amini, Hadi Saboohi, Nasser Nematbakhs. “A RDF-based Data Integration Framework”, National Electrical Engineering Conference (NEEC) 2008, Najafabad, Iran, March 2008.
[3] J. Berlin and A. Motro. "Autoplex: Automated Discovery of Content for Virtual Databases", SIGMOD Record, Vol. 33, No. 4, December 2005.
[4] L. Libkin, and C. Sirangelo. Data Exchange and Schema Mappings in Open and Closed Worlds. In the proc. of PODS’10.Vancouver, BC, Canada. 2010.
[5] P. G. Kolaitis. Schema Mappings, Data Exchange, and Metadata Management. In the proc. of PODS’05, Baltimore, Maryland, 2005.
[6] P. Anokhin, and A. Motro. Fusionplex: Resolution of Data Inconsistencies in the Integration of Heterogeneous Information Sources, Tech. Rep. ISETR 03 -06. Department of Information and Software Engineering. George Mason University, USA. 2003 [7] A Motro, J Berlin, P Anokhin. “Multiplex, Fusionplex and Autoplex: three generations of information integration”, ACM SIGMOD Record, 2004
[8] E. Rahm and H. Hai Do:"Data Cleaning: Problems and Current Approaches". IEEE Bull. Of the Tech Comm on Data Eng, Vol 23, 2001.
[9] A. Arasu, S. Chaudhuri, Z. Chen, K. Ganjam, R.Kaushik, and V.Narasayya. Towards a Domain Independent Platform for Data Cleaning. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2011.
[10] A. K. Elmagaramid, P.G.Lpeirotis, and V.S. Verykios. “Duplicate Record Detection: A survey. IEEE Computer Society. 2007 [11] J.Bleiholder and F.Naumann. Data fusion. ACM Computing Survey, 41(1):1–41, 2008.
[12] Xin Dong, Felix Naumann. “Data Fusion – Resolving Data Conflicts for Integration”. VLDB ‘09, August 24-28, 2009, Lyon, France.
[13] J. Bleiholder, M.Herschel, and F. Naumann, “Eliminating NULLs with Subsumption and Complementation” .Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, page2 18-25, 2011
[14] J. Bleiholder and F. Neumann, “Conflict Handling Strategies in an Integrated Information System”, WWW2006, Edinburgh, UK. 2006.
[15] A.Z. El Qutaany, A.H. El Bastawissy, and O. Hegazy. “A Technique for Mutual Inconsistencies Detection and Resolution in the Virtual Data Integration Environments”. In the proceedings of INFOS, Cairo, Egypt, 2010.
[16] M. Lenzerini.” Principles of Peer to Peer Data Integration”. In the proceedings of DIWEB, Riga, Latvia June 2004.
[17] L. M. Haas. “Beauty and the beast: The theory and practice of information integration”. In Proc. of ICDT, pages 28–43, 2007.
[18] A. Y. Halevy, A. Rajaraman, and J. J. Ordille. “Data integration: The teenage years”. In Proc. of VLDB, pages 9–16, 2006. [19] M. Lenzerini. Data integration: A theoretical perspective. In Proceedings of PODS. Roma, Italy. 2002.
[20] A.Cali, D.Lembo, and R.Rosati, “Query rewriting and answering under constraints in data integration systems”. In the proceedings of IJCAT2003, 2003.
[21] P.J. McBrien and A. Poulovassilis. “Data integration by bidirectional schema transformation rules”. In Proceedings of ICDE03. IEEE, 2003.
[22] J. Lin and A. O. Mendelzon. “Merging databases under constraints”. Int. Jour. of Cooperative Inf. Sys., 7(1):55–76, 1998. [23] N. Leone et all, “INFOMIX System for Advanced Information Integration” SIGMOD ’05 Baltimore, Maryland USA , 2005
[24] A. Fuxman and R. J. Miller. “First-Order Query Rewriting for Inconsistent Databases”. In ICDT, pages 337-351, 2005. [25] A. Fuxman, E. Fazli, R. J.Miler. ConQuer: Efficient Management of Inconsistent Databases, SIGMOD, Baltimore, Maryland, USA. 2005.
[26] Y. Papakonstantinou, S. Abiteboul, and H. G. Molina. Object fusion in mediator systems. Proc. of the 22nd Int. Conf. on Very Large Data Bases (VLDB’96), pages 413–424, 1996.
[27] F.Naumann, M.Haaussler. Declarative data merging with conflict resolution, in proceddings of the 7th international conference of Information Quality (IQ), 2002.
[28] L.L. Yan and M. T. Ozsu. Conflict tolerant queries in AURORA. In Proceedings of the 7th International Conference on Cooperative Information Systems (CoopIS’99), Edinburgh, United Kingdom, pages 279–290, 1999.
[29] G. D. Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tackling Inconsistencies in Data Integration through Source Preferences, IQIS 2004, Maison de la Chimie, Paris, France, 2004.
[30] A. Bilke, F. Naumann, J. Bleiholder, Christoph Bohm, Karsten Draba, "Automatic Data Fusion with HumMer", Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005.
[31] J. Bleiholder, k. Draba, and F. Naumann. FuSem- Exploring Different Semantics of Data Fusion. VLDB'07, September 23-28, Vienna, Australia. 2007.
[32] A. Motro. Multiplex: A Formal Model for Multidatabases and Its Implementation. In Proceedings of NGITS-99, 4th International Workshop on Next Generation Information Technologies and Systems. Volume1649, pages138–158. Springer-Verlag, 1999.
[33] F. Naumann, A. Bilke, J. Beliholder, and M. Weis. Data fusion in three steps: Resolving schema, tuple, and value inconsistencies. IEEE Data Eng. Bull. Pages 21–31, 2006.
[34] S. Maitra and R. Wason, “A Naive Approach for Handling Uncertainty Inherent in Query Optimization of Probabilistic Databases Institute of Information Technology and Management,” 2011.
[35] X. Lian, L. Chen, and S. Song, “Consistent query answers in inconsistent probabilistic databases,” Proceedings of the 2010 international conference on Management of data - SIGMOD ’10, vol. 8, p. 303, 2010.
[36] P. Andritsos, a. Fuxman, and R. J. Miller, “Clean Answers over Dirty Databases: A Probabilistic Approach,” 22nd International Conference on Data Engineering (ICDE’06), pp. 30–30, 2006.
[37] Y. Katsis, A. Deutsch, Y. Papakonstantinou, and V. Vassalos, “Inconsistency Resolution in Online Databases,” in In Data Engineering (ICDE), 2010 IEEE 26th International Conference on IEEE, 2010, pp. 1205–1208.
[38] X. Wang, H. LIN-PENG, X.-H. XU, Y. ZHANG, and J.-Q. CHEEN, “A Solution for Data Inconsistency in Data Integration,” Journal of information science and engineering, vol. 27, pp. 681–695, 2011.
[39] J. Bleiholder and F. Naumann, “Data fusion,” ACM Computing Surveys, vol. 41, no. 1, pp. 1–41, Dec. 2008.
[40] F. Panse and N. Ritter, “Tuple Merging in Probabilistic Databases,” in In Proceedings of the fourth International Workshop on Management of Uncertain Data (MUD),Singapur, 2010, pp. 113–127.
[41] W. Fan, F. Geerts, N. Tang, and W. Yu, “Inferring Data Currency and Consistency for Conflict Resolution,” ICDE, 2013
[42] P. N. Mendes, H. Mühleisen, and C. Bizer, “Sieve: Linked Data Quality Assessment and Fusion,” Proceedings of the 2012 Joint EDBT/ICDT Workshops, pp. 116–123, 2012.
[43] Andreas Schultz, Andea Matteini, and Robert Isele. “LDIF - A Framework for Large-Scale Linked Data Integration”, WWW2012 Developer Track, April 18-20, 2012, Lyon, France.
[44] Isele, R., Jentzsch, A., Bizer, B.: Silk Server - Adding missing Links while consuming Linked Data. 1st International Workshop on Consuming Linked Data (COLD 2010), Shanghai, November 2010.
[45] Mendes, P., M¨uhleisen, H., Bizer, C.: Sieve - Linked Data Quality Assessment and Fusion. 2nd International Workshop on Linked Web Data Management (LWDM 2012), Berlin, March 2012.


Keywords

Data Integration, virtual integration, detectors, data fusion, duplicate and inconsistency detection, duplicate and inconsistency resolution.