Towards Comparative Analysis of Resumption Techniques in ETL

Authors

DOI:

https://doi.org/10.24002/ijis.v3i2.3776

Keywords:

data warehouse, block based, resumption, failed ETL.

Abstract

Data warehouses are loaded with data from sources such as operational data bases. Failure of loading process or failure of any of the process such as extraction or transformation is expensive because of the non-availability of data for analysis. With the advent of e-commerce and many real time application analysis of data in real time becomes a norm and hence any misses while the data is being loaded into data warehouse needs to be handled in an efficient and optimized way. The techniques to handle failure of process to populate the data are very much important as the actual loading process. Alternative arrangement needs to be made for in case of failure so that processes of populating the data warehouse are done in time. This paper explores the various ways through which a failed process of populating the data warehouse could be resumed. Various resumption techniques are compared and a novel block based technique is proposed to improve one of the existing resumption techniques.

References

P. Vassiliadis, A. Simitsis, and S. Skiadopoulos, “Conceptual modeling for ETL processes,” ACM Int. Work. Data Warehous. Ol., pp. 14–21, 2002, doi: 10.1145/583890.583893.

P. Vassiliadis and A. Simitsis, Near Real Time ETL, vol. 3. 2009.

K. Kakish and T. a Kraft, “ETL Evolution for Real-Time Data Warehousing,” Proc. Conf. Inf. Syst. Appl. Res., pp. 1–12, 2012.

A. Simitsis, P. Vassiliadis, and T. Sellis, “Optimizing ETL processes in data warehouses,” Proc. - Int. Conf. Data Eng., no. June 2014, pp. 564–575, 2005, doi: 10.1109/ICDE.2005.103.

U. Dayal, M. Castellanos, A. Simitsis, and K. Wilkinson, “Data integration flows for Business Intelligence,” Proc. 12th Int. Conf. Extending Database Technol. Adv. Database Technol. EDBT’09, pp. 1–11, 2009, doi: 10.1145/1516360.1516362.

R. E. S. B. Navathe, Database Systems. 2016.

W. J. Labio, J. L. Wiener, H. Garcia-Molina, and V. Gorelik, “Efficient resumption of interrupted warehouse loads,” pp. 46–57, 2000, doi: 10.1145/342009.335379.

M. Gorawski and P. Marks, “High efficiency of hybrid resumption in distributed data warehouses,” Proc. - Int. Work. Database Expert Syst. Appl. DEXA, vol. 2006, pp. 323–327, 2005, doi: 10.1109/DEXA.2005.108.

M. Gorawski and P. Marks, “Checkpoint-based resumption in data warehouses,” IFIP Int. Fed. Inf. Process., vol. 227, pp. 313–323, 2006, doi: 10.1007/978-0-387-39388-9_30.

J. Huang and C. Guo, “An MAS-based and fault-tolerant distributed ETL workflow engine,” Proc. 2012 IEEE 16th Int. Conf. Comput. Support. Coop. Work Des. CSCWD 2012, pp. 54–58, 2012, doi: 10.1109/CSCWD.2012.6221797.

S. Tu and L. Zhu, “An optimized etl fault-tolerant algorithm in data warehouses,” 2013 IEEE 3rd Int. Conf. Inf. Sci. Technol. ICIST 2013, pp. 484–487, 2013, doi: 10.1109/ICIST.2013.6747594.

D. Lozinski, “Fastest-way-to-insert-new-records-where-one-doesnt-already-exist,” The curious consultant, 2015. .

C. Morehouse, “Restratability in PDI,” Hitachi, 2019. .

A. Simitsis, K. Wilkinson, U. Dayal, and M. Castellanos, “Optimizing ETL Workflows for Fault-Tolerance,” 2010.

“Transaction Processing Council,” 2020. [Online]. Available: http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v2.13.0.pdf.

S. H. A. El-Sappagh, A. M. A. Hendawi, and A. H. El Bastawissy, “A proposed model for data warehouse ETL processes,” J. King Saud Univ. - Comput. Inf. Sci., 2011, doi: 10.1016/j.jksuci.2011.05.005.

Stitchdata, “ETL Transforms,” Talend, 2019. .

J. VANLIGHTLY, “Building-synkronizr-a-sql-server-data-synchronizer-tool-part-1,” RabbitMQ, 2016. [Online]. Available: https://jack-vanlightly.com/blog/2016/11/12/building-synkronizr-a-sql-server-data-synchronizer-tool-part-1.

Downloads

Published

2021-02-15

How to Cite

Muddasir, M., K, R., & R, D. (2021). Towards Comparative Analysis of Resumption Techniques in ETL. Indonesian Journal of Information Systems, 3(2), 82–93. https://doi.org/10.24002/ijis.v3i2.3776