Identifying Duplicate Records by Using Estimation of Distribution Algorithms to Learn the Semantics

Publish Year: 1384
نوع سند: مقاله کنفرانسی
زبان: English
View: 1,251

This Paper With 8 Page And PDF Format Ready To Download

  • Certificate
  • من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

شناسه ملی سند علمی:

ACCSI11_218

تاریخ نمایه سازی: 5 آذر 1390

Abstract:

When data is gathered from various sources to be included in integrated information systems, for example data warehouses, the likelihood of existence of duplicate and inconsistent data records increases. A flexible and automatic reasoning mechanism is required to clean the data, to enable the user to draw accurate statistics and reports from this wealth of data, which are to be used in the decision making of entrepreneurial enterprises. In this paper, we have employed an approach for deduplication, which takes advantage of a fuzzy logic framework. The fuzzy inference system is then optimized by means of the Bayesian Optimization Algorithm, a class of Estimation of Distribution Algorithms, which can learn complex multivariate relations of bounded order. This class of algorithms is inspired form the breeder genetic algorithm, which is used in the science of livestock breeding. The experiments reveal that this approach is capable of eliminating duplicates abound with uncertainty, and therefore the resultant data is of better quality.

Keywords:

Duplicate Elimination , Estimation of Distribution Algorithms , Fuzzy Inference System

Authors

Saied Haidarian Shahri۱

Control and Intelligent Processing Center of Excellence (CIPCE)Department of Computer and Electrical Engineering University of Tehran, Tehran, Iran

Caro Lucas

Control and Intelligent Processing Center of Excellence (CIPCE)Department of Computer and Electrical Engineering University of Tehran, Tehran, Iran

Babak N. Araabi۱,۲

School of Cognitive Sciences, Institute for studies in theoretical Physics and Mathematics, Tehran, Iran

مراجع و منابع این Paper:

لیست زیر مراجع و منابع استفاده شده در این Paper را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود Paper لینک شده اند :
  • Baluja, S., Davies, S.: Using optimal dep endency-trees for combinatorit ...
  • Baluja, S: Population -based incremental learning: A method for integrating ...
  • Buntine, W.: Operations for learning with graphical models. J. of ...
  • Larraiaga, P., Lozano, J. A.: Estimation of Distribution Algorithms, A ...
  • Low, _ Lee, M.L., Ling, T.W.: A Knowledge-b ased Approach ...
  • Mamdani, E.H.: Advances in Linguistic Synthesis of Fuzzy Controllers. Int. ...
  • Monge, A.E., , Elkan, P.C.: An Efficient Domain- independent Algorithm ...
  • Mihlenbein, H., Mahning, T.: FDA - a scalable ...
  • evolutionary algorithm for the optimization of additively decomposed functions. Evolutionary ...
  • Mihlenbein, H.: The equation for response to selection and its ...
  • Pearl, J.: Probabilistic Reasoning in Intelligent Systems. San Francisco, Calif.: ...
  • Pelikan, M., Goldberg, D. E., Cant1-Paz E. BOA: The Bayesian ...
  • Pelikan, M., Mihlenbein, H.: The bivariate marginal distribution algorithm. In ...
  • Engineering Design and Manufacturing, pages 521-535, London, S pringer-Verlag (1999) ...
  • Rahm, E., Do, H.H.: Data Cleaning: Problems and Current Approaches. ...
  • Raman, V., Hellerstein, J.M.: Potter's Wheel: An Interactive Data Cleaning ...
  • Syswerda, G.: Simulated crossover in genetic algorithms. In Foundations of ...
  • Winkler, W.E.: The State of Record Linkage and Current Research ...
  • Cohen, W., Ravikumar, P., Fienberg, S.: A Comparison of String ...
  • de Bonet, J. S., Isbell, C. L., Viola, P.: MIMIC: ...
  • Etxeberria, R., Larraiaga, P.: Global optimization with Bayesian networks. In ...
  • Galhardas, H., Florescu, D., et al.: Declarative Data ...
  • Cleaning: Language, Mode] and Algorithms. In Proc. of the 27th ...
  • Goldberg, D. E.: Genetic Algorithms in Search, ...
  • Optimization, and Machine Learning. Addi son-Wesley _ Reading, Massachusetts, USA ...
  • Haidarian S., H., A.A. Barforush: A Flexible Fuzzy Expert System ...
  • Harik, G., Lobo, F. G., Golberg, D. E.: The compact ...
  • Harik, G.: Linkage learning via probabilistic modeling in the EcGA. ...
  • Heckerman, D. Geiger, D., Chickering, M.: Learning Bayesian networks: The ...
  • Hernandez, M.A., Stolfo, S.J.: Rea1-world Data is Dirty: Data Cleansing ...
  • Holland, J. H.: Adaptation in Natural and Artificial Systems. The ...
  • Inza, I., Merino, M., Larranaga, P., Quiroga, J., Sierra, B., ...
  • Jang, J.S.R.: ANFIS: Adaptive Network-based Fuzzy Inference Systems. IEEE Transactions ...
  • Jensen, F.: An Introduction to Bayesian Network. Springer (1996) ...
  • Koza, J. R.: Genetic Programming: On the Programming of Computers ...
  • نمایش کامل مراجع