Title: Review Paper on Outlier Detection Techniques
Year of Publication: 2015
Publisher: International Journal of Computer Systems (IJCS)
ISSN: 2394-1065
Series: Volume 2, Number 11
Authors: Kanchan D. Shastrakar, Professor Pravin G.Kulurkar


Kanchan D. Shastrakar, Pravin G.Kulurkar , "Review Paper on Outlier Detection Techniques", International Journal of Computer Systems (IJCS), 2(11), pp: 517-519, November 2015. BibTeX

	author = {Kanchan D.Shastrakar, Professor Pravin G.Kulurkar },
	title = {Review Paper on Outlier Detection Techniques},
	journal = {International Journal of Computer Systems (IJCS)},
	year = {2015},
	volume = {2},
	number = {11},
	pages = {517-519},
	month = {November}


Outlier Mining is an important task of discovering the data records which have an exceptional behaviour comparing with other records in the remaining dataset. Outliers do not follow with other data objects in the dataset. There are many effective approaches to detect outliers in numerical data. Most of the earliest work on outlier detection was performed by the statistics community on numeric data. But for categorical dataset there are limited approaches By using NAVF (Normally distributed attribute value frequency) and ROAD (Ranking-based Outlier Analysis and Detection algorithm) and new hybrid approach for outlier detection in categorical dataset will be formed.


[1] M. E. Otey, A. Ghoting, and A. Parthasarathy, "Fast Distributed Outlier Detection in Mixed-Attribute Data Sets," Data Mining and Knowledge Discovery He, Z., Deng, S., Xu, X., “A Fast Greedy algorithm for outlier mining”, Proc. of PAKDD, 2006.
[2] I. S. Jacobs and C. P. Bean, “Fine particles, thin films and exchange anisotropy,” in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271–350.
[3] P. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining: Pearson Addison-Wesley, 2005.
[4] E. Knorr, R. Ng, and V. Tucakov, "Distance-based outliers: Algorithms and applications," VLDB Journal, 2000.M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, "LOF: Identifying density based local outliers," presented at ACM SIGMOD International Conference on Management of Data, 2000.
[5] S. Papadimitriou, H. Kitawaga, P. Gibbons, and C. Faloutsos, "LOCI: Fast outlier detection using the local correlation integral," presented at International Conference on Data Engineering, 2003.
[6] Z. He, X. Xu, J. Huang, and S. Deng, "FP-Outlier: Frequent Pattern Based Outlier Detection”, Computer Science and Information System (ComSIS'05), 2005.
[7] S. Wu and S. Wang, “Information-Theoretic Outlier Detection for Large-Scale Categorical Data", IEEE Transactions on Knowledge Engineering and Data Engineering, 2011
[8] A. Frank, & A. Asuncion, (2010). UCI Machine Learning Repository []. Irvine, CA: University of California, School of Information and Computer Science.
[9] E. Muller, I. Assent, U. Steinhausen, and T. Seidl, “Outrank: ranking outliers in high dimensional data,” in IEEE ICDE Workshop, Cancun, Mexico, 2008, pp. 600–603.
[10] K. Das and J. Schneider, “Detecting anomalous records in categorical datasets,” in ACM KDD, San Jose, California, 2007, pp. 220–229.
[11] Z. He, X. Xu, and S. Deng, “A fast greedy algorithm for outlier mining,” in PAKDD, Singapore, 2006, pp. 567–576.
[12] A. Koufakou, E. Ortiz, and M. Georgiopoulos, “A scalable and efficient outlier detection strategy for categorical data,” in IEEE ICTAI, Patras, Greece, 2007, pp. 210–217.
[13] S. Guha, R. Rastogi, and S. Kyuseok, “ROCK: A robust clustering algorithm for categorical attributes,” in ICDE, Sydney, Australia, 1999, pp. 512–521.
[14] Z. Huang, “A fast clustering algorithm to cluster very large categorical data sets in data mining,” in SIGMOD DMKD Workshop, 1997, pp. 1–8.
[15] A. K. Jain, “Data clustering: 50 years beyond K-means”, In Pattern Recognition Letters, vol. 31, pp. 651–666, 2010.
[16] F. Cao, J. Liang, and L. Bai, “A new initialization method for categorical data clustering,” Expert Systems with Applications,vol. 36, pp. 10 223–10 228, 2009.
[17] A. Asuncion and D. J. Newman. (2007) UCI machine learning repository. [Online]. Available:


NAVF, ROAD, Outliers, Categorical.