PENGELOMPOKAN TOPIK DOKUMEN BERBASIS TEXT MINING DENGAN ALGORITME K-MEANS: STUDI KASUS PADA DOKUMEN KEDUTAAN BESAR AUSTRALIA JAKARTA

Authors

  • Wishnu Hardi National Library of Australia
  • Wisnu Ananta Kusuma Information Technology for Library, Bogor Agricultural University
  • Sulistyo Basuki University of Indonesia

DOI:

https://doi.org/10.37014/visipustaka.v21i1.77

Keywords:

text mining, content analysis, document clustering, K-Means algorithm

Abstract

The Australian Embassy in Jakarta is storing a wide array of media release document. Analyzing particular and vital patterns of the documents collection is imperative as it will result in new insights and knowledge of significant topic groups of the documents. K-Means algorithm was used as a non- hierarchical clustering method which partitioning data objects into clusters. The method works through minimizing data variation within cluster and maximizing data variation between clusters. Of the documents issued between 2006 and 2016, 839 documents were examined in order to determine term frequencies and to generate clusters. Evaluation was conducted by nominating an expert to validate the cluster result. The result showed that there were 57 meaningful terms grouped into 3 clusters. “People to people linksâ€, “economic cooperationâ€, and “human development†were chosen to represent topics of the Australian Embassy Jakarta media releases from 2006 to 2016. This research concluded that text mining can be used to cluster topic groups of documents. It provides a more systematic clustering process as the text analysis is conducted through a number of stages with specifically set parameters.

References

Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., & Kochut, K. (2017). A brief survey of text mining: classification, clustering and extraction techniques. Accessed 17 December 2017, from ArXiv Preprint ArXiv:1707.02919.

Davis, C.H., & Shaw, D. (2013). Introduction to information science and technology. Medford, N.J.: American Society for Information Society.

Dolamic, L., & Savoy, J. (2010). When stopword lists make the difference. Journal of the American Society for Information Science and Technology, 61(1), 200"“203. Accessed 17 December 2017, available at https://doi.org/10.1002/asi.21186.

Feldman, R., & Sanger, J. (2007). The text mining handbook: advanced approaches in analyzing unstructured data. Imagine. Accessed 15 November 2017, available at https://doi.org/10.1179/ 1465312512Z.00000000017.

Gurusamy, V., Kannan, S., & Prabhu, J. R. (2017). Mining the attitude of social network users using K-Means clustering. International Journal of Advanced Research in Computer Science and Software Engineering, 7(5), 226"“230. Accessed 15 November 2017, available at https://doi.org/ 10.23956/ijarcsse/SV7I5/0231.

Kannan, S., & Gurusamy, V. (2014). Preprocessing techniques for text mining. Accessed 18 Februari 2018, available at https://www.researchgate.net/publication/27 3127322_Preprocessing_ Techniques_for_Text_Mining.

Lama, P. (2013). Clustering system based on text mining using the K-means algorithm? : news headlines clustering. Turku University of Applied Sciences. Accessed 20 November 2017, available at http://www.theseus.fi/handle/10024/69505.

Mathew, S. (2012). Financial services data management? : big data technology infinancial services. Oracle Financial Services. Accessed 4 November 2017, available at http://www.oracle.com/us/industries/financial-services/bigdata-in- fs-final-wp-1664665.pdf.

Miner, G. D., Elder, J., & Nisbet, R. A. (2012). Practical text mining and statistical analysis for non-structured text data applications. Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications. Accessed 20 December 2017, available at https://doi.org/10.1016 /C2010- 0-66188-8.

Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130"“137. Accessed 11 January 2018, available at https://doi.org/10.1108/eb046814.

Prilianti, K.R., & Wijaya, H. (2014). Aplikasi text mining untuk automasi penentuan tren topik skripsi dengan metode K-Means Clustering. Jurnal Cybermatika, 2(1), 1"“6. Accessed 15 October 2017, available at http://cybermatika.stei.itb.ac.id/ojs/index.ph p/cybermatika/article/view /58/28.

Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20 (November), 53"“65. Accessed 20 February 2018, available at https://doi.org/10.1016/0377- 0427(87)90125-7.

Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513"“523. Accessed 10 February 2018, available at https://doi.org/10.1016/0306- 4573(88)90021-0.

Solka, J. L. (2008). Text data mining: theory and methods. Statistics Surveys, 2, 94"“112. Accessed 15 February 2018, available at https://doi.org/10.1214/07-SS016.

Spire Technologies. (2016). Making sense of unstructured data with Spire. Accessed February 25, 2018, available at http://spiretechnologies.com/making-sense- unstructured-hr-data-spire/.

Sulistyo-Basuki. (2014). Senarai pemikiran Sulistyo Basuki : profesor pertama ilmu perpustakaan dan informasi di Indonesia. Jakarta : Ikatan Sarjana Ilmu Perpustakaan dan Informasi Indonesia.

United Nations. (1961). Vienna Convention on Diplomatic Relations. International and Comparative Law Quarterly. Accessed 6 November 2017, available at https://doi.org/10.1093/ iclqaj/10.3.600.

Wahid, D.H., & Azhari, S. N. (2016). Peringkasan sentimen esktraktif di Twitter menggunakan hybrid TF-IDF dan Cosine Similarity. Indonesian Journal of Computing and Cybernetics Systems, 10(2), 207"“218. Accessed 15 December 2017, available at https://jurnal.ugm.ac.id/ijccs/ article/view/16625/ 11694.

Zade, J., Bamnote, D., & Agrawal, P. (2017). Text document clustering using K-Means algorithm with its analysis and implementation. Imperial Journal of Interdisciplinary Research, 3(2), 1528"“1531. Accessed 16 December 2017, available at http://www.imperialjournals.com/index.php/IJIR/article/view/ 4259/4079.

Zhao, Q., Xu, M., & Fränti, P. (2009). Adaptive and natural computing algorithms. (M. Kolehmainen, P. Toivanen, & B. Beliczynski, Eds.) (Vol. 5495). Berlin, Heidelberg: Springer Berlin Heidelberg. Accessed 23 January 2018, available at https://doi.org/10.1007/978-3-642-04921-7.

Downloads

Published

2019-12-04

Issue

Section

Articles