Clustering Data Kredit Bank Menggunakan Algoritma Agglomerative Hierarchical Clustering Average Linkage
Abstract
Data mining adalah pengembangan model yang merepresentasikan penemuan pola menggunakan data historis. Model dapat diaplikasikan pada data untuk prediksi (klasifikasi dan regresi), segmentasi populasi (clustering), dan menentukan hubungan di dalam populasi (asosiasi). Dari beberapa model, salah satunya adalah clustering yang didefinisikan sebagai proses mengorganisir objek-objek menjadi satu kelompok yang anggotanya memiliki kemiripan tertentu. Similaritas ada dua, yakni similaritas berdasarkan bentuk dan jarak. Clustering mempunyai beberapa karakteristik, yaitu: partitioning, hierarchical, overlapping, dan hybrid. Hierarchical clustering adalah salah satu algoritma clustering dengan karakteristik setiap data harus termasuk dalam cluster tertentu, dan data yang termasuk dalam cluster tertentu tidak dapat berpindah ke cluster lain. Hierarchical clustering ada dua, yaitu divisive (top to down) dan agglomerative (down to top). Algoritma agglomerative ada empat yaitu single linkage, centroid linkage, complete linkage, dan average linkage. Salah satu dari algoritma agglomerative tersebut adalah average linkage. Algoritma ini merupakan algoritma terbaik di antara algoritma hierarchical yang lain, tetapi memiliki waktu komputasi tertinggi. Pada penelitian ini akan dilakukan clustering terhadap nasabah di suatu bank dengan algoritma agglomerative hierarchical clustering average linkage. Atribut data yang digunakan: status pengecekan, durasi kredit, sejarah kredit, tujuan kredit, besaran kredit, status tabungan, employment, komitmen, status personal, pihak lain, menetap sejak, kepemilikan property, umur, rencana pembayaran lainnya, status rumah, keberadaan kredit, pekerjaan, jumlah tanggungan, telepon rumah, pekerja luar negeri, dan kelas. Data dalam penelitian ini sebanyak 1000 instances, yang kemudian dijadikan sebagai data training sebanyak 25 %, 50 %, dan 75 %, sedangkan untuk data testing digunakan keseluruhan data.
Kata kunci : Data mining, Dataset, Clustering, Agglomerative Hierarchical Clustering, Average Linkage
ABSTRACT
Data mining is the development of model that represents pattern discovery using historical data. The model can be applied to data for prediction (classification and regression), population segmentation (clustering), and determining relationships within the population (association). Of the several models, one of them is clustering which is defined as the process of organizing objects into one group whose members have similarities. There are two similarities, namely similarity based on shape and distance. Clustering has several characteristics, namely: partitioning, hierarchical, overlapping, and hybrid. Hierarchical clustering is a clustering algorithm with the characteristics of each data must be included in a particular cluster, and data included in a particular cluster cannot moved to another cluster. There are two hierarchical clustering, namely divisive (top to down) and agglomerative (down to top). There are four agglomerative algorithms, namely single linkage, centroid linkage, complete linkage, and average linkage. One of the agglomerative is average linkage. This algorithm is the best hierarchical algorithms, but has the highest computational time. In this study clustering of customers in a bank conducted with the agglomerative hierarchical clustering average linkage. Data attributes used: checking status, credit duration, credit history, credit goals, loan size, savings status, employment, commitment, personal status, other parties, settled since, property ownership, age, other payment plans, home status, credit availability, employment, number of dependents, landline, overseas workers and class. The data in this study were 1000 instances, which were then used as training data for 25%, 50%, and 75%, while for the testing data the entire data.
Keywords: Data mning, datasets, clustering, agglomerative hierarchical clustering, average linkage
Full Text:
PDFReferences
Barakbah, A.R., 2006. Clustering: workshop data mining 18-20 juli 2006. Jurusan Teknologi Informasi Politeknik Elektronika Negeri Semarang.
Hornick, F.M., Marcade, E. & Venkayala, S. 2007. Java data mining: strategy, standard, and practice: a practical guide for architecture, design, and implementation. San Fransisco: Elsevier.
DOI: https://doi.org/10.32528/justindo.v4i1.2418
Refbacks
- There are currently no refbacks.
Copyright (c) 2019 Ginanjar Abdurrahman
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.