Introduction: In recent decades, the field of health insurance has emerged as one of the vital components of the healthcare system, propelled by continuous advancements in technology and the increasing complexity of medical services and technologies. With the advent of new challenges in this industry, there has been a heightened effort to find innovative solutions to enhance service quality, optimize resource management, and increase the satisfaction of insured individuals. One significant approach in improving this domain involves the application of data mining techniques to identify behavioral patterns among health insurance policyholders during outpatient visits to diagnostic and treatment facilities.
Methods: The present study is a descriptive cross-sectional study. The claim data of health insurance in Bushehr province of Iran was used. After data preparation, analysis was performed using SPSS Clementine12.0 software. The values of insurance start time, number of visits, and the value of the type of insurance were used to model the K-means algorithm in two modes including demographic mode and Recency-frequency-monetary (RFM). Sampling was done by census method. The statistical population includes the information of all outpatient referrals of the insured covered by health insurance of Bushehr province to 1,420,579 referrals to diagnostic and medical centers in 2018, which has been prepared by the researcher’s direct referral to the database of medical records.
Results: The root mean square deviation values for RFM-based clustering and demographics are 21 and 21.65, respectively. And the Dunn’s Index confirmed the better RFM-based clustering. The RFM-based K-Means algorithm classified the data into four clusters, with 44% of the insured in Cluster One, 4% in Cluster Two, 22% in Cluster Three, and 30% in Cluster Four. Based on this, cluster 2 insured, including women with insurance of other classes with 4% of the population, were identified as the most referred, and cluster 3, including women with rural insurance, with 22% of the population, were identified as the least referred insured.
Conclusion: The obtained model divided the insured into 4 clusters. This model allows the organization to predict the referral patterns of each insurer based on their age, gender, and type of insurance and provide appropriate services for different clusters. By using these models and technique in decision making process, the insurers satisfaction will be improved.