| Home | E-Submission | Sitemap | Contact Us
Korean J Intern Med > Volume 27(2); 2012 > Article
Kim, Shin, Kim, and Kim: Comorbidity Study on Type 2 Diabetes Mellitus Using Data Mining



The aim of this study was to analyze comorbidity in patients with type 2 diabetes mellitus (T2DM) by using association rule mining (ARM).


We used data from patients who visited Keimyung University Dongsan Medical Center from 1996 to 2007. Of 411,414 total patients, T2DM was present in 20,314. The Dx Analyze Tool was developed for data cleansing and data mart construction, and to reveal associations of comorbidity.


Eighteen associations reached threshold (support, ≥ 3%; confidence, ≥ 5%). The highest association was found between T2DM and essential hypertension (support, 17.43%; confidence, 34.86%). Six association rules were found among three comorbid diseases. Among them, essential hypertension was an important node between T2DM and stroke (support, 4.06%; confidence, 8.12%) as well as between T2DM and dyslipidemia (support, 3.44%; confidence, 6.88%).


Essential hypertension plays an important role in the association between T2DM and its comorbid diseases. The Dx Analyze Tool is practical for comorbidity studies that have an enormous clinical database.


According to national health statistics in Korea, the prevalence of type 2 diabetes mellitus (T2DM) increased from 8.6% in 2001 to 9.5% in 2007, while the prevalence of T2DM in the United States was 10.7% in 2007. Furthermore, the prevalence of T2DM in 2007 in men (11.6%) was higher than in women (7.8%). The prevalence was highest in men aged 60-69 years (26.6%) and in females aged 70-79 years (19.5%) [1].
Patients with T2DM have an increased incidence of disease in several internal organs and tissues. Chronic microvascular and macrovascular diseases have greater influence on the long-term prognosis of patients with T2DM than acute complications [2]. Investigating the associations of these complications with comorbid diseases by using patient diagnostic data is helpful in predicting their incidence and thus more effectively treating patients with T2DM.
Association rule mining (ARM) describes how two items are related using a special method of exploring patterns different from other analysis techniques [3]. The association rule generated from ARM can formulate the relation between X and Y in the form of "X → Y" or "If X.., then Y..," and analyze it as "If item X exists, item Y coexists" [4]. A rule does not necessarily imply cause and effect. Instead, it identifies simultaneous occurrence between items in antecedent X and consequent Y. ARM makes it possible to analyze the association between not only two diseases, but also among three or more comorbidities that can be calculated from existing statistics. One study revealed the accompanying diseases of attention deficit/hyperactivity disorder by applying ARM to diagnostic data from the National Health Insurance Database of Taiwan [5]. Another study analyzed stroke and its comorbid diseases by ARM [6]. Therefore, the current study was conducted to determine the relations among complications, the various diseases that accompany T2DM, and three or more comorbidities, using ARM based on large amounts of clinical data.


Study population

Data from 411,414 patients examined at the Keimyung University Dongsan Medical Center from 1996 to 2007 were analyzed using the Dx Analyze Tool. Among the patients, 20,314 had T2DM and the total diagnostic data was 145,306. As the control group for the analysis, 20,314 patients without a diagnosis of T2DM were included and the total diagnostic data was 57,379.

Data collection

The workflow of the association analysis of T2DM comorbid diseases is shown in Fig. 1. First, data were collected from the database of patients examined at Keimyung University Dongsan Medical Center from 1996 to 2007. Personal information of the subjects such as name, gender, age, and contact details was not collected.

Analysis method

For the current study, we developed the Dx Analyze Tool using the Apriori algorithm (C# 2.0, MS Access DB) [4,7] to analyze the association between clinical diagnoses. The Dx Analyze Tool, which refines the data and extracts an association rule between a specific disease and its related diseases, involves five steps: data retention, data cleansing, data mart construction, selection of Dx code, and analysis by the Apriori algorithm. The Apriori algorithm is an ARM technique. The algorithm rules specify when item-set A appears and an item-set B appears with it. The rules are evaluated by support (the number of occurrences of disease A and disease B from all diseases) and confidence (the number of occurrences of disease A co-occurring with disease B). The formulas for support and confidence have been previously described [4,8,9] and are presented below.
Using SPSS version 18.0 (SPSS Inc., Chicago, IL, USA), the chi-square test was used to review the association rules generated by the Dx Analyze Tool and to discern differences between groups with or without T2DM in the distribution of diseases appearing by the association rule. The results from the Dx Analyze Tool and the chi-square test found that a meaningful association rule exists between T2DM and other diseases.


Diseases frequently accompanying T2DM

Diseases that frequently accompany T2DM are summarized in Table 1. The most frequent disease was essential hypertension (34.68% of all subjects), followed by gastritis and duodenitis (15.61%), senile cataract (15.43%), lipidemias and other disorders of lipoprotein metabolism (13.64%), and retinal disease (12.78%).

Association rule resulting from the Apriori algorithm

The association rule between T2DM and comorbid diseases generated by the Apriori algorithm is presented in Table 2. The threshold for values was established as > 3% for support and > 5% for confidence, and 18 rules satisfying these conditions were made. The rule with the highest support and confidence was T2DM→essential hypertension (support, 17.43%; confidence, 34.86%). Other rules with high support and confidence were T2DM→gastritis/duodenitis (support, 7.80%; confidence, 15.61%), T2DM→senile cataract (support, 7.71%; confidence, 15.43%), T2DM→disorders of lipoprotein metabolism and other lipidemias (support, 6.82%; confidence, 13.64%), and T2DM→retinal disease (support, 6.39%; confidence, 12.78%). The rules showing an association for more than three diseases were T2DM→essential hypertension and stroke (support, 4.06; confidence, 8.12%), T2DM→essential hypertension and disorders of lipoprotein metabolism and other lipidemias (support, 3.44%; confidence, 6.88%), and T2DM→senile cataract and retinal disease (support, 3.39%; confidence, 6.78%).

Statistical examination of ARM analysis results

The results of the statistical analysis to determine the distribution of diseases occurring with or without T2DM are summarized in Table 3. Subjects with T2DM were more likely than those without T2DM to have disorders of lipoprotein metabolism and other lipidemias, senile cataract, retinal disorders, essential hypertension, angina pectoris, heart failure, cerebral infarction, gastroesophageal reflux disease, gastric ulcer, gastritis and duodenitis, osteoporosis without pathological fracture, and chronic renal failure (p < 0.05).


This study was conducted to analyze the association between T2DM and comorbid diseases. Prior to this study, a pilot study was performed, in which comorbidity of cerebral infarction patients [6] and essential hypertension patients [10] were analyzed by ARM. On the basis of the pilot study, the present study constructed a data mart by refining diagnostic data extracted from patients of our medical center. The association rule related to more than three diseases comorbid with T2DM was ascertained by developing a program to generate the association rule by applying the ARM Apriori algorithm.
T2DM is frequently accompanied by one or more components of metabolic syndrome such as obesity, dyslipidemia, and hypertension. A patient with hypertension is 2.4 times more likely to develop cerebrovascular disease [11]. A study that examined diabetic complications in 5,652 patients with diabetes from 13 university hospitals in Korea reported that hypertension and dyslipidemia are accompanying comorbid conditions in 60.4% and 44.1%, respectively, of these patients. Additionally, 38.4% and 44.7% of patients had retinopathy and neuropathy, respectively [2]. Another study [12] reported that 77.9% of 4,240 patients with T2DM from 13 university hospitals in Korea had metabolic syndrome, with the prevalence of each component of metabolic syndrome being 56.8% for central obesity, 42.0% for hypertriglyceridemia, 65.1% for low high-density lipoprotein cholesterol, and 74.9% for hypertension. Despite different research methods, the results of the present study agree with previous studies and link T2DM with essential hypertension, disorders of lipoprotein metabolism and other lipidemias, retinal disease, cerebral infarction, and angina pectoris. Specifically, T2DM and essential hypertension had the highest association, and this association produced the following association rules: T2DM→essential hypertension and cerebral infarction, T2DM→essential hypertension and disorders of lipoprotein metabolism and other lipidemias, and T2DM→essential hypertension and angina pectoris. A previous comorbidity study on cerebral infarction revealed disorders of lipoprotein metabolism and essential hypertension→cerebral infarction by the Apriori algorithm, as well as an association of T2DM and essential hypertension→cerebral infarction [5].
Patients with T2DM often have irregular diet patterns, which deleteriously influences glucose control, lipid metabolism, and micronutrient intake [13]. In addition, T2DM is progressive and generally incurable, precluding several complications related to poor glucose regulation [14]. The use of medications to counteract the complications of diet and disease itself can cause and exacerbate gastric disorders. This was recently shown by the link between T2DM and gastroesophageal reflux disease, gastric ulcer, and gastritis and duodenitis.
Fasting glucose and diabetes correlate with the occurrence of cataracts, and metabolic disorders of the body increase the risk of the occurrence of cataracts. Specifically, the risk of cataracts increases in low levels of high-density lipoprotein cholesterol, hypertension, and high fasting glucose [15]. The present data also support an association between T2DM and senile cataract and essential hypertension. However, an association with dyslipidemia was not found and this requires further study.
Although the present study showed that T2DM is associated with heart failure and chronic renal failure, other studies on T2DM did not show such results [2,11,14]. Park et al. [16] investigated the cause of death in 680 patients with T2DM and reported that cerebrovascular disease (15.0%), ischemic heart disease (15.6%), infectious disease (25.3%), cancer (21.9%), congestive heart failure (7.1%), kidney disease (4.7%), and other diseases are major causes of death, which offers support for an association rule for T2DM, congestive heart failure, and chronic kidney disease.
In the present study, 7.21% (1,464 patients) of the patients with T2DM displayed accompanying osteoporosis without pathological fracture, and the association rule of T2DM→osteoporosis without pathological fracture was generated. Patients with T2DM were found to have more concurrent osteologic diseases than nondiabetic patients, suggesting that patients with T2DM may have decreased bone density [17].
This study determined comorbidities using the association rules generated for the diagnosis data of patients with T2DM by applying ARM from previous studies. While the possibility exists that doctors added diagnoses excessively to increase prescriptions or that comorbidities were found but not recorded, the majority of cases were diagnosed accurately, and the few inaccuracies were filtered by using large amounts of clinical data.
This study was significant because it was based on a large amount of data generated using electronic medical records in clinical use, a constructed data mart, and analysis of the comorbidity of DM using a program that automates the determination of the Apriori algorithm. However, a limitation of the present study is that the data came from a single medical institution. Data from other medical facilities should be collected and analyzed to demonstrate the relevance of the program and its results. Furthermore, the Apriori algorithm is limited in determining precedence or causality of disease. Therefore, future studies to identify the temporal complications of diseases considering chronology (e.g., the sequential pattern of disease occurrence) should be conducted.


This work was supported by a grant from the Regional Technology Innovation Program of the Ministry of Knowledge Economy (MKE) (RTI04-01-01).

Conflict of interest

No potential conflict of interest relevant to this article was reported.


1. Ministry of Health & Welfare. Korean Centers for Disease Control & Prevention. 2007 National Health Statistics: National Health and Nutrition Examination Survey 4th. 2008;Seoul: Korean Centers for Disease Control & Prevention.

2. Lim S, Kim DJ, Jeong IK, et al. A nationwide survey about the current status of glycemic control and complications in diabetic patients in 2006: The Committee of the Korean Diabetes Association on the Epidemiology of Diabetes Mellitus. Korean Diabetes J 2009;33:48–57.
3. Bae HS, Cho DH, Suk KH, et al. Data Mining Using SAS Enterprise Miner. 2008;2nd ed. Seoul: Kyowoosa.

4. Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data 1993. Washington, DC; 1993 May 26-28. New York: ACM, p. 207–216.
5. Tai YM, Chiu HW. Comorbidity study of ADHD: applying association rule mining (ARM) to National Health Insurance Database of Taiwan. Int J Med Inform 2009;78:e75–e83PMID : 19853501.
crossref pmid
6. Lee IH, Shin AM, Son CS, et al. Association analysis of comorbidity of cerebral infarction using data mining. J Korean Soc Phys Ther 2010;22:75–81.
7. Association rule learning [Internet]. Wikipedia. 2012;cited 2012 Mar 30. San Francisco (CA): Wikimedia Foundation Inc., Available from: http://en.wikipedia.org/wiki/Association_rule_learning.

8. Kang HC, Han ST, Choi JH, Kim ES, Kim MK. Data Mining with SAS Enterprise Miner 4.0: Methodology and Application. 2002;3rd ed. Seoul: Jayuacademi.

9. Heo MH, Lee YG. Data Mining Modeling and Case. 2008;2nd ed. Seoul: Hannarae.

10. Shin AM, Lee IH, Lee GH, et al. Diagnostic analysis of patients with essential hypertension using association rule mining. Healthc Inform Res 2010;16:77–81PMID : 21818427.
crossref pmid pmc
11. Chung HS, Seo JA, Kim SG, et al. Relationship between metabolic syndrome and risk of chronic complications in Koreans with type 2 diabetes. Korean Diabetes J 2009;33:392–400.
12. Kim TH, Kim DJ, Lim S, et al. Prevalence of the metabolic syndrome in type 2 diabetic patients. Korean Diabetes J 2009;33:40–47.
13. Ahn HJ, Han KA, Koo BK, et al. Analysis of meal habits from the viewpoint of regularity in Korean type 2 diabetic patients. Korean Diabetes J 2008;32:68–76.
14. Kim SG, Choi DS. The present state of diabetes mellitus in Korea. J Korean Med Assoc 2008;51:791–798.
15. Park SS, Lee EH. Relations of cataract to metabolic syndrome and its components: based on the KNHANES 2005, 2007. J Korean Ophthalmic Opt Soc 2009;14:103–108.

16. Park SK, Park MK, Suk JH, et al. Cause-of-death trends for diabetes mellitus over 10 years. Korean Diabetes J 2009;33:65–72.
17. Lipscombe LL, Jamal SA, Booth GL, Hawker GA. The risk of hip fractures in older individuals with diabetes: a population-based study. Diabetes Care 2007;30:835–841PMID : 17392544.
crossref pmid
Figure 1
Schematic diagram of the study workflow.
Table 1
High frequency comorbid diseases with type 2 diabetes mellitus (n = 20,314)
Table 2
Association rules between type 2 diabetes mellitus and comorbid diseases (n = 40,628)

E11, type 2 diabetes mellitus; I10, essential (primary) hypertension; K29, gastritis and duodenitis; H25, senile cataract; E78, disorders of lipoprotein metabolism and other lipidemias; H36, retinal disorders in diseases classified elsewhere; I63, cerebral infarction; I20, angina pectoris; N18, chronic renal failure; K25, gastric ulcer; M81, osteoporosis without pathological fracture; I50, heart failure; K21, gastroesophageal reflux disease.

Table 3
Statistical analysis of the association rule mining results (n = 40,628)

Values are presented as number (%).

E11, type 2 diabetes mellitus; E78, disorders of lipoprotein metabolism and other lipidemias; H25, senile cataract; H36, retinal disorders in diseases classified elsewhere; I10, essential (primary) hypertension; I20, angina pectoris; I50, heart failure; I63, cerebral infarction; K21, gastroesophageal reflux disease; K25, gastric ulcer; K29, gastritis and duodenitis; M81, osteoporosis without pathological fracture; N18, chronic renal failure.

Editorial Office
101-2501, Lotte Castle President, 109 Mapo-daero, Mapo-gu, Seoul 04146, Korea
Tel: +82-2-2271-6792   Fax: +82-2-790-0993    E-mail: kaim@kams.or.kr
Copyright © 2018 The Korean Association of Internal Medicine. All rights reserved.
About |  Browse Articles |  Current Issue |  For Authors and Reviewers
powerd by m2community