TY - JOUR
T1 - Privacy-preserving multiple linear regression of vertically partitioned real medical datasets
AU - Kikuchi, Hiroaki
AU - Hamanaga, Chika
AU - Yasunaga, Hideo
AU - Matsui, Hiroki
AU - Hashimoto, Hideki
AU - Fan, Chun I.
N1 - Publisher Copyright:
© 2018 Information Processing Society of Japan.
PY - 2018
Y1 - 2018
N2 - This paper studies the feasibility of privacy-preserving data mining in epidemiological study. As for the data-mining algorithm, we focus on a linear multiple regression that can be used to identify the most significant factors among many possible variables, such as the history of many diseases. We try to identify the linear model to quantify the most significant cause of death from distributed dataset related to the patient and the disease information. In this paper, we have conducted an experiment using a real medical dataset related to a stroke and attempt to apply multiple regression with six predictors of age, sex, the medical scales, e.g., Japan Coma Scale, and the modified Rankin Scale. Our contributions of this paper include (1) to propose a practical privacy-preserving protocol for linear multiple regression with vertically partitioned datasets, (2) to show the feasibility of the proposed system using the real medical dataset distributed into two parties, the hospital who knows the technical details of diseases while patients are in the hospital, and the local government who knows the resident even after the patient has left hospital, (3) to show the accuracy and the performance of the PPDM system which allows us to estimate the expected processing time when an arbitrary number of predictors are used and (4) to study the complexity of the extended models of vertically partition.
AB - This paper studies the feasibility of privacy-preserving data mining in epidemiological study. As for the data-mining algorithm, we focus on a linear multiple regression that can be used to identify the most significant factors among many possible variables, such as the history of many diseases. We try to identify the linear model to quantify the most significant cause of death from distributed dataset related to the patient and the disease information. In this paper, we have conducted an experiment using a real medical dataset related to a stroke and attempt to apply multiple regression with six predictors of age, sex, the medical scales, e.g., Japan Coma Scale, and the modified Rankin Scale. Our contributions of this paper include (1) to propose a practical privacy-preserving protocol for linear multiple regression with vertically partitioned datasets, (2) to show the feasibility of the proposed system using the real medical dataset distributed into two parties, the hospital who knows the technical details of diseases while patients are in the hospital, and the local government who knows the resident even after the patient has left hospital, (3) to show the accuracy and the performance of the PPDM system which allows us to estimate the expected processing time when an arbitrary number of predictors are used and (4) to study the complexity of the extended models of vertically partition.
KW - Epidemiology
KW - Privacy
KW - Privacy-preserving data mining
UR - http://www.scopus.com/inward/record.url?scp=85063649143&partnerID=8YFLogxK
U2 - 10.2197/ipsjjip.26.638
DO - 10.2197/ipsjjip.26.638
M3 - 記事
AN - SCOPUS:85063649143
SN - 0387-5806
VL - 26
SP - 638
EP - 647
JO - Journal of Information Processing
JF - Journal of Information Processing
ER -