TY - GEN
T1 - Probabilistic Model Incorporating Auxiliary Covariates to Control FDR
AU - Qiu, Lin
AU - Murrugarra-Llerena, Nils
AU - Silva, Vítor
AU - Lin, Lin
AU - Chinchilli, Vernon M.
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/10/17
Y1 - 2022/10/17
N2 - Controlling False Discovery Rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring metrics about test-level covariates. This strategy may not be optimal for complex large-scale problems, where indirect relations often exist among test-level covariates and auxiliary metrics or covariates. We incorporate auxiliary covariates among test-level covariates in a deep Black-Box framework (named as NeurT-FDR) which boosts statistical power and controls FDR for multiple hypothesis testing. Our method parametrizes the test-level covariates as a neural network and adjusts the auxiliary covariates through a regression framework, which enables flexible handling of high-dimensional features as well as efficient end-to-end optimization. We show that NeurT-FDR makes substantially more discoveries in three real datasets compared to competitive baselines.
AB - Controlling False Discovery Rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring metrics about test-level covariates. This strategy may not be optimal for complex large-scale problems, where indirect relations often exist among test-level covariates and auxiliary metrics or covariates. We incorporate auxiliary covariates among test-level covariates in a deep Black-Box framework (named as NeurT-FDR) which boosts statistical power and controls FDR for multiple hypothesis testing. Our method parametrizes the test-level covariates as a neural network and adjusts the auxiliary covariates through a regression framework, which enables flexible handling of high-dimensional features as well as efficient end-to-end optimization. We show that NeurT-FDR makes substantially more discoveries in three real datasets compared to competitive baselines.
UR - https://www.scopus.com/pages/publications/85140852222
UR - https://www.scopus.com/pages/publications/85140852222#tab=citedBy
U2 - 10.1145/3511808.3557672
DO - 10.1145/3511808.3557672
M3 - Conference contribution
AN - SCOPUS:85140852222
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 4419
EP - 4423
BT - CIKM 2022 - Proceedings of the 31st ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 31st ACM International Conference on Information and Knowledge Management, CIKM 2022
Y2 - 17 October 2022 through 21 October 2022
ER -