Archer Gong Zhang, Jiahua Chen
Abstract: In many applications, we collect samples from multiple interconnected populations. These population distributions share some latent structure, so it is advantageous to jointly analyze the samples to make efficient inferences on the multiple distributions and their functionals. One effective way to connect the distributions is the density ratio model (DRM). A key ingredient of the DRM is that the log density ratios are linear combinations of prespecified functions; the vector formed by these functions is called the basis function. The benefit of DRM relies on correctly specifying the basis function to a large degree. In applications, the user may not have a complete knowledge to enable a suitable choice of the basis function, and many discussions have been devoted to this topic. In this article, we consider the still open problem of a data-adaptive choice of the basis function that can alleviate the risk of severe model misspecification. We propose a data-adaptive approach to the choice of basis function based on functional principal component analysis. Under some conditions, we show that this approach leads to consistent basis function estimation. Our simulation results show that the proposed adaptive choice can achieve an efficiency gain. We use a real-data example from economics to demonstrate the efficiency gain and the ease of our approach.