Crowd counting in congested scenarios is an essential yet challenging task in detecting abnormal crowd for contemporary urban planning. The counting accuracy has been significantly improved with the rapid development of deep learning over the last decades. However, current models are fragile in the real-world application mainly due to two inherent weaknesses: (1) Scale variations always exert negative influences on counting accuracy. (2) Overwhelming amount of parameters in the deep neural network will lead to low efficiency. To address these two limitations, in this paper, we propose a Feature Pyramid Attention Network (FPANet). Specifically, the FPANet consists of three modules, namely the feature pyramid module, attention module, and multiscale aggregation module. The feature pyramid module is built in a lightweight architecture to extract multiscale features. The attention module focuses on the crowd region and suppresses misleading information. The multiscale aggregation module is derived to adaptively fuse the discriminative knowledge extracted in different granularities. Additionaly, the efficiency of FPANet is boosted by the multi-group structure. Experimental results on five crowd benchmark datasets, i.e., ShanghaiTech, UCF_CC_50, UCF-QNRF, WorldExpo’10, and NWPU-Crowd, and two cross-domain datasets, i.e., CARPK, and PUCPR+, demonstrate that the FPANet achieves superior performances in terms of accuracy, efficiency and generalization.