Crowd counting plays a crucial rule in the development of smart city. However, the problems of scale variations and background interferences degrade the performance of the crowd counting in real-world scenarios. To address these problems, a novel attentive hierarchy ConvNet (AHNet) is proposed in this paper. The AHNet extracts hierarchy features by a designed discriminative feature extractor and mines the semantic features in a coarse-to-fine manner by a hierarchical fusion strategy. Meanwhile, a re-calibrated attention (RA) module is built in various levels to suppress the influence of background interferences, and a feature enhancement (FE) module is built to recognize head regions at various scales. Experimental results on five people crowd datasets and two cross-domain vehicle crowd datasets illustrate that the proposed AHNet achieves competitive performance in accuracy and generalization.