The browser you are using is not supported by this website. All versions of Internet Explorer are no longer supported, either by us or Microsoft (read more here:

Please use a modern browser to fully experience our website, such as the newest versions of Edge, Chrome, Firefox or Safari etc.

Profile photo of Ronny Berndtsson

Ronny Berndtsson

Professor, Dep Director, MECW Dep Scientific Coordinator

Profile photo of Ronny Berndtsson

Inverse method using boosted regression tree and k-nearest neighbor to quantify effects of point and non-point source nitrate pollution in groundwater


  • Alireza Motevalli
  • Seyed Amir Naghibi
  • Hossein Hashemi
  • Ronny Berndtsson
  • Biswajeet Pradhan
  • Vahid Gholami

Summary, in English

Nitrate pollution of groundwater has increased dramatically worldwide due to increase of population and agricultural productivity. The resulting nitrate concentration in groundwater is usually a combination of various types of point and non-point pollutant sources. It is often difficult to distinguish between these sources since groundwater is formed in large and complex catchments with various natural processes and anthropogenic influence that contribute to a certain downstream nitrate concentration. For such conditions, this paper uses a methodology that can be used to inversely determine type and location of main nitrate pollutant source. The methodology builds on two state-of-the-art data mining techniques, boosted regression tree (BRT)and k-nearest neighbor (KNN). These techniques are used to produce a nitrate pollution vulnerability map. The methodology can mitigate effects of subjective judgement on determining importance of different sources and mechanisms for nitrate transport. The investigated mechanisms are hydrogeological, hydrological, anthropogenic, topography, and soil conditioning factors. Thus, the proposed methodology is used to separate between natural processes and anthropogenic effects on nitrate pollution. To calculate the groundwater vulnerability maps, a groundwater nitrate concentration of 40 mg/L (suggested by WHO with a 20% risk margin)was selected as a general threshold for identifying polluted areas that resulted in 96 polluted wells. Non-polluted locations were selected from well data with nitrate concentration less than 15 mg/L (96 non-polluted). The models were trained on 70% polluted and 70% non-polluted site data. The remaining data, 30% polluted and 30% non-polluted sites, were used to validate the simulation results. Results showed that the BRT produced outputs with higher performance than the KNN algorithm. The final ranking results based on the BRT model showed the higher importance of hydraulic conductivity, river density, soil, slope percent, net recharge, and distance from villages, in order, relative to other factors.


  • Middle Eastern Studies
  • Division of Water Resources Engineering
  • LTH Profile Area: Water
  • Centre for Advanced Middle Eastern Studies (CMES)
  • MECW: The Middle East in the Contemporary World

Publishing year







Journal of Cleaner Production



Document type

Journal article




  • Oceanography, Hydrology, Water Resources
  • Environmental Sciences


  • Boosted regression tree
  • Data mining
  • GIS
  • Inverse modeling
  • K-nearest neighbors
  • Nitrate pollution




  • ISSN: 0959-6526