Model My Watershed Technical Documentation
This reference document is intended to provide technical documentation and references for the data layers, data analysis algorithms, models, computational framework, and other components that together create the hydrologic and water quality output delivered by the Model My Watershed web application available at https://wikiwatershed.org or https://app.wikiwatershed.org/.
2.Layers (Viewable Mapped Data)
Model My Watershed provides a number of geospatial data layers for visualization, analysis and modeling. Detailed information and data sources for each layer are provided below, organized by type, in the order in which they appear in the Layer selector in the lower left of the map.
Layers that are not available for visualization, but are used for analysis and modeling functions, are described under Section 2.6 Additional Data Layers.
Continental US Medium Resolution Stream Network
From NHDplusV2 Medium Resolution (1:100,000-scale) NHDFlowlines (similar to https://catalog.data.gov/dataset/medium-resolution-national-hydrography-dataset-flowline-feature-line or https://databasin.org/datasets/89e82ce1f6cb42dba509ff46ba51f67f)
Unfortunately, the NHD high resolution Flowline vector dataset (nominally at 1:24,000-scale) is not yet available within NHDplusV2.
Blue lines are rendered with styling that depends on user zoom extent and on the stream order.
- Larger streams are attributed with thicker blue lines.
- Small streams appear/disappear as the user zooms in and out of the map area.
Delaware River Basin High Resolution Stream Network
The Delaware River Basin High resolution stream network was derived from the 1/3 arc second (10 m) resolution digital elevation model (DEM) from the USGS national elevation dataset obtained from the National Map (https://viewer.nationalmap.gov/launch/) using FTP download options for the domain covering the Delaware River Basin.
This work was done by Model My Watershed partners at Utah State University, David Tarboton and Nazmus Sazib, using Terrain Analysis using Digital Elevation Models (TauDEM) software. Tarboton is the lead developer of the TauDEM software (see http://hydrology.usu.edu/taudem).
The processing steps used were:
- Define the Delaware River Basin Terrain Analysis domain. The parts of the DEM that occupied ocean or estuary area identified from National Hydrography Dataset and other data sources were masked out in this DEM, setting a no data value for grid cells more than 100 m from the shore and -50 m for grid cells within 100 m of the coast. This ensured that grid cells adjacent to the shore drained into the ocean/estuary, while at the same time avoiding unnecessary terrain analysis for ocean/estuary areas. The DEM was then clipped to the Delaware River Basin watershed boundary from NHDPlus, with a 5 km buffer around the edges to avoid edge effects where the watershed boundary and DEM are inconsistent.
- Pitremove. The TauDEM pitremove function was used to hydrologically condition the DEM. This raised the level of any grid cells completely surrounded by higher terrain to the level of the lowest pour point around their edge so that there is a path of non increasing elevation from each grid cell to the domain edge along which water could drain.
- D8 flow directions. The TauDEM D8 flow direction function was used to compute the single flow direction associated with each grid cell to one of its eight adjacent neighbors.
- D8 Contributing area. The TauDEM D8 Contributing area function was used to calculate the number of grid cells draining through each grid cell counting itself.
- Determine outlets to the ocean/estuary. Outlet points where contributing area is greater than 5000 grid cells (Approx 0.5 km2) and the flow leaves the domain were determined as the downstream ends of a temporary stream network mapped using TauDEM with 5000 grid cell contributing area threshold. These outlet points were used in calculations below to constrain the work to areas upstream of these outlets. It was deemed not meaningful to delineate a stream network for areas less than 0.5 km2 draining directly to the ocean.
- Peuker Douglas valley filter. The TauDEM Peuker Douglas filter was used to identify valley grid cells. This filter selects all grid cells, examines each set of 2 x 2 grid cells, and unselects the highest elevation cell. Cells remaining selected at the end are “potential valley cells.”
- Weighted D8 Contributing area. The TauDEM D8 contributing area function was used with the Peuker Douglas valley filter result as a weighted input. This calculates the number of potential valley grid cells draining through each grid cell.
- Define stream grid. The TauDEM threshold function was used to define as candidate stream grids the grid cells in the Weighted D8 contributing area result exceeding input contributing area thresholds. Contributing area thresholds of 20, 50, and 100 grid cells were evaluated. After visual inspection, in comparison to contour crenulations and high resolution NHD streams a threshold of 50 grid cells was chosen.
- Calculate stream network. The TauDEM Stream Network function was used to delineate a stream network of lines (GIS vector shapes) from the 50 cell threshold stream grid. The result is a geographic feature set (set of lines) in GIS shapefile format.
Note that this procedure, and in particular the use of the Peuker Douglas valley filter and weighted contributing area functions results in a stream network that adapts to the complexity of the topography. Where the topography is complex, as would be reflected by a high degree of crenulation in contours, the drainage density of the resulting stream network is high and reflects this. Where the topography is less complex (smooth contours) the drainage density is low. The basis for this is that the mapping of valley grid cells produces a skeletonized (disconnected) stream map that reflects the variability of drainage density across the topography. These valley grid cells were then formed into a connected stream network by using them as input to a weighted contributing area calculation that counted only these grid cells.
For additional detail on the rationale for this approach refer to the following references (full citations in References list below): Tarboton & Ames (2001); Tarboton et al. (1992); Tarboton et al. (1991).
For additional detail on the TauDEM software and use of each function refer to http://hydrology.usu.edu/taudem/taudem5/documentation.html. The TauDEM software is open source and may be obtained from the following websites:
- Precompiled files and installer http://hydrology.usu.edu/taudem/taudem5/index.html
- Source code https://github.com/dtarb/TauDEM
Delaware River Basin T(X) Concentration(s) from SRAT
Estimated in-stream concentrations of Total Nitrogen (TN), Total Phosphorus (TP) or Total Suspended Solids (TSS), derived within the Delaware River Basin from the Stream Reach Assessment Tool (SRAT) modeling effort. SRAT-estimated in-stream concentrations are shown in MMW by color-coding the NHDplusV2 stream network in colors ranging from green to yellow to orange to red, with greens indicating the lowest concentrations and reds indicating the highest.
The Stream Reach Assessment Tool (SRAT) modeling effort was funded by the William Penn Foundation (WPF) Delaware River Watershed Initiative (DRWI). SRAT is derived from calibrated MapShed model runs of all HUC-12 areas within the Delaware River Basin, downscaling MapShed results to NHDplusV2 catchment scales and routing loads through the NHDplusV2 medium resolution stream network. For more details regarding SRAT, see https://www.streamreachtools.org/overview/. The Stream Reach Assessment Tool is brought to you through the collaborative work of many DRWI partners.
Many additional SRAT-derived model output data layers can be visualized and analyzed in Model My Watershed, including the visualization of pollutant loading rates and stream concentrations at the NHD catchment and stream segment level. See below for more details.
Land: USGS National Land Cover Database
- NLCD-2011: http://www.mrlc.gov/nlcd2011.php
- Legend to colors and land use types: https://www.mrlc.gov/nlcd11_leg.php
Soil: Hydrologic Soil Groups from gSSURGO
Gridded Soil Survey Geographic (gSSURGO) 2016. Database for the Conterminous United States. United States Department of Agriculture (USDA), Natural Resources Conservation Service (NRCS). Obtained from the USDA Geospatial Data Gateway at https://gdg.sc.egov.usda.gov/
For more information and official gSSURGO User Guide, see https://www.nrcs.usda.gov/wps/portal/nrcs/detail/soils/survey/geo/?cid=nrcs142p2_053628
Hydrologic Soil Groups is one gSSURGO soil category, based on water infiltration rates during wet, saturated conditions. Low infiltration rate soils translate to high runoff potential. For more information, see these USDA NRCS publications:
- Hydrologic Soils Group (HSG) Questions & Answers: https://www.nrcs.usda.gov/wps/PA_NRCSConsumption/download?cid=stelprdb1262857&ext=pdf
- National Engineering Handbook, Part 630 Hydrology, Chapter 7 Hydrologic Soil Groups at https://directives.sc.egov.usda.gov/OpenNonWebContent.aspx?content=17757.wba
Climate: Mean Monthly Precipitation and Temperature
Gridded mean monthly values for precipitation and temperature were obtained from the PRISM Climate Group (http://prism.nacse.org/) and are the “AN81m” datasets. Briefly, these layers were created from a modelling effort (Climatologically-Aided Interpolation process) that utilized nationally available records for the time period 1981-2010. Documentation can be found at http://prism.nacse.org/documen
Pennsylvania Urbanized Areas
US EPA Urbanized Area boundaries, developed by the USEPA to support a number of analytical needs. In Pennsylvania, these boundaries are used to identify areas within which various municipal entities have a responsibility for reducing pollutant loads (primarily sediment, nitrogen and phosphorus).
EPA web reference here: https://www.epa.gov/npdes/urbanized-area-maps-npdes-ms4-phase-ii-stormwater-permits
DRB Catchment Water Quality Data, T(X) Loading Rates from SRAT Catchments
Estimated average-catchment loading rates for Total Nitrogen (TN), Total Phosphorus (TP), or Total Suspended Solids (TSS), derived within the Delaware River Basin from the Stream Reach Assessment Tool (SRAT) modeling effort. SRAT-estimated loading rates are shown in MMW by shading the NHDplusV2 catchments areas, where darker shades indicate higher mean annual loading rates in mass per unit area (e.g., lbs/acre or kg/ha).
The Stream Reach Assessment Tool (SRAT) modeling effort was funded by the William Penn Foundation (WPF) Delaware River Watershed Initiative (DRWI). SRAT is derived from calibrated MapShed model runs of all HUC-12 areas within the Delaware River Basin, downscaling MapShed results to NHDplusV2 catchment scales and routing loads through the NHDplusV2 medium resolution stream network. For more details regarding SRAT, see https://www.streamreachtools.org/overview/.
Many additional SRAT-derived model output data layers can be visualized and analyzed in Model My Watershed, including the visualization of pollutant loading rates and stream concentrations at the NHD catchment and stream segment level. See below for more details.
USGS Subbasin unit (HUC-8)
US Geological Survey Hydrologic Units of the eight-digit level (Hydrologic Unit Code 8), averaging 700 square miles (1,813 square kilometers). Although USGS names the HUC-8 level as “subbasin” scale, these hydrological units are not equivalent to true hydrographic basins or watersheds, because the main river/stream within a given HUC-8 area can often have contributions from additional, upstream HUC-8 areas.
USGS Watershed Unit (HUC-10)
US Geological Survey Hydrologic Units of the ten-digit level (Hydrologic Unit Code 10), averaging 227 square miles (588 square kilometers). Although USGS names the HUC-10 level as “watershed” scale, these hydrological units are not equivalent to true hydrographic basins or watersheds, because the main river/stream within a given HUC-10 area can often have contributions from additional, upstream HUC-10 areas.
USGS Subwatershed Unit (HUC-12)
US Geological Survey Hydrologic Units of the twelve-digit level (Hydrologic Unit Code 12), averaging 40 square miles (104 square kilometers). Although USGS names the HUC-12 level as “subwatershed” scale, these hydrological units are not equivalent to true hydrographic basins or watersheds, because the main river/stream within a given HUC-12 area can often have contributions from additional, upstream HUC-12 areas.
County lines for each state in the continental United States.
Congressional Districts for the United States House of Representatives, for the 113th Congress: 1/3/2013-1/3/2015.
School District boundaries in the continental United States.
Sub-county municipal boundaries for the State of Pennsylvania were developed by various state agencies. The most recent version can be found at
(Currently only available within the Delaware River Basin)
Click on point on map to see Observation Station information, current data, historical data and a link to directly access the primary data source for each station.
Delaware Environmental Observing System (18)
- Delaware Environmental Observing System
- Real-time weather data from Delaware
NOAA Tides and Currents (5)
- NOAA National Ocean Service, Center for Operational Oceanographic Products and Services, https://tidesandcurrents.noaa.gov
- Real-time water monitoring systems along the tidal portion of the Delaware River main stem – at Philadelphia, Marcus Hook, Delaware City, Reedy Point, and Ship John Shoal locations
- Example at Philadelphia: https://tidesandcurrents.noaa.gov/stationhome.html?id=8545240
USGS National Water Information System (214)
- USGS National Water Information System
- Sites with real-time or recent surface-water, groundwater, or water-quality data.
- Example at Chadds Ford: https://waterdata.usgs.gov/nwis/uv/?site_no=01481000
EPA Permitted Point Sources ([# of sites])r
ESRI ArcGIS World Imagery
Satellite with Roads
Google Maps hybrid map type
Google Maps terrain map type
2.6.Additional Data Layers
In addition to the data layers that are visualized on the map, the Model My Watershed (MMW) web app also has access to additional data layers (not visualized) for use by the various modeling functions.
Farm animal populations for an area of interest are estimated from county-level data from USDA, by first calculating an average “animals per farmland acres” for each animal type for each county.
Data source: https://www.nass.usda.gov/Quick_Stats/index.php
Point source discharges of pollutants (primarily large municipal and industrial wastewater treatment plants) can be viewed as a backdrop layer within MMW (under “Observations” as described above). This layer was created using locational information (i.e., latitude/longitude) and discharge information compiled by US EPA. For point sources collected within the Delaware River Basin measures of discharge (effluent) and concentration (nitrogen and phosphorus) were taken directly from state level Discharge Monitoring Reports.
3.Choose Area of Interest (AoI)
A suite of tools (along the top of the map screen) to select areas within the lower 48 United States and begin the modeling process by summarizing land use, hydrologic soil groups, and other statistics. Options include: Select by Boundary, Free Draw, and Delineate Watershed.
3.1.Select by Boundary
Choose a predefined boundary from several boundary types as described above in Section 2.3 Boundaries. First select the boundary type, then use this selection tool enable a “hover over” function to see the name of each bounded area. Once activated, the user can click on their desired area to generate land use and hydrologic soils analysis within the area (among other statistics).
Available boundary types for area selection are a subset of the boundary types viewable in the Layers Selector in the lower left of map. For more information and data sources for these layers, see Section 2.3 Boundaries of this guide.
A tool any user can deploy to draw a polygon and, upon closing the polygon (double click to close), clip land use and hydrologic soil groups (among other statistics) for the area within the polygon.
A single click anywhere on the map will result in a 1 km2 area that will clip land use and hydrologic soil groups (among other statistics) for the area within the polygon.
This tool selects an Area of Interest by automatically delineating a watershed from a point on a stream network using topographic data represented as a digital elevation model (DEM).
Once the user clicks on the map, the tool moves downhill from that point to “snap” onto a second point on the nearest stream flowline. The tool then calculates the watershed upstream of this second point using the Rapid Watershed Delineation algorithms. The methods for moving downhill to the stream and watershed delineation both use a grid of flow directions derived from a digital elevation model (DEM).
The tool returns the delineated watershed area and boundary, which are provided to the Analyze Area of Interest functions. Two DEMs and stream networks are presently available for watershed delineation as listed below.
Continental US medium resolution streams and NHDPlus DEM
Selecting “Snap to Continental US medium resolution streams” moves downhill from the point you click to snap onto the nearest point on the medium resolution National Hydrography Dataset (NHDPlus flowline) and calculates the watershed upstream of this point using the 30 m resolution NHDPlus flow direction grid for the continental US.
If the point you click does not have a NHDPlus flowline downstream of it (e.g. is in an internally draining area) the watershed is calculated from the point you click. To learn more about NHDPlus see http://www.horizon-systems.com/nhdplus/. The Model My Watershed watershed delineation uses NHDPlus version 2.1 data model with latest content version accessed 11/22/16.
Delaware high resolution streams and 1/3 arc sec (10 m) resolution DEM
Selecting “Snap to Delaware high resolution streams” moves downhill from the point you click to snap onto the nearest point on the Delaware high resolution stream network and calculates the watershed upstream of this point using a 1/3 arc second (10) m resolution digital elevation model for the Delaware River Basin obtained from the National Elevation Dataset.
The stream network snapped to was delineated using TauDEM as described above (Delaware River Basin High Resolution Stream Network overlay).
Upload a polygon for your area.
- Must be a shapefile (zip containing shp and prj files) or geojson
- Only the first feature is used
4.Analyze Area of Interest (AoI)
Once an area of interest is selected, Model My Watershed automatically performs geospatial analyses on mapped data layers within the area. Summary statistics are provided in graphs and tables for each of these data layers that impact stormwater runoff and/or water quality:Land.
- For more information on data sources, see Section 2.2 Layers: Coverage Grids.
- For more information on data sources, see Section 2.2 Layers: Coverage Grids.
- For more information on data sources, see Section 2.6 Layers: Additional Data Layers.
- Point Sources.
- For more information on data sources, see Section 2.6 Layers: Additional Data Layers.
- Water Quality (Delaware River Basin Only).
- For more information on data sources, see Section 2.2 Layers: Coverage Grids.
- For more information on data sources, see Section 2.1 Layers: Streams.
- Stream length in agricultural and non-agricultural areas is calculated using an implied riparian width of approximately 30 m, and an implied buffer of approximately 15 m, using the following methodology:
- A stream vector line is rasterized to a 1 pixel string, with pixels the same size as an NLCD pixel (30m). Under the hood, GeoTrellis uses Bresenham’s Line Drawing algorithm to rasterize a line to pixels. The specific GeoTrellis code can be found at https://github.com/locationtech/geotrellis/blob/4a718f2b64e02d2f05f0be6627fd76 ec3b9b8d14/raster/src/main/scala/geotrellis/raster/rasterize/Rasterizer.scala#L281-L328
- This approach therefore assumes an implied riparian width of approximately 30m, and an implied buffer of approximately 15 m.
5.Model Water Quantity & Quality
There are currently two models to choose from to 1) predict how water moves through your Area of Interest and 2) predict the water quality of water running off or your Area of Interest.
5.1.Site Storm Model
The Model My Watershed (MMW) Site Storm Model simulates a single 24-hour storm by applying a hybrid of the Source Loading and Management Model (SLAMM), TR-55, and the simplest of the Food and Agriculture Organization of the United Nations evaporation models for runoff quantity and EPA’s STEP-L model for water quality over the selected Area of Interest within the continental United States.
The results are calculated based on actual land cover data (from the USGS National Land Cover Database 2011, NLCD2011) and actual soil data (from the USDA Gridded Soil Survey Geographic Database, gSSURGO, 2016) for the selected land area of interest. For more information and data sources, see Section 2.2 Coverage Grids.
This model is used to calculate runoff for all “natural” land-use types. All of our TR-55 curve number info is here: https://github.com/WikiWatershed/tr-55/blob/develop/tr55/tables.py
TheSource Loading and Management Model (SLAMM) is used to calculate runoff for urban land-use types. For additional information on SLAMM, see http://dnr.wi.gov/topic/stormwater/standards/slamm.html. All of our SLAMM curve number info is here: https://github.com/WikiWatershed/tr-55/blob/develop/tr55/tables.py
5.2.Watershed Multi-Year Model
The Watershed Multi-Year Model in Model My Watershed (MMW) simulates 30-years of daily water, nutrient and sediment fluxes using the Generalized Watershed Loading Function Enhanced (GWLF-E) model that was developed for the MapShed desktop modeling application by Barry M. Evans, Ph.D., and his group at Penn State University. The GWLF-E model is also one of five watershed models available within EPA’s BASINS multi-purpose modeling application.
Model My Watershed will eventually become the primary framework for running the latest GWLF-E model version, replacing MapShed and BASINS (by 2018?) because these two desktop applications are built on the aging MapWindow GIS package that is no longer supported. For that reason, in late 2014 we ported all GWLF-E code from Visual Basic to Python, with all subsequent code development in this open source repository: https://github.com/WikiWatershed/gwlf-e/. Similarly, all of the MapWindow-based geoprocessing routines have been rewritten to operate with the open-source GeoTrellis geographic data processing engine and framework, with all new code in this repository: https://github.com/WikiWatershed/model-my-watershed.
5.2.1.The GWLF Model
The core watershed multi-year simulation model used in MMW and MapShed (GWLF-E) is an enhanced version of the Generalized Watershed Loading Function (GWLF) model first developed by researchers at Cornell University (Haith and Shoemaker, 1987). The original DOS-compatible version of GWLF was rewritten in Visual Basic by Evans et al. (2002) to facilitate integration with ArcView© and other GIS software packages, and tested extensively in the U.S. and elsewhere. Since 2002 it has been substantially enhanced; see Section 5.2.2 GWLF-Enchancements.
The advantage of GWLF (and GWLF-E) is the ease of use and reliance on input datasets less complex than those required by other watershed-oriented water-quality models such as SWAT, SWMM, and HSPF (Deliman et al., 1999). The model has also been endorsed by the U.S. EPA as a good “mid-level” model that contains algorithms for simulating most of the key mechanisms controlling nutrient and sediment fluxes within a watershed (U.S. EPA, 1999).
The GWLF model provides the ability to simulate runoff, sediment, and nutrient (nitrogen and phosphorus) loads from a watershed given variable-size source areas (e.g., agricultural, forested, and developed land). It also has algorithms for calculating septic system loads, and allows for the inclusion of point source discharge data. It is a continuous simulation model that uses daily time steps for weather data and water balance calculations. Monthly calculations are made for sediment and nutrient loads based on the daily water balance accumulated to monthly values.
GWLF is considered to be a combined distributed/lumped parameter watershed model. For surface loading, it is distributed in the sense that it allows multiple land use/cover scenarios, but each area is assumed to be homogeneous in regard to various “landscape” attributes considered by the model. Additionally, the model does not spatially distribute the source areas, but simply aggregates the loads from each source area into a watershed total; in other words there is no spatial routing. For subsurface loading, the model acts as a lumped parameter model using a water balance approach. No distinctly separate areas are considered for sub-surface flow contributions. Daily water balances are computed for an unsaturated zone as well as a saturated subsurface zone, where infiltration is simply computed as the difference between precipitation and snowmelt minus surface runoff plus evapotranspiration.
With respect to major processes, GWLF simulates surface runoff using the SCS-CN approach with daily weather (temperature and precipitation) inputs from the EPA Center for Exposure Assessment Modeling (CEAM) meteorological data distribution. Erosion and sediment yield are estimated using monthly erosion calculations based on the USLE algorithm (with monthly rainfall-runoff coefficients) and a monthly KLSCP values for each source area (i.e., land cover/soil type combination). A sediment delivery ratio based on watershed size and a transport capacity based on average daily runoff is then applied to the calculated erosion to determine sediment yield for each source area. Surface nutrient losses are determined by applying dissolved N and P coefficients to surface runoff and a sediment coefficient to the yield portion for each agricultural source area.
Point source discharges can also contribute to dissolved losses and are specified in terms of kilograms per month. Manured areas, as well as septic systems, can also be considered. Urban nutrient inputs are all assumed to be solid-phase, and the model uses an exponential accumulation and wash-off function for these loadings. Subsurface losses are calculated using dissolved N and P coefficients for shallow groundwater contributions to stream nutrient loads, and the subsurface submodel only considers a single, lumped-parameter contributing area.
Evapotranspiration is determined using daily weather data and a cover factor dependent upon land use/cover type. Finally, a water balance is performed daily using supplied or computed precipitation, snowmelt, initial unsaturated zone storage, maximum available zone storage, and evapotranspiration values.
It is beyond the scope of this document to provide specific details on the structure and technical components underlying the original GWLF model. For users interested in such details, a copy of the GWLF manual can be found at www.mapshed.psu.edu. Additional details on the updated version of this model (GWLF-E) and the geoprocessing routines used in MapShed (and by extension, MMW) to prepare input data to the model can also be found in the MapShed Users’ Manual also available at this website.
Since its initial incorporation into MapShed (and its precursor, AVGWLF), the GWLF-E model has been substantially enhanced since 2002 to include a number of routines and functions not found in the original GWLF model.
A significant revision in one of the earlier versions of AVGWLF was the inclusion of a streambank erosion routine. This routine is based on an approach often used in the field of geomorphology in which monthly streambank erosion is estimated by first calculating an average watershed-specific Lateral Erosion Rate (LER). After a value for LER has been computed, the total sediment load generated via streambank erosion is then calculated by multiplying the above erosion rate by the total length of streams in the watershed (in meters), the average streambank height (in meters), and an average soil bulk density value (in kg/m3). In Mapshed, these stream bank and erosion rate parameters were optimized for models using the high resolution stream flow line dataset available for Pennsylvania. In Model My Watershed, which uses NHDplus v2 medium resolution flow lines, we use a sediment erosion adjustment factor of 1.4 to make bank erosion estimates in MMW comparable to those in MapShed for Pennsylvania.
In later versions, the original water balance routine within GWLF was extended to simulate water withdrawals from surface and groundwater sources. Within MapShed, information contained in an optional “water extraction” GIS layer can be used to estimate the volume of water taken from various sources within a watershed each month. For surface water withdrawals, the estimated cumulative water volume is subtracted from the simulated “stream flow” component of the monthly water balance calculations. For groundwater withdrawals, this volume is subtracted from the “subsurface” component of the monthly water balance calculations. (Note: this particular routine is not yet implemented in MMW, although the GWLF-E model does allow for “extracted” water to be simulated).
Other recent model revisions include the implementation of an agricultural tile drainage routine, the capability to consider point source effluent (i.e., flows) in the hydrology for a given area, the incorporation of new routines for more direct simulation of loads from farm animals, a new pathogen load estimation routine, and the ability to consider the potential effects of best management practices (BMPs) and other mitigation activities on pollutant loads.
Another significant change has been an improvement in the simulation of hydrology and loads from urban areas. In the original version of GWLF used with AVGWLF, such simulation could only be accomplished for two basic types of urbanized or developed land (i.e., low-density development and high-density development). However, in very intensively developed watersheds, it may be more appropriate to use more complex routines for a wider range of urban landscape conditions. Consequently, additional modeling routines have been included with the version of GWLF used in MapShed and MMW to address this situation. These new functions are based on the RUNQUAL model developed by Haith (1993) at Cornell University. With these routines, runoff volumes are calculated from procedures given in the U.S. Soil Conservation Service’s Technical Release 55 (U.S. Soil Conservation Service, 1986). Contaminant loads are based on exponential accumulation and washoff functions similar to those used in the SWMM (Huber and Dickinson, 1988) and STORM (Hydrologic Engineering Center, 1977) models. The pervious and impervious fractions of each land use type are modeled separately, and runoff and contaminant loads from the various surfaces are calculated daily and aggregated monthly in the model output. With the RUNQUAL-derived routines, it is assumed that the area being simulated is small enough so that travel times are on the order of one day or less. (A copy of the RUNQUAL manual that contains more details about this model can also be found at www.mapshed.psu.edu).
5.2.3.GIS-Based Estimation of Model Input Parameters
Similar to what is done using the desktop version of MapShed, various web-based geoprocessing routines are used to parameterize input data for the GWLF-E watershed model implemented within Model My Watershed (MMW).
Once model parameter values have been estimated, they are subsequently written to a model input file that is then automatically processed by the GWLF-E model to simulate hydrology and pollutant transport for the “area of interest” (typically a watershed) identified by the user. To support the modeling process in MMW, a number of nationally-available data sets are used. Brief descriptions of the key data sets used are provided below, along with a web link that identifies the source of this data.
2011 National Land Cover Dataset
Primarily used to estimate/assign values for curve numbers, various USLE factors, dissolved TN and TP runoff concentrations, impervious surface fractions, and pollutant accumulation rates in urban areas (see http://www.mrlc.gov/nlcd2011.php).
GSSURGO Soils Data
Primarily used to estimate various USLE factors, curve numbers by land use/cover type, available water-holding capacity, soil P content, and dissolved P concentration in runoff (see https://www.nrcs.usda.gov/wps/portal/nrcs/detail/soils/survey/geo/?cid=nrcs142p2_053628).
30-Meter Elevation Data
Primarily used to estimate slope and slope-length by land cover category for use in the USLE soil loss equation and mean watershed slope for use in the streambank erosion equation (see http://eros.usgs.gov/elevation-products).
Discharge Monitoring Report (DMR) Data
This represents a national database on “point source” pollutant discharges available from USEPA. (Note: for the Delaware River Basin this data set was enhanced using data available from various state agencies as well). This dataset was used to assign default values for effluent discharges from point sources within a given watershed (see http://cfpub.epa.gov/dmr/ez_search.cfm)
Estimates of Shallow Groundwater Nitrogen Concentration
Primarily used to set default values for groundwater N concentration for any given watershed. (see https://water.usgs.gov/GIS/dsdl/gwava-s/index.html#gwava-dw ). This dataset was developed by researchers at the U.S. Geological Survey, and a description of the spatial modeling process used is provided by Nolan and Hitt (2006).
County-Level Farm Animal Populations
These data are available from USDA, and were used to estimate farm animal populations weighted by “farmland acres” for any given watershed (see https://www.nass.usda.gov/Quick_Stats/index.php).
USEPA National Climate Data
A database of national-scale daily weather data was previously compiled by USEPA for use in various environmental simulation models. In the case of MMW, these data were used to estimate daily weather data (i.e., precipitation and temperature; compiled for the time period 1960-1990) for use in driving the daily runoff and erosion calculations in the GWLF-E model (see https://www.epa.gov/exposure-assessment-models/meteorological-data).
Estimates of Baseflow
This dataset, prepared by the US Geological Survey, depicts estimates of baseflow on a 1-km grid cell basis for the conterminous United States (see http://water.usgs.gov/lookup/getspatial?bfi48grd ). It was created by interpolating baseflow index (BFI) values from USGS stream gages throughout the country. Baseflow is the component of stream flow that can be attributed to groundwater/shallow subsurface discharge into streams. For use in MMW, this dataset is used to estimate the recession coefficient used by the GWLF-E model. In this case, a regression equation was developed by correlating average BFI values for a number of watersheds across the country against calibrated recession coefficients established by one of the MMW co-developers (B. Evans) for the same watersheds as part of previous studies.
Estimates of Soil Phosphorus Concentration
This dataset is used to estimate the amount of phosphorus attached to eroded soil generated by precipitation events as well as the concentration of dissolved phosphorus in runoff. The national soil P layer used in MMW was created using various geo-referenced sample datasets developed by the U.S Geological Survey (see Smith, D.B., W.F. Cannon, L.G. Woodruff, F. Solano, and K.J. Ellefsen, 2014). (See also https://pubs.usgs.gov/of/2014/1082/ ). Some example national maps produced by USGS from these sample datasets can be viewed and downloaded at https://mrdata.usgs.gov/soilgeochemistry/#/detail/element/15 . Unfortunately, none of the national map layers available at the latter site could be directly used within MMW due to their generalized nature (particularly with respect to data categorization). Consequently, for use within MMW, the original geo-referenced sample datasets in Excel file format were obtained from one of the USGS scientists (Federico Solano), and surface interpolation routines were used to create a number of intermediate spatial datasets representing soil P concentrations for different land cover types and soil depths across the country. These were subsequently processed and combined to create one national map depicting mean soil P values (in units of mg of P/kg of soil) at a 1-km grid cell resolution.
Estimates of Soil Nitrogen Concentration
Estimates of soil nitrogen concentration are needed to calculate the amount of nitrogen attached to eroded soil produced by precipitation events in a given area. Within MMW, these estimates are derived from a national soil nitrogen map produced by research scientists at the Oak Ridge National Laboratory (see Hargrove and Post, 1998). In this latter study, this map was developed from a USDA National Soil Characterization Database linked back to spatial information in a STATSGO soil map using soil taxonomic relationships (see http://geobabble.org/~hnw/esri98). For use in MMW, the national soil nitrogen layer was provided directly by Dr. William Hargrove, one of its’ principal developers.
Learn More About GWLF-E Algorithms
Those interested in learning more about how various equations and algorithms are used to estimate values for various GWLF-E model parameters can find additional details in the MapShed Users Manual available for download at the MapShed web site (www.mapshed.psu.edu).
As described elsewhere in the technical documentation for Model My Watershed (MMW), there are two basic modelling approaches included in this tool. The simpler model provides pollutant load estimates based on literature-based “event mean concentrations” and user-supplied rainfall values. The other more comprehensive “multi-year” model is based on the MapShed desktop software application developed by Dr. Barry Evans and his group at Penn State University (Evans and Corradini, 2016). The MapShed model itself includes a GIS-based front-end for assembling input data for an enhanced version of the GWLF model originally developed by Haith and Shoemaker (1987). This enhanced model (called GWLF-E) is the model upon which the “multi-year” model included in MMW is based.
To provide a preliminary assessment of the accuracy of the multi-year model, a limited amount of calibration was performed using modeled results and observed stream data for 39 test watersheds located in specific geographic regions located around the country. Due to a lack of time initially assigned to this particular task under current MMW funding, the limited calibration was undertaken using stream data and load calculations previously compiled by one of the lead modelers involved in the development of the “multi-year” model included in the Model My Watershed (MMW) application (i.e., Dr. Evans) as part of other projects that he has conducted over the last 15 years or so (e.g., Evans et al., 2002; Evans, 2007; and Evans, 2010).
As part of these earlier projects, daily stream flow and water quality data were obtained from USGS at www.waterdata.usgs.gov/nwis , and daily and/or monthly pollutant loads for calibration periods ranging from about 1990 to 2015 were subsequently computed for each corresponding drainage area using a variety of statistical methods (primarily the FLUX model from the U.S. Army Corps of Engineers [Walker, 1999]). For the purposes of the current assessment, loads previously computed in this fashion were then used as the “observed” loads against which MMW-simulated loads were compared. Figure 1 depicts the distribution of these different test sites across the country, and Table 1 summarizes the size, pollutant loads, and original calibration periods associated with each of these sites.
Using the previously-derived observed load estimates described above, mean annual loads (and loading rates) were computed for each of the test sites and compared against the corresponding estimates from MMW. In this case, MMW-based load estimates were derived using the sub-basin modeling routines that account for in-stream attenuation as nitrogen, phosphorus and sediment loads travel to the outlet of any given drainage area. A primary focus of this particular activity was to “fine-tune” the attenuation factors in order to achieve a “best-fit” between the observed and predicted loads across the 39 test sites.
Table 1. Summary data for calibration test sites.
Model Results and Discussion
As described above, MMW was used to estimate nutrient and sediment loads for each of the calibration test sites. As part of the calibration process, the loads delivered to the outlet of the drainage areas represented by the calibration test sites were calculated and subsequently compared to the observed loads at each outlet. With the new attenuation routine implemented in MMW, nutrient and sediment loads are attenuated (i.e., reduced) as the loads move from upstream NHD catchments to downstream NHD catchments based on the presence (percent) of open water and wetland areas within each intervening catchment down to the drainage area outlet. During the calibration process, the attenuation rates were incrementally adjusted in successive model runs until a “best fit” was achieved across all of the test sites in terms of matching observed and simulated loads. Table 2 shows these loads (expressed as loading rates in kg/ha) for each of the calibration sites.
Figures 2 through 4 graphically show the comparisons between the observed and simulated loads for the calibration points using the mean annual loading rate (in kg/ha) as a standardized unit of measure. As can be seen from these figures, the MMW model simulations provided reasonably good estimates of the total nitrogen and total phosphorus loads on a mean annual basis (i.e., R2 = 0.92 and R2 = 0.84, respectively). In the case of total suspended sediment (TSS) loads, the model results were less accurate (R2 = 0.71).
As can be seen for TN, estimates from MMW were under-predicting loads by about 16% on average. In this case, it is suspected that the under-prediction may be due, in part, to the general unavailability of good data on nitrogen discharges from wastewater treatment plants across the country. In MMW, data from the USEPA is used to estimate nitrogen loads from these sources. However, in many states only ammonia concentrations (which are general a very small fraction of TN) are typically required by regulatory agencies and subsequently reported to EPA.
Additionally, in MMW, county-level data on farm animal populations from USDA are used to estimate animal numbers for any given watershed or area of interest based on an area-weighted basis (i.e., area of agricultural land). It is highly unlikely, though, that farm animal populations are as uniformly distributed as this algorithm implies. In general, the problem of under-estimating TN appears to worsen in watersheds having very large TN loading rates (e.g., higher than about 10 kg/ha). For example, in three of the test sites used in Pennsylvania (i.e., Lehigh River, Chiques Creek, and Schuylkill River), it is known that the farm animal populations and/or TN loads from wastewater discharges are higher than those estimated by MMW based on locally-available data.
Table 2. Comparison of observed and simulated loads for the calibration sites. (NA = Stream data not available)
In the case of TP, MMW-predicted loads were only about 2% higher on average than observed loads. However, as shown by the lower R2 value (0.84), MMW was less accurate in predicting these loads than TN, particularly in cases where lower loading rates occurred (below about 1 kg/ha). As with TN, it may be that inaccurate estimates of wastewater discharges and farm animal populations are adversely influencing TP load estimates from MMW. Since much of the phosphorus load generated within a given watershed is also attached to stream-transported sediment, however, it is also possible that inaccurate estimates of sediment loads (as discussed below) are also adversely affecting these load estimates.
As shown in Figure 4, sediment loads predicted by MMW are about 24% lower than observed loads on average, and the R2 value (0.71) is lower than that for either TN or TP. This is not surprising as in-stream samples of sediment are known to be very problematic, and it could be that some of the inaccuracy in the modeled results comes from the use of imprecisely-calculated “observed” values. It is also likely that prediction errors may be arising due to the more “empirical” nature of the streambank erosion routine in the GWLF-E model in comparison to those used for calculating sediment erosion from upland sources. However, it is hoped that future improvements in this routine that allow for better distribution of streambank-eroded loads on a stream segment basis (rather than the more “uniform” approach used now) will improve these results. In any case, it is believed that the simulation results do capture the relative magnitudes of sediment loads in streams that are relatively “natural” versus those heavily influenced by agriculture and human development reasonably well.
As described earlier, only a limited amount of calibration could be performed due to a lack of funding to accomplish this activity in the original scope of work. However, with future funding, it is anticipated that additional calibration work will be completed. In particular, as implied by the map in Figure 1, additional work needs to be undertaken in other regions of the country that have different weather patterns, landscape conditions, cropping practices, etc. from those reflected by the locations of existing test sites used for this limited calibration in order to provide a higher level of confidence in the pollutant loading estimates produced by Model My Watershed elsewhere across the country. For those so inclined, MMW currently provides the ability to download an input (gms) file generated for any given watershed or area of interest. Once downloaded, this file can be read by the desktop version of the GWLF-E model available at https://wikiwatershed.org/mapshed/software/ and then subsequently edited to support other calibration efforts.
Evans, B.M., D.W. Lehning, K.J. Corradini, G.W. Petersen, E. Nizeyimana, J.M. Hamlett, P.D. Robillard, R.L. Day, 2002. A comprehensive GIS-based modelling approach for predicting nutrient loads in watersheds. J. Spatial Hydrology 2(2).
Evans, B.M., 2007. Summary of Work Undertaken Related to Adaptation of AVGWLF for Use in New England and New York. Final Report to the New England Interstate Water Pollution Control Commission, Penn State Institutes of Energy and the Environment, 116 pp.
Evans, B.M., 2010. Adaptation of the AVGWLF Watershed Model for Use in Texas and Surrounding States: Phase 1. Report to the Texas State Soil and Water Conservation Board, Penn State Institutes of Energy and the Environment, 157 pp.
Evans, B.M. and K.J. Corradini, 2016. MapShed Users Guide (Version 1.5), Penn State Institutes of Energy and the Environment, Penn State University, 140 pp.
Haith, D.A. and L.L. Shoemaker, 1987. Generalized Watershed Loading Functions for Stream Flow Nutrients. Water Resources Bulletin, 23(3), pp. 471-478. https://doi.org/10.1111/j.1752-1688.1987.tb00825.x
Walker, W. W., 1999. Simplified Procedures for Eutrophication Assessment and Prediction: User Manual. Prepared for U.S. Army Corps of Engineers, Instruction Report W-96-2, 239 pp.
6.Framework for Web App
The WikiWatershed web application functions by being built from many frameworks and components executed within an Amazon Web Service based cloud infrastructure, most of which is not visible to the user. Its principal design goals are to allow intensive geoprocessing and spatial modeling for arbitrarily defined geographies and to process variable user loads — all while delivering output at speeds suitable for the web. The entire software stack is open source and available at https://github.com/WikiWatershed. The following is an explanation of what specific technology is used to achieve that goal.
A simplified architectural diagram showing these high level components.
6.2.Computation and Execution
The core functionality of the Model My Watershed web application runs on the following services and frameworks.
Amazon EC2 is the main computation service that provides CPU, memory and I/O (input/output) resources to the application code. The code and its dependencies are compiled into Amazon Machine Images (AMI) which can then be loaded onto EC2 instances and added to a fleet of servers responding to web requests and computing model results. The application decouples various processing roles from each other by isolating logical functionality into their own AMI so that scaling can happen for specific components of the system independently of each other. The main categories of server types are:
- AppServer: handles web requests and initializing modeling jobs
- Worker: handles the asynchronous execution of geoprocessing and modeling tasks
- Tiler: handles requests to generate map tiles from vector based data sources
AWS ElasticLoadBalancer (ELB) and AutoScalingGroups (ASG) are utilized to distribute web traffic to multiple EC2 instances, the number of which can be controlled through an ASG profile, which can increase the capacity of the infrastructure by adding or removing EC2 instances of any particular type.
Celery is an open source distributed task queue. Long-running geoprocessing requests are decomposed into jobs which do partial calculation concurrently across the worker machines, which are then reassembled and returned to the user
Apache Spark is a fast and general engine for large-scale data processing with tight integration with our main geoprocessing tool, GeoTrellis (see description below).
Spark Job Server is a project providing a standard HTTP based interface into a Spark Context, allowing us to submit Scala based Spark jobs from our Python code.
Django Web Framework is a Python WSGI compatible framework that serves the backend API routes, provides an interface into the backend database, and handles our user authentication workflows.
Raster analysis and model data are chunked and stored as RDDs, a Spark data format, on Amazon Simple Storage Service, S3. S3 provides low latency, redundantly distributed object storage with an HTTP interface. Our source code can make use of a spatial indexing system allowing us to read subsets of the raster data to do our analyses and modelling routines.
Rapid Watershed Delineation requires disk access to its raster and vector input, which is stored on a snapshot of an Amazon Elastic Block Store (EBS) volume. This data volume can be attached to running instances of the Worker EC2 type as they come online.
Amazon’s Relational Database Service provides us with a general purpose database, with a PostgreSQL compatible protocol. The PostGIS spatial extension is enabled to allow us to store and query geometry data.
GeoTrellis is an open-source raster-focused geoprocessing engine. It is maintained by Azavea, Inc. but belongs to LocationTech and Eclipse Foundation open source group.
- Provides raster processing at web speed
- Community support: 6,500 commits, 14 releases; 54 contributors
Windshaft is a web-based map server built on top of Mapnik, a popular vector rendering engine. Windshaft is used to convert our vector data sources into styled map images that can be overlaid or selected in the app.
Both the Site Storm Model and Watershed Multi-Year Model have been created as open source Python modules, available for installation from the Python Package Index.
The workflow of doing spatial analysis on both vector and raster data sources, aggregating and aligning the intermediate data input and the actual execution of the model is orchestrated through all of the technologies listed above, often in seconds, to produce the results that are provided to the user.
Deliman, P.N., R.H. Glick, and C.E. Ruiz, 1999. Review of Watershed Water Quality Models. U.S. Army Corps of Engineers, Tech. Rep. W-99-1, 26 pp.
Evans, B.M., D.W. Lehning, K.J. Corradini, G.W. Petersen, E. Nizeyimana, J.M. Hamlett, P.D. Robillard, R.L. Day, 2002. A comprehensive GIS-based modelling approach for predicting nutrient loads in watersheds. J. Spatial Hydrology 2(2), (www.spatialhydrology.com ).
Haith, D.A. and L.L. Shoemaker, 1987. Generalized Watershed Loading Functions for Stream Flow Nutrients. Water Resources Bulletin, 23(3), pp. 471-478.
Haith, D.A., 1993. RUNQUAL: Runoff Quality from Development Sites: Users Manual. Dept. Agricultural and Biol. Engineering, Cornell University, 34 pp.
Huber, W.C. and R.E. Dickinson, 1988. Storm water management model, version 4: User’s manual. Cooperative agreement CR-811607. U.S. Environmental Protection Agency, Athens, GA.
Hydrologic Engineering Center, 1977. Storage, treatment, overflow, runoff model (STORM). U.S. Army Corps of Engineers, Davis, CA.
Nolan, B. T. and K.T. Hitt, 2006. Vulnerability of shallow groundwater and drinking-water wells to nitrate in the United States. Environ. Sci. Technol., Vol. 40, pp. 7834-7840.
Smith, D.B., W.F. Cannon, L.G. Woodruff, F. Solano, and K.J. Ellefsen, 2014. Geochemical and mineralogical maps for soils of the conterminous United States. U.S. Geological Survey, Open-File Report 2014-1082, 386 pp.
Tarboton, D. G. and D. P. Ames, (2001),”Advances in the mapping of flow networks from digital elevation data,” in World Water and Environmental Resources Congress, Orlando, Florida, May 20-24, ASCE, http://hydrology.usu.edu/dtarb/asce2001.pdf
Tarboton, D. G., R. L. Bras and I. Rodriguez-Iturbe, (1992), “A Physical Basis for Drainage Density,” Geomorphology, 5(1/2): 59-76, http://dx.doi.org/10.1016/0169-555X(92)90058-V.
Tarboton, D. G., R. L. Bras and I. Rodriguez-Iturbe, (1991), “On the Extraction of Channel Networks from Digital Elevation Data,” Hydrologic Processes, 5(1): 81-100, http://dx.doi.org/10.1002/hyp.3360050107.
U.S. Environmental Protection Agency, 1999. Protocols for developing nutrient TMDLs. EPA 841-B-99-007. Office of Water (4503 F), Washington, D.C.
U.S. Soil Conservation Service, 1986. Urban hydrology for small watersheds. Technical Release NO. 55 (2nd edition). U.S. Department of Agriculture, Washington, DC.
WikiWatershed is an initiative of Stroud™ Water Research Center. The Stroud Center seeks to advance knowledge and stewardship of freshwater systems through global research, education, and watershed restoration.
- Anthony Aufdenkampe, Ph.D., LimnoTech
- David Arscott, Ph.D., Stroud Water Research Center
- Barry Evans, Ph.D., Penn State Institutes of Energy and the Environment
- David Tarboton, Ph.D., Utah State University
- Matt McFarland, Azavea
- Steven Kerlin, Ph.D., Stroud Water Research Center
- Sara Geleskie Damiano, Stroud Water Research Center
- Academy of Natural Sciences of Drexel University
- Concord Consortium
- Millersville University
- Penn State Institutes of Energy and the Environment
- University of Washington
- Utah State University
See the About page for more information about these organizations and individuals.
9.Send Us Feedback
Please help us improve this guide. You can leave feedback about individual sections (look for the “Was this helpful? Yes or No” text). If your answer is no, or if you see something that needs to be changed, please use the “Suggest an edit” link and fill out a quick form.
Additional guides can be accessed on the Documentation page. If you would like to submit a question to the scientists and educators developing WikiWatershed, please use the contact form on the Help page.