Exploring Ponkapoag Pond with Python GIS: Part I

One exciting thing about open-source coding for me is when I can use it as a Geographic Information System (GIS). Advantages are that I can thoroughly document my workflow, and the GIS capabilities are both highly-advanced and free! In this post I am focusing on elevation data with a high spatial resolution, a characteristic that allows visualization of extremely detailed terrain and brings its own set of challenges.

This is the beginning of a GIS/remote sensing task with the end goal of mapping a large, ecologically important bog at Ponkapoag Pond in the wonderful Blue Hills Reservation just outside of Boston, MA. Being relatively new to Python, I’m going into detail that experienced Python coders probably don’t need and can skip over, but I hope may be helpful to others!

I. Loading the modules

I used the GIS capabilites for raster data available in Python through rasterio, along with numpy for convenient processing of rasters as numeric arrays and matplotlib for visualizing them.

import rasterio
import numpy as np
import matplotlib.pyplot as plt
from rasterio.merge import merge
from rasterio.plot import show

II. Loading and examining the elevation data

I downloaded two bare-earth Digital Surface Model (DSM) rasters from OLIVER (MassGIS Online Mapping Tool) that just cover the pond and immediate surroundings. These DSMs were processed from 2010-15 LiDAR (Light Detection and Ranging) data covering the entire state (see here).

The LiDAR data are huge, being over 1,000 GB even for a small state like Massachusetts! The DSMs I dowloaded were 1-m horizontal resolution GeoTIFFs, and at only ~1.5 x 1.5 km still ~9 MB each.

src1 = rasterio.open('C:/temp/files/19_03264672.tif')
src2 = rasterio.open('C:/temp/files/19_03274672.tif')

After loading the elevation data, I checked their metadata for consistency (one example below). This also provided me with information important for working with them, such as their having one band (count: 1), no missing data (nodata: None), the data type (dtype: float32), and the projected coordinate system (epsg: 26919; see here for the details on this Universal Transverse Mercator projection).

src1.meta()

{'driver': 'GTiff', 
 'dtype': 'float32', 
 'nodata': None, 
 'width': 1500, 
 'height': 1500, 
 'count': 1, 
 'crs': CRS({'init': 'epsg:26919'}), 
 'transform': (325500.0, 1.0, 0.0, 4674000.0, 0.0, -1.0), 
 'affine': Affine(1.0, 0.0, 325500.0, 
        0.0, -1.0, 4674000.0)}

I then used matplotlib to plot them side by side.

fig, (axr, axg) = plt.subplots(1,2, figsize=(8,4))
show(src1, ax=axr, title='src1')
show(src2, ax=axg, title='src2')
plt.show()

Whoops, something wrong here. While I can see them, they don’t look much like elevation data! I checked the range of elevation values by reading in the images as arrays.

src1_read = src1.read(1)
src2_read = src2.read(1)
print('src1 min =', src1_read.min(),'src1 max =', src1_read.max()) 
print('src2 min =', src2_read.min(),'src2 max =', src2_read.max())

src1 min = -3.4028235e+38 src1 max = 70.23359
src2 min = -3.4028235e+38 src2 max = 82.24581

Now it’s becoming clearer. Based on this very negative minimum elevation the metadata are not correct. To see the number of digits after the decimal point for this float32 number that is required to correctly update the metadata (16; which is another story¹), one can add '%.16e' when printing it.

print('%.16e' % src1_read.min())
print('%.16e' % src2_read.min())

-3.4028234663852886e+38
-3.4028234663852886e+38

Updating the metadata was then a simple process of manipulating the files directly (the 'r+' option allows read/write access). The Python with as structure is a handy way of opening a file, doing something, then closing it.

with rasterio.open('C:/temp/files/19_03264672.tif', 'r+') as dataset: 
    dataset.nodata = -3.4028234663852886e+38

with rasterio.open('C:/temp/files/19_03274672.tif', 'r+') as dataset:
    dataset.nodata = -3.4028234663852886e+38

Now to reopen the files and try the display again (this time also applying a nice terrain colormap).

src1 = rasterio.open('C:/temp/files/19_03264672.tif')
src2 = rasterio.open('C:/temp/files/19_03274672.tif')

fig, (axr, axg) = plt.subplots(1,2, figsize=(8,4))
show(src1, ax=axr, title='src1', cmap='terrain')
show(src2, ax=axg, title='src2', cmap='terrain')
plt.show()

That’s more like it! Checking the stats again, with the no data values excluded (masked=TRUE), now shows a reasonable range of elevations.

src1_read = src1.read(1, masked=True)
src2_read = src2.read(1, masked=True)

print('src1 min =', src1_read.min(),'src1 max =', src1_read.max()) 
print('src2 min =', src2_read.min(),'src2 max =', src2_read.max())

src1 min = 44.54896 src1 max = 70.23359
src2 min = 45.79777 src2 max = 82.24581

III. Merge

With the metadata corrected to handle the nodata values, I can properly merge the two GeoTIFFs. In the procedure below, the brackets [] are a convenient way to create a collection of source files to merge. The merge() function creates two elements, the first being the inputs merged into a single array (assigned to mosaic), and the second being coordinate system information (assigned to out_trans, used below in metadata creation).

srcs_to_mosaic = [src1, src2]
mosaic, out_trans = merge(srcs_to_mosaic)

Before moving on to exporting the merged data as a new GeoTIFF, I used numpy to create an array with the nodata values masked out for matplotlib display.

mosaic_masked = np.ma.masked_array(mosaic, mosaic == -3.4028234663852886e+38)

Just to see what a masked array looks like here is an abbreviated representation of it. The -- correspond to the no data values in the right top rows of the array.

mosaic_masked

masked_array(
  data=[[[66.03388214111328, 65.95572662353516, 66.01012420654297, ...,
          --, --, --],
         [66.01231384277344, 65.97007751464844, 65.99173736572266, ...,
          --, --, --],
         [65.99140930175781, 65.97484588623047, 66.0059814453125, ...,
          --, --, --],
         ...,
         [49.2547721862793, 49.16815948486328, 49.02363204956055, ...,
          72.78544616699219, 72.83187103271484, 72.86965942382812],
         [49.296844482421875, 49.22859191894531, 49.12592315673828, ...,
          72.798828125, 72.82592010498047, 72.88390350341797],
         [49.398563385009766, 49.25251388549805, 49.220428466796875,
          ..., 72.8497085571289, 72.8565444946289, 72.90929412841797]]],
  mask=[[[False, False, False, ...,  True,  True,  True],
         [False, False, False, ...,  True,  True,  True],
         [False, False, False, ...,  True,  True,  True],
         ...,
         [False, False, False, ..., False, False, False],
         [False, False, False, ..., False, False, False],
         [False, False, False, ..., False, False, False]]],
  fill_value=1e+20,
  dtype=float32)

matplotlib provides a lot of useful customization for plotting, for example here I added an elevation legend horizontally below the merged data. The .squeeze() function applied to the masked array prevents a TypeError: Invalid dimensions for image data error.

fig, ax = plt.subplots(figsize=(8,6))
im = ax.imshow(mosaic_masked.squeeze(), cmap='terrain')
# ax.set_axis_off() # if the axis ticks and numbers are not desired
cb = fig.colorbar(im, orientation='horizontal', pad=0.15)
cb.set_label('elevation (m)', fontsize= 14)
plt.show()

IV. Save the merge as an image

To save the merged data as a new image file, I first copied the metadata from one of the sources and updated the relevant parameters using the merged array information.

out_meta = src1.meta.copy()

out_meta.update({'height': mosaic.shape[1],
                  'width': mosaic.shape[2],
                  'transform': out_trans})

out_meta

{'driver': 'GTiff',
 'dtype': 'float32',
 'nodata': -3.4028234663852886e+38,
 'width': 3000,
 'height': 1500,
 'count': 1,
 'crs': CRS({'init': 'epsg:26919'}),
 'transform': Affine(1.0, 0.0, 325500.0,
        0.0, -1.0, 4674000.0),
 'affine': Affine(1.0, 0.0, 325500.0,
        0.0, -1.0, 4674000.0)}

Then, I wrote the data to the new file with the metadata added. The ** adds the metadata using keyword arguments (often termed “kwargs”; see more info on them and “args” here).

out_fp = 'C:/temp/files/mosaic.tif'

with rasterio.open(out_fp, 'w', **out_meta) as dest:
     dest.write(mosaic)

Now to clean up a bit and take a peek at the new mosaic!

src1.close()
src2.close()

src3 = rasterio.open('C:/temp/files/mosaic.tif')
show(src3, title='saved GeoTIFF', cmap='terrain');

V. Visualization and wrap-up

Zooming-in to an area of interest (AOI) can easily be done in GIS software (e.g., QGIS), but it can also be done with matplotlib. The first step is to get the bounding box of the full image in the units of its projection (meters). While the following bounds function does provide this, it is not in the order that matplotlib needs (left, right, bottom, top).

src3.bounds

BoundingBox(left=325500.0, bottom=4672500.0, right=328500.0, top=4674000.0)

Happily, there is a way to easily get it through rasterio (see here).

spatial_extent = rasterio.plot.plotting_extent(src3)
spatial_extent

(325500.0, 328500.0, 4672500.0, 4674000.0)

Then by experimentation (or by looking around in QGIS first, like I did!) to get the desired bounds of the zoom, it is applied after the full image spatial extent parameter is set. The x and y axes are now showing the projected coordinates.

zoomed_extent = (325500.0, 326250.0, 4673500.0, 4674000.0)

fig, ax = plt.subplots(figsize=(6,6))
im = ax.imshow(mosaic_masked.squeeze(), cmap='terrain', extent=spatial_extent)
ax.set_xlim(zoomed_extent[:2]) # selects the left and right bounds
ax.set_ylim(zoomed_extent[2:4]) # selects the bottom and top bounds
cb = fig.colorbar(im, orientation='horizontal', pad=0.1)
cb.set_label('elevation (m)', fontsize= 14)
plt.show()

While zooming in helps to show more detail, for example here in looking at the northwest corner of the bog, there is a useful technique for creating a 3D-like representation of the surface called a hillshade (or shaded-relief) to show even more.

To create the hillshade from the mosaic I used a GDAL (Geospatial Data Abstraction Library; https://www.gdal.org/) function in a subprocess². There are a variety of parameters to set in addition to the input and output, but key ones for manipulating how it displays are the azimuth (-az) and altitude (-alt) of the hypothetical sun, which I specified to be in the northwest in the mid-afternoon.

import subprocess 

subprocess.call(['gdaldem', 'hillshade', 'C:/temp/files/mosaic.tif', \
     'C:/temp/files/mosaic_hs.tif', '-z', '1.0', '-s', '1', \
     '-az', '315.0', '-alt', '45.0', '-of', 'GTiff']);

src3.close()
src4 = rasterio.open('C:/temp/files/mosaic_hs.tif')
src4_read = src4.read(1, masked=True)

Finally, I wanted to put this all together by showing the zoomed-in hillshade next to a recent aerial photo to illustrate the tremendous amount of information the LiDAR-derived elevation data can convey.

I downloaded a leaf-on 2016 National Agriculture Imagery Program (NAIP) aerial photo using The National Map viewer (https://viewer.nationalmap.gov/basic/). Reading in the red, green, and blue bands separately allowed me to display a composite RGB image with matplotlib.

src5 = rasterio.open('C:/temp/files/naip2016_clip.tif')

src5_read_r = src5.read([1], masked=True)
src5_read_g = src5.read([2], masked=True)
src5_read_b = src5.read([3], masked=True)

It was then just a matter of getting the extent of this aerial photo too, and applying a new zoomed-in extent to both it and the hillshade image. The numpy dstack function creates an RGB composite.

spatial_extent_a = rasterio.plot.plotting_extent(src5) 

# another AOI I picked to zoom to
zoomed_extent2 = (325560.0, 326463.0, 4673049.0, 4673613.0)

fig, (axr, axg) = plt.subplots(1,2, figsize=(16,16))

axr.imshow(src4_read, cmap='Greys', extent=spatial_extent)
axr.set_axis_off()
axr.set_xlim(zoomed_extent2[:2])
axr.set_ylim(zoomed_extent2[2:4])

axg.imshow(np.dstack((src5_read_r.squeeze(), 
                          src5_read_g.squeeze(), 
                          src5_read_b.squeeze())), extent=spatial_extent_a)
axg.set_axis_off()
axg.set_xlim(zoomed_extent2[:2])
axg.set_ylim(zoomed_extent2[2:4])
fig.tight_layout()
plt.show()

The level of terrain detail in the hillshade is really amazing, with features such as the greens and sand traps on the golf course and the bog shoreline clearly shown. As a step toward modeling and mapping the bog habitat, if you look closely there is also some interesting variation in the bog’s hillshade texture corresponding to what is visible in the aerial photo that I’ll be focusing on next. Stay tuned!

¹ Why 16 digits following the decimal? It may simply have been generated that way during raster production. Here is a link that notes this exact number was obtained from a raster in ESRI binary grid format: https://github.com/OSGeo/gdal/issues/1071. There are other ways I could have dealt with this ‘no data’ issue, for example by making a numpy array and using calculations (e.g., mask all values <0), but editing the metadata seemed more efficient to me.

² As best I can understand at this point in my Python experience, key advantages of using a subprocess include not spawning a shell and accurate return codes directly from the program; see the accepted answer here in Stack Overflow: https://stackoverflow.com/questions/44730935/advantages-of-subprocess-over-os-system.

Jim Sheehan

Ecologist, GIS and remote sensing analyst, developing data scientist, and founder of Natural Data Solutions.

Exploring Ponkapoag Pond with Python GIS: Part I – Raster Data

Published by Jim Sheehan on February 2, 2019February 2, 2019

I. Loading the modules

II. Loading and examining the elevation data

III. Merge

IV. Save the merge as an image

V. Visualization and wrap-up

Jim Sheehan

0 Comments

Leave a Reply Cancel reply

Scraping and Visualizing the GISD with R

Exploring Ponkapoag Pond with Python GIS: Part I – Raster Data

Published by Jim Sheehan on February 2, 2019February 2, 2019

I. Loading the modules

II. Loading and examining the elevation data

III. Merge

IV. Save the merge as an image

V. Visualization and wrap-up

Jim Sheehan

0 Comments

Leave a Reply Cancel reply

Related Posts

Scraping and Visualizing the GISD with R