data

Where is That Facility?

By Casey J. McLaughlin

Regulating pollutants requires knowing about the pollutants but also where they are.  When President Nixon established the Environmental Protection Agency in 1970, he took parts of several agencies and combined them into a new and expanded agency tasked with regulating environmental pollutants. One might think after 40 years that we would be a well-oiled machine, but it is more complicated than that.  Self-reporting, split jurisdictions between federal and state, and functions driven by different legislation, mandates and funding sources have combined to cause some confusion. Without going into the why, regulated facilities have multiple hands touching multiple data multiple times in different ways for different purposes.  The Facility Registry System (FRS) integrates data across the programs into one source of comprehensive environmental information about federally regulated facilities.

The facility registry program (FRS) is the geospatial component of the Envirofacts database.  Envirofacts integrates a variety of facility information from multiple databases into one searchable place.  In other words, Envirofacts is a single point of access to search U.S. EPA environmental data.  With those two systems introduced, let us look at some different questions and how FRS helps.

Question 1: How many facilities are near my house?

I believe this is one of the main questions Envirofacts and FRS answer.  The web search interface groups the facilities together and users can see how many total sites are nearby.  The interface links users with various reports about inspections, compliance notes and more.  Very useful.

Question 2: Suppose the Arch in St. Louis has fallen into the river, what facilities would be affected?

I would immediately open my mapping software/Geographic Information System (GIS) and select the 3 points adjacent to the arch (2 blue dots on the shore and a third in the river and a little north of the arch).   While it looks like 3 results, the identify returns 5!  The Jefferson National Expanse has 3 points that are on top of each other.  Spatially they are the same, but three different programs track the facility.  Still, this is very useful information because it focuses the response by directing responders to specific names and locations.  They know what they should be looking for. During our response to floods in the summer of 2011, we strategically sent people into facilities identified as “at-risk” based on their proximity to flood prone areas and at-risk levees.  Find out more at: http://www.epaosc.org/site/site_profile.aspx?site_id=7103

Question 3:  Where are the smokestacks for facility x?

Ah, here comes the rub with FRS (one of).  Envirofacts provides one location per facility and does not contain “sub-facility” information!  This data does exist, but getting it isn’t straight-forward and isn’t part of the core FRS dataset yet.

The Facility Registry System is a rich source of information.  Like just about every database, it is imperfect and doesn’t answer every question.  It has been a major undertaking modernizing a bunch of databases developed long ago by many different people with many different purposes.  Envirofacts gets the best data the Agency has into a useful format for internal and external users.  The process is in progress and sees frequent improvements. (YAY!)

Casey McLaughlin is a first generation Geospatial Enthusiast who has worked with EPA since 2003 as a contractor and now as the Regional GIS Lead. He currently holds the rank of #1 GISer in EPA Region 7′s Environmental Services Division.

Editor's Note: The opinions expressed herein are those of the author alone. EPA does not verify the accuracy or science of the contents of the blog, nor does EPA endorse the opinions or positions expressed. You may share this post. However, please do not change the title or the content. If you do make changes, please do not attribute the edited title or content to EPA or the author.

EPA's official web site is www.epa.gov. Some links on this page may redirect users from the EPA website to a non-EPA, third-party site. In doing so, EPA is directing you only to the specific content referenced at the time of publication, not to any other content that may appear on the same webpage or elsewhere on the third-party site, or be added at a later date.

EPA is providing this link for informational purposes only. EPA cannot attest to the accuracy of non-EPA information provided by any third-party sites or any other linked site. EPA does not endorse any non-government websites, companies, internet applications or any policies or information expressed therein.

Getting a Good Spatial Location

By Casey McLaughlin

How can we get an accurate spatial location for mapping property sampling?  Is an address enough? Is collecting a latitude/longitude location enough?  EPA samples properties for a variety of reasons but for the sake of this post, let’s assume we are sampling several properties adjacent to a recent chemical fire.  Samplers get property access by knocking on doors and having owners sign an access agreement.  After completing their work, a sampler will have a field sheet, an address, and often times a latitude and longitude (from GPS).  Field work done!

The job is not quite finished, however, until we turn that information into actionable intelligence – information we can use!  For me, the best way to use the field information is to first see it displayed in a map.  Others may see this data in other formats, but a map is a great tool for understanding the situation.  Does a map show us that the location data collected in the field is good enough?  Let’s look at this example to find out:

Figure 1

Figure 1 highlights three pieces of information that we like having: Parcel Lines, GPS locations, and the final points.  The parcel lines can be a great source of information and look great for making maps, but are not always available for all projects.  I prefer starting a project with parcels and narrowing the relevant area down, but that is not usually possible.  What I often get from field staff are either addresses or latitude/longitude coordinates.  In my opinion, both have their issues as shown by the blue dots on the map. Look at blue points B and blue C.  The GPS location could have been taken from the end of the driveway where there are fewer trees, but on the map I might find it difficult knowing which polygon is B and which is C.  The white B and C points clearly illustrate their relationship to each residential dwelling.

Figure 2

Take note of how zooming the map out (figure 2) will quickly cause overlapping points.  Also note that only three samples were taken (perhaps not everyone was home), therefore it would be easy to overlook the other properties in the cul-de-sac that may need sampling.  Matching points with an authoritative properties dataset can easily show us which properties may still need sampling.

There are a number of mechanisms for determining spatial locations, including GPS, mobile phones, surveying, and addresses. Each can be used, but choosing the right method should be determined by considering both the immediate field needs and long-term project needs such as creating general site maps, point maps, area maps, or public information maps.

Casey McLaughlin is a first generation Geospatial Enthusiast who has worked with EPA since 2003 as a contractor and now is the Regional GIS Lead. He currently holds the rank of #1 GISer in the EPA Region 7′s Environmental Services Division.

Editor's Note: The opinions expressed herein are those of the author alone. EPA does not verify the accuracy or science of the contents of the blog, nor does EPA endorse the opinions or positions expressed. You may share this post. However, please do not change the title or the content. If you do make changes, please do not attribute the edited title or content to EPA or the author.

EPA's official web site is www.epa.gov. Some links on this page may redirect users from the EPA website to a non-EPA, third-party site. In doing so, EPA is directing you only to the specific content referenced at the time of publication, not to any other content that may appear on the same webpage or elsewhere on the third-party site, or be added at a later date.

EPA is providing this link for informational purposes only. EPA cannot attest to the accuracy of non-EPA information provided by any third-party sites or any other linked site. EPA does not endorse any non-government websites, companies, internet applications or any policies or information expressed therein.

Keep in Touch with Your Past

By Casey McLaughlin

What happens to collected data?  Every day, the world is stockpiling vast amounts of data and often this data has a spatial component tying it to a specific location on the earth.   For environmental data, the spatial location is a fundamental component.  Geospatial defines the context of a single piece of data and the relationship it has with other nearby data. The general growth in data may seem obvious in our increasingly technical world (smart phones are data making machines).  We are very concentrated on collecting data about where we are right now (real time sensors, crowd sourcing, traffic counts).  Historical data can and should place newer data in context.

EPA, formed in the early 1970s has collected and maintained a mountain of environmental data.  I am familiar with our region’s cleanup program records; along with tabular data (sampling dates, results, QA/QC information) we also collect spatial data in the form of latitude/longitudes, addresses, aerial surveys, sampling plans, excavation maps.  Unfortunately, keeping data is not simple.  Technology has continuously improved.

Wikipedia Cassette Tape Image

Tell me about this music? Who recorded it? Where is my favorite song?

Data collection techniques and sensors have advanced.  Systems have changed.  Data and technology have evolved so that retrieving data from an old database technology may not be possible.  How can I get data from a 5 ¼” floppy disk?  Think about having a stack of old cassette tapes.  How would you get songs into your digital library (easily)?  Imagine the label has worn off 100 cassettes – how would you know where a specific song was located on just one of those tapes?

Old data technology is just one of many complications to using historical and modern data.  Over the last 15 years, EPA has put forth a tremendous effort in digitizing reports.  Certainly electronic record-keeping is superior to paper records????  Don’t get me wrong, electronic records are crucial — hello, I’m an information specialist! Not all electronic formats, however, are appropriate for data analysis.  Can you run a correlation between the data in an archived (paper or scanned) report with new data?  What about using a FoxPro database stored on a 3 ½” diskette (this happened to me a few years ago and it took me awhile to FIND a machine with a disk drive).  How can managers compare post-excavation sampling locations (often represented on multiple versions of a map) with a proposed residential development?  What do you do when you encounter an over-sized map which is now in the electronic record as an “Unscanned Item”?

Using data from the past with data from the future is not trivial and geospatial data has some special considerations beyond technology.  Resolution of aerial imagery may be coarse, but without a time machine, we don’t have the option of re-acquiring imagery from 1950 (talk to me later about a T.A.R.D.I.S.)!  Geo-referencing old photos or CAD drawings can be problematic because of projection complications (oh, what a GIS topic!) and lack of reproducible local control points.  Avoiding the gruesome details, properly dealing with old geospatial data requires some thought and expertise.

Keeping data is not the same as being able to use data in the future.

Most every EPA project report has an element of spatial data.   Having a map electronically may not be good enough for using it with new data in the future.  Properly acquiring spatial data helps make informed decisions now.  Properly maintaining and storing spatial data will help make future decisions better.

Casey McLaughlin is a first generation Geospatial Enthusiast who has worked with EPA since 2003 as a contractor and now as the Regional GIS Lead. He currently holds the rank of #1 GISer in EPA Region 7′s Environmental Services Division.

Editor's Note: The opinions expressed herein are those of the author alone. EPA does not verify the accuracy or science of the contents of the blog, nor does EPA endorse the opinions or positions expressed. You may share this post. However, please do not change the title or the content. If you do make changes, please do not attribute the edited title or content to EPA or the author.

EPA's official web site is www.epa.gov. Some links on this page may redirect users from the EPA website to a non-EPA, third-party site. In doing so, EPA is directing you only to the specific content referenced at the time of publication, not to any other content that may appear on the same webpage or elsewhere on the third-party site, or be added at a later date.

EPA is providing this link for informational purposes only. EPA cannot attest to the accuracy of non-EPA information provided by any third-party sites or any other linked site. EPA does not endorse any non-government websites, companies, internet applications or any policies or information expressed therein.

Healthy Waters…there could be an app for that!

By Christina Catanese
Find out how to submit your App for the Environment!Are you a mobile apps developer? Do you know one? Well now is your chance to show us what you can do with EPA data on a mobile device!

EPA’s Apps for the Environment Challenge is a contest that puts your tech-savvy to the test.  EPA challenges you to find new ways to combine and deliver environmental data in a mobile app.  You can use EPA data by itself, or combine it with other environmental and health data to make a useful resource for individuals or communities.  Besides addressing one of EPA Administrator Lisa Jackson’s Seven Priorities, the only limit to what you can create is your own imagination!  You have until September 16, 2011 to submit your application, and you can get all the details here.

Not a coder but think you might have the next big idea for an environmental app?  There’s a place for your input.  Visit EPA’s Data and Developer Forum to submit your idea for an app, as well as submit comments or questions about EPA’s existing apps, data resources, and data sets.  The brainstorming has already started, so check out the ideas for apps that others have had to get inspired!

We’d like to challenge you one step further and encourage you to come up with an app that uses water data about the Mid Atlantic region.  That’s right, we’re talking about a Healthy Waters App! There are lots of places to find data about the waters of our region.  EPA and state websites have loads of interesting data that includes water quality monitoring and assessment, Total Maximum Daily Loads (TMDLs), permitted facilities, non-point source projects, drinking water sources and facilities, beach sampling, and clean water grants….just to name a few!  You might also find interesting water data from other federal agencies, like the USGS, Forest Service, National Park Service, or CDC.  What other sources of water data can you think of?

We’d love to hear from you in our own comments section… how would your Healthy Waters App use Mid Atlantic data?  I’m no computer scientist, but if I could make a Healthy Waters App, I think I would make one where I could type in my address (or let my cell phone GPS determine my location) and have it tell me where my drinking water comes from, any consumer confidence reports the facility has issued, what watershed I’m currently in, any impairments nearby waterbodies have, all shown on a map of course.  Or maybe I would want it to tell me where the nearest EPA-funded water project is.  Or maybe I would want to have mobile beach advisory alerts, so I knew when and where it was safe to go for a swim.  Or maybe…

Well that’s enough from me!  Tell us about the Healthy Waters app that you would make, and get cracking on your code to submit your app to the challenge!

About the Author: Christina Catanese has worked at EPA since 2010, and her work focuses on data analysis and management, GIS mapping and tools, communications, and other tasks that support the work of Regional water programs. Originally from Pittsburgh, Christina has lived in Philadelphia since attending the University of Pennsylvania, where she earned a B.A. in Environmental Studies and Political Science and an M.S. in Applied Geosciences with a Hydrogeology concentration. Trained in dance (ballet, modern, and other styles) from a young age, Christina continues to perform, choreograph and teach in the Philadelphia area.

Editor's Note: The opinions expressed herein are those of the author alone. EPA does not verify the accuracy or science of the contents of the blog, nor does EPA endorse the opinions or positions expressed. You may share this post. However, please do not change the title or the content. If you do make changes, please do not attribute the edited title or content to EPA or the author.

EPA's official web site is www.epa.gov. Some links on this page may redirect users from the EPA website to a non-EPA, third-party site. In doing so, EPA is directing you only to the specific content referenced at the time of publication, not to any other content that may appear on the same webpage or elsewhere on the third-party site, or be added at a later date.

EPA is providing this link for informational purposes only. EPA cannot attest to the accuracy of non-EPA information provided by any third-party sites or any other linked site. EPA does not endorse any non-government websites, companies, internet applications or any policies or information expressed therein.

Science Wednesday: BP Oil Spill Data Tools – Part II

Each week we write about the science behind environmental protection. Previous Science Wednesdays.

For the past eight weeks I’ve had the privilege of being involved in a small slice of EPA’s coordinated response to the tragedy of the BP oil spill. Spending time in the Public Information Officers (PIO) room of the Emergency Operations Center (EOC) here in Washington DC, has only furthered my resolve that this is an Agency where people truly live the mission of protecting public health and the environment. Part of that dedication is a commitment to sharing the information and environmental data we have on the EPA’s BP spill website.

Since oil began pouring into the Gulf of Mexico, EPA has collected thousands of samples for chemicals related to oil and dispersants in the air, water and sediment. Jeffrey Levy’s blog post last week mentioned how the principles of open government and transparency govern our actions here as we post the EPA’s  air, water, and sediment sampling and air monitoring data as quickly as possible.

On the website, we’ve focused on providing data as well as presenting EPA’s interpretation of it. Up until now, one way we’ve been providing the data is in chunks in .CSV files (a generic file that any spreadsheet program can read) or in a PDF spreadsheet – that’s pretty good but we can do better. So we’re pretty excited to be offering a few new tools that offer increased flexibility and options for people to access the data. Last week, Jeffrey mentioned Socrata and Google Earth, and today we’re announcing a new tool that gives you the ability to download data based upon criteria you select. You can download data based upon the date range you wish, whether you want to see air monitoring data or data from sampling efforts (from which you can select: air, sediment, surface water, waste or oil sample results from mousse, oily debris, tar, and weathered oil) and for all the states in which we’re gathering data (Alabama, Florida, Louisiana or Mississippi) or just one of these states.

Download-tool

This is the first version and we’ll be adding features to the data download tool, such as searching by chemical, chemical category or searching by county in the coming weeks. We will be phasing out posting of the spreadsheets, but we believe that putting you in the driver’s seat for how to sort and organize the data is a better way to share this data. We welcome your ideas for future versions and encourage you to visit the sampling and monitoring data download tool, try it out and share your feedback on ways we can improve the sampling and monitoring data download tool. We’ll work to incorporate as many of the suggestions as we can – so we’re hoping to see an active and constructive discussion in the comment section below so we can improve this tool together.

About the author: When not serving in the Emergency Operations Center, Melissa Anley-Mills is the news director for EPA’s Office of Research and Development. She joined the Agency in 1998 as a National Urban Fellow.

Editor's Note: The opinions expressed herein are those of the author alone. EPA does not verify the accuracy or science of the contents of the blog, nor does EPA endorse the opinions or positions expressed. You may share this post. However, please do not change the title or the content. If you do make changes, please do not attribute the edited title or content to EPA or the author.

EPA's official web site is www.epa.gov. Some links on this page may redirect users from the EPA website to a non-EPA, third-party site. In doing so, EPA is directing you only to the specific content referenced at the time of publication, not to any other content that may appear on the same webpage or elsewhere on the third-party site, or be added at a later date.

EPA is providing this link for informational purposes only. EPA cannot attest to the accuracy of non-EPA information provided by any third-party sites or any other linked site. EPA does not endorse any non-government websites, companies, internet applications or any policies or information expressed therein.

Science Wednesday: OnAir – Huge Datasets Pose Challenges but Hold Promise

During a recent visit to Harvard, I sat down with Francesca Dominici, a biostatistician and former director of the Johns Hopkins Particulate Matter Research Center.

Dominici confessed that she has spent much of her time at Harvard thus far figuring out how to transfer, store and manage all of the data that has accumulated over years of research.

How hard could it be to move data, I wondered?

Her projects at Hopkins included a national study showing hospital admissions and mortality associated with exposure to air pollution particles.

“We’re using all data on particulate matter and particulate matter composition for every single monitoring station in the United States from the first date it has been available up until 2007.”

This includes years’ worth of ambient air data from every zip code in the country.

To get information on human health effects, Dominici uses Medicare data, including “every hospitalization for every person older than 65,” amounting to over 48 million subjects.

In all, the data (which continue to grow) add up to seven terabytes, Dominici said.

How much is a terabyte? It would take 1,000, 1-gigabyte flash drives to hold a terabyte. Now, imagine 7,000 of those flash drives—and you can wrap your mind around how much data Dominici has on her hands.

As a way to cope with the mass of information, Dominici explained that it helps to pick and choose what data to work with at any give time. She compared the process to using a storage closet—where you can put away winter clothes during the summer months and take them out again when it gets cold.

“The good news… is that you don’t need to manage it dynamically, all at once,” she said.

Despite the challenges of handling and analyzing such a vast amount of information, Dominici thinks the efforts will be fruitful.

“I have high confidence in the national study because I can see real improvements in getting sharper results as more data becomes available,” she said.

One study using the data, published in the Journal of the American Medical Association (JAMA), showed that causes of death and hospitalization related to air pollution differed in different parts of the country. “Cardiovascular risks tended to be higher in counties located in the Eastern region of the United States,” the study reported.

As analysis continues, other questions about air pollution risks will be answered. For now though, Dominici is neck deep in data, and it seems she likes it that way.

“As a statistician, I really like to do this because I can have an impact,” she said.

“Going from seven terabytes of data to estimates that have an impact on policy… it’s very, very satisfying.”

About the Author: A student contractor with EPA’s Office of Research and Development, Becky Fried is a regular “Science Wednesday” contributor.

Editor's Note: The opinions expressed herein are those of the author alone. EPA does not verify the accuracy or science of the contents of the blog, nor does EPA endorse the opinions or positions expressed. You may share this post. However, please do not change the title or the content. If you do make changes, please do not attribute the edited title or content to EPA or the author.

EPA's official web site is www.epa.gov. Some links on this page may redirect users from the EPA website to a non-EPA, third-party site. In doing so, EPA is directing you only to the specific content referenced at the time of publication, not to any other content that may appear on the same webpage or elsewhere on the third-party site, or be added at a later date.

EPA is providing this link for informational purposes only. EPA cannot attest to the accuracy of non-EPA information provided by any third-party sites or any other linked site. EPA does not endorse any non-government websites, companies, internet applications or any policies or information expressed therein.

I Know That Data Is Here Somewhere…

I have to admit that after 15 years of working at EPA, I still have trouble finding environmental data. Web searches don’t help that much so I rely on people like my friend Tim to email me data about hazardous waste. But I shouldn’t have to know every database manager to get EPA’s data.

It turns out that I’m not the only person with this problem. Last year EPA’s Office of Environmental Information hosted the National Dialogue on Access to Environmental Information to learn about the information access needs of our major audiences. We held listening sessions throughout the country and encouraged people to comment using blogs and wikis. From the thousands of comments we received we developed EPA’s Information Access Strategy, which describes key themes and a direction for EPA to address these needs. One of the common themes was: we need environmental data, but we don’t know where to find it. In response to these comments we’ve built Data Finder, a single place to find EPA’s data sources, so people can access and understand environmental information.

screenshot of Data Finder homepage Data Finder points to data sources: EPA-hosted web sites where numerical data can be downloaded. You can find data sources by clicking on key words or by typing terms into a search box. One click brings you to the source itself. By making data EPA information easier to find, understand, and use, Data Finder complements the Obama Administration’s commitment to a transparent and participatory government. It helps lay the foundation for more open conduct of Agency business and broader, more effective participation by the public.

I think Data Finder is a good first step for finding EPA’s data, but I know it only contains a subset of the data that’s out there. Please try Data Finder and tell us what information you’d like to see and how to make the site more useful. We’ll post your comments and tell you how we’re updating the site in response to your comments. And let’s leave Tim out of this.

About the author: Ethan McMahon has worked at EPA since 1994. Most recently he helped develop the Agency’s Information Access Strategy and the 40-page Report on the Environment: Highlights Document. Prior to working at EPA he evaluated alternative refrigerants and designed high efficiency heat pumps. Ethan believes that making information available can enable lots of people to find solutions to environmental problems.

Editor's Note: The opinions expressed herein are those of the author alone. EPA does not verify the accuracy or science of the contents of the blog, nor does EPA endorse the opinions or positions expressed. You may share this post. However, please do not change the title or the content. If you do make changes, please do not attribute the edited title or content to EPA or the author.

EPA's official web site is www.epa.gov. Some links on this page may redirect users from the EPA website to a non-EPA, third-party site. In doing so, EPA is directing you only to the specific content referenced at the time of publication, not to any other content that may appear on the same webpage or elsewhere on the third-party site, or be added at a later date.

EPA is providing this link for informational purposes only. EPA cannot attest to the accuracy of non-EPA information provided by any third-party sites or any other linked site. EPA does not endorse any non-government websites, companies, internet applications or any policies or information expressed therein.

Why Data About Data Matters

photo of field gear scattered around

About the author: Molly O’Neill, EPA’s Assistant Administrator for the Office of Environmental Information and Chief Information Officer.

In my first job after college, I was an environmental biologist/analyst. I spent some of that time taking surface water, sediment, groundwater, soil, and biological samples in the field. Of course, I followed the EPA standard sampling procedures and believe me they are quite extensive – 14-hour days were common and much of that time was to ensure that the quality of the sample was not compromised. There is a lot of documentation that goes along with each sample taken. After those long days in the field, I used to think, does all this documentation really make a difference?

Last week I participated in a listening session with a stakeholder group as part of the National Dialogue for Access to Environmental Information. One of the important themes that kept coming up during the discussion was the necessity to have access to quality data. This means that the data sample and results are not compromised and that the information about the data sample is not lost or forgotten along the way. For example, a community may take water samples at a local beach for a specific place and time, and then post the results to a website. These results are then consumed by other interested parties and made available to the public in a variety of ways. The data about the data, or “metadata”, doesn’t always convey with the data set and therefore, secondary users of this data may draw the wrong conclusions. In this case, without the time/place data with the sample an assumption that a local beach is currently contaminated may not be accurate.

Along that same theme, there was concern that while new mapping tools allow almost anyone to grab data sets (including some of EPA’s) and plot them on a map, combining data sets doesn’t always make sense. Data Set A + Data Set B doesn’t necessarily = Conclusion C. These are good cautions and the takeaway for me was that while providing access is good, we need to ensure that access to the metadata is equally as important. We also need invest in describing the data set and why it is collected.

Getting back to my first job and the question about whether the documentation with a sample is important, you bet the answer is yes! If you have comments on how we might enhance access to environmental information, please checkout our National Dialogue web site.

Editor's Note: The opinions expressed herein are those of the author alone. EPA does not verify the accuracy or science of the contents of the blog, nor does EPA endorse the opinions or positions expressed. You may share this post. However, please do not change the title or the content. If you do make changes, please do not attribute the edited title or content to EPA or the author.

EPA's official web site is www.epa.gov. Some links on this page may redirect users from the EPA website to a non-EPA, third-party site. In doing so, EPA is directing you only to the specific content referenced at the time of publication, not to any other content that may appear on the same webpage or elsewhere on the third-party site, or be added at a later date.

EPA is providing this link for informational purposes only. EPA cannot attest to the accuracy of non-EPA information provided by any third-party sites or any other linked site. EPA does not endorse any non-government websites, companies, internet applications or any policies or information expressed therein.