Non-profits & IRS 990s

Non-profits & IRS 990s

US Non-profit organizations are legal entities that are organized and operated for a collective, public, or social benefit. Many different types of organizations may operate as non-profits, including charities, business associations, churches, social service providers of various sorts, consumer cooperatives, private schools, and even some political organizations. Non-profit entities are tax-exempt in the US, but still must file an “informational” tax return with the Internal Revenue Service each year. As a matter of federal law, these informational returns are public records and freely available to anyone who wants to see them. They can be a rich source of information about how an NPO is funded and how it spends its money, especially when combined with other public records. This blog post outlines the rationale for my own recent look into what can be learned about an individual NPO from its IRS 990 data. Or, you can skip straight to the bottom for links to R code, and draft of a case report created with it.

In fact, there is a non-profit organization called GuideStar whose business model turns upon being the “Premier destination for nonprofits and nonprofit research.” Which is to say, it provides research services about non-profits, for whomever may require that sort of information. GuideStar makes specific use of IRS data in profiling NPOs. Non-profit IRS filings are also a rich source of information for news organizations. Browse the back files of any significant newspaper and you will find stories that depend on 990 data. So, how might you make this trove of information work for you? The first challenge is to get hold of IRS 990s for the organizations you’re interested in.

One convenient place to get 990s is from the ProPublica Nonprofit Explorer. It is necessary to search for the particular NPOs you are interested in, and individually download the forms for each tax year. The form for a specific NPO/tax-year will be available either as a scanned PDF, if it was filed as a paper document, or as an XML file, if it was filed electronically. All NPOs are required to file electronically since 2019.

A second source of information used in this project is details of individual grants available from the National Institutes of Health. The NIH Exporter is a tool that gives access to information about all individual grants made by the NIH. That information includes the direct and indirect funds paid to grantees each year for each grant. NIH grant data can be downloaded using the NIH Exporter web tool. The data files are available for each fiscal year since 1985 in either CSV or XML format.

In highlighting what can be learned from these data sources, I use Haskins Laboratories, a 501©(3) non-profit organization, as a case study. Why Haskins Labs? First, because there is both IRS data and NIH grant data available for the Labs. As a non-profit organization, it is required to file an IRS 990 each year, and has been filing those documents electronically since 2014. Electronic filing is significant because it means that the 990 data is available in XML format, making it easy to access programmatically. Form 990s that have been filed as paper documents are only available as scanned PDF files, often not of very good quality. This makes data extraction error prone and tedious (but not impossible!). With regard to grant data, Haskins is a research organization and has received significant NIH research funding since at least 1970. So, there is a wealth of information about the grants that NIH has made to Haskins over the years.

There are certainly other NPOs that get NIH funding. But, most of them are large and complicated organizations (e.g., Yale University). Haskins, on the other hand, is small enough that it is possible to use these data sources to understand something about its financial history without too much effort. That said, I’m really just scratching the surface of what is available in its 990s and NIH grant history.

Another reason for using Haskins records in this exercise is more personal. I was a senior scientist on the Haskins research staff from 2002 until 2019. So, I have some curiosity about what nuggets of historical interest can be found in these public data sources. Certainly, there are other accounts of the lab’s history which are more detailed and more human than what can be gleaned from these federal records. The initial work on this project was done in late summer of 2021. At that time the most recent IRS 990 available was for the 2019 tax year. Although, NIH data is both more up to date and goes back further in time. Together, the IRS and NIH data might provide an interesting complement to other views of Haskins’ history, but constructing an overview of Haskins history is not really my goal. I’m simply using my connection to the labs as one source of motivation for this project.

The real point of this exercise is simply to pick the lowest hanging fruit out of all that might be gathered from these two federal sources. I am not trying for a comprehensive account of what might be found there, and certainly there are other sources beyond these two that might be incorporated; see the linked report below for some thoughts on that. The draft report is best thought of as a rough and simple overview of the source data. It is not tailored to any particular goal. One can easily imagine projects based on either of these data sources that would involve in-depth examinations of how individual organizations change over time, or comparisons of groups of organizations, or both.

This document is a work-in-progress and is likely to remain so. I returned to it briefly in spring of 2022 to do some minor cleanup and prepare a web release to go with the original PDF report. I hope to get back to it again later in 2022, once the IRS releases form 990s for 2020 (public release tends to lag submission to the IRS by about 18 months). The most current version of the report will ‘always’ be available as a web page at and as a PDF file at If you really want to get into the weeds, you can grab the R code used to create the report from GitHub at

If this data doodle piques your curiosity, if you have questions about the process, feel free to reach out to me. I am always happy to talk shop.