The Philanthropedia Whitepaper
This whitepaper serves two main purposes: (1) to make the case for using experts to identify high-impact nonprofits, and (2) to explain our progress towards a specific methodology to date.
Our overall conclusions are that:
- Our methodology captures expert opinion about high-impact nonprofits in different social causes.
- Using experts to identify high-impact nonprofits offers unique advantages in terms of high quality information about nonprofits and low cost to gather that information.
We begin this paper by explaining the problem Philanthropedia is trying to solve in the philanthropy sector and Philanthropedia’s solution to this problem. We then review our research methodology in detail and outline its strengths and limitations. Next, we look at our research in climate change as a case study to evaluate the results and analyze the data. Finally, we explain how we intend to modify our methodology based on the results of our work to date.
We invite you to download the whitepaper, the accompanying summary sides, and/or the executive summary. You may also look through the summary slides embedded below.
We hope you find these materials interesting and invite your feedback by contacting us at feedback@myphilanthropedia.org or leaving your comments for us below.
Discussion
I just noticed that the links broke down over the weekend, working on fixing them now - my apologies!
I'm writing on behalf of GiveWell, which takes a great interest in Philanthropedia's work as it (a) shares a common goal and set of values with our organization; (b) could be of direct use to us in identifying high-priority organizations to investigate. As we've written before, we have major concerns with Philanthropedia's model as it stands.
I feel that this whitepaper is a commendable step toward encouraging openness and discussion around Philanthropedia's model, that it clarifies a number of issues, and that it lists promising steps for improvement.
However, I also feel that it (a) overstates what Philanthropedia has established about its model, and that it (b) does not address the single largest question in our minds: who are the "experts" and how are they chosen? This comment will focus on these two issues.
Point (a): The whitepaper overstates what Philanthropedia has established about its model.
The whitepaper states, "We have shown that our methodology can capture the relative impact of nonprofits within the same sector." This is a strong claim, and I do not believe the analysis Philanthropedia provides (Section III of the whitepaper) supports it. The analysis of Section III is as follows:
- Tests for a "good expert network." The paper observes that the set of professionals polled by Philanthropedia has a high number (139) of experts with a high average number of years of experience (13), and diverse occupations and geographic locations.
I agree with the goal of a large, diverse expert network, and I would be concerned if the set of polled professionals were highly geographically concentrated or reported low numbers for years of experience. In this sense, the tests are good ones to have run. However, serious questions remain about who the "experts" are and how relevant their experience, knowledge and biases are (more below); these tests do not address this issue.
- Tests for quality of expert reviews. The paper states, "The expert reviews are based on the strengths and areas for improvement that experts submitted in the second survey. The quality of these expert reviews varies and is much harder to systematically analyze due to its qualitative nature. Nevertheless, through interviews with representatives from some top nonprofits and other nonprofit experts we found that the expert reviews capture some of the most important characteristics of these charities." Substantially more information would be needed about these interviews - and the people with whom they were conducted - in order for any outsider, including myself, to assess this claim.
We have previously questioned the quality of the expert reviews on microfinance, noting that in many cases they appear not even to be based on opinions about charities' effectiveness in helping people, and the content of this section is not sufficient to address these concerns.
- Tests for data validity/reliability. Page 32 notes 75%+ correlations between charities' scores on different elements of Philanthropedia's survey. Again, I feel that this is a test worth running and one that could have raised concerns, but it should be noted that it is only a test for internal consistency, not external validity.
- Correlation matrix on page 34. The paper lists the top-ranked charities for climate change, along with how many total votes they received and how many votes they received from different subgroups of experts. It then examines correlations between the total number of votes a charity received and the number it received from each subgroup. It concludes: "First, the expert self-rating scale is a meaningful selection criterion as evidenced by increasing correlations for the higher self-ratings. Second, all three major groups of experts agree with the final recommendations, although foundation professionals and researchers correlate slightly lower results, as evidenced by the very high correlations for both cold calls and referrals."
I believe there is a flaw in the calculations here that leads to overstated correlations and an overstated relationship between "expert self-rating" and correlation with other voters. Since this is a relatively minor and technical issue, I've put the details in another comment immediately following this one (as a sort of "footnote" to this paragraph).
A larger issue, in my view, is that only the final-stage votes are examined: this matrix examines levels of agreement within the top 15 charities, but does not indicate to what extent the groups agree on which charities should make the top 15 in the first place. More on this immediately below.
- Correlation matrix on page 36. The paper finds low correlation between charities' ratings and basic measures of their "prominence" such as 2007 revenue, Google hits, age, # employees, and Charity Navigator metrics. It concludes that "fears that experts are influenced disproportionately by these external variables and thus vote on the basis of a 'popularity contest' are largely unfounded."
However, the matrix looks only at correlations within the top 15 charities, whereas the far more important question - it seems to me - is which charities "make the cut" of being recommended at all, and why.
I would guess that if Philanthropedia looked across the full universe of eligible charities rather than simply the top 15, it would find a very strong relationship between "prominence" measures and Philanthropedia ratings. I also don't think this would in itself be a bad thing - I expect that charities regarded highly by experts will be more likely to succeed in becoming prominent.
It's also worth noting that experts could be biased toward the people they personally have good relationships with, rather than charities that are generally prominent.
I'm not advocating that Philanthropedia conduct its analysis differently (except in the one case of the table on page 34). Many of the concerns about expert quality and expert biases cannot be answered simply with statistical analysis (as outlined below, I believe that increased transparency and disclosure is a more promising path to addressing the major concerns). The point I am making is that the analysis Philanthropedia has conducted only - at best - checks for some very basic issues with the data, and should not be said to demonstrate (or even suggest) that Philanthropedia's network is high-quality or free of biases.
Point (b): the paper does not address the single largest question in our minds: who are the "experts" and how are they chosen?
As we've previously written, we are concerned about the representativeness and credibility of the professionals polled by Philanthropedia. We have advocated that Philanthropedia "publish exhaustive details of how 'experts' are defined, selected and invited." As far as I can tell, this whitepaper does not do so.
The most detailed description I see of the process for identifying experts is as follows:
We look at a number of factors to determine expertise. We look at years of experience working in the sector, job title and occupation, professional affiliations and/or academic background, and we ask experts to self-rate their expertise on a scale from 1-5, where 5 is "most" expert. Our minimum criteria are having 2 years of relevant experience and a 3 on the Philanthropedia-developed expert self-rating scale … we specifically identify and invite foundation professionals, academics/researchers, nonprofit senior staff, and policy makers interested in having a representative expert network along two dimensions: profession type and geography … We target experts in a social cause through a combination of cold calls and warm referrals (on the basis of professional and personal connections).
This description makes it clear that Philanthropedia staff themselves are exercising considerable judgment, particularly regarding what sort of experience is considered to be relevant. It does not make clear how (and how systematically/consistently) Philanthropedia evaluates and weighs the different factors listed and decides whom to contact; it does not disclose the full list of people contacted.
As stated previously, we believe it is important not only for Philanthropedia to provide a clear and detailed explanation of how it identifies experts, but to publish the names of all experts who are invited to participate in its process, whether or not they accept.
I am encouraged by the whitepaper's statement that in the future, "we will require all experts to be listed on our website if they choose to participate in the survey" - this, in my view, would be a major step forward.
Bottom line
We have major questions about the quality, credibility and representativeness of the people being polled by Philanthropedia. Until these questions are resolved, we cannot confidently describe its output as "recommendations of relevant experts."
It appears to me that Philanthropedia has attempted to address these questions primarily through statistical analysis of its output. I am skeptical that such analysis can answer our key questions, and I feel strongly that the specific analysis Philanthropedia has done does not answer these questions. I feel that a more promising path is for Philanthropedia to substantially increase its level of transparency: publish the full details of how it defines and identifies "experts," and the full list of "experts" who have been asked to provide recommendations.
Philanthropedia appears to recognize the need for more disclosure, and it has committed to taking steps toward this. I am optimistic that in the future, the level of disclosure will increase substantially, making it possible to assess the true meaning of Philanthropedia's recommendations. If so, we will then either (a) recognize the recommendations as a major resource for donors and force for good in the nonprofit sector; (b) start the conversation about what needs to change in order for (a) to come about.
Regarding the correlation matrix on page 34 (this is a fairly minor and technical point, which is why I've put it in a separate comment from my main thoughts):
The correlations listed are between the total number of votes and the number of votes from a given subgroup - where the latter are included in the former. Note that even if there were no relationship between the rankings of different subgroups, the observed correlations would still be positive simply because they are comparing two sets of numbers that have a substantial common element (to give an analogy, the number of votes a political candidate receives from Democrats will almost always have a positive correlation with the total number of votes s/he receives, but this does not indicate a relationship between Democrats' and non-Democrats' votes.)
This means that (a) each correlation is likely to be "inflated" relative to what it is said to be representing (i.e., the degree of agreement between different subgroups); (b) the larger the subgroup, the more "inflated" the correlation will be. Thus, Philanthropedia's observation that experts with higher self-ratings have higher correlations with the total number of votes could simply be an artifact of the fact that there are more experts that self-rate as 5 than experts who self-rate as 4, and more experts who self-rate as 4 than experts who self-rate as 3. (This is in fact the case.)
When one examines the correlations between, for example, researchers' votes and non-researchers' votes (as opposed to between researchers' votes and total votes), the numbers and the patterns change. No subgroup is 90%+ correlated with its non-members; in most cases the correlations are between 70% and 80%. And the relationship between expert self-rating and correlation with other voters is no longer present (in fact experts who self-rate as "4" have the highest correlation with other voters, at 85%).
-Holden Karnofsky
www.givewell.net
Holden,
Thanks for the thoughtful comments and for reading our whitepaper. As always, we appreciate the critique as it gives us an opportunity to improve. Let me provide a few answers to keep the conversation going:
Point 1: Who are the experts?
I am surprised to see this comment, because we have addressed this issue many times, both on our blog as well as on the whitepaper. At the same time, I realize that it is a core issue for us. Let me attempt to explain one more time - our approach is very straightforward:
- Step 1: identify as many experts as possible, looking into every foundation, nonprofit, research institution, media, consultancy, association, and other institutions that are engaged in the social cause we are researching. The end result is literally a really long list (of say 400 people) comprising of every foundation professional, senior nonprofit staff member, researcher, journalist, policy maker, consultant, etc. that has relevant experience and as a result, we believe, could have an intelligent opinion about high-impact nonprofits in the space. We always try to be as inclusive as possible.
- Step 2: interview, email, and call as many of these experts as well as further grow the list by asking people for more referrals of thought leaders we might have missed. We would usually get another 100-200 names in this way. We always try to be as inclusive as possible.
- Step 3: survey the experts, with a target to have at least 50 experts on the local level, and 80 experts on the national level. We make every effort to get as many people to participate as possible, and put special emphasis on recruiting academics and foundation professionals. After the survey is complete, we remove votes from people with less than 2 years of experience or self-rating less than 3.
There are three other relevant facts I think. First, people opt out if they don't feel qualified to answer the questions. Second, as explained in the whitepaper, we do checks after we close the surveys to make sure that the people who participated have relevant expertise - this is much easier than it sounds because we can look at their experience as a proxy to expertise (which we ask for in the surveys) and at their answers that usually are very obviously either high or low quality (see Point 2 below for more on this). Third, we have designed our methodology to be very "error tolerant" in the sense that making a few mistakes in the expert selection step would have no impact on the final results (that is why we have the luxury of being inclusive in earlier steps).
So, overall, we are as inclusive as we can get in inviting experts to participate, and only remove experts that obviously lack the qualifications or obviously provide low quality responses – and so we do not exercise considerable judgment. On the basis of the above, we feel we have answered this question quite decisively - it is possible to recruit relevant social cause experts in a very methodological fashion.
On the point of transparency of this step, I largely share your comments: of course we need to provide more information to build trust and credibility. That's probably the only way to answer your question/concern in a convincing way. That is why we are redesigning our webpage to include a report card for each social cause we cover, as explained in the whitepaper. I think there are some practical limitations to some of your suggestions (e.g. we might run into privacy issues if we disclose the full list of people who we attempted to contact). I would also question the usefulness of publishing the full list of people we attempted to contact - other than perhaps to try to "pressure" these experts into participating next time around. Besides, the list would be so long, it would be completely un-actionable and useless. However, we are totally on board with the theme of disclosing more information and this will be reflected on the webpage in the next few months.
Point 2: Quality of expert reviews
Indeed, assessing the quality of this type of information is very challenging and we recognized that early on, which is why we decided to go for an alternative approach: let the data speak for itself. We could ask for testimonials/quotes from the people we consulted with or perhaps even publish interviews with them, however, we believe that a far more effective way to demonstrate the quality of expert reviews is to simply publish them.
In our opinion, despite a few examples that are arguably low quality, the vast majority of expert comments describe accurately the nonprofits' strengths and weaknesses and answer the question we asked in the survey well (it is listed in the whitepaper). At the same time, we think there is a way to improve on this point substantially. Namely, we really care mostly about the evidence for impact, based on which the expert is recommending the organization. That is why in our new survey versions, we have changed this question substantially. Combined with Philanthropedia gaining more momentum and our methods for recruiting experts getting better, we hypothesized that the quality of expert reviews will rise significantly. We are currently running four social causes here in the Bay Area with the new surveys and are very impressed with the quality of responses. Unfortunately, I cannot disclose more at this point, but we will certainly be publishing information on this in the future.
Point 3: Statistical analyses ran and limitations
I agree with the substance behind your comments: most of our analyses are what I would call "defensive" in the sense that we are checking for very basic issues that could theoretically prevent us from creating a usable set of information. We feel that this is a crucial step in the beginning since are methodology is unique and original, and we ought to make sure we are not making some basic mistakes.
On your comment about the correlation matrix on page 34, we actually performed additional tests (including some of the ones you mentioned) that we did not publish. Our goal was to establish which expert group contributed which nonprofit recommendation. I look forward to publishing this when I get a spare minute. I am mentioning this because I feel that this type of analysis answers the question discussed in the whitepaper better. In any case, I think your point about the size of the expert sub-groups being an important intervening variable to be very important and I will certainly take a second look. If the correlations are actually a bit lower as you mention (i.e. in the 70-80s) that’s actually good news for us – some of our advisors warned us that our correlations are “too” high in the sense that eliminating a group of experts would not make a difference, which of course undermines the whole point of recruiting a large and diverse expert network! Thank you for pointing that out.
As for running the analyses on only the top nonprofits: this is an unfortunate limitation of our data set, as explained in the whitepaper. In the future, we plan on expanding slightly the number of nonprofits that we gather information on so that we can study our results better (there are big limitations here such as the fact that having 25 nonprofits to distribute percentage allocations among makes the survey impossible to take by most human beings as research indicates).
On the broader comment of the limitations of statistics: I could not agree more! Analyses such as the ones we ran were necessary in our opinion in order to lay a foundation upon which we can build our methodology. Our approach going forward would be to make more information available so that partners such as GiveWell can study the organizations and/or our research more in-depth, which is what you guys do so well.
In conclusion, we don’t see the whitepaper as the “last word” on our methodology, but rather as an important foundational step. As such, there are bound to be many questions left unanswered. At the same time, we have made significant changes to our methodology and are seeing substantially improved results with our new research. We would love to use this discussion to learn more about what kind of things to study once these new data sets are available in a few weeks.
More comments welcome! Thank you again Holden and Elie for taking the time!
A short note: you can use HTML to format your response (e.g. <strong></strong> to make something bold). Initially, the HTML will not be active, but I periodically review comments and enable it if there are no problems (this limitation is due to security issues, my apologies).