Discussion: combining and weighing information from different domains and approaches
There is a substantial body of research on motivators of charitable giving (see literature reviews here), but the evidence is often far from definitive. There has been a lack of systematic exploration of the most practical issues, for a variety of reasons. Academics have tended to focus on broader theoretical issues relevant to human behavior in general, or particularly focusing on government policy. Charities and fundraisers have often relied on conventional industry wisdom and anecdotal evidence. Where they have conducted studies and A/B tests has have not always been done rigorously or in depth, and they have not always shared with the broader community. To some extent, charities, like any organization in a “consumer” sector, are in competition; however there are also many idealistic third-sector professionals do share broad goals. External consultants and platforms who advise charities and facilitate fundraising have an incentive to overstate their expertise and the extent to which “they have the answers”.
Where we do have evidence it often comes from studies scattered across disciplines (economics, psychology, marketing, sociology, data science, etc.), with different methodologies and approaches (laboratory and field experiments with real giving or in-lab public goods, real-world trials, observational data, survey-based or hypothetical choices, etc.), targeting different domains (e.g., student populations, likely givers, Mturk participants, and different charities and cause areas). Experimental methods vary greatly, as do assumptions underlying observational studies, and general statistical and inferential approaches. For many issues, we will see creditable work claiming different answers to the same question. Some work may find a positive effect, some a negative affect, and some will claim a null result (which often merely reflects an underpowered design). Other work may find heterogeneous results (by domain, population, psychometrics, etc), or mediators of an effect; here we might be cautious about “fishing” and multiple comparisons overstating significance.
In putting these results together, some judgment is required. Which studies should we credit and which should we ignore? When is it meaningful to combine evidence from different domains? In doing so, e.g., in “averaging” estimated affects (and considering the probability bounds on this), how much shall we weight each study?
Statisticians and applied researchers have paid considerable attention to these issues of “meta-analysis” and information synthesis. This is highly relevant to medical and health research, to assessing the impact of policy interventions, and across many other areas. Theoretical results and prescriptions, e.g., for optimal Bayesian updating, generally require strong assumptions and abstraction. Real world application of these methods will always involve some judgment calls and compromise. (See Givewell.org for a particularly relevant example).
For the Innovations in Fundraising project, we will continually consult and communicate with experts in the field on the most widely accepted and justifiable approaches. Where subjective judgment is necessary, we will be explicit about this and we will poll and consult creditable experts and try to elicit their “best guesses”. However, we will avoid making subjective judgements and follow clear and objective rules whenever possible. We will also make the “default” rules and assumptions clear to the user, and provide links to explain why we made these choices. We will allow the user to adjust the assumptions, and give some guidelines on what they might want to consider in doing so. We avoid subjective choices for several reasons: it limits the scaleability and reproduceability of this project, it can lead to perceptions of unfairness and alienate people from getting involved, and it makes this project less transparent and broadly acceptable.
For example, a key issue will be how much weight to give to a certain class of studies, e.g., unpublished working papers, or experimental results based on hypothetical choices. Suppose, for example, a fundraiser for the Against Malaria Foundation wants to predict the impact of explicitly pre-covering overhead costs on the average “warm list” donor’s contribution, in order to decide how strongly to pursue this approach. The meta-estimate (impact on average contribution in dollars) will depend on which studies we include, and how much weight we give to each of these. We may, as a default, include all published studies and working papers by authors who have published in this area, which involve real charitable donations and between-participant comparisons between two appeals. This user could select if she wanted to exclude studies involving donations to non-international or non-health charities. Coefficients could be combined, by default, with the random-effects model allowing true study heterogeneity, allowing the user to select the fixed-effect approach as a variation, yielding an estimated effect and a 95% confidence interval. Our default might be to downweight unpublished “gray literature” studies, or studies that have not been pre-registered or replicated. In doing this, we will explicitly mention this downweighting, explaining our general reasoning, explaining how we chose these particular weights, and allowing the user to adjust these (and giving guidelines for doing so).
How should these weights be chosen? The relevant question is how much we should expect evidence from one domain, and with one approach, to be informative of our domain of interest. This is obviously a difficult problem. We can seek empirical answers to this, at least in part, by measuring the past reliability and consistency of these results, particularly for cases in which we have a “gold standard” for comparison. (See, e.g., Lalonde, 1986). The weights can be determined based on the extent to which previous results in one domain have carried over to the other domain, and the extent that one type of evidence has replicated and had predictive value. Where such evidence is absent, we need to look to statistical and social science (economics, psychology, etc) theory and accepted practice, and survey the relevant community of scholars and practitioners to determine which evidence to include and which to exclude or downweight.
Update: Recent applied work by Eva Vivalt on RCTs in developing countries seems particularly relevant; we aim to incorporate these approaches.
References and discussion of specific issues in meta-analysis
Wikipedia provides a decent general discussion of meta-analysis and its key issues (although it appears a bit random in which “new methods” it highlights). Below, we give some specific work, focusing on the relevance to this project.
Various articles and texts give general rules, tips, and theoretical justifications of how to do this:
- “…Non-technical primer for conducting a meta-analysis to synthesize correlational data” [1]
- "Wheat from chaff: Meta-analysis as quantitative literature review." (Broad discussion for economists, see also responses in same journal)
How do we think about combining …
- Studies with outcomes measured in different units. How can we reasonably “standardize” this?
- With different types and magnitudes of interventions? Does the “average effect size” even exist in a meaningful way? Datacolada
- “…When Studies Contain Multiple Measurements”[2]; (e.g., a single study may report both the affect on whether a participant donates, the average donation, her stated intention to give in the future; coefficients with or without controls…); these are not independent measures.
- Studies are run in succession, with an implicit or explicit stopping rule? (How) do we need to adjust the reported significance? [3]
- How to adjust for a potential “file drawer” problem and publication bias? Datacolada. Here they argue that this is not a serious problem (the argument is somewhat implicit, however) and argue that proposed solutions do not work. Other approaches involve imputing or adjusting for the likely “missing data”.
Measuring whether there is “evidentiary value_ckgedit_QUOT_ in a set of reported/published findings
- Funnel plots (and trim-and-fill, Orwin's fail-safe N, etc.)
- The P-curve approach [4]
- Replication projects
Broad practical meta-analyses
- Aidgrade, synthesizing impact evaluations of interventions in developing countries with 'build your own meta-analysis' interface
See also: What works to increase charitable donations? A meta-review with meta-meta-analysis (Noetel et al, pre-print)
Specific academic meta-analysis examples
Charitable giving
Andrews, K.R. et al., 2008. The legitimization of paltry favors effect: A review and meta-analysis. Communication Reports, 21(2), pp.59–69.
Bolkan, S., & Rains, S. A. (2017). The Legitimization of Paltry Contributions as a Compliance-Gaining Technique: A Meta-Analysis Testing Three Explanations. Communication Research, 44(7), 976–996. https://doi.org/10.1177/0093650215602308
Butts, M. M., Lunt, D. C., Freling, T. L., & Gabriel, A. S. (2019). Helping one or helping many? A theoretical integration and meta-analytic review of the compassion fade literature. Organizational Behavior and Human Decision Processes, 151, 16–33. https://doi.org/10.1016/j.obhdp.2018.12.006
Lee, S., & Feeley, T. H. (2017). A meta-analysis of the pique technique of compliance. Soc. Influ., 12(1), 15–28. https://doi.org/10.1080/15534510.2017.1305986
Lee, S., Moon, S.-I., & Feeley, T. H. (2016). A Meta-Analytic Review of the Legitimization of Paltry Favors Compliance Strategy. Psychological Reports, 118(3), 748–771. https://doi.org/10.1177/0033294116647690
Economics: Beliefs and attitudes
Behavior in lab social dilemmas
Johnson, N.D. & Mislin, A.A., 2011. Trust games: A meta-analysis. Journal of Economic Psychology. Available at: http://www.sciencedirect.com/science/article/pii/S0167487011000869.
Lane, T., 2016. Discrimination in the laboratory : A meta-analysis of economics experiments. European Economic Review, 90, pp.375–402. Available at: http://dx.doi.org/10.1016/j.euroecorev.2015.11.011.
Oosterbeek, H., Sloof, R. & Van De Kuilen, G., 2004. Cultural differences in ultimatum game experiments: Evidence from a meta-analysis. Experimental Economics, 7(2), pp.171–188.
Policies and practical treatments
Cadario, R. & Chandon, P., 2017. Which Healthy Eating Nudges Work Best? A Meta-Analysis of Field Experiments. , (July). Available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3090829. [1]
Polanin, J.R., Hennessy, E.A. & Tanner-Smith, E.E., 2016. A Review of Meta-Analysis Packages in R. Journal of Educational and Behavioral Statistics, 42(2), pp.206–242. Available at: http://jeb.sagepub.com/cgi/doi/10.3102/1076998616674315.
Sagarin, B.J., Ambler, J.K. & Lee, E.M., 2014. An Ethical Approach to Peeking at Data. Perspectives on Psychological Science, 9(3), pp.293–304.
Simonsohn, U., Nelson, L.D. & Simmons, J.P., 2014. P-curve: a key to the file-drawer. Journal of Experimental Psychology: General, 143(2), p.534.