Monitoring & Evaluation

In this section we hope to provide an overview of why we attach so much weight to monitoring and evaluation of our projects at the OAD.

Basic understanding of monitoring and evaluation
Defining key terms
Reasons to evaluate
Constraints on evaluation
When not to evaluate

1. Basic understanding of monitoring and evaluation

Research has shown that interventions often run into unexpected barriers and can produce unexpected negative, as well as positive, impacts. As a result, non-profit organisations and other intervention providers have come under increasing pressure from funders and other stakeholders to provide information about their performance. Monitoring and evaluation refer to a combination of activities and procedures used to effectively measure, report and learn from project performance.

Monitoring and evaluation are considered good practice for organisations engaged in social intervention because they enable providers to:

determine whether a project successfully produced desired outcomes
understand how a project worked or did not work
check for unforeseen negative consequences that require attention in future
identify which activities produce the largest or most important positive impacts
identify which strategies are most cost-effective (i.e. resource efficient) in achieving project goals
assess how meaningful, sustainable , accessible and relevant a project was for participants and other stakeholders
learn from experience how to improve and build on project experience
demonstrate impact to others.

Monitoring refers to indicators used throughout the life of a project to measure progress and record how a project was designed, implemented and delivered. Monitoring data capture whether a project is implemented in a way that is consistent with its design and whether any unanticipated barriers were encountered (for example, no one applied to take part in the project). The monitoring phase of a project also provides a framework through which to track progress and identify problems early on, thus offering a way through which corrective action and improvements can be made ‘on the ground’ and recorded without compromising project objectives.

Evaluation focuses on “what works”, as well as how, for whom and under what conditions. The key difference between evaluation and monitoring is that evaluation is about using data, including monitoring and post-project outcome data, in order to draw conclusions about project impact. Different types of evaluation are used to address different dimensions of project impact and effectiveness (see Table below for a summary of evaluation types). Different research designs are used for different types of evaluation; and different types of evaluations are of interest to different stakeholders. It is considered good practice to conduct an impact or outcome evaluation in combination with an economic evaluation wherever possible; however, these two types of evaluation are also the most technically challenging.

Types of Evaluation

Evaluation Type	Focus Questions
Mechanistic/Theoretical	What is the project’s theory of change? i.e. how and why does a project lead to changes in desired (and/or secondary) outcomes? How does the project’s design and implementation reflect that theory? Which intermediate outcomes are produced during the project? Which project components cause intermediate or final changes in outcome? Which individuals (or communities) are likely to respond best and why? Are there any possible negative outcomes? why do these occur? How can these risks be minimised or avoided?
Process	How was the project implemented? Which aspects were feasible and which aspects were problematic? How did project stakeholders experience it? Which components were effective? Which were not? How did the project produce outcomes and for whom? I.e. were theories of change shown to be correct or do these theories require adjustment?
Feasibility	Would it be possible to scale the project up and deliver on a much larger scale? Is it possible to replicate the project? Has the project gained enough stakeholder support to sustain or expand it? Are impacts long-lasting?
Impact/Outcome	Did the project meet the overall needs? Was any change significant and was it attributable to the project? How valuable are the outcomes to the organisation, other stakeholders, and participants?
Economic	Was the project cost effective? i.e. do outcome gains equal or exceed investments? Was there another alternative that may have produced the same or better outcomes while using fewer resources?
Summative	What lessons have been learned? How can future projects build on this project? Which questions require further research?

2. Defining key terms

Astronomy for development is concerned with activities that involve people rather than stars. That is to say, Astro4Dev projects are projects that seek to affect human development, not achieve scientific objectives. Astro4Dev projects are thus classed as “social interventions”: interventions, policies, practices or programmes that seek to improve social welfare outcomes by addressing social, economic, health, psychology, education or behaviour problems.

Some diverse examples of social interventions include humanitarian relief after natural disasters; HIV prevention campaigns; ‘sin’ taxes on cigarettes; reductions in classroom sizes to improve education outcomes; drug treatment in prisons to reduce reoffending; and after-school clubs to improve childhood school performance and reduce delinquency. Even though the ‘social problems’ that they address are not always made explicit, all of the OAD projects conducted to date can be classes as social interventions too.

Evaluation is talked about a lot across fields of social intervention. Everyone seems to mean something slightly different. There is no consensus on a technical definition. The Gates Foundation uses the following definition:

Evaluation is the systematic, objective assessment of an ongoing or completed intervention, project, policy, program, or partnership. Evaluation is best used to answer questions about what actions work best to achieve outcomes, how and why they are or are not achieved, what the unintended consequences have been, and what needs to be adjusted to improve execution.

There are two main types of evaluation, focused on different questions listed in the definition above:

Impact (aka Summative) Evaluation, which focuses on whether and to what extent target outcomes were achieved (i.e. measuring impact)
Process (aka Formative) Evaluation, which focuses on how, for whom and under what conditions a project works

Evaluation is typically paired with monitoring (hence the phrase “Monitoring and Evaluation”, sometimes abbreviated to “M&E” ).

Monitoring focuses on determining how a project was delivered and whether this deviated from its original plan or design. Monitoring can be used to identify obstacles to implementation (e.g. under-budgeting; the need for more staff training etc.) and ensure accountability (e.g. reduce misallocation of funds or even deliberate corruption in aid organisations).

The results of both kinds of evaluation (impact and process) and of project monitoring are combined to provide lessons for future projects and improve practice over time.

3. Reasons to evaluate

Evaluations are often required by funders (e.g. by nearly all government funds for education and development projects). There are, however, many more reasons to conduct evaluations. These include:

Cost-effectiveness Determining whether a project impacted intended outcomes and the size of those impacts can be used to select between different projects, ensuring that limited resources are allocated to those projects that are most effective
Unintended consequences Human beings are complex and social systems are even more so. Social initiatives conducted over the past century (and most people’s personal experience!) provide substantial evidence of how even the simplest interventions and efforts to help others can have significant and far-reaching unintended consequences. Sometimes these are good but it is surprisingly easy to cause inadvertent harm. Evaluation provides a way to check for harm and mitigate the risk of repeating and/or ‘scaling up’ an approach that causes unforeseen harm.
Improve practice over time Another consequence of social complexity is that it can be difficult to design a project that has a large impact. It usually takes time and a ‘learning cycle’ to arrive at an approach that really works. Evaluation enables lessons about what works to be used and incorporated into future practice so that project designs improve and become more effective over time (this is the learning cycle embedded in the OAD Impact Cycle).
Contribute to knowledge and practice Evaluation offers lessons for others about what went well, what worked and what did not. Since evaluation findings can be shared, evaluations can be used to improve not only OAD project designs and practice but also a broader understanding of what works in related fields (e.g. effective techniques for science communication); and to improve the design of similar projects conducted by other organisations and actors (e.g. other science unions, AstroEdu, UNAWE etc.).
Demonstrating impact to stakeholders For a project to be sustainable, it must be supported by all key stakeholders. These include the people who deliver the project (e.g. OAD project leaders and teams); those who fund the project (e.g. the OAD, the IAU, Kickstarter donations etc.); those who participate (e.g. students, teachers); and any other affected communities or parties (e.g. school districts). Evaluations provide project supporters with evidence that they take their project objectives seriously and prioritise achieving positive outcomes (i.e. by taking a risk with an evaluation that could show that the project does not work). If an evaluation shows positive results (or once a project has been improved to the point where it shows positive results), the project supporters can use these results to build trust with target participants and attract support for its continuation.
Increasing funding and scale of delivery Most large international funders will not allocate significant resources to any intervention that has not been demonstrated to bring about positive outcomes. Organisations that require rigorous impact evaluation before scaling-up include The Gates Foundation, the largest private philanthropic foundation in the world; UN agencies; Oxfam; and most government aid departments such as USAID and the UK’s DFID).

Evaluations are particularly important to conduct when projects are

using innovative methods that have not been tested
operating in a domain where knowledge is lacking about what works
working on problems whose mechanisms are poorly understood
attempting to change behaviours, attitudes or social structures
trying to affect outcomes that are difficult to observe

For example, we hope that an OAD project proposal may focus on improving gender inclusion in the sciences. This project would probably be a good candidate for having an in-built evaluation. This is because we know that there are multiple interacting causes for disparities in gender representation in the sciences and that these causes are likely to vary over time and across contexts. We also don’t know of any highly effective solutions to these causes; and probably do not know what all the causes are. Furthermore, we do know that there have been several high-profile examples of projects that sought to enhance girls’ entry into science and failed to do so or actually had the opposite effect (e.g. ‘Science It’s a Girls Thing’). An evidence-based design (incorporating what is known) combined with a carefully designed evaluation framework would help mitigate the risk of causing unforeseen harm and ensure that the project contributed to evolving knowledge and practice in this area — at best, the project would be found to be effective and could be used by others. At worst, the project would contribute to our understanding of what does (and does not) work and thus help future projects design more effective approaches.

4. Constraints on evaluation

Evaluation is not always appropriate and ideal evaluations are rarely possible. Evaluations are likely to be particularly limited for smaller projects, where a huge expensive measurement exercise would end dwarfing the costs of the project.

In determining whether and how to evaluate a project, the following constraints should be kept in mind:

Cost: International development agencies, funders and organisations typically allocate approximately 10-20% of each project’s budget to monitoring and evaluation. While this might seem like a lot, evaluation can enable much larger gains over time by ensuring that the most effective projects are identified and scaled up while the most ineffective (and any harmful!) projects are abandoned.
Burden on implementation: This is a particularly serious concern for OAD projects, which are often delivered by volunteers in their spare time. In the ideal case, evaluation design and data collection would be conducted by a separate (independent) team. This has the advantage of reducing the burden on those implementing the project; and reducing the risk of conflicting interests (since those who designed or implement a project are likely to be invested in its success!). In reality, it is often impossible to budget for or engage independent evaluators.
Restricting outcomes: When outcomes are difficult to measure, there is a risk that attention will be paid only to what is measurable. If measures are poorly designed or inadequate, this can create problems; even when measures are good, they must necessarily focus on a narrow selection of outcomes. There may be others of interest that are then ignored. For example, the selection of science & research indicators may influence policymakers to focus on specific areas of science output that are easily (or currently) measured rather than on those that could be done; or that have more restricted (local) relevance (see a blog on this topic by the LSE).
Political/personal/emotional: Constraints on evaluation means that (if an evaluation is appropriate) monitoring and evaluation activities should be embedded in the project and data collection burdens minimized. This can mean streamlining reporting and auditing requirements; limiting the number of outcomes measured; and limiting the kind of data collected (e.g. level of detail about participants). Technological solutions should also be considered wherever possible.
It is also important to be aware that these constraints also mean that a single evaluation is unlikely to answer all questions about a project. For example, the first randomized controlled trial (RCT) conducted by the OAD was an impact evaluation of in-school astronomy activities for children. To limit costs and implementation burdens, only 5 questions were asked: all of these questions concerned the children’s social identities and attitudes to in-group and out-group members. These questions were based on the project’s core theory. While the results did not support the theory, there were many unanswered questions: did the activities increase interest in science? did the students’ intergroup attitudes change in other ways, not captured by those questions? It could be that the project had positive effects in domains that were simply not measured! The experiment will thus need to be repeated with different outcome measures that capture all of the ‘intended outcomes’ that the OAD community expects—and with more open-ended explorations to identify possible unexpected outcomes! —before we can reach conclusions about whether it ‘worked’ in a broader sense.

5. When not to evaluate

Not all OAD projects need to or even should be evaluated, for example projects that

are not clearly defined (replicable project)
are too complex: evaluate replicable components, not complex combinations.
have no aims to change observable outcomes (e.g. projects aimed to improve inspiration, which is not an observable outcome)
test no clear hypothesis

Evaluation is a low priority when the results of our efforts are easily observable. It is also a low priority when our projects are conducting basic scientific research, developing but not distributing products or tools, or creating new data sets or analyses. In such cases, our projects’ self-reported progress data and existing protocols (such as for clinical trials) provide sufficient feedback for decision making and improvement.

Office of Astronomy for Development

Astronomy for a Better World!