Data preparation

The data in this report was downloaded from Researchfish in April 2018. It is important to note that all the data is correct as of April 2018, however as the data in the system is live then researchers may change their outputs in the future or the data could change due to further cleaning or mapping. AMRC does not have the in-depth knowledge of the awards which the individual charities have. Therefore cleaning of the dataset was at the very top-level and so the data used is based purely on what the principal investigators inputted into Researchfish. Any seemingly obvious outliers were further explored by AMRC by searching through online databases or contacting the funder. Additional steps taken to prepare, clean and supplement the dataset are outlined below.

Data cleaning

  • Test awards or any awards without submitted outcomes were removed.
  • Some outputs ask for specific dates (Publications, Spin outs and Further Funding). In these cases, outputs were removed if they were reported to occur before the start date of the award they were attributed to.
  • Unpublished publications were removed.
  • At one time, the ‘spin outs’ section was used to collect information on collaborations with large companies as well as information on spin outs from the PIs own lab. In this report, where possible, these collaborations which were reported as spin outs were removed.
  • Further Funding instances from the same organisation as the award it was attributed to were removed.
  • Often an output was attributed to more than one award from the same funder or from different funders. These were de-duplicated where possible when calculating unique numbers for the high level analyses (for example total number of publications produced by all funders). Duplicate outputs from the same principal investigator were excluded by removing any outputs which had identical output IDs (assigned to the output by Researchfish). However it is far harder to identify identical outputs which were created by multiple principal investigators as these will have different output IDs.
  • For the breakdowns of the outputs by different categories (e.g. health category), outputs were not de-duplicated. This means that often the sum of the number of outputs in categories will be higher than the figure for the total number of outputs.
  • If there was no month given with the year of an output, then January was used as the default. This is consistent with other funders who have analysed Researchfish data.

Categorisation and supplementation

  • Locations were standardized and countries were added based on the GRID ID of the organization or institution where relevant (for example RHT, Collaborations, Further Funding, Next Destination).
  • Organisations were categorized according to sectors for relevant outputs (Collaborations, Further Funding, Next Destination).
  • When funders upload awards to Researchfish, it is not mandatory to assign grant type classifications (e.g. project, PhD studentship) to them. Where possible AMRC have added these classifications to them based on data which has previously been submitted to AMRC by funders. These were also mapped onto the relevant “project”, “people” or “infrastructure” category. If there was no classification available then awards were coded as unknown. 1.7% of awards (112) had a grant type of unknown.
  • The coding of awards by HRCS codes was taken from various different data sources. Where possible, coding was taken from the 2014 UKCRC HRCS report dataset as these codes have been double-coded to ensure quality control. Some funders coded the awards themselves but if not then they were autocoded through Uber Research. In total, 92% of the awards were assigned health category codes and 79% of the awards were assigned research activity codes. The remainder of awards were either outside of the Health Research Classification System or were unable to be coded as the awards had insufficient information.

Additional notes

  • Due to rounding of numbers and percentages, figures may not always equal the totals or 100%.
  • If further funding amounts were given in different currencies then Researchfish automatically applied the exchange rates based on the rates at the point at which the output has been entered into Researchfish. The equivalent amount in Pounds Sterling is given in the download field [Funding Organisation Further Funding Value].
  • When calculating the time since award start date figures, the year was used. This means that even if the award started 31 December 2013 then it will be counted as starting in 2013, and outputs collected at May 2016 will be 3 years since award start date.
  • Collaborations can be made up of multiple partners. For this report, each separate partner is counted as a ‘collaborator’.