As part of the 2015 Dowling Review of research collaboration between universities and the private sector, the Royal Academy of Engineering (RAEng) explored different ways to illustrate these collaborations in the UK. Technopolis Group was commissioned to build on prior work — for the RAEng and the Engineering and Physical Sciences Research Council (ESPRC) — which tested a new approach to obtaining company mentions from unstructured text information.
Our source of information was a database of 6,642 impact case studies submitted in the UK’s latest Research Excellence Framework (REF) exercise, in which universities describe the impact of their research and their different networks and collaborations.
The information contained in these impact case studies amounts to 10,200 full pages of written text. This called for the use of automated machine-reading techniques. We used the latest version of the Name Entity Recognition (NER) classifier published by the Stanford Natural Language Processing Group.
Tagging and classifying
NER algorithms take text and tag it with specific metadata, locating and classifying expressions into different sets of predefined categories or entities. Some of these algorithms are pre-trained with large volumes of textual data and use advanced techniques to detect names in the text and classify them by type of entity. Using a 3-class model for English language, the NER classifier tags organisations, locations and personal names present in a text. As is usually the case when using these types of ‘big data’ techniques, some old fashioned manual cleaning was also put in place.
The final result is a simple but quite powerful set of word clouds identifying the companies that interact the most with universities — or are mentioned the most by universities — in the UK.