Technopolis Group has used machine-reading algorithms to classify and analyse text in a huge dataset — with the aim of illustrating the landscape of research collaborations between universities and companies in the UK.

As part of the 2015 Dowling Review of research collaboration between universities and the private sector, the Royal Academy of Engineering (RAEng) explored different ways to illustrate these collaborations in the UK. Technopolis Group was commissioned to build on prior work — for the RAEng and the Engineering and Physical Sciences Research Council (ESPRC) — which tested a new approach to obtaining company mentions from unstructured text information.

Our source of information was a database of 6,642 impact case studies submitted in the UK’s latest Research Excellence Framework (REF) exercise, in which universities describe the impact of their research and their different networks and collaborations.

The information contained in these impact case studies amounts to 10,200 full pages of written text. This called for the use of automated machine-reading techniques. We used the latest version of the Name Entity Recognition (NER) classifier published by the Stanford Natural Language Processing Group.

Companies menntioned in 5 or more impact case studies (175 companies)

Tagging and classifying

NER algorithms take text and tag it with specific metadata, locating and classifying expressions into different sets of predefined categories or entities. Some of these algorithms are pre-trained with large volumes of textual data and use advanced techniques to detect names in the text and classify them by type of entity. Using a 3-class model for English language, the NER classifier tags organisations, locations and personal names present in a text. As is usually the case when using these types of ‘big data’ techniques, some old fashioned manual cleaning was also put in place.

The final result is a simple but quite powerful set of word clouds identifying the companies that interact the most with universities — or are mentioned the most by universities — in the UK.

Qué hay de nuevo?

Todos los artículos Todas las noticias