The types of data encompass open government data, collected or produced by public institutions (related to businesses, public tenders, demographics, transport), social media, sensor data (produced by mobile sensor devices, such as mobile phones or cars, or by fixed sensing devices), satellite technologies (e.g. location, communication, earth observation). More specifically, industrial modernisation can be captured by accessing data sources that are freely available to any internet user, such as company websites, product pages and industrial blogs, community-based data collection networks, news flows and social media. These data sources combined with novel data techniques (web scraping, crawling, text-mining), allow us to monitor industrial transformation at three key levels: the level of firms, industry (industrial value chain reconfigurations) and the broader industrial ecosystem (national or regional framework conditions).
At the level of firms, for example, the extent to which manufacturing enterprises offer innovative services, such as digital applications, remote maintenance or environmental audits, can be determined by text-mining company websites and looking for communication in these subjects. Similarly, the tracking of services and products offered by the firms in one industry and can capture digitalisation trends. One can also analyse key industrial blogs and capture relevant industrial trends linked to specific geographic locations. Another interesting source of information is job vacancies published online, as they reflect the most demanded skills and help to design effective education and labour market policy.1
Besides data available through the web, community-based collection networks gather venture capital firms and partners with the objective to track technological and new business model investments. Harnessing such data sources can reveal more about firm investments necessary for industrial modernisation and in real-time than any other statistical data. Through social media analytics, the data generated by online groups, polls, surveys, blogs and chat rooms can assist in examining social, political and economic issues. Thus, the application of social media data can be relatively wide. Companies such as mobile telecoms operators have also identified the business opportunity of sharing data with policymakers and can uncover trends and do foresight. The facets of industrial transformation that policy-makers usually wish to monitor include business practices, business models, firm investments, skills, technical knowledge of companies and the enabling conditions. What policy-makers are interested in are not just the trends but the type and depth of the trend and the related dynamics. The advantages of gathering and analysing such data compared to, for instance, survey data are manifold. Social media and the internet can reveal real behaviour instead of communicated intentions. Moreover, crawlers and web scrappers can be put in place in real-time meaning that dynamics can be also monitored over time. Another advantage comes from the possibility to connect all these data sources and reach a particularly high level of observations not possible or very costly through questionnaires.
Despite the increased possibilities in open and big data, its quality and utility in the policy process are sometimes questioned and the representativeness or the analytical methods are questioned. It is often asked, to what extent can we trust the results compared to traditional data sources? Based on the experience of numerous projects implemented so far in this area, we would argue that usability depends on how the results are interpreted and presented. Accuracy and representativeness of open and big data will naturally grow with time as digital services providers spread and more and more data are generated on the various digital interfaces. Nevertheless, analysts need to be cautious of what kind of conclusions they draw and the storyline behind the specific data source, the observations and processing techniques have to be explained. The process of cleaning data, improving quality and installing the right analytical algorithms including machine learning takes time and has to be devoted sufficient attention.
The use of big data for policy is, however, conditional on reaching a shared view on privacy, technical standards, intellectual property rights (IPR), transparency and inclusion.2 Only common efforts by data scientists and practitioners can integrate new data sources into policy-making and contribute to the development of a Code of Conduct, that accounts for data quality, IPR, privacy, skills development and that contains shared definitions to ensure effective knowledge sharing. In order to use big data for policy, stakeholders need to develop and agree on policies, regulations and codes of conduct for collecting, storing, processing and using data.
Despite the issues inherently included in tackling novel data sources as presented above, its potential to monitor industrial modernisation and predict future changes is powerful that several existing studies proved. The challenge will be to trust such data sources and collect indicators over time and allow for long-term trend analysis.
1 – Technopolis Group, Dialogic, University of Cambridge (2018). Study on the potential of servitisation and other forms of product-service provision for EU SMEs.
2- Poel et al (2015). Data for Policy: A study of big data and other innovative data-driven approaches