Policy-makers and analysts are increasingly interested in “big data” (and “open data”) as they can give valuable insights into industrial trends. The capacity to process and understand these trends can be transformative.
The real potential of big data comes from the traceability of behaviour — the online interaction of web users (individuals, groups, organisations); for example, a behavioural advertising agency would base its campaigns on detailed user-profiles.
The types of data discussed in this article include
- Open government data, collected or produced by public institutions (related to businesses, public tenders, demographics, transport)
- Related social media accounts
- Sensor data (produced by mobile sensor devices, such as mobile phones or cars, or by fixed sensing devices)
- Satellite technologies (e.g. location, communication, earth observation programmes)
- Information published by companies (company websites, product pages)
- Community-generated data (specialist blogs, community-based data collection systems, news channels and social media posts)
Combining data sources — a powerful tool
Combining these data sources with new data-gathering techniques (web scraping, crawling, text-mining) allows us to monitor industrial transformation at three levels: at the level of individual firms; at the industry level; and at the broader industrial ecosystem (the national or regional) level. For example, the extent to which manufacturing firms offer innovative services, such as digital applications, remote maintenance or environmental audits, can be determined by text-mining specific company websites.
At the industrial level, value chains are worth mentioning as they are a useful monitoring tool. A value chain is the set of processes that need to take place for a specific product or service to get to market. A value chain might include materials, design, hardware, production, post-processing, logistics, marketing & sales, service and recycling. They differ for each product/service and in each market; in other words, they are not set in stone.
Similarly, the tracking of products and services offered by firms in a particular industry can show digitalisation trends. One can also analyse blog articles by key players, and capture relevant industrial trends linked to specific geographic locations.
Evaluating the skills market
Online job vacancies provide another interesting source of information, as they illustrate what skills are most in demand and can provide useful inputs for the design of an effective education and labour market policy.
Besides the data available through the web, community-based collection networks offer venture capital firms, and other interested parties, the instruments to track new business model investments. These real-time sources can reveal more about the investment necessary for industrial modernisation than any other statistical data.
Social media analytics and the data generated by online groups, polls, surveys, blogs and chat rooms, can help examine social, political and economic issues. The application of social media data can be relatively wide. Companies such as mobile telecom operators now realise that sharing data with policy makers is a business opportunity that can help to uncover trends and make forecasts.
What data do policymakers want to know?
The facets of industrial transformation that policy-makers usually wish to monitor include business practices, business models, company investments, skills, technical knowledge of companies and the enabling conditions for growth.
Policy-makers are not just interested in trends, but also in the type and depth of the trend and the related dynamics. The advantages of analysing such data compared to, for instance, survey data, are numerous. For example, social media analytics can reveal actual behaviour instead of publicly communicated intentions.
Moreover, crawler and web scraping programmes can be put in place in real-time, meaning that the dynamics can be monitored over time. Another advantage comes from the possibility of connecting all these data sources and reaching a particularly high level of observation — something that is not possible, or very costly, through questionnaires.
Not everyone accepts these new data sources
Despite the increased opportunities in open and big data, their value (and the associated methods) in the policy-making process are not always accepted. People often ask to what extent can they trust the results compared with traditional data sources. Based on our experience of numerous projects in this area, we would argue that usability depends on how the results are interpreted and presented. Accuracy, and representativeness, of open and big data, will naturally grow with time, as digital service providers increase and more and more data are generated.
The need for caution
Nevertheless, analysts need to be cautious about the kind of conclusions they draw. The story behind the specific data source must be clearly explained, as should the analysts’observations and processing techniques. The process of cleaning data, improving quality and installing the right analytical algorithms, including machine-learning, takes time, focus and investment. Data cleaning, the first step in preparing the data for analysis, is a particularly intense task. It involves detecting, correcting or deleting records that are corrupted, inaccurate, incorrect, incomplete or just irrelevant.
A code of conduct is needed
The use of big data for policy is, however, conditional on reaching a shared view on privacy, technical standards, intellectual property rights (IPR), transparency and inclusion. Only a joint effort by data scientists and practitioners can integrate new data sources into the policy-making process. This collaboration can contribute to the development of a code of conduct that should include data quality, IPR, privacy, skills development and shared definitions that would help in sharing the knowledge.
In order to use big data for policy, stakeholders need to agree on policies, regulations and codes of conduct for collecting, storing, processing and using data.
But despite the issues involved in using new data sources, their potential to monitor industrial modernisation and predict future change is powerful. This is proven by several existing studies (NESTA, 2014; Technopolis Group, 2018). The challenge will be to trust such data sources, collect indicators over time and enable long-term trend analysis.
Before joining Technopolis Group as a specialist in innovation and industrial policies, Kincsö Izsak worked as a policy officer at the European Commission’s Directorate-General for Enterprise and Industry in the Unit ‘Support for Industrial Innovation’.
Dr María del Carmen Calatrava Moreno is a consultant in the Vienna office of Technopolis Group and specialises in ICT and the use of computer-based methods to extract knowledge from large amounts of data.