Abstract: The “20 Measures” recently released marks a landmark move towards better data governance in China. It proposes a system that separates the rights to hold data resources, process data, and manage data products. In addition, it mentions for the first time the establishment of an "algorithm review" system, which is a critical step in ensuring legal and regulatory compliance.
Ⅰ. DATA RIGHTS
The “20 Measures” innovates a property rights system that separates the rights to hold data resources, process data, and manage data products. This novel arrangement is appropriate for the digital economy.
Unlike other production factors such as land, labor, and capital, there are many participants in the formation of data that could blur its ownership. The separation of three rights discards the traditional perspective of "who owns the data" and instead focuses on the three forms of data, to each of which the corresponding ownership, processing and use rights, and management rights are assigned; defines the owner of data along three dimensions – public, corporate, and personal; and makes it clear that the basis for determining rights is the source of data and how it’s generated", which allows the rights and interests of data generators, data processors, and data users to be fully protected.
The three forms proposed in the “20 Measures” – data resources, data, and data products – serve as the foundation for separating property rights. By understanding the characteristics and functions of data, we can interpret the three forms as follows: data resources include information from public institutions, nature, businesses, and individuals that can be recorded in the form of data; data products are the results of collecting, cleaning, and processing data resources; and data assets are data products used in business activities.
Ⅱ. DATA CIRCULATION AND TRADING
Another breakthrough in the “20 Measures” is a full-process compliance and regulatory rule system based on the characteristics of data. It addresses six questions regarding data transactions and circulation: 1) What data can be circulated? 2) How to assess data quality? 3) What software and hardware are required for data circulation? 4) Where is data being circulated? 5) How to price data? And 6) how to control data quality?
The framework gives full consideration to the challenges posed by data that differ from traditional production factors in the supply, demand, and transaction links. Data contains personal privacy and commercial secrets and is non-exclusive, non-competitive, and non-exhaustive. Data cannot circulate in the market the way land, labor, and capital do due to information asymmetry.
Therefore, to build a sound data market, we must resolve difficulties facing both the supply and the demand side, and break the bottlenecks in data trading. The biggest problem is a lack of effective data supply and brand data. Data is in great demand, but its transaction and circulation are not very profitable while involving high compliance and security risks, so many suppliers do not want or dare to sell it. Meanwhile, the demand side has a hard time finding the proper data and the ideal supplier, as well as faces difficulties in data integration and data protection. The data transaction mechanism also needs improvement, such as in the area of dispute settlement.
The “20 Measures” seeks to resolve these bottlenecks by increasing effective supply and improving transaction efficiency, which is innovative and leaves space for future development. On trading venue, the “20 Measures” places emphasis on “building a standardized and efficient data trading venue”, and proposes to “promote the development of various types of data trading venues, with a focus on the role of national-level data trading venues in compliance regulation and basic service”. In the foreseeable future, we should pay special attention to over-the-counter (OTC) transactions which is an important complement to exchange trading.
So far, three major models for data trading have emerged: 1) the point-to-point model, where businesses trade data or data mining services under contract; 2) the data intermediary model, where data brokers bring together buyers and providers by collecting and mining data on one end and providing custom, value-added data or data-related services on the other; 3) the data marketplace model, where data marketplaces such as data exchanges serve as a platform for data transactions.
Global experience suggests that building data marketplaces is not easy, and most of them are small. According to Maximize Market Research, in 2021, data intermediaries around the world recorded a total transaction volume of around 257.2 billion USD, and the number is expected to reach 365.7 billion USD in 2029. Statistics from Grand View Research show that in 2021, the size of global data marketplaces stood at 780 million USD, with the B2B segment contributing 58% of total revenue and expected to reach 5.09 billion USD in 2030. Many of the data marketplaces have failed or closed, such as Microsoft’s Azure DataMarket (-2018), Kasabi (2010-2012), Austrian-based Data Market Austria, and Swivel.com.
Most data transactions today rely on brand data brokers, which are prevalent across sectors in the United States. For example, Corelogic provides over 99% of the data on residential and commercial real estates in the country.
The crux of difficulty in data transactions lies in the severe information asymmetry and weak trust foundation. A sound transaction model must solve this issue. The point-to-point model is feasible because it enables a direct match between the supply and demand side; the data broker model dominants the market as it can help reduce information asymmetry and build up trust; the data exchange model has only made limited progress since data products are hard to be standardized unless data exchanges can also act as data brokers or introduce a lot of data brokers. Given that the data trading market system has just been established, we should let the “efficient market” and the “well-functioning government” both play their due role. If data brokers can solve the problem, there is no need to differentiate on-exchange transactions from OTC transactions. It should be noted that OTC transactsions should also be regulated.
III. ALGORITHM AUDITING
The “20 Measures” proposes that the goal for data governance is to “build a safe, high-trust, inclusive, innovation-friendly, fair, open, and well-regulated environment for the data market”, and for the first time, it puts forward the idea of setting up an “algorithm review” system. Algorithm plays a huge role in enhancing business efficiency and credit risk control in the digital economy. Meanwhile, we keep hearing about issues like black box algorithms and algorithmic bias. The key problem is that most of the businesses and consumers in the digital economy cannot ascertain whether the algorithm is fair or not, and the regulator has been unable to truly realize look-through regulation. Therefore, “algorithm review” is an important step to ensure that business operations comply with laws and regulations.
However, the “20 Measures” does not specify who and how to conduct the algorithm review. The core of algorithm governance involves three dimensions: first, businesses adopt compliance management and set up guidelines of technology ethics to ensure that technology is used for good; second, an algorithm filing mechanism is established to at least make the algorithmic rules transparent to the regulator; third, the regulator or the commissioned third party organizes regular or irregular algorithm auditing, or initiate the auditing whenever there are complaints made. Going forward, a feasible path of algorithm review is “algorithm auditing” designed by the regulator and implemented by market institutions.
Algorithm auditing refers to a process where data of an algorithm used in a given environment is collected, based on which the legitimacy and fairness of the algorithm is assessed. In 2016, the US Executive Office of the President issued a report in which promising avenues for promoting algorithm auditing were outlined. In terms of industry practices, leading audit firms are also actively involved in algorithm auditing. For example, Deloitte has introduced algorithm auditors and created auditing toolkits, providing such services especially for government clients. China can draw on international experience to further clarify the path, framework and assessment criteria for algorithmic auditing so as to set up an algorithm auditing system.
Generally, there are two paths for algorithm auditing, one emphasizing algorithm code transparency and the other emphasizing the evaluation of input-output and results. The former arrangement requires businesses to provide their core algorithm and an independent third-party company or public institution to conduct assessment directly. Disadvantages of this arrangement include possible refusal from businesses to provide their algorithms for fear of trade secret exposure, and a lack of means to verify if the algorithm provided is the one actually used.
With algorithm auditing that emphasizes input-output and outcome, input auditing means the platform is required to specify which key dimensions are relied on to provide personalized service. Output auditing means that the platform is required to clarify the primary goals the algorithm is designed to pursue. Outcome auditing means the platform is required to report and assess the effectiveness of the algorithm.
Businesses should report on their use of algorithms in the following dimensions. First, what objectives does the company wish to achieve with the algorithm, which indicators are used to measure the performance of the algorithm, and to what extent the consideration of different stakeholders' interests is incorporated in the algorithm design? Second, what data sets are used in algorithm training, how are they collected or excluded, and how representative they are? Third, what algorithm techniques are used and why they are chosen? Fourth, to what extent has the algorithm achieved its intended goals, how accurate are the predictions, and how well are stakeholders' interests protected. Fifth, what are the arrangements for personal information protection and data security?
More indicators recommended to be considered in algorithm auditing include discrimination, effectiveness, transparency, direct impact, security and accessibility. Of course, the choice of indicators can be determined during implementation according to the features of the business.
Scoring algorithms in the above dimensions can help stakeholders and the public to have a comprehensive understanding of whether the algorithm adopted by a platform is in compliance with applicable regulations. It can motivate companies to use compliant algorithms not only for their own development but also the healthy growth of China's digital economy.
This article was released on CF40’s WeChat blog on December 30, 2022. The views expressed herewith are the author’s own and do not represent those of CF40 or other organizations. It is translated by CF40 and has not been reviewed by the author.