Advancing UN Comtrade for physical trade flow analysis: Addressing the issue of missing values (2022)


Resources, Conservation and Recycling

Volume 186,

November 2022

, 106525


Trade contributes to the redistribution of resources among countries and regions. One of the most widely used data sources is the United Nations Commodity Trade Statistics Database (UN Comtrade). Nevertheless, data issues still limit its validity, trustworthiness, and use. A critical issue is the lack of commodity weight information. It relies heavily on data quality to determine the global market's suppliers and consumers. Thus, trade needs reliable methods for filling in the missing physical values. Using statistical approaches, we estimate missing physical values for commodities, countries/areas, and years. The impact of handled data on countries and commodities varies considerably; for example, South Africa's net weight rose by 117% and clocks’ and watches’ by 63% (HS0, 1988–2019), compared with their original data. The directions of net trade flows for 10594 records have been reversed. Finally, the bilateral asymmetry problem improved. Overall, this paper introduces a novel approach for improving data accuracy.


Trade has long been a subject of great interest in various fields (Chenetal.2019). In recent years, there has been increasing interest in evaluating the impact of trade on people's lives and the planet (Dalinetal.2017; Zhangetal.2017). Trade plays a crucial role in redistributing resources and wealth among countries and regions (Xuetal.2020b; Yangetal.2020). For example, trade is vital to circular economy development (Wangetal.2020b). Over the last few decades, resource trade has resulted in a growing share of global resources extraction, reaching 24.8% in 2017 (Figure A1). Therefore, it is necessary to quantify the material flows accompanied by trade, which involves diverse processes.

The United Nations Commodity Trade Information Database (UN Comtrade), established in the early 1960s, is one of the most extensive and accurate international trade statistics databases. For more than 50 years, it has supplied a plethora of trade information to policymakers, business communities, academic institutions, and the general public (Comtrade2019). It stores standardized annual, and monthly trade statistics supplied by countries/areas and reflects detailed international commodity flows between partners, accounting for up to 99% of global merchandise trade (Comtrade2019). Many kinds of research, including our earlier ones, have shown, however, that missing values in UN Comtrade pose statistical issues that can lead to considerable trade misunderstanding (Espinozaand Soulier 2016; Nakajimaetal.2018; Shietal.2021) and the severity of the situation worsens as the proportion of missing values rises. This problem exists for all commodities, countries, and years (Table A1). The lack of data will result in underestimating material flows and environmental influence. However, it will also cause net flow reversals (e.g., shifting from net importers to net exporters). There is thus a pressing need to address the problems associated with missing values in the UN Comtrade database.

In recent years, the proportion of missing values (namely missing monetary values, missing physical values, and both missing, see Figure A2) has increased. The missing weight data is the most common, and in this study, we focus on missing physical values. The following reasons may cause missing physical values: 1) Little focus on trade weight. Custom reports of various countries mainly focus on money rather than weights; 2) Wrong unit conversion (Breweretal.2020a; Breweretal.2020b). Some commodities are not reported in kilograms (kg, for example, natural gas). It may result in errors when converting their unit to kg; 3) The lack of unit conversion. Custom data are given in "quantity"; however, UN Comtrade data are released in "net weight". UN Comtrade may overlook some data when filling out the "net weight", resulting in "false data missing".

Previous studies have used the global average price of a specific commodity or the linear regression method (Dittrichand Bringezu 2010; Dittrichetal.2012; Farhan2015). In addition, UN Statistics Division (UNSD) estimates missing data using the median and unit prices (United Nations Statistics Division September,2017). These methods, however, ignore the varieties between commodities, reporters, and years, limiting their usefulness to the UN Comtrade database. For instance, these methods assume that the price of a car is constant worldwide, which is inconsistent with reality. Furthermore, these methods are inapplicable to custom data reported in kg because trade weight can be determined simply from "quantity".

This paper is the third one in this series on addressing data quality issues of the UN Comtrade database. Our first paper presents the status quo, causes, existing solutions, and challenges of data quality issues (Chenetal.2022). The second paper establishes an improved framework to identify outliers (Jiangetal.2022). This third paper aims to develop a framework to handle missing physical values in UN Comtrade for all commodities from 1988-to 2019. We also quantify the data quality improvements to examine the influence of missing physical values. The rest of this paper is structured as follows. The second section summarizes the primary methods used in this study. The third section presents the results and a brief critique of the findings. The fourth section compares these methods and discusses the limitations of this research. The fifth section highlights the main conclusions of this study.

Section snippets

The classification of data and data miss

UN Comtrade data includes statistics from the original unit of measure (as indicated in Table1). Missing values include missing monetary value, missing physical value, and both missing. The missing monetary values are primarily the result of omissions; however, the reasons for missing physical values are more complicated (already introduced in the first section). This study only shows the results of missing physical values. With the same framework, missing monetary values can be handled


In the Material Flow Analysis (MFA) research, the trade records and the trade weight are very important factors. Trade records can characterize the network architecture developed by countries using graph theory. Trade weight can reflect the complex changes in material flow between countries. Firstly, this section shows the distribution of the missing values by time, country, and commodity, which demonstrates the change in the trade records compared to discarding missing physical values

Method comparison

Under most circumstances, one method is challenging to adapt to all countries and commodities. As a result, we compare the method performance by commodities and countries. All method performance results are available on figshare (Zhangetal.2022).

The method selection of different commodities in the same country is different based on the performance evaluation results. Here we take China as an example since this pattern can be identified in other countries. As shown in Fig.6, Method 1 works


This paper presented a framework consisting of seven methods that can be used to deal with the missing values of all commodities in the UN Comtrade database. One of the main advantages of our framework is that it is based on the estimation of the data distribution, which can better distinguish the heterogeneity of all commodities in different countries and different years, which increases the accuracy of the estimation. These estimation procedures significantly improved the data quality of the

CRediT authorship contribution statement

Zhihe Zhang: Resources, Investigation, Methodology, Validation, Formal analysis, Writing – original draft. Zhihan Jiang: Investigation, Methodology. Chuke Chen: Resources, Formal analysis, Visualization, Writing – review & editing. Xu Zhang: Writing – review & editing. Heming Wang: Conceptualization, Supervision, Writing – review & editing, Funding acquisition, Methodology, Formal analysis. Nan Li: Conceptualization, Supervision, Writing – review & editing. Peng Wang: Methodology, Writing –

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.


This research was supported by the National Natural Science Foundation of China (No.41871204, No. 71961147003, No. 52170184, and No.52070034).

References (33)

  • Wei-Qiang Chen et al.Sustainable cycles and management of plastics: a brief review of RCR publications in 2019 and early 2020

    Resour., Conserv. Recycl.


  • B. Chen et al.Global energy flows embodied in international trade: a combination of environmentally extended input–output analysis and complex network analysis

    Appl. Energy


  • Brewer, T.D.; Abbott, D.; Lal, N.; Sharp, M.; am Thow; Andrew, N.L. (2020a): A method for cleaning trade data for...
  • Brewer, T.D.; Andrew, N.L.; Sharp, M.K.; am Thow; Kottage, H.; Jones, S. (2020b): A method for cleaning trade data for...
  • Chuke Chen et al.

    Advancing Un Comtrade for physical trade flow analysis: review of data quality issues and solutions

    SSRN J.


  • Comtrade, U.N. (2019): UN Comtrade. In United Nations Commodity Trade Statistics...
    • Advancing UN Comtrade for Physical Trade Flow Analysis: Review of Data Quality Issues and Solutions

      2022, Resources, Conservation and Recycling

      International trade has been considered a critical driving force of material flows and their environmental pressures, which has been a global research hotspot. The United Nations Commodity Trade Statistics Database (UN Comtrade) is the original and probably the most widely-used data source to support the physical trade analysis. However, data discrepancies have been discovered in UN Comtrade, which may lead to diametrically conflicted conclusions if not properly addressed. To promote applications of UN Comtrade, this article reviews data statistics criteria and preprocessing procedures, discusses three main data quality issues (outliers, missing values, and bilateral asymmetries), and reviews methods to explore adequate options. It is revealed that data quality issues existed in data of almost all the commodities, reporters, and periods, but existing methods are subject to certain limitations. Furthermore, this article presents a brief introduction of our following work on addressing these issues.

    View full text

