Cryptoassets such as cryptocurrencies and tokens are increasingly traded on decentralized exchanges. The advantage for users is that the funds are not in the custody of a centralized external entity. However, these exchanges are prone to manipulative behavior.
Wash-Trading Transaction Identification Dataset
We provide a labeled address data that was extracted from two Ethereum exchange sites, IDEX and EtherDelta. Each trade in the data contains two participating accounts and a token amount that was exchanged for a certain Ether amount.
Data Set Characteristics: graph files
Task: Classification - Given networks of two exchange sites IDEX and EtherDelta, identify which transactions are involved in wash-trading.
Data start date (UTC): 09/27/2017 10:57pm(IDEX) 02/09/2017 11:56pm(EtherDelta)
Data end date (UTC): 05/04/2020 1:22pm(IDEX) 05/04/2020 1:22pm(EtherDelta)
Number of trades: 5, 340, 537(IDEX) 3, 573, 512(EtherDelta)
Number of traders: 249, 911(IDEX) 323, 598(EtherDelta)
Number of tokens: 1, 206(IDEX) 6, 551(EtherDelta)
Full Dataset
All data files (3.3GB) are available on Box.
Cite Our Dataset:
@inproceedings{chartalistNeurips2022,
author = {Kiarash Shamsi and Yulia R. Gel and Murat Kantarcioglu and Cuneyt G. Akcora},
title = {Chartalist: Labeled Graph Datasets for UTXO and Account-based Blockchains},
booktitle = {Advances in Neural Information Processing Systems 36: Annual Conference
on Neural Information Processing Systems 2022, NeurIPS 2022, November 29-December
1, 2022, New Orleans, LA, USA},
pages = {1--14},
year = {2022},
url = {https://openreview.net/pdf?id=10iA3OowAV3}
}
Baseline (Edge Classification for Wash Trade Detection)
The wash trade detection task is approached with a two-step process. First, a candidate set of potential wash trades is determined by repeatedly identifying trade cycles in the form of strongly connected components, which are identified in multiple time windows. In a second step, the candidate set is evaluated on whether each participating account’s position (asset balances) has not, or almost not, changed. Performing the second step can be used as an evaluation criterion when a different candidate set generation mechanism is employed. For example, one could devise a random walk strategy or use graph convolutional networks to identify a candidate set and then evaluate the result set on whether it conforms to the legal definition.
@inproceedings{victor2021detecting,
title={Detecting and quantifying wash trading on decentralized cryptocurrency exchanges},
author={Victor, Friedhelm and Weintraud, Andrea Marie},
booktitle={Proceedings of the Web Conference 2021},
pages={23--32},
year={2021}
}