Decentralized Exchange Classification Dataset: AlphaCore
The Ethereum blockchain stores around 200M addresses and their transactions. The use of smart contracts has facilitated a new financial frontier in Decentralized Finance. Ethereum contains two types of addresses: externally owned and smart contract addresses. Externally owned addresses (EOA) have private keys that are managed by real-life entities. Some entities are ordinary users, whereas others are organizations such as blockchain exchanges. There are two types of exchanges; centralized exchanges (cex) manage users' Ethereum accounts by storing and using users' private keys. Decentralized exchanges (dex) act as bridges to convert an Ethereum asset to another one without storing users' private keys. Understanding the type of addresses and transactions has emerged as an important task in this large ecosystem of stablecoins, ERC20 and ERC721 (NFT) tokens, and decentralized exchanges.
We provide a labeled address data that was extracted from the Ethereum blockchain between Oct-16-2018 and May-04-2020. As there are thousands of token networks contained in these transfers, we first consider the top 100 networks by the number of transfers, excluding the token network USD Tether, as it is too large for efficient algorithm comparison (>4M nodes). We downloaded node labels in May 2020 from Etherscan.io, a prominent Ethereum block explorer, that curates and maintains address labels. In total, 296 addresses from 149 centralized and decentralized exchange addresses are listed publicly, which are likely used frequently. We also provide address labels (label, address, name, asset) for addresses in the 0.1 depth Alphacore of the stablecoin network.
Data Set Characteristics: graph files
Task 1: Classification - Given a token transaction network and a list of centralized (cex) and decentralized (dex) addresses, predict which other Ethereum addresses belong to an exchange.
Task 2: Core decomposition - Given a token transaction network, identify its cores by using node features. Use the list of centralized (cex) and decentralized (dex) addresses as your ground truth with the assumption that cex and dex addresses appear in the highest core of the network (see the AlphaCore article cited below for a justification of this assumption).
Challenge: The dataset is clean and without missing data. The only challenge should be large graph sizes of token networks.
Number of instances: 28 networks.
Classification target: Exchange addresses.
Main Dataset
The archive (8.3 GB) contains 28 Ethereum-based token networks in CSV format.
The exchange label file contains all address labels.
Stablecoins exchange label file contains all address labels for stablecoin network in the 0.1 depth.
Cite Our Dataset:
@inproceedings{chartalistNeurips2022,
author = {Kiarash Shamsi and Yulia R. Gel and Murat Kantarcioglu and Cuneyt G. Akcora},
title = {Chartalist: Labeled Graph Datasets for UTXO and Account-based Blockchains},
booktitle = {Advances in Neural Information Processing Systems 36: Annual Conference
on Neural Information Processing Systems 2022, NeurIPS 2022, November 29-December
1, 2022, New Orleans, LA, USA},
pages = {1--14},
year = {2022},
url = {https://openreview.net/pdf?id=10iA3OowAV3}
}
Baseline (Alphacore)
We employ k-core and weighted k-core decomposition as the baseline and add our recent AlphaCore decomposition results for comparison. The first insight is that data depth-based AlphaCore performs best compared to the other two core decomposition algorithms. The results indicate that addresses of centralized and decentralized exchanges can be discovered by using in and out-degrees of addresses. This is a promising result to automate address type discovery in Ethereum asset networks.
@inproceedings{victor2021alphacore,
title={Alphacore: Data Depth based Core Decomposition},
author={Victor, Friedhelm and Akcora, Cuneyt G and Gel, Yulia R and Kantarcioglu, Murat},
booktitle={Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery \& Data Mining},
pages={1625--1633},
year={2021}
}