• Quick Search:

Account blockchains, such as Ethereum, do not use the UTXO data structure of Bitcoin. Unlike UTXO coin transactions that can involve as few as two or as many as thousands of addresses, coin transactions on account blockchains always involve only two addresses: sender and receiver.
 

On account blockchains, we can observe the following networks:

  1. Coin transaction network: Similar to the UTXO transaction network, this network is created from the coin (ether) transfers between addresses. Network edges only carry the native currency (coin) of the blockchain
  2. Token transaction networks: crypto asset trading networks that are created by internal smart contract transactions.
  3. Trace network: interactions between all address types. The name trace implies that a transaction triggers a cascade of calls to smart contracts or externally owned addresses.

    

Coin Transaction Network

The account transaction network contains three node types and coin transfers between them. As an ordinary transaction cannot be initiated by a contract address nor the NULL address, the network edges always start from an EOA. Specifically, transaction network edges are from i) EOA to EOA, ii) EOA to contract, and iii) EOA to NULL address. 

Network edges may have i) coin amount, ii) account nonce, iii) gas price and iv) timestamp features.

Token Transaction Network

A token transaction network has EOA, NULL, and smart contract addresses as nodes. We outline the following three types of transactions that a Data Scientist must know to analyze token networks. 

  1. The creation transaction that assigns an address for the token, initializes its smart contract and state variables. 
  2. A trade transaction that moves some tokens between addresses. 
  3. A management transaction that can only be initiated by the smart contract creator (or any address that the owner specifies). The transaction may delete the contract, or forward its balance (in ether or token) to another address. 

Trace network

Ethereum stores an ecosystem of addresses, smart contracts, and decentralized organizations. In transaction and token networks, we studied financial relationships between addresses. In this section, we now shift our focus to relationships, call-dependencies, inheritances, and other interactions between Ethereum addresses. A trace network stores these non-financial relationships where nodes are EOA and smart contract addresses, and edges are interactions between user-contract, and contract-contract pairs.

Dataset

We provide ether transactions from the Ethereum blockchain for the period between August 21st to October 1st, 2022. On an average day during this period, there were approximately 480,000 addresses, with 1 million edges connecting them. Ethereum changed its block creation process during this time, moving from the costly Proof-of-Work method to the more efficient Proof-of-Stake algorithm in two phases on September 9th and 15th, 2022.

Ethereum transaction network: Download contains approximately 480,000 addresses, with 1 million edges connecting

Date: From Aug-21-2022 To Oct-01-2022

Cite Our Dataset:

	@inproceedings{chartalistNeurips2022,
  author    = {Kiarash Shamsi and Yulia R. Gel and  Murat Kantarcioglu and Cuneyt G. Akcora},
  title     = {Chartalist: Labeled Graph Datasets for UTXO and Account-based Blockchains},
  booktitle = {Advances in Neural Information Processing Systems 36: Annual Conference
               on Neural Information Processing Systems 2022, NeurIPS 2022, November 29-December
               1, 2022, New Orleans, LA, USA},
  pages     = {1--14},
  year      = {2022},
  url       = {https://openreview.net/pdf?id=10iA3OowAV3}
  }

Temporal Nature

The temporal aspect is implied and vital in Blockchains in almost all tasks. For this reason, all our data is tagged with temporal information.

Price prediction models have found past price to be the most informative attribute. Even in time agnostic applications, such as network core decomposition, Blockchain researchers divide the transaction network into 24-hour snapshots (as the entire network is too big) and study them in isolation. In another example, malicious actors start using a ransomware money laundering pattern in time, and the ML models should learn the origin of the model and apply it in future cases. In this sense, blockchains are the most important temporal data source, and many models, such as time series analysis and anomaly detection can benefit from the availability of Chartalist data.

In Ethereum, we share the UNIX time stamp of the data.