Ethereum Security Data Analytics - DSPYT
- Authors
- Name
- Pavel Fedotov
- @pfedprog
Security and regulatory compliance have emerged as the top objectives as cryptocurrencies move closer to being accepted by the general public. Forensic data analytics can help in tracking transactions and giving the proof to penalize wrongdoing in a data-centric society when a data-based protocol like Ethereum experiences fraud or other misbehavior. As a result, it's essential to develop data analytics solutions with case studies.
We have provided ideas for data collection and data collection methods in Data Collection Ideas post In this post, we utilize Python and Dune Analytics to collect and examine data from the Ethereum blockchain.
Lazarus OFAC sanctions data
The Office of Foreign Assets Control (OFAC) of the United States Department of the Treasury sanctioned virtual currency mixer Blender, which is used by the Democratic People's Republic of Korea to facilitate hostile cyber operations and the laundering of stolen virtual currency.
According to OFAC, a blockchain project connected to the online game Axie Infinity was the target of the largest virtual currency heist to date, worth nearly USD 620 million, on March 23, 2022, and Blender was used to process over USD 20.5 million of the illegal proceeds. Lazarus Group, a DPRK state-sponsored cyber hacking group, was responsible. The DPRK has turned to illegal actions, such as cyber-enabled heists from cryptocurrency exchanges and financial institutions, to raise money for its illegal weapons of mass destruction and ballistic missile programs as a result of severe U.S. and UN sanctions.
OFAC has identified four more virtual currency wallet addresses used by the Lazarus Group to launder the remaining stolen earnings from the Axie Infinity crime in March 2022. OFAC has published updated Lazarus Group list of Bitcoin and Ethereum addresses. This expands on OFAC's April 14, 2022, identification of the initial getaway wallet address and attribution of the DPRK's Lazarus Group as the culprits of the Axie Infinity crime.
We have further assembled the list of Ethereum addresses in text document on GitHub: Addresses List.
Dune Analytics
To analyze the nature of transactions and identify trends, we transaction data with the help of Dune Analytics queries and extract the related smart contract metadata with the help of python script.
Dune can help users trace the flow of funds between contracts with a visual and data-centric interface. If a user receives tokens in one contract and sends it to another contract, Dune will trace the fund and show the attempted hidden movement between contracts. Dune Analytic also provides interactive charts which can help users do in-depth analysis of blockchain data and conduct better analysis.
In particular, we queried to ethereum blockchain to obtain weekly deposits transactions Data, weekly withdrawals transactions data, granular tabular transactions data, ERC721 transactions data.
Next, we transform the data into tables and graphs with the help of Dune Analytics: North Korean Lazarus Group Dashboard.
Lazarus Group Total Eth Value Deposited
Deposit Share by weekly period
NFT ERC721 Deposits
Largest Transactions
Etherscan Flagged Latest Data Set
Etherscan is the primary blockchain explorer, search, API, and analytics portal for Ethereum, a decentralized smart contract platform. In order to promote equitable access to blockchain data, Etherscan gives developers with direct access to Etherscan's block explorer data and services via GET/POST requests. Because Etherscan's APIs are provided as a community service with no warranties, the data may be skewed or contain incorrect labels.
We obtained the following phishing labeled addresses data sets using Dune Analytics Queries: Weekly Aggregated Etherscan Flagged Latest Data Set, Tabular Etherscan Flagged Latest Data Set.
Next, we visualize the data set with the help of Dune Analytics: Etherscan Phishing Accounts Addresses Dune Analytics.
Etherscan Flagged Data Set Total Eth
Top Scam Addresses Receivers Latest
Amount of ETH sent to flagged addresses over time
Phishing & Scams going through Bitly Data Set
According to Ranjeet Vidwans, in case of shortened links scams, the real domain name of the website is hidden behind random letters and digits when URL shorteners compress a link. These random numbers include no information that would alert the receiver that they were visiting a malicious link or being sent to a spoofing website where their credentials may be stolen. Since the majority of people don't know any better or can't resist, compromised shortened URLs appear in phishing emails and on social media postings, which would normally discourage some people from clicking.
Here is a medium article detailing an example of phishing attack using Bitly links. Below we scrapped the Bitly data sets initially provided by 409h Dune Analytics User: Weekly Aggregated Bitly Deposits DataSet, Tabular Bitly Deposits DataSet, Weekly Aggregated Bitly Withdrawals DataSet, Tabular Bitly Withdrawals DataSet.
Bitly 2017-2019
On further research we noticed more recent similar scams: OpenSea phishing scam swindled millions in NFTs and Scam Alert: OpenSea Phishing Emails that require more thorough investigation.
More Collected Data Sets
Suspicious address data
OpenseaPhishing437073.csv - OpenSea Minting Scam
OpenseaPhishing1320652.csv - All ERC721 NFTs transactions to a suspicious address
Premium Dataset. Hourly Crypto Market Anomaly Score: Ocean Protocol
Ethereum NFT dataset on Kaggle
Ethereum Fraud Detection Dataset on Kaggle
Etherscan Phishing Accounts Addresses Data
Eth-Phish/Hack (Phish/Hack in Ethereum) Introduced by Zhou et al. in Behavior-aware Account De-anonymization on Ethereum Interaction Graph The sampled 2-hop subgraphs centered on Phish/Hack accounts on the Ethereum Interaction graph.
Related Posts
- Ethereum Security Data Collection Ideas
- Panel data with python – An easy introduction
- Blockchain data indexer with TrueBlocks
- What is a blockchain address?
- Advanced Realized Volatility and Quarticity
- Machine Learning with Simple Sklearn Ensemble
- How to illustrate log returns vs simple returns
- A How to EfficientNet Classification
- Cross-sectional data – An easy introduction