The first international conference in Spain about Big Data with leading experts in data mining, data cleasing, distributing storing, cloud computing, sharing, data analysing and visualisation.Big Data is a technological challenge and a business opportunity. The conference Big Data Spain 2013 will introduce Big Data to developers and business managers in Madrid.
November 7th, 9:00AM
November 8th, 17:00PM
This workshop combines the interest of the Bitcoin data with the power of Google's solutions. Google BigQuery is the cloud tool and API that allows data explorers to focus on what ultimately matters: The data and its possibilities. In this workshop we'll get hands-on experience focused on datasets ripe for exploring. We'll quickly present the basic building blocks for participants to ask questions that might have never been asked before - and get answers in mere seconds.
Paradigma's Alberto Gómez Toribio @gotoalberto helped compile a dataset of Bitcoin's blockchain never released before.
NOTICE: Google BigQuery has a free monthly quota for querying you'll be able to use at the workshop. BigDataSpain will provide you a special code via email to apply for a $1000 credit - to take your querying even farther. Please apply before Friday, so your credit can get approved on time for the workshop.
Google Cloud Platform is offering all BigDataSpain attendees $2,000 of credit to get started building your web or mobile app. Apply at cloud.google.com/starterpack with the code we will give you at the conference.. This offer includes $1,000 in Google App Engine credit and $1,000 in Google Compute Engine credit.App Engine is a full development stack (PaaS), and Compute Engine lets you run workloads on Linux virtual machines (IaaS).
About the Bitcoin dataset
The dataset is made of text files - 600+ MB compressed - 5 GB uncompressed - contains information about Bitcoin Transactions IDs from from transaction f7883 (in 2012) onwards to some point in 2013. You may find a description of the blockchain at http://blockchain.info/. These are the fields of a given transaction: https://blockchain.info/block/0000000000000028e1c6cdbc69b61cb1db11523d3389b24725cb1ffa93bfcfb3
You can cat the information of individual transactions:
You see 6 records corresponding to 2 inputs y 3 outpus.. 2 x 3 = 6 as per https://blockchain.info/es/tx6eea75038c52a8d114c6ac56019da791ee7026521d989ef3109284708b5bd112
The fields are
- Origin address: account sourcing the money
- Destination address: account receiving the money
- Output of the previous transaction: helps calculate the balance of accounts
- Amount: 1 BTC is currently worth 1
- Date: Time when the transaction was recorded for the first time in GMT+0 with a tolerance of +/-3'
- Transaction ID: field to consolidate several senders and receivers of a transaction
- Block ID del bloque: the block where the transaction was inserted
- Block Height: ID of the relation, amount and order of the blocksThere are lots of things that can be queried on the dataset, from temporal patterns to filtering, etc.
These are some questions that can be answered with this dataset:
- Find (or predict) temporal patterns in the transactions, see world conflicts, government and economics notices and compare it with the data. Is Bitcoin a shelter investment?
- There are 1320 millions of euros as Bitcoins in circulation, but ¿which is the busiest account?
- Most people sleep from 23:00 to 08:00 and is active from 08:00 to 23:00. Which is the timezone in which there are more transactions? Which is the timezone in which there are more money amount volume in transactions?
- Are some accounts being used to make obfuscate movements? You can create patterns for detect it and estimate the volume of these transactions.