Hive – FitBit JSON example

Here’s a quick mockup of working with the JSON data (with nested array elements).

1) Create an HDFS directory:
hadoop fs -mkdir -p /<hdfs_path>/fitbit

2) Copy your files to the local Unix server (into your home directory, perhaps)
WINSCP / SCP / FTP / WGET, for example

3) Put your files into HDFS:
hadoop fs -put /<local_unix_path>/*.json /<hdfs_path>/fitbit

Then from the Hive CLI:

use <your database or just use default>;

drop table if exists fitbit;

create external table fitbit
( `activities-heart` array<struct<customHeartRateZones:string
,dateTime:string
,heartRateZones:array<struct>
,value:string>>

,`activities-heart-intraday` struct<dataset:array<struct>
,datasetIntrerval:string
,datasetType:string>
)

ROW FORMAT SERDE
‘org.openx.data.jsonserde.JsonSerDe’
STORED AS textfile
LOCATION
‘/<hdfs_path>/fitbit’
TBLPROPERTIES (
‘serialization.null.format’=”);

select * from fitbit;

I just made up a couple of examples of querying the data:
———————————————————————-
— How many calories did I burn each day?
———————————————————————-
select datetime
,sum(calories)
from (select expl.datetime
,expl.heartratezones.caloriesout as calories_array
from fitbit
lateral view explode (`activities-heart`) tbl as expl) z
lateral view explode (calories_array) tbl as calories
group by datetime;

———————————————————————-
— How many times did my heart beat each day?
———————————————————————-
select datetime
,sum(heartbeats)
from (select expl.datetime
,expl2.value as heartbeats
from fitbit
lateral view explode (`activities-heart`) tbl as expl
lateral view explode (`activities-heart-intraday`.dataset) tbl2 as expl2) z
group by datetime;