Here’s a quick mockup of working with the JSON data (with nested array elements).
1) Create an HDFS directory:
hadoop fs -mkdir -p /<hdfs_path>/fitbit
2) Copy your files to the local Unix server (into your home directory, perhaps)
WINSCP / SCP / FTP / WGET, for example
3) Put your files into HDFS:
hadoop fs -put /<local_unix_path>/*.json /<hdfs_path>/fitbit
Then from the Hive CLI:
use <your database or just use default>;
drop table if exists fitbit;
create external table fitbit
( `activities-heart` array<struct<customHeartRateZones:string
,dateTime:string
,heartRateZones:array<struct>
,value:string>>
,`activities-heart-intraday` struct<dataset:array<struct>
,datasetIntrerval:string
,datasetType:string>
)
ROW FORMAT SERDE
‘org.openx.data.jsonserde.JsonSerDe’
STORED AS textfile
LOCATION
‘/<hdfs_path>/fitbit’
TBLPROPERTIES (
‘serialization.null.format’=”);
select * from fitbit;
I just made up a couple of examples of querying the data:
———————————————————————-
— How many calories did I burn each day?
———————————————————————-
select datetime
,sum(calories)
from (select expl.datetime
,expl.heartratezones.caloriesout as calories_array
from fitbit
lateral view explode (`activities-heart`) tbl as expl) z
lateral view explode (calories_array) tbl as calories
group by datetime;
———————————————————————-
— How many times did my heart beat each day?
———————————————————————-
select datetime
,sum(heartbeats)
from (select expl.datetime
,expl2.value as heartbeats
from fitbit
lateral view explode (`activities-heart`) tbl as expl
lateral view explode (`activities-heart-intraday`.dataset) tbl2 as expl2) z
group by datetime;