1) Setup the big data infrastructure for datalake team, including environment, spark, hadoop and storm cluster setup。
2) Created and designed whole architecture for storm topology and spark dstream of data pipeline and transformations analytics.
3) Created the robust Spark streaming system to process massive real time 4-5T big data volume for complex analytics job.
4) Used the Spark Dstream to get the data from kafka, modify it and dump to HDFS for EDW analysis,
5) Created the Hadoop map reduce and spark job to analyze the complex logic of the petabytes data.
6) Developed the data pipeline architecture for data transformation.
7) Worked with Hive and Hbase to process the simple data aggregation and analytics.
8) Created the simple pig ETL job to analyze the data.
9) Used the spark machine learning to recommend the games to potential clients.