About Us

About the Authors:

Mark Grover:

Mark is a committer on Apache Bigtop and a committer and PMC member on Apache Sentry (incubating) and a contributor to Apache Hadoop, Apache Hive, Apache Sqoop and Apache Flume projects. He is also a section author of O'Reilly's book on Apache Hive - Programming Hive. Mark also served as a reviewer for O'Reilly's Apache Sqoop Cookbook. He currently works as a Software Engineer at Cloudera.

Mark has presented at numerous conferences, including OSCON 2013, and meet-ups, a complete list of which is included in his LinkedIn profile. Mark presents on Hive, Bigtop but more importantly how the various projects in the Hadoop ecosystem integrate with one another to function as a well oiled machine. Mark has also written a series of blog posts on Apache Hive for Safari Books Online:
Blog post 1: Introduction to Hive
Blog post 2: Tips on partitioning data in Hive
Blog post 3: Tips on using joins in Hive
Blog post 4: Tips on writing UDFS in Hive

He has also written a blog post on Apache Zookeeper for IBM developerWorks, which can be viewed here.

Videos of Mark’s interviews and presentations:
Video interview with O’Reilly Programming is available here.
Hive and HCatalog 101 presentation to New York City Hadoop User Group (NYC HUG) is available here.
Presentation on Cloudera Impala, a real-time query engine for Apache Hadoop to the Bay Area Hadoop User Group (Bay Area HUG) is available here.

LinkedIn, Twitter, GitHub

Ted Malaska:

Ted is a Senior Solutions Architect at Cloudera helping clients be successful with Hadoop and the Hadoop ecosystem. Previously, he was a Lead Architect at the Financial Industry Regulatory Authority (FINRA), helping build out a number of solutions from web applications and Service Oriented Architectures to big data applicatons. He has also contributed code to Apache Flume, Apache Avro, Yarn, and Apache Pig.

LinkedIn, Twitter, GitHub

Jonathan Seidman:

Jonathan is a Solutions Architect at Cloudera working with partners to integrate their solutions with Cloudera’s software stack. Previously, he was a technical lead on the big data team at Orbitz Worldwide, helping to manage the Hadoop clusters for one of the most heavily trafficked sites on the internet. He's also a co-founder of the Chicago Hadoop User Group and Chicago Big Data, technical editor for Hadoop in Practice, and has spoken at a number of industry conferences on Hadoop and big data, including:
Extending Your Data Infrastructure with Hadoop, Big Data TechCon, 2013
Integrating Hadoop Into the Enterprise, Hadoop Summit, 2012
Extending the Enterprise Data Warehouse with Hadoop, Chicago Data Summit, 2011, which can be viewed here.
Distributed Data Analysis with Hadoop and R, OSCON and Strangeloop, 2011.
Hadoop and Hive at Orbitz, Hadoop World, 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB, 2010.

Selected publications include:
SectorFileSystem: Running Hadoop MapReduce over the Sector File System, Jonathan Seidman and Collin Bennett, Open Cloud Consortium.
Implementing a Cross-language Interface to Multiple Distributed File Systems Using Thrift, Jonathan Seidman, Collin Bennet, Robert Grossman, Open Cloud Consortium.

LinkedIn, Twitter, GitHub

Gwen Shapira:

Gwen is a Solutions Architect turned Software Engineer at Cloudera and a committer on Apache Sqoop. She has 15 years of experience working with customers to design scalable data architectures. Formerly a senior consultant at Pythian, Oracle ACE Director and board member at NoCOUG. Gwen is a frequent speaker at industry conferences and maintains a popular blog.

Her blog can be viewed here.
Various slides from her presentations can be found here.
Article at Oracle Magazine
5 minute Ignite video
Presenting, "Big Disasters" at OakTable World 2012.
Presenting “Making Sense of Big Data” at NYC Database Week.
Presenting “ETL with Hadoop” at Surge 2013.

LinkedIn, Twitter, GitHub