Big Data and Data Lakes Agenda

 

Integrating Relational Data with Hadoop and Spark

Monday, 8:30 AM - 10:10 AM | Lab 10 | Session ID: 4636A

Hadoop systems are becoming more popular for storing and scalable processing of all kinds of data. At the same time, the desire to analyze these data in Hadoop efficiently with SQL is increasing. This lab shows how this can be achieved with IBM Big SQL. In addition, there is a need for more complex analysis and advanced analytics like machine learning with these data stored in Hadoop and relational database systems. Spark is a framework that is particularly well suited to this task. In this lab, we will show how processing in Spark can be integrated very easily with Big SQL and IBM Db2 Warehouse.

Speakers:

 

Managing and Analyzing Data in the Cloud with Cloud Object Storage and Cloud SQL

Monday, Mar 19, 2018 | 8:30 AM - 9:10 AM | Breakers K | Session ID: 1478A

Meet the experts for Cloud Object Storage and Cloud SQL to discuss your use cases, requirements and ideas for cloud-based and cloud native data management and analysis.

View this session online

Speakers

Torsten Steinbach, IBM
Michael Factor, IBM

The Big Data Dudes: Beginners Guide to Big SQL

Monday, 10:30 AM - 12:10 PM | Lab 21 | Session ID: 2903A

Join Aaron and Henry, the dynamic duo behind The Big Data Dudes, for this fun-filled lab session where you will get hands-on time with Big SQL on HDP. The guys will briefly introduce you to this SQL-on-Hadoop engine and then fully immerse you in a lab. You will utilize a variety of different Big SQL interfaces, write and execute Big SQL queries, and more. At a Big Data Dudes lab you never know what's going to happen next... This is going to be HOT!!!

Speakers:

  • Henry L. Quach, IBM
  • Aaron Ritchie, IBM

 

Unified Governance, Cognitive Classification and Metadata Management for Regulatory Compliance

Monday, Mar 19, 2018 | 10:30 AM - 11:10 AM | Breakers L | Session ID: 8718A

This session will illustrate an integrated solution for unified governance, cognitive classification, and metadata management using IBM InfoSphere 11.7, StoredIQ, governance/quality/curation dashboards, IBM InfoSphere Information Governance Catalog (IGC)/OpenIGC, and IBM InfoSphere Information Analyzer. The use case is to support a company's compliance group to address GDPR regulations. This session will clarify and discuss best practices on the following topics: unified governance components, unstructured data suite, metadata management, governance/lineage, profiling and data quality, reporting dashboard, and Hadoop integration.

View this session online

Speakers

Barry S. Rosen, IBM
Russell Anderson, IBM

Save Time and Effort Mapping Data in Jupyter Notebooks

Monday, Mar 19, 2018 | 10:30 AM - 11:10 AM | Reef F | Session ID: 2332A

The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. The Jupyter stack is built from the ground up to be extensible and hackable. The Developer Advocacy team at IBM Analytics has developed an open source library of useful time-saving and anxiety reducing tools we call "Pixiedust." It was designed to ease the pain of charting, saving data to the cloud and exposing Python data structures to Scala code. I'll talk about how I built geospatial visualization into Pixiedust, putting data from Spark-based analytics on maps using Mapbox GL.

View this session online

Speaker

Raj Singh, IBM

Banking Guidebook: Succeed in Digital Transformation with Cloud, Big Data and Advanced Analytics

Monday, Mar 19, 2018 | 11:30 AM - 12:10 PM | Breakers L | Session ID: 3799A

This session delivers a working guidebook and case studies that provide a roadmap for banks to be successful driving digital transformation by leveraging a combination of cloud, big data, advanced analytics, and cognitive microservices including identity resolution and match processing engines. These methods allow SMB banks to compete against big banks and coming FinTech disruption. Key methods, architectures and successful deployments driving management of customer relationships, financial crimes, risk and regulatory compliance will be presented. Lower costs and gain leverage with smaller teams, and drive to real-time digital customer relationships. Learn and gain confidence from those that are doing it today.

View this session online

Speakers

Timothy G. Davis, IBM
Derek Baughman, First Hawaiian Bank

The Future of Enterprise Analytics and Data Management

Monday, Mar 19, 2018 | 12:30 PM - 1:10 PM | Breakers H | Session ID: 3619A

Today, businesses need real-time predictive analytics insight derived from transactional data as it flows into the organization. Critical initiatives such as customer interaction, counter-fraud, and risk management require real-time facts. Data latency and movement work in opposition to real-time decisions. This session will discuss IBM technologies like DB2 Analytics Accelerator, Db2 Query Management Facility (QMF) and more that are helping to deliver extraordinary analytics insight in unison with the most current transactional data and new offerings like Machine Learning for z/OS. Learn how IBM is continuing its investment in the Db2 utilities and tools portfolio to answer your needs today?and in the future.

View this session online

Speaker

Udo Hertz, IBM

Enhanced Productivity through Data Preparation Tools on the Watson Data Platform

Monday, Mar 19, 2018 | 12:30 PM - 1:10 PM | Reef E | Session ID: 4053A

Learn features of the Watson Data Platform that cut down the time needed to create high-quality data sets through rapid iterations over data import cycles. This session is applicable to data scientists, data engineers and data analysts. Topics covered through an in-depth demo include metrics and data visualization, quick UI operations and advanced coding operations to collaboratively build complex data shaping flows that can reshape your data of any type and size.

View this session online

Speaker

Sonali Surange, IBM

IBM Db2 Family Data Virtualization Deep Dive

Monday, Mar 19, 2018 | 1:30 PM - 2:10 PM | Breakers H | Session ID: 4680A

Data federation is more essential than ever when it comes to making the most of varied data types and sources. The extended IBM Db2 family of database and data warehouse products provides virtualization capabilities that increase your flexibility across the family and across third-party and open source data repositories. Capabilities such as these are essential when you're facing more and more data sources all the time. We'll begin reviewing this solution by looking at access, security, SQL, monitoring and administration. We?ll also look at the Common SQL Engine and how unified data access is fueled by built-in data federation capabilities. This will include also an overview of IBM Big SQL, which works with Hadoop implementations.

View this session online

Speaker

Michael D. Connor, IBM

Unify Your IBM Db2 Warehouse with Hortonworks Data Platform to Enhance Analytical Insights

Monday, 2:30 PM - 4:10 PM | Lab 21 | Session ID: 1221A

Accelerate your enterprise's data-driven decisions by taking advantage of the partnership between IBM's Big SQL hybrid SQL engine and Hortonworks' Apache Hadoop ecosystem. Using IBM Fluid Query to limit data movement for blistering-fast performance, attendees will learn how to easily marry data between their IBM Db2 Warehouse private cloud and Hortonworks Data Platform. Cognos Analytics will be used to visualize query federation between the two platforms.

Speakers:

  • Ashwin Balu, IBM
  • Piotr Pruski, Hortonworks

 

Using IBM Big SQL and Apache Spark to Create a Self-Service Data Lake

Monday, Mar 19, 2018 | 3:30 PM - 4:10 PM | Breakers H | Session ID: 3648A

Landing data in Hadoop typically requires a schema to be defined so that the data can be accessed. This means business users must rely on administrators to first create a table and then grant privileges. But even for the database administrator there are challenges. What format is the data stored in? What are the datatypes? How many columns compose the data? What should be the permissions on the datasets? The integration of Apache Spark and IBM Big SQL allows for users to directly query data without having to define or know the schema of the data and without requiring an administrator. This session shares how to implement this capability, as well how to implement security, auditing, monitoring and workload management of these queries.

View this session online

Managing Data across Its Lifecycle the IBM Optim Way

Monday, Mar 19, 2018 | 3:30 PM - 4:10 PM | Breakers I | Session ID: 5616A

IBM's Optim portfolio of solutions targets the challenge our customers face with managing data across its lifecycle?from the time it is created to when it gets archived. Along the way, it also needs to be protected. In this session, we dive deeper into the IBM Optim portfolio to discuss the latest being offered and to peek into the roadmap as well.

View this session online

Speakers

Peter Costigan, IBM
Ken Berridge, IBM

Driving Business Value with New Approaches and Data-Driven Analytics Design

Monday, Mar 19, 2018 | 3:30 PM - 4:10 PM | Breakers L | Session ID: 8524A

In today?s business landscape, data and analytics are essential to your business differentiation, with new initiatives driven by competition, modernization, efficiency, and compliance. But traditional approaches have been complex, difficult and brittle. You can?t afford to wait or invest in long-running projects of unproven value. In this session, leveraging real-world examples, you will learn new approaches to optimize your data design in support of a focused business objective today, while building the foundation to support future business that minimizes cost and redesign.

View this session online

Speakers

Paul Christensen, IBM
Stephen Romaine, IBM

How Mizuho Bank is Developing Its Next Data Lake with IBM Machine Learning

Tuesday, 10:30 AM - 11:10 AM | Lagoon K | Session ID: 2448A

To accelerate its digital strategy for business development, Mizuho Bank is expanding its data lake with a digital sandbox called “BigData lab” to evaluate and develop two new use cases : 1) Analysis of unstructured data (web logs/social data) with IBM Watson Explorer and Db2 Warehouse on Cloud to gain new insights; and 2) Machine learning to enhance the quality and speed of analytical applications. In this session, Mizuho Bank's IT manager summarizes the architecture and shares results of the PoCs, along with the bank's midterm plan for its next data lake with hybrid cloud and machine learning for advanced digital strategy.

Speakers:

  • Ikumi Iemura, Mizuho Bank
  • Shutaro Nonami, IBM
  • Kewei Wei, IBM
  • Avijit Chatterjee, IBM
  • Toshihiko Kubo, IBM

 

Analytics Lifecycle Demo

Tuesday, Mar 20, 2018 | 11:30 AM - 11:50 AM | Cloud and Data Campus Think Tank A1 | Session ID: 9025A

In this session, we?ll use a demonstration to explain the importance of the components that make up the analytics lifecycle, and how they can all work together for a continuous cycle of improvement in a digital organization. Come see how the analytics lifecycle can work for you to drive operational improvements, increased revenue, and better data-driven decision-making.

View this session online

Speakers

Brad Molzen, IBM
Gio Carraro, IBM

Harnessing the Data Tsunami at Marriott

Tuesday, Mar 20, 2018 | 11:30 AM - 12:10 PM | Breakers H | Session ID: 3287A

Marriott is undertaking a quantum leap in the use of data and analytics to transform the guest experience and drive our vision to become the world?s favorite travel company. Marriott is growing rapidly, fueled by acquisitions, organic growth and increased global travel?a perfect storm of torrents of data and real-time analytics. We will describe the challenges in connecting meaningfully with hundreds of millions of consumers across 100-plus countries with different languages, customs, and expectations. We provide insight into our Marriott data platform, including open-source and IBM products such as Db2 Warehouse and IBM Big SQL. We highlight the accomplishments and challenges as we deliver a high-performance, scalable data platform.

View this session online

Speakers

Timothy G. Davis, IBM
Eric Tagliere, Marriott International

Data Told Me the Earth is Flat

Tuesday, Mar 20, 2018 | 11:30 AM - 12:10 PM | Breakers K | Session ID: 7191A

In this provocative talk, Jason will cover common follies in adopting a data-driven culture without appropriate data governance processes.

View this session online

Speaker

Jason Federoff, USAA

Honda R&D's Big Data Analytics Sandbox Activities with IBM Data Science Experience and IBM Cloud

Tuesday, Mar 20, 2018 | 12:30 PM - 1:10 PM | Breakers H | Session ID: 1195A

Honda R&D has chosen IBM Data Science Experience as a sandbox environment for data analytics, with an IBM Cloud database as a data lake. Honda R&D has succeeded with various PoCs and prototype projects in the past. Yet they faced challenges with sharing their knowledge, data and information, and had been struggling with a huge silo in big data analytics activities. Now, they have begun an enterprise-wide big data project to enable their activities and effort to become enterprise assets. In this sandbox, many engineers in Honda R&D can experiment by trial-and-error with analytics. Also, Honda is trying to create a process to collect all data in Honda R&D. This talk will share Honda?s long analytics journey.

View this session online

Speakers

Kyoka Nakagawa, Honda R&D Co., Ltd.
Keisuke Nishio, IBM

Empower Your Business with a Hybrid Cloud Data Architecture

Tuesday, Mar 20, 2018 | 12:30 PM - 1:10 PM | Cloud and Data Campus Large Theater A | Session ID: 8104A

Among the best ways to deliver a business advantage are to be more efficient at new application development, and to leverage all your data sources for deeper insights. Success depends on your ability to converge data on the right data architecture for increased responsiveness. We will explain the benefits of a common database and data warehouse strategy so you can land transaction data, then easily analyze or move it without application rewrites. See how AMC Networks has partnered with IBM to build an enterprise hybrid cloud platform for industry-leading insights into audience preferences. Find out how to get your development projects off to a solid start on this flexible hybrid cloud platform that puts data to work for you.

View this session online

Speakers

Thomas Chu, IBM
Matthias Funke, IBM

Building a Hadoop Environment to Centralize Reporting and Analytics: Insight from Verizon

Tuesday, Mar 20, 2018 | 1:30 PM - 1:50 PM | Cloud and Data Campus Think Tank B | Session ID: 8019A

Let Verizon show you how to build a Hadoop environment from which you can also provide centralized reporting and analytics. This approach will remove the need for data marts, which can result in loss of data and erosion of data quality, not to mention the additional cost to deploy and manage data marts. A centralized, secured Hadoop environment also gives business users access to quality customer insights, expanding its value beyond a narrow group of IT users.

View this session online

Speakers

Pandit J. Prasad, IBM
Randy Torres, Verizon

ODPi: Unifying Open Metadata and Governance

Tuesday, Mar 20, 2018 | 2:30 PM - 3:10 PM | Breakers K | Session ID: 7518A

This panel will describe trends in metadata and governance, and will explain the work of the members of ODPi to build an open ecosystem for interfaces, repositories, tools and experts to collaborate and exchange content while adhering to governance guidelines and imperatives. Join this session if you are responsible for metadata and governance, or for implementing General Data Protection Regulation (GDPR) in your institution. Chief Data Officers and staff seeking to create value from their data will benefit from the discussion, as well as metadata and governance builders.

View this session online

Speakers

Alan Gates, Hortonworks
Ferd Scheepers, ING
Mandy Chessell, IBM
John Mertic, Linux Foundation
Susan Malaika, IBM

Big Data Integration with IBM InfoSphere Governance Catalog at TD Bank

Wednesday, Mar 21, 2018 | 10:30 AM - 10:50 AM | Cloud and Data Campus Think Tank A1 | Session ID: 2438A

TD Bank will walk through some of the industry-related challenges with expanding its footprint into a big data platform, and the need to establish a catalog of assets and data through IBM InfoSphere Governance Catalog (IGC). We will provide an overview of options and tooling for a big data platform and what a migration from legacy data warehouses into a useable data lake looks like at scale. We will speak to some of the challenges of integrating various file types, databases and sources, and discuss the need to catalog these assets to establish data lineage and data quality. Our lab will overlay some of the challenges experienced to date, and highlight the inner workings of IGC and its integration with the growing world of big data.

View this session online

Speakers

Pirabu Pathmasenan, IBM
Ahamed Mohideen, TD Bank
Sherman Chung, TD Bank

AMC Networks' Enterprise Hybrid Cloud Solution with Advanced Analytics

Wednesday, Mar 21, 2018 | 10:30 AM - 11:10 AM | Breakers H | Session ID: 3621A

AMC Networks has been at the forefront of television, producing critically acclaimed shows like ?Mad Men? and others. AMC thinks it is essential to uncover insights into audience preferences and viewing patterns to make smarter scheduling and market decisions, yet they face huge data volumes. Partnering with IBM, AMC built an enterprise hybrid cloud platform with industry leading analytics and the new IBM Integrated Analytics System and Db2 Warehouse on Cloud. In this session, AMC shares their story and lessons learned. They also explain how the combination of high-performance analytics, including Data Science Experience and Apache Spark, gives business analysts the ability to conduct intense data investigations with speed and ease.

View this session online

Speakers

Michael Kwok, IBM
Vitaly Tsivin, AMC Networks

Data Told Me the Earth is Flat

Wednesday, Mar 21, 2018 | 11:00 AM - 11:20 AM | Cloud and Data Campus Think Tank A1 | Session ID: 7191B

In this provocative talk, Jason will cover common follies in adopting a data-driven culture without appropriate data governance processes.

View this session online

Speaker

Jason Federoff, USAA

Big Data Integration with IBM InfoSphere Governance Catalog at TD Bank

Wednesday, Mar 21, 2018 | 11:00 AM - 11:20 AM | Cloud and Data Campus Think Tank A1 | Session ID: 2438B

TD Bank will walk through some of the industry-related challenges with expanding its footprint into a big data platform, and the need to establish a catalog of assets and data through IBM InfoSphere Governance Catalog (IGC). We will provide an overview of options and tooling for a big data platform and what a migration from legacy data warehouses into a useable data lake looks like at scale. We will speak to some of the challenges of integrating various file types, databases and sources, and discuss the need to catalog these assets to establish data lineage and data quality. Our lab will overlay some of the challenges experienced to date, and highlight the inner workings of IGC and its integration with the growing world of big data.

View this session online

Speakers

Pirabu Pathmasenan, IBM
Ahamed Mohideen, TD Bank
Sherman Chung, TD Bank

BayCare Provides Cognitive Patient Care with Watson Explorer

Wednesday, Mar 21, 2018 | 11:30 AM - 12:10 PM | Breakers J | Session ID: 3239A

BayCare, one of the largest community-based health systems in Florida, sought to improve their care management approach by implementing an IBM Watson-based solution for patient population identification. In this session, BayCare will share their journey in implementing a cognitive solution to better understand unstructured information, more effectively generate insights into patient care scenarios, and ultimately better inform clinical decision-making. Find out how BayCare leveraged IBM Watson Explorer and Healthcare Annotators to gain up to 14 hours of productivity per day and increase patient identification accuracy from 51% to 92%.

View this session online

Speakers

Christine Livingston, Perficient
Apparsamy Balaji, BayCare

Accessing Citizens Bank's Data Lake through the Eye of IBM Db2 Big SQL

Wednesday, Mar 21, 2018 | 12:30 PM - 1:10 PM | Breakers H | Session ID: 8956A

Hear how Citizens Bank end-users access a data lake using Db2 Big SQL from client tools and business intelligence (BI) engines. One of the most challenging issues in managing a data lake is to give end-users access to the data with acceptable performance via a familiar language and tools. This session will give an overview of the data lake architecture, configuration and security, and share the Citizens Data Lake Data Strategy. We will also describe the end-user experience using the Db2 Big SQL interface with various query tools and BI tools.

View this session online

Speakers

Nagapriya Tiruthani, IBM
Jessica YAU, IBM
James Gudaitis, Citizen's Bank

Exploring IBM Db2 Event Store Developer Edition with Scala and Python Notebooks

Thursday, 8:30 AM - 10:10 AM | Lab 21 | Session ID: 6415A

Learn how to use IBM Db2 Event Store Developer Edition through Scala and Python Notebooks. The hands-on exercise will share an example of connecting to Event Store from an external application as well. The exercise will demonstrate how to perform a fast ingestion of data into Event Store and execute Spark queries against Event Store.

Speakers:

Ajaykumar B. Gupte, IBM

 

Real-World Convergence: Using Data Governance to Drive Your Test Data Management Processes

Thursday, Mar 22, 2018 | 10:30 AM - 10:50 AM | Cloud and Data Campus Think Tank A1 | Session ID: 3015A

Join this session for a discussion of how to integrate the two worlds of data governance and test data management. You will learn how to leverage the investment of information in your data governance catalog to drive test data management processes on multiple platforms. Discover how to utilize the metadata contained in both systems to ensure that sensitive information is tightly controlled and follows the policies outlined by your organization.

View this session online

Speakers

Steven Beatty, State Farm Insurance Co.
Jared Wagner, State Farm

Scotia Bank's Governed Data Lake with Diyotta and IBM Information Governance Catalog

Thursday, Mar 22, 2018 | 10:30 AM - 10:50 AM | Cloud and Data Campus Think Tank C | Session ID: 5614A

International banking spans multiple countries, each with its operational and analytical data platforms to fulfill business, operations and regulatory requirements. Enterprise Data Lake is a centralized Hadoop repository that acts as a crucial information asset for the entire organization. However, there are multiple challenges in building an ecosystem that ensures that the platform can adhere to the enterprise standards in terms of security, governance and quality. Above all, this new architecture required a balance of tools and standards to make it a successful and continuously evolving strategy. In this session, we will talk about Scotia Bank's journey to establish a transparent global data lake using IBM IGC and Diyotta.

View this session online

Speakers

Sanjay Vyas, Diyotta
Alex Pain-Andrejin, Scotiabank

Readying the Data Lake for Insight with IBM BigIntegrate, Best Practices and Deployment Experience

Thursday, Mar 22, 2018 | 10:30 AM - 10:50 AM | Cloud and Data Campus Think Tank D1 | Session ID: 3833A

Successfully deploy IBM BigIntegrate as the premier data integration and governance tool within the Hadoop cluster. Manage hybrid scenarios spanning traditional RDBMS data sources to ingesting and loading into a data lake. Learn about new product features that will help make BigIntegrate even stronger and easier to use within Hadoop. See a demo of BigIntegrate's integration and deployment using Apache Ambari to demonstrate how simple it is to manage and configure BigIntegrate within your Hadoop cluster. Master YARN Resource Management by understanding how to effectively allocate resources to ETL flows. Listen to best practices and deployment tips from Aetna and hear how they successfully implemented BigIntegrate within their environment.

View this session online

Speakers

Scott Brokaw, IBM
Richard Pietrycha, Aetna

Readying the Data Lake for Insight with IBM BigIntegrate, Best Practices and Deployment Experience

Thursday, Mar 22, 2018 | 11:00 AM - 11:20 AM | Cloud and Data Campus Think Tank D1 | Session ID: 3833B

Successfully deploy IBM BigIntegrate as the premier data integration and governance tool within the Hadoop cluster. Manage hybrid scenarios spanning traditional RDBMS data sources to ingesting and loading into a data lake. Learn about new product features that will help make BigIntegrate even stronger and easier to use within Hadoop. See a demo of BigIntegrate's integration and deployment using Apache Ambari to demonstrate how simple it is to manage and configure BigIntegrate within your Hadoop cluster. Master YARN Resource Management by understanding how to effectively allocate resources to ETL flows. Listen to best practices and deployment tips from Aetna and hear how they successfully implemented BigIntegrate within their environment.

View this session online

Speakers

Scott Brokaw, IBM
Richard Pietrycha, Aetna

Large-Scale Data Lake Architecture for an Enterprise?A Real Life Experience

Thursday, Mar 22, 2018 | 12:30 PM - 1:10 PM | Breakers H | Session ID: 1506A

Data Lakes are hot! Data Lakes are powering mobile, social, cloud and analytics. Basically, everything exciting going on in IT right now is using a big data platform. On top of this, businesses can actually make money directly through data lakes by performing deep analytics. This session will discuss data lakes and the realities of what is going on today and where this is going in the future. We will introduce a proven data lake architecture in this space, and share how it fits into an enterprise environment. The presentation is based on a real-life implementation, and is meant to give attendees excellent content to take forward to your clients to have a discussion about how they can participate in building a data lake.

View this session online

Speakers

Mani KANDASAMY, IBM
Ameet Shetty, SunTrust
Srinivasan Ramanujam, SunTrust

Charting the Data Lake: From Business Issues to Hadoop with IBM Big SQL

Thursday, Mar 22, 2018 | 12:30 PM - 1:10 PM | Cloud and Data Campus Small Theater C | Session ID: 2554A

Many firms are shifting their use of NoSQL and Hadoop technologies to make them components of their mainstream data management environments. This talk describes best practices emerging from a range of engagements with clients in banking, insurance and healthcare that are trying to establish the appropriate degree of consistency and standardization across their Hadoop and NoSQL deployments. Areas addressed include: when/how to use different data models for deployment; normalization considerations; how various Hadoop-related technologies are being used together; tuning/performance considerations for data ingest and query; and how to achieve overall governance. This session will also look at expected future trends.

View this session online

Speakers

Pat O'Sullivan, IBM
Austin Clifford, IBM