Big Data and Data Lakes Agenda
Integrating Relational Data with Hadoop and Spark
Monday, 8:30 AM - 10:10 AM | Lab 10 | Session ID: 4636A
Hadoop systems are becoming more popular for storing and scalable processing of all kinds of data. At the same time, the desire to analyze these data in Hadoop efficiently with SQL is increasing. This lab shows how this can be achieved with IBM Big SQL. In addition, there is a need for more complex analysis and advanced analytics like machine learning with these data stored in Hadoop and relational database systems. Spark is a framework that is particularly well suited to this task. In this lab, we will show how processing in Spark can be integrated very easily with Big SQL and IBM Db2 Warehouse.
Speakers:
Managing and Analyzing Data in the Cloud with Cloud Object Storage and Cloud SQL Monday, Mar 19, 2018 | 8:30 AM - 9:10 AM | Breakers K | Session ID: 1478A Meet the experts for Cloud Object Storage and Cloud SQL to discuss your use cases, requirements and ideas for cloud-based and cloud native data management and analysis. |
Speakers Torsten
Steinbach, IBM |
The Big Data Dudes: Beginners Guide to Big SQL Monday, 10:30 AM - 12:10 PM | Lab 21 | Session ID: 2903A Join Aaron and Henry, the dynamic duo behind The Big Data Dudes, for this fun-filled lab session where you will get hands-on time with Big SQL on HDP. The guys will briefly introduce you to this SQL-on-Hadoop engine and then fully immerse you in a lab. You will utilize a variety of different Big SQL interfaces, write and execute Big SQL queries, and more. At a Big Data Dudes lab you never know what's going to happen next... This is going to be HOT!!! Speakers:
Unified Governance, Cognitive Classification and Metadata Management for Regulatory Compliance Monday, Mar 19, 2018 | 10:30 AM - 11:10 AM | Breakers L | Session ID: 8718A This session will illustrate an integrated solution for unified governance, cognitive classification, and metadata management using IBM InfoSphere 11.7, StoredIQ, governance/quality/curation dashboards, IBM InfoSphere Information Governance Catalog (IGC)/OpenIGC, and IBM InfoSphere Information Analyzer. The use case is to support a company's compliance group to address GDPR regulations. This session will clarify and discuss best practices on the following topics: unified governance components, unstructured data suite, metadata management, governance/lineage, profiling and data quality, reporting dashboard, and Hadoop integration. |
Speakers Barry
S. Rosen, IBM |
Save Time and Effort Mapping Data in Jupyter Notebooks Monday, Mar 19, 2018 | 10:30 AM - 11:10 AM | Reef F | Session ID: 2332A The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. The Jupyter stack is built from the ground up to be extensible and hackable. The Developer Advocacy team at IBM Analytics has developed an open source library of useful time-saving and anxiety reducing tools we call "Pixiedust." It was designed to ease the pain of charting, saving data to the cloud and exposing Python data structures to Scala code. I'll talk about how I built geospatial visualization into Pixiedust, putting data from Spark-based analytics on maps using Mapbox GL. |
Speaker Raj Singh, IBM |
Banking Guidebook: Succeed in Digital Transformation with Cloud, Big Data and Advanced Analytics Monday, Mar 19, 2018 | 11:30 AM - 12:10 PM | Breakers L | Session ID: 3799A This session delivers a working guidebook and case studies that provide a roadmap for banks to be successful driving digital transformation by leveraging a combination of cloud, big data, advanced analytics, and cognitive microservices including identity resolution and match processing engines. These methods allow SMB banks to compete against big banks and coming FinTech disruption. Key methods, architectures and successful deployments driving management of customer relationships, financial crimes, risk and regulatory compliance will be presented. Lower costs and gain leverage with smaller teams, and drive to real-time digital customer relationships. Learn and gain confidence from those that are doing it today. |
Speakers Timothy
G. Davis, IBM |
The Future of Enterprise Analytics and Data Management Monday, Mar 19, 2018 | 12:30 PM - 1:10 PM | Breakers H | Session ID: 3619A Today, businesses need real-time predictive analytics insight derived from transactional data as it flows into the organization. Critical initiatives such as customer interaction, counter-fraud, and risk management require real-time facts. Data latency and movement work in opposition to real-time decisions. This session will discuss IBM technologies like DB2 Analytics Accelerator, Db2 Query Management Facility (QMF) and more that are helping to deliver extraordinary analytics insight in unison with the most current transactional data and new offerings like Machine Learning for z/OS. Learn how IBM is continuing its investment in the Db2 utilities and tools portfolio to answer your needs today?and in the future. |
Speaker Udo Hertz, IBM |
Enhanced Productivity through Data Preparation Tools on the Watson Data Platform Monday, Mar 19, 2018 | 12:30 PM - 1:10 PM | Reef E | Session ID: 4053A Learn features of the Watson Data Platform that cut down the time needed to create high-quality data sets through rapid iterations over data import cycles. This session is applicable to data scientists, data engineers and data analysts. Topics covered through an in-depth demo include metrics and data visualization, quick UI operations and advanced coding operations to collaboratively build complex data shaping flows that can reshape your data of any type and size. |
Speaker Sonali Surange, IBM |
IBM Db2 Family Data Virtualization Deep Dive Monday, Mar 19, 2018 | 1:30 PM - 2:10 PM | Breakers H | Session ID: 4680A Data federation is more essential than ever when it comes to making the most of varied data types and sources. The extended IBM Db2 family of database and data warehouse products provides virtualization capabilities that increase your flexibility across the family and across third-party and open source data repositories. Capabilities such as these are essential when you're facing more and more data sources all the time. We'll begin reviewing this solution by looking at access, security, SQL, monitoring and administration. We?ll also look at the Common SQL Engine and how unified data access is fueled by built-in data federation capabilities. This will include also an overview of IBM Big SQL, which works with Hadoop implementations. |
Speaker Michael D. Connor, IBM |
Unify Your IBM Db2 Warehouse with Hortonworks Data Platform to Enhance Analytical Insights Monday, 2:30 PM - 4:10 PM | Lab 21 | Session ID: 1221A Accelerate your enterprise's data-driven decisions by taking advantage of the partnership between IBM's Big SQL hybrid SQL engine and Hortonworks' Apache Hadoop ecosystem. Using IBM Fluid Query to limit data movement for blistering-fast performance, attendees will learn how to easily marry data between their IBM Db2 Warehouse private cloud and Hortonworks Data Platform. Cognos Analytics will be used to visualize query federation between the two platforms. Speakers:
Using IBM Big SQL and Apache Spark to Create a Self-Service Data Lake Monday, Mar 19, 2018 | 3:30 PM - 4:10 PM | Breakers H | Session ID: 3648A Landing data in Hadoop typically requires a schema to be defined so that the data can be accessed. This means business users must rely on administrators to first create a table and then grant privileges. But even for the database administrator there are challenges. What format is the data stored in? What are the datatypes? How many columns compose the data? What should be the permissions on the datasets? The integration of Apache Spark and IBM Big SQL allows for users to directly query data without having to define or know the schema of the data and without requiring an administrator. This session shares how to implement this capability, as well how to implement security, auditing, monitoring and workload management of these queries. |
Managing Data across Its Lifecycle the IBM Optim Way Monday, Mar 19, 2018 | 3:30 PM - 4:10 PM | Breakers I | Session ID: 5616A IBM's Optim portfolio of solutions targets the challenge our customers face with managing data across its lifecycle?from the time it is created to when it gets archived. Along the way, it also needs to be protected. In this session, we dive deeper into the IBM Optim portfolio to discuss the latest being offered and to peek into the roadmap as well. |
Speakers Peter
Costigan, IBM |
Driving Business Value with New Approaches and Data-Driven Analytics Design Monday, Mar 19, 2018 | 3:30 PM - 4:10 PM | Breakers L | Session ID: 8524A In today?s business landscape, data and analytics are essential to your business differentiation, with new initiatives driven by competition, modernization, efficiency, and compliance. But traditional approaches have been complex, difficult and brittle. You can?t afford to wait or invest in long-running projects of unproven value. In this session, leveraging real-world examples, you will learn new approaches to optimize your data design in support of a focused business objective today, while building the foundation to support future business that minimizes cost and redesign. |
Speakers Paul
Christensen, IBM |
How Mizuho Bank is Developing Its Next Data Lake with IBM Machine Learning Tuesday, 10:30 AM - 11:10 AM | Lagoon K | Session ID: 2448A To accelerate its digital strategy for business development, Mizuho Bank is expanding its data lake with a digital sandbox called “BigData lab” to evaluate and develop two new use cases : 1) Analysis of unstructured data (web logs/social data) with IBM Watson Explorer and Db2 Warehouse on Cloud to gain new insights; and 2) Machine learning to enhance the quality and speed of analytical applications. In this session, Mizuho Bank's IT manager summarizes the architecture and shares results of the PoCs, along with the bank's midterm plan for its next data lake with hybrid cloud and machine learning for advanced digital strategy. Speakers:
Analytics Lifecycle Demo Tuesday, Mar 20, 2018 | 11:30 AM - 11:50 AM | Cloud and Data Campus Think Tank A1 | Session ID: 9025A In this session, we?ll use a demonstration to explain the importance of the components that make up the analytics lifecycle, and how they can all work together for a continuous cycle of improvement in a digital organization. Come see how the analytics lifecycle can work for you to drive operational improvements, increased revenue, and better data-driven decision-making. |
Speakers Brad
Molzen, IBM |
Harnessing the Data Tsunami at Marriott Tuesday, Mar 20, 2018 | 11:30 AM - 12:10 PM | Breakers H | Session ID: 3287A Marriott is undertaking a quantum leap in the use of data and analytics to transform the guest experience and drive our vision to become the world?s favorite travel company. Marriott is growing rapidly, fueled by acquisitions, organic growth and increased global travel?a perfect storm of torrents of data and real-time analytics. We will describe the challenges in connecting meaningfully with hundreds of millions of consumers across 100-plus countries with different languages, customs, and expectations. We provide insight into our Marriott data platform, including open-source and IBM products such as Db2 Warehouse and IBM Big SQL. We highlight the accomplishments and challenges as we deliver a high-performance, scalable data platform. |
Speakers Timothy
G. Davis, IBM |
Data Told Me the Earth is Flat Tuesday, Mar 20, 2018 | 11:30 AM - 12:10 PM | Breakers K | Session ID: 7191A In this provocative talk, Jason will cover common follies in adopting a data-driven culture without appropriate data governance processes. |
Speaker Jason Federoff, USAA |
Honda R&D's Big Data Analytics Sandbox Activities with IBM Data Science Experience and IBM Cloud Tuesday, Mar 20, 2018 | 12:30 PM - 1:10 PM | Breakers H | Session ID: 1195A Honda R&D has chosen IBM Data Science Experience as a sandbox environment for data analytics, with an IBM Cloud database as a data lake. Honda R&D has succeeded with various PoCs and prototype projects in the past. Yet they faced challenges with sharing their knowledge, data and information, and had been struggling with a huge silo in big data analytics activities. Now, they have begun an enterprise-wide big data project to enable their activities and effort to become enterprise assets. In this sandbox, many engineers in Honda R&D can experiment by trial-and-error with analytics. Also, Honda is trying to create a process to collect all data in Honda R&D. This talk will share Honda?s long analytics journey. |
Speakers Kyoka
Nakagawa, Honda R&D Co., Ltd. |
Empower Your Business with a Hybrid Cloud Data Architecture Tuesday, Mar 20, 2018 | 12:30 PM - 1:10 PM | Cloud and Data Campus Large Theater A | Session ID: 8104A Among the best ways to deliver a business advantage are to be more efficient at new application development, and to leverage all your data sources for deeper insights. Success depends on your ability to converge data on the right data architecture for increased responsiveness. We will explain the benefits of a common database and data warehouse strategy so you can land transaction data, then easily analyze or move it without application rewrites. See how AMC Networks has partnered with IBM to build an enterprise hybrid cloud platform for industry-leading insights into audience preferences. Find out how to get your development projects off to a solid start on this flexible hybrid cloud platform that puts data to work for you. |
Speakers Thomas
Chu, IBM |
Building a Hadoop Environment to Centralize Reporting and Analytics: Insight from Verizon Tuesday, Mar 20, 2018 | 1:30 PM - 1:50 PM | Cloud and Data Campus Think Tank B | Session ID: 8019A Let Verizon show you how to build a Hadoop environment from which you can also provide centralized reporting and analytics. This approach will remove the need for data marts, which can result in loss of data and erosion of data quality, not to mention the additional cost to deploy and manage data marts. A centralized, secured Hadoop environment also gives business users access to quality customer insights, expanding its value beyond a narrow group of IT users. |
Speakers Pandit
J. Prasad, IBM |
ODPi: Unifying Open Metadata and Governance Tuesday, Mar 20, 2018 | 2:30 PM - 3:10 PM | Breakers K | Session ID: 7518A This panel will describe trends in metadata and governance, and will explain the work of the members of ODPi to build an open ecosystem for interfaces, repositories, tools and experts to collaborate and exchange content while adhering to governance guidelines and imperatives. Join this session if you are responsible for metadata and governance, or for implementing General Data Protection Regulation (GDPR) in your institution. Chief Data Officers and staff seeking to create value from their data will benefit from the discussion, as well as metadata and governance builders. |
Speakers Alan
Gates, Hortonworks |
Big Data Integration with IBM InfoSphere Governance Catalog at TD Bank Wednesday, Mar 21, 2018 | 10:30 AM - 10:50 AM | Cloud and Data Campus Think Tank A1 | Session ID: 2438A TD Bank will walk through some of the industry-related challenges with expanding its footprint into a big data platform, and the need to establish a catalog of assets and data through IBM InfoSphere Governance Catalog (IGC). We will provide an overview of options and tooling for a big data platform and what a migration from legacy data warehouses into a useable data lake looks like at scale. We will speak to some of the challenges of integrating various file types, databases and sources, and discuss the need to catalog these assets to establish data lineage and data quality. Our lab will overlay some of the challenges experienced to date, and highlight the inner workings of IGC and its integration with the growing world of big data. |
Speakers Pirabu
Pathmasenan, IBM |
AMC Networks' Enterprise Hybrid Cloud Solution with Advanced Analytics Wednesday, Mar 21, 2018 | 10:30 AM - 11:10 AM | Breakers H | Session ID: 3621A AMC Networks has been at the forefront of television, producing critically acclaimed shows like ?Mad Men? and others. AMC thinks it is essential to uncover insights into audience preferences and viewing patterns to make smarter scheduling and market decisions, yet they face huge data volumes. Partnering with IBM, AMC built an enterprise hybrid cloud platform with industry leading analytics and the new IBM Integrated Analytics System and Db2 Warehouse on Cloud. In this session, AMC shares their story and lessons learned. They also explain how the combination of high-performance analytics, including Data Science Experience and Apache Spark, gives business analysts the ability to conduct intense data investigations with speed and ease. |
Speakers Michael
Kwok, IBM |
Data Told Me the Earth is Flat Wednesday, Mar 21, 2018 | 11:00 AM - 11:20 AM | Cloud and Data Campus Think Tank A1 | Session ID: 7191B In this provocative talk, Jason will cover common follies in adopting a data-driven culture without appropriate data governance processes. |
Speaker Jason Federoff, USAA |
Big Data Integration with IBM InfoSphere Governance Catalog at TD Bank Wednesday, Mar 21, 2018 | 11:00 AM - 11:20 AM | Cloud and Data Campus Think Tank A1 | Session ID: 2438B TD Bank will walk through some of the industry-related challenges with expanding its footprint into a big data platform, and the need to establish a catalog of assets and data through IBM InfoSphere Governance Catalog (IGC). We will provide an overview of options and tooling for a big data platform and what a migration from legacy data warehouses into a useable data lake looks like at scale. We will speak to some of the challenges of integrating various file types, databases and sources, and discuss the need to catalog these assets to establish data lineage and data quality. Our lab will overlay some of the challenges experienced to date, and highlight the inner workings of IGC and its integration with the growing world of big data. |
Speakers Pirabu
Pathmasenan, IBM |
BayCare Provides Cognitive Patient Care with Watson Explorer Wednesday, Mar 21, 2018 | 11:30 AM - 12:10 PM | Breakers J | Session ID: 3239A BayCare, one of the largest community-based health systems in Florida, sought to improve their care management approach by implementing an IBM Watson-based solution for patient population identification. In this session, BayCare will share their journey in implementing a cognitive solution to better understand unstructured information, more effectively generate insights into patient care scenarios, and ultimately better inform clinical decision-making. Find out how BayCare leveraged IBM Watson Explorer and Healthcare Annotators to gain up to 14 hours of productivity per day and increase patient identification accuracy from 51% to 92%. |
Speakers Christine
Livingston, Perficient |
Accessing Citizens Bank's Data Lake through the Eye of IBM Db2 Big SQL Wednesday, Mar 21, 2018 | 12:30 PM - 1:10 PM | Breakers H | Session ID: 8956A Hear how Citizens Bank end-users access a data lake using Db2 Big SQL from client tools and business intelligence (BI) engines. One of the most challenging issues in managing a data lake is to give end-users access to the data with acceptable performance via a familiar language and tools. This session will give an overview of the data lake architecture, configuration and security, and share the Citizens Data Lake Data Strategy. We will also describe the end-user experience using the Db2 Big SQL interface with various query tools and BI tools. |
Speakers Nagapriya
Tiruthani, IBM |
Exploring IBM Db2 Event Store Developer Edition with Scala and Python Notebooks Thursday, 8:30 AM - 10:10 AM | Lab 21 | Session ID: 6415A Learn how to use IBM Db2 Event Store Developer Edition through Scala and Python Notebooks. The hands-on exercise will share an example of connecting to Event Store from an external application as well. The exercise will demonstrate how to perform a fast ingestion of data into Event Store and execute Spark queries against Event Store. Speakers: Ajaykumar B. Gupte, IBM
Real-World Convergence: Using Data Governance to Drive Your Test Data Management Processes Thursday, Mar 22, 2018 | 10:30 AM - 10:50 AM | Cloud and Data Campus Think Tank A1 | Session ID: 3015A Join this session for a discussion of how to integrate the two worlds of data governance and test data management. You will learn how to leverage the investment of information in your data governance catalog to drive test data management processes on multiple platforms. Discover how to utilize the metadata contained in both systems to ensure that sensitive information is tightly controlled and follows the policies outlined by your organization. |
Speakers Steven
Beatty, State Farm Insurance Co. |
Scotia Bank's Governed Data Lake with Diyotta and IBM Information Governance Catalog Thursday, Mar 22, 2018 | 10:30 AM - 10:50 AM | Cloud and Data Campus Think Tank C | Session ID: 5614A International banking spans multiple countries, each with its operational and analytical data platforms to fulfill business, operations and regulatory requirements. Enterprise Data Lake is a centralized Hadoop repository that acts as a crucial information asset for the entire organization. However, there are multiple challenges in building an ecosystem that ensures that the platform can adhere to the enterprise standards in terms of security, governance and quality. Above all, this new architecture required a balance of tools and standards to make it a successful and continuously evolving strategy. In this session, we will talk about Scotia Bank's journey to establish a transparent global data lake using IBM IGC and Diyotta. |
Speakers Sanjay
Vyas, Diyotta |
Readying the Data Lake for Insight with IBM BigIntegrate, Best Practices and Deployment Experience Thursday, Mar 22, 2018 | 10:30 AM - 10:50 AM | Cloud and Data Campus Think Tank D1 | Session ID: 3833A Successfully deploy IBM BigIntegrate as the premier data integration and governance tool within the Hadoop cluster. Manage hybrid scenarios spanning traditional RDBMS data sources to ingesting and loading into a data lake. Learn about new product features that will help make BigIntegrate even stronger and easier to use within Hadoop. See a demo of BigIntegrate's integration and deployment using Apache Ambari to demonstrate how simple it is to manage and configure BigIntegrate within your Hadoop cluster. Master YARN Resource Management by understanding how to effectively allocate resources to ETL flows. Listen to best practices and deployment tips from Aetna and hear how they successfully implemented BigIntegrate within their environment. |
Speakers Scott
Brokaw, IBM |
Readying the Data Lake for Insight with IBM BigIntegrate, Best Practices and Deployment Experience Thursday, Mar 22, 2018 | 11:00 AM - 11:20 AM | Cloud and Data Campus Think Tank D1 | Session ID: 3833B Successfully deploy IBM BigIntegrate as the premier data integration and governance tool within the Hadoop cluster. Manage hybrid scenarios spanning traditional RDBMS data sources to ingesting and loading into a data lake. Learn about new product features that will help make BigIntegrate even stronger and easier to use within Hadoop. See a demo of BigIntegrate's integration and deployment using Apache Ambari to demonstrate how simple it is to manage and configure BigIntegrate within your Hadoop cluster. Master YARN Resource Management by understanding how to effectively allocate resources to ETL flows. Listen to best practices and deployment tips from Aetna and hear how they successfully implemented BigIntegrate within their environment. |
Speakers Scott
Brokaw, IBM |
Large-Scale Data Lake Architecture for an Enterprise?A Real Life Experience Thursday, Mar 22, 2018 | 12:30 PM - 1:10 PM | Breakers H | Session ID: 1506A Data Lakes are hot! Data Lakes are powering mobile, social, cloud and analytics. Basically, everything exciting going on in IT right now is using a big data platform. On top of this, businesses can actually make money directly through data lakes by performing deep analytics. This session will discuss data lakes and the realities of what is going on today and where this is going in the future. We will introduce a proven data lake architecture in this space, and share how it fits into an enterprise environment. The presentation is based on a real-life implementation, and is meant to give attendees excellent content to take forward to your clients to have a discussion about how they can participate in building a data lake. |
Speakers Mani
KANDASAMY, IBM |
Charting the Data Lake: From Business Issues to Hadoop with IBM Big SQL Thursday, Mar 22, 2018 | 12:30 PM - 1:10 PM | Cloud and Data Campus Small Theater C | Session ID: 2554A Many firms are shifting their use of NoSQL and Hadoop technologies to make them components of their mainstream data management environments. This talk describes best practices emerging from a range of engagements with clients in banking, insurance and healthcare that are trying to establish the appropriate degree of consistency and standardization across their Hadoop and NoSQL deployments. Areas addressed include: when/how to use different data models for deployment; normalization considerations; how various Hadoop-related technologies are being used together; tuning/performance considerations for data ingest and query; and how to achieve overall governance. This session will also look at expected future trends. |
Speakers Pat
O'Sullivan, IBM |