IBM Edge 2016 - Session Preview

IBM Edge 2016 | October 23 - October 27 | Mandalay Bay, Las Vegas, NV

The Truth about SQL and Data Warehousing on Hadoop

Session ID: DMT-1121 (link) | 2016-10-26 | 10:00 AM - 10:45 AM

SQL and data warehousing on Hadoop continues to be a hot topic in 2016. With at least 24 SQL on Hadoop solutions available on the market, surely one of them might be suitable for data warehousing workloads? How do you choose? Which workloads work and which don't? What does this mean for an existing data warehouse? In this session, you'll hear about IBM Lab's recent performance studies comparing Hive, Impala, HAWQ, Spark SQL, and Big SQL, along with some lessons learned.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Paul Yip, IBM

Hadoop Security Primer

Session ID: DMT-1122 (link) | 2016-10-24 | 02:00 PM - 02:45 PM

Hadoop platforms are made up of more than 20 components. Securing such a platform can be daunting. In this session, we will provide a model for Hadoop security, including options, considerations and best practices. The session includes an overview of basic information on Hadoop components and ODPi compliance. Regarding security, we start from the ground up.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Paul Yip, IBM
John Mertic, The Linux Foundation
Uday B. Kale, IBM

Sparkified dashDB

Session ID: DMT-1479 (link) | 2016-10-25 | 02:00 PM - 02:45 PM

Learn about IBM's deeply integrated open source based analytics in IBM dashDB, based on Apache Spark. This new capability of dashDB combines your SQL-based descriptive analytics with advanced analytics methods such as machine learning in a very elegant fashion. You can use pre-built Spark based predictive SQL routines or run your custom Spark analytics and transformations via SQL. You can also run Spark workload in dashDB via REST APIs. Or use it interactively with Jupyter Notebooks, and simply leverage dashDB as an operational, multi-tenant Spark analytics service that combines data persistence with data analysis. dashDB integrated Spark in a highly optimized way, by leveraging its own MPP architecture effectively for Spark computation.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Torsten Steinbach, IBM

R Analytics Inside IBM Data Warehouse Offerings

Session ID: DMT-1490 (link) | 2016-10-24 | 04:00 PM - 04:45 PM

With PureData for Analytics on-premise, IBM dashDB in public cloud and dashDB Local in private cloud, you get a compatible family of relational data warehouse offerings that all provide a very deep integration of R-based analytics. Be it interactive analytics with R using integrated RStudio, seamless push-down of complex operations using an R DataFrame API into the database, publishing of R-driven web applications, or the deployment of R logic into the data warehouse and then running it via a REST API or even running it in a scale-out parallel R computation engine: All of this is available to you. This session will introduce you to all these capabilities and options, and walk you through a set of usage examples.

Program

Sessions

Track

Data management

Level

Advanced

Speakers

Torsten Steinbach, IBM

Sensor Overload!: Taming the Raging Manufacturing Big Data Torrent

Session ID: DMT-1633 (link) | 2016-10-24 | 01:00 PM - 01:45 PM

Hi-tech manufacturers produce more data than most types of businesses from a huge array of sensors and embedded diagnostic equipment on highly automated production lines. A hard disk drive contains hundreds of highly engineered components and the entire manufacturing process can take over six months, from developing the silicon wafers, heads, arms, platters, motors and logic circuits with embedded CPUs to testing the fully assembled drives. Imagine the countless streams of big data emanating from factories producing hundreds of thousands of drives per day. In this session, Seagate will discuss how it collects and transports this data into an Enterprise Hadoop cluster ready to be analyzed and modeled by data scientists and analysts.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Nicholas Berg, Seagate Technology

Spark for Dummies

Session ID: DMT-1658 (link) | 2016-10-24 | 08:00 AM - 08:45 AM

This session is for technical and non-technical people who want to clearly understand what Hadoop and Spark is all about. The discussion will explain the technical concepts in an easy-to-understand way, so anyone can grasp how these new technologies work. To reinforce the explanations you'll see easy-to-understand demos that everyone can follow. If you don?t yet know what is Hadoop or Spark is, this is the session for you!

Program

Sessions

Track

Data management

Level

Introductory

Speakers

Luis Reina Julia, IBM

Birds of a Feather: The New Way to Work in Hybrid Data Warehousing

Session ID: DMT-1717 (link) | 2016-10-24 | 03:00 PM - 03:45 PM

There is just too much data and too many new applications to fit it all on your traditional data warehouse. So what can you do? Keep core analytics on your traditional data warehouse and use new technologies for new analytics, self-service, short-lived needs, and for data that is born on the cloud. Come ask all your questions and hear from our expert panel in this "birds of a feather" session.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Matthias FUNKE, IBM
James Cho, IBM
John Park, IBM
Hemant Suri, IBM

Ten Use Cases to Get Started with Modern Data Warehousing

Session ID: DMT-1771 (link) | 2016-10-24 | 08:00 AM - 08:45 AM

Learn ten new ideas for getting started with cloud and private cloud data warehousing. In this session, we will provide you with ideas to get started with your data warehouse environment. We have worked with many customers looking to leverage a modern data warehouse in the public cloud and private cloud, and we would like to share ten use cases that are proven to propel you into the future. These use cases range from seeking drastic performance increases to leveraging a warehouse for all of your new, innovative applications.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Aislinn Shea, IBM

Enterprise Analytics for IBM IMS Data Using Apache Spark

Session ID: DMT-2019 (link) | 2016-10-25 | 03:00 PM - 03:45 PM

IBM?s enterprise clients depend on the qualities of service provided by IBM z Systems: security, scalability and availability. With respect to analytics, they need capabilities that match, and that can deal with the large-scale data processing that keeps these businesses humming. Apache Spark is a natural fit in this space. It provides analytics processing that is on par with large-scale data processing. You can pull IBM Information Management System (IMS) data into Spark, and use data science programming languages like Scala, Python and R to gain new insights about your IMS data. Join this session to discover how your operational IMS data can become a key asset in your analytics solutions.

Program

Sessions

Track

Data management

Level

Introductory

Speakers

Richard Tran, IBM

Justified Big Data Performance: Transform Your Business with IBM DB2 Analytics Accelerator

Session ID: DMT-2041 (link) | 2016-10-25 | 03:00 PM - 03:45 PM

Big data is causing huge challenges as businesses try to capture analytical insights for their existing and new large data sources. When evaluating potential big data solutions there are many platforms, databases and application integration options. The IBM DB2 Analytics Accelerator (IDAA) solution has many unique advantages over other platforms that need to be understood, emphasized and leveraged for the best business solution. This presentation will take you through the steps of IDAA evaluation for tens of billions of rows, utilizing the IBM Data Studio tools, the IDAA virtual server and potential huge CPU savings and justification for your analytical platform for any type of business solutions.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Dave Beulke, Pragmatic Solutions, Inc

Business Drivers for Cloud Versus On-Premise Analytics

Session ID: DMT-2232 (link) | 2016-10-26 | 11:00 AM - 11:45 AM

Business requirements have never been as demanding as they are now. Combined with the fact that information technology is at its most significant inflection point in more than 40 years, clients must quickly adapt the way they manage their data and analytic processes in order to gain optimal competitive advantage in the market. This session will outline the steps to identify the appropriate environment for enterprise analytics: on-premise, cloud, hybrid and more. We will highlight use cases across various applications and processes, and discuss advantages and trade-offs to consider across scalability, performance, security and costs. Explore the options, both short- and long-term, that will provide a platform for innovation.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Namik Hrle, IBM

Big Data Tooling: IBM Data Server Manager Provides Big SQL Web Tooling

Session ID: DMT-2241 (link) | 2016-10-27 | 02:00 PM - 02:45 PM

IBM Data Server Manager and Big SQL support select Apache Hadoop platforms and are included with several IBM BigInsights Offerings. Use IBM Data Server Manager in support of Apache Hadoop to do things like explore your Big SQL database, monitor performance of your Hadoop database, execute queries, query tuning for selected queries, and more.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Anson Kokkat, IBM

Using a Metadata Catalog to Get Cognitive about Your Data

Session ID: DMT-2469 (link) | 2016-10-27 | 10:00 AM - 10:45 AM

Metadata is "data about data." Further (from Wikipedia): "The database catalog of a database instance consists of metadata in which definitions of database objects such as base tables, views (virtual tables), synonyms, value ranges, indexes, users, and user groups are stored. The main purpose of metadata is to facilitate discovery of relevant information..." IBM Information Management System (IMS) has a metadata catalog, enabling mobile and cloud clients to discover and extend the use of data and business processes from IMS-based systems of record. Learn how to implement the IMS Catalog, and see how this strategic catalog is used to support many newer IMS features and overall IMS simplification.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Nancy G. Stein, IBM

End-to-End Analytics in the Cloud: A Case Study

Session ID: DMT-2626 (link) | 2016-10-24 | 01:00 PM - 01:45 PM

How do you run real-world analytics in the cloud? You need cloud-based capabilities for data ingestion, a fit-for-purpose data lake and a variety of analytics options ranging from Spark and machine learning to cognitive. You need skill-appropriate tools to support roles like the Data Scientist, the Business Analyst and the Data Engineer. In this session, we look at a retail customer case study that ties all these aspects together.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

John Thomas, IBM

Getting Started with Big SQL Features, Including Spark Integration

Session ID: DMT-3512 (link) | 2016-10-24 | 04:00 PM - 06:30 PM

Big SQL is an industry-standard SQL query interface for big data. This query engine is derived from decades of IBM R&D investment in RDBMS, including database parallelism and query optimization. Big SQL supports familiar tools and applications via standard JDBC and ODBC drivers. Through the technical introduction in this hands-on lab, participants will learn about recent and core features available in Big SQL, including how to query data stored in Hadoop's HDFS, Hadoop's HBase, as well as how to leverage our just-released Spark integration.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Xabriel J Collazo Mojica, IBM
Sameer Jorapur, IBM

How to Build a Data Lake: A Case Study for Data Engineers

Session ID: DMT-2777 (link) | 2016-10-26 | 10:00 AM - 10:20 AM

Data Scientists and Business Analysts work with data in a data lake to derive insights. Data Engineers ensure that the lake is populated with the right data, under control and governance. This session looks at capabilities available in the cloud for Data Engineers to work with a data lake. We use a case study to explore how to ingest different types of data into the data lake, store it, and make it available for analytics.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

John Thomas, IBM

Share Data, Assets and Experience for Better Insights with the IBM Analytics Platform

Session ID: DMT-2811 (link) | 2016-10-26 | 11:00 AM - 11:45 AM

To develop timely and compelling analytics solutions, data scientists, data engineers and business analysts must collaborate. Our team implemented three multi-channel retail business analytics solutions on different cloud analytics platforms. How do data scientists communicate and share insights with business analysts? How do business analyst needs translate into tasks for the data engineer? Learn from our experience which analytics platform offers new levels of simplicity and end-to-end integration across multiple data sources, data flows and data-centric roles within your organization.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Siva Anne, IBM

IBM BigInsights Roadmap and Direction

Session ID: DMT-2900 (link) | 2016-10-25 | 01:00 PM - 01:45 PM

Learn about the direction and strategy for BigInsights?IBM's open data platform for Apache Hadoop and Apache Spark. This session will provide the details about the latest features and capabilities, as well as explain where we are headed in the near future.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Pandit Prasad, IBM
Priya Krishnan, IBM
Rohan Vaidyanathan, IBM

SQL and IBM Big SQL on Hadoop for DB2 DBAs

Session ID: DMT-2903 (link) | 2016-10-27 | 09:00 AM - 09:45 AM

Many organizations are starting to adopt Hadoop. How can you, as a DB2 DBA, leverage your skills to support Hadoop? In this session, we will cover the things you need to know to have an intelligent conversation about SQL on Hadoop. This includes related technologies like Hive, YARN, HBase/Phoenix, Spark SQL, Big SQL... and how it all comes together. Come discover how your DB2 skills can be of exceptional value for Hadoop.

Program

Sessions

Track

Data management

Level

Introductory

Speakers

Paul Yip, IBM

The IT Economics of Hadoop Environments

Session ID: DMT-2929 (link) | 2016-10-27 | 12:00 PM - 12:45 PM

Hadoop is generally a cost-effective way to deal with big data. A closer look at costs reveals that the operational costs of an Hadoop environment can be quite different from upfront acquisition costs. Some factors that drive total cost, such as accurate sizing for performance, are visible. But many cost factors, such as SQL compliance and ease of management, are hidden. This session examines the total cost of Hadoop environments based on customer studies with IBM BigInsights.

Program

Sessions

Track

Data management

Level

Introductory

Speakers

Allen Kliethermes, IBM

How Close Are Different Spark Platforms? Are They Clones, Siblings or Distant Relations?

Session ID: DMT-2982 (link) | 2016-10-24 | 10:00 AM - 10:45 AM

Spark is at the forefront of big data initiatives, with cloud vendors lining up claiming support for Spark. However, there?s an underlying assumption that all Spark platforms are essentially the same, and that deploying Spark applications on one platform is the same as on another. This session looks at experiences using Spark for core analytic tasks on different cloud vendor platforms. We highlight the technical challenges faced while trying to implement Spark, from initial ease of use, capabilities to easily ingest and combine different types of data, use of notebooks to develop and deploy analytic insights, as well as other issues encountered. Learn about key differences across Spark platforms, and their usability and performance impacts.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Brian Haan, IBM

IBM BigInsights Big SQL Best Practices and Troubleshooting Tips

Session ID: DMT-3214 (link) | 2016-10-24 | 03:00 PM - 03:45 PM

IBM BigInsights Big SQL leverages IBM's strength in SQL engines to provide seamless ANSI SQL access to data across any system from Hadoop, via JDBC or ODBC, whether that data exists in Hadoop or a relational database.This means that developers familiar with the SQL programming language can access data in Hadoop without having to learn new languages or skills. It presents a structured view of your existing data, using an optimal execution strategy. You can leverage MapReduce parallelism when needed for complex data sets and avoid it when it hinders, using direct access for smaller, low-latency queries. This session will walk through an introduction to Big SQL, and look at best practices and troubleshooting tips.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Deepak RANGARAO, IBM
Zach Zakharian, IBM

Citizens Bank Data Lake Implementation: Selecting BigInsights ViON Spark/Hadoop Appliance for ETL

Session ID: DMT-3260 (link) | 2016-10-24 | 02:00 PM - 02:45 PM

Citizens Bank, formerly part of the Royal Bank of Scotland, is implementing a BigInsights Hadoop Data Lake with PureData System for Analytics (Netezza) to support all of its internal data initiatives. The goal is to provide an improved experience for customers and to grow market share. Along their ETL journey, we?ve used Netezza SQL, Hadoop and finally IBM BigIntegrate and BigInsights. Testing BigIntegrate on BigInsights yielded the productivity, maintenance and performance that Citizens was looking for, and this all came prepackaged in the the ViON Hadoop Appliance that was rolled into its data centers?greatly simplifying entry into the Hadoop world.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Dana Rafiee, Destiny Corporation
John DiFranco , Citizens Bank

IBM BigInsights: Simplifying the Journey to Enlightenment

Session ID: DMT-3333 (link) | 2016-10-26 | 08:00 AM - 08:45 AM

When should a business embark on the big data journey? Is Hadoop as complex as it seems? How do we get data to the business faster? This presentation is for those considering implementing a big data strategy. The typical BI professional relies on traditional methods when it comes to building out operational reporting structures, and may find the concept of a data lake intimidating. Partnering with IBM's BigInsights will simplify the implementation of a big data platform. The speaker will share a journey revealing the use case that initiated the need to implement a big data platform, and explaining how business users can quickly gain data intelligence utilizing the advanced, ad hoc tools included with the BigInsights solution.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Linda Zimmerman, Delhaize America

Next-Generation Architecture: Creating a Modern Data Infrastructure for a Cognitive Future

Session ID: DMT-3415 (link) | 2016-10-26 | 08:00 AM - 08:45 AM

The next generation of information architecture is radically different from the database-driven, on-premise constructs of earlier generations. Leading organizations are pushing the boundaries of data architecture with a next-generation platform designed for the fast, voluminous and varied data of today. This session, based on an IBM Institute for Business Value expert perspective, will guide executives towards a clear understanding of the critical components in a modern data infrastructure.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Daniel Sutherland, IBM
Andy MARTORELLI, IBM

First Steps Towards a Data Lake: Insight from Southwest Power Pool

Session ID: DMT-3451 (link) | 2016-10-26 | 09:00 AM - 09:45 AM

Creating a data lake infrastructure is a journey. This session will discuss Southwest Power Pool?s data lake vision, why we selected the IBM BigInsights product, phase 1 implementation of the data lake, and future plans.

Program

Sessions

Track

Data management

Level

Not applicable

Speakers

Srinivas Kolluru, Southwest Power Pool
Russell Mason, Southwest Power Pool

Modernizing Your Data Warehouse with Hadoop

Session ID: DMT-3507 (link) | 2016-10-24 | 05:00 PM - 05:45 PM

Join this session to learn about how managing unstructured data with open source analytic tools provides significant performance benefits while delivering unprecedented business insights. The discussion will also cover tips and tricks for optimizing your Hadoop infrastructure.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Dwaine SNOW, IBM

Spark Today and Tomorrow: How IBM Can Help You on Your Journey

Session ID: DMT-3513 (link) | 2016-10-25 | 05:00 PM - 05:45 PM

This session will explore Spark capabilities and future development directions. The discussion will outline how Spark can help organizations enhance their analytical capabilities using this high-powered, open source offering.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Niru Anisetti, IBM

Data without Governance Is a Liability: Data Lake Best Practices

Session ID: DMT-3515 (link) | 2016-10-27 | 11:00 AM - 11:45 AM

Data enablement is a crucial capability for the successfully turning data into actionable information; i.e., creating business value from data. The lack of this capability is a significant stressor to data management and analytics delivery across industries. Adding worry to challenge, data professionals have to face this fact daily: data without governance is a material liability. This talk outlines some common assumptions and blind spots related to data lake implementation strategies which, once understood, can dramatically increase data enablement across an organization. Focus areas include components of the data lake, the value of governed data and best practices for architecture, technology and deployment.

Program

Sessions

Track

Data management

Level

Advanced

Speakers

David Stevens, IBM

Accelerate Your Data Science Delivery with Integrated Notebooks and IBM BigInsights

Session ID: DMT-3516 (link) | 2016-10-27 | 01:00 PM - 01:45 PM

Notebooks are super-charging data science because they provide data scientists with a UI for Apache Spark on a Hadoop Cluster. Notebooks accelerate data science because they support collaboration, enable reproducible research, and empower data scientists to do deeper data exploration, and create powerful visualizations. Notebooks on Hadoop will allow data science and analytics professionals to push and pursue advanced analytics and data science in directions they have yet to fully imagine. This talk provides a quick overview of what a Notebook is, and then demos analytics capabilities on Hadoop using Spark and Notebooks on an IBM BigInsights cluster. Come learn about why Notebooks are the future of data science and analytics.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Rohan Vaidyanathan, IBM
Chris Snow, IBM

Constant Contact: An Online Marketing Leader's Data Lake Journey

Session ID: DMT-3517 (link) | 2016-10-26 | 11:00 AM - 11:45 AM

Join this session to learn how Constant Contact, a leading online marketing company, uses IBM BigInsights to deliver value to its clients.

Program

Sessions

Track

Data management

Level

Introductory

Speakers

Matt Laudato, Constant Contact

Getting Value Out of Your Hadoop Cluster with IBM BigInsights 4.2

Session ID: DMT-3518 (link) | 2016-10-26 | 09:00 AM - 09:45 AM

BigInsights 4.2 is IBM's latest release of its industry-leading, enterprise-level open analytics platform for Hadoop. This release puts the full range of analytics for Hadoop, Spark and SQL into the hands of advanced analytics and data science teams on a single platform. Specifically for IBM Open Platform (IOP), it includes new Apache components, currency updates to existing components and integration with Apache Spark. For value-adds like IBM Big SQL, it introduces a wide range of new functional capabilities and performance enhancements for RDBMS offload and consolidation via BigSQL. This talk will cover new feature highlights, including how to upgrade to this new release.

Program

Sessions

Track

Data management

Level

Advanced

Speakers

Hebert Pereyra, IBM
SCOTT GRAY, IBM

IBM Open Platform: A Technical Overview

Session ID: DMT-3519 (link) | 2016-10-24 | 11:00 AM - 11:45 AM

The IBM Open Platform (IOP) is IBM's Apache Hadoop distribution, which includes Spark. 100% open source, it includes the most recently available components. Not all distributions are the same! Join this session to learn what IOP includes and what benefits the various components provide. We will also discuss new components that are being added and ways for you to access the latest updates.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Pandit Prasad, IBM
Di Li, IBM
Demetrios Dimatos, IBM

Using Spark to Overcome the Force of Data Gravity

Session ID: DMT-3559 (link) | 2016-10-27 | 10:00 AM - 10:45 AM

Data has gravity. That is, as data accumulates, it builds mass; and as it builds mass, there is a greater likelihood that additional services, applications and analytics will be attracted to this data. Additionally, as data mass evolves, services and applications are more likely to be "drawn to the data," rather than vice versa. Enter Spark, which will access data where it resides, process it, and then return the answer or write some data back out. This session will discuss how Spark is impacted by data gravity, and how that impacts where data should be stored, and/or Spark should be run to optimize performance and efficiency.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Paul Yip, IBM
Dwaine SNOW, IBM

A Data Science Introduction for Database Girls/Guys

Session ID: DMT-3561 (link) | 2016-10-24 | 05:00 PM - 05:45 PM

The focus of this session is to help database administrators and data analysts get a glimpse of the world of predictive modeling and machine learning... without the deep math. In this session, we will provide foundational knowledge on predictive modeling and machine learning, including how the data is shaped to support this work. By knowing what data science and advanced analytics professionals do in their day-to-day work?and what they care about?you'll be able to have a semi-intelligent conversation about how to best work with these users.

Program

Sessions

Track

Data management

Level

Introductory

Speakers

Jacques Roy, IBM
Willem Hendriks, IBM

Ensuring High Availability within Your Hadoop Cluster

Session ID: DMT-3562 (link) | 2016-10-27 | 08:00 AM - 08:45 AM

This session will focus on how to enable high availability for critical components in your IBM BigInsights installation. It covers configuration and management of HA solutions for metadata in your Hadoop cluster. This includes components in the IBM Open Platform (IOP) stack, such as HDFS, Resource Managers, HBase and Hive metastore. We will go over recovery processing after a master node failure. The session will also explore server setup procedures for switching to standby Ambari server in the event of primary server failure. For IBM BigInsights value-add(s), we will introduce the Big SQL HA feature with automatic metadata and log replication, and go over the management of primary and secondary head nodes.

Program

Sessions

Track

Data management

Level

Advanced

Speakers

Hebert Pereyra, IBM
George Lapis, IBM

Loading Data into a Business Intelligence Cloud Using IBM BigInsights on Cloud

Session ID: DMT-3563 (link) | 2016-10-26 | 10:00 AM - 10:45 AM

Partitioning a table on one or more columns allows data to be organized in such a way that querying the table with predicates that reference the partitioning columns results in better performance. Let?s take a look at how IBM Big SQL?s LOAD HADOOP statement can be used to load data into a partitioned table. Also, we'll discuss dynamic and static partitioning in the context of a cloud deployment.

Program

Sessions

Track

Data management

Level

Advanced

Speakers

Sampada Basarkar, IBM
Meline Nikoghossian, IBM

Tuning Hadoop and Spark to Improve Cluster Performance

Session ID: DMT-3564 (link) | 2016-10-24 | 09:00 AM - 09:45 AM

Is your Hadoop or Spark cluster running slower then dial-up? Do jobs take a long time to complete? Do queries require more time then desired? Do data ingest or export rates needs improvement? Want to move from the Hadoop slow lane into the fast lane? Run (don't walk) to this session! Learn how to isolate and resolve bottlenecks that hurt cluster performance; understand resource limits for CPU, I/O and network bandwidth; monitor resource usage within the cluster; and get expert advice on tuning your cluster, including recommendations for Linux kernel, network communications, file system, JVM, HDFS, Hadoop and Spark frameworks.

Program

Sessions

Track

Data management

Level

Advanced

Speakers

Stewart Tate, IBM Corp
Eric Yang, IBM

NoSQL 101: A Field Guide to the World of Modern Data Stores

Session ID: DMT-3565 (link) | 2016-10-24 | 08:00 AM - 08:45 AM

Choose your database wisely... There are many types of databases and data analysis tools to choose from when building your application. Should you use a relational database? How about a key-value store? Maybe a document database? Is a graph database the right fit? What about polyglot persistence and the need for advanced analytics? If you feel a bit overwhelmed, don?t worry. This session lays out the various database options and analytic solutions available to meet your app?s unique needs. You?ll see how data can move across databases and development languages, so you can work in your favorite environment without the friction and productivity loss of the past.

Program

Sessions

Track

Data management

Level

Introductory

Speakers

Lawrence Weber, IBM

IBM DataWorks Data Access: Are You Open for Data?

Session ID: DMT-3582 (link) | 2016-10-27 | 11:00 AM - 11:45 AM

IBM DataWorks makes the promise of data access a reality by integrating ingestion, data preparation, storage and governance with a powerful shop for data experience. This provides a foundation of trusted data access that enables deep collaboration and expanded access to new data sources and data types with control and context. DataWorks integrates this trusted access layer with all user experiences and analytics processing capabilities to drive analytics and infuse insights into day-to-day business processes.

Program

Sessions

Track

Data management

Level

Intermediate

Speakers

Harsha Kapre, IBM