BigData News Sunday, April 1 Machine learning, Introductory python material, Deep learning frameworks & more…

BigData News TLDR / Table of Contents

  • A list of 10 useful Github repositories made up of IPython (Jupyter) notebooks, focused on teaching data science and machine learning.
  • This post is made up of a collection of 10 Github repositories consisting in part, or in whole, of IPython (Jupyter) Notebooks, focused on transferring data science and machine learning concepts.
  • They go from introductory Python material to deep learning with TensorFlow and Theano, and hit a lot of stops in between.
  • So here they are: 10 useful IPython Notebook Github repositories in no particular order: – – This warmup notebook is from postdoctoral researcher Randal Olson, who uses the common Python ecosystem data analysis/machine learning/data science stack to work with the Iris dataset.
  • This is an eclectic mix, put together by John Wittenauer, with notebooks for Python implementation of Ng’s Coursera course exercises, Udacity’s TensorFlow-oriented deep learning course exercises, and the Spark edX course exercises.

Tags: machine learning, introductory Python material, deep learning, IPython Notebook, IPython Notebook Github

  • When we originally created the repo, there were many little tips and tricks we had to use to ensure we were using the same model between frameworks and it was done in an optimal way.
  • Of course, while it is tempting to compare different frameworks with these metrics such as speed and inference time, theyarent meant to suggest anything about the overall performance of the frameworksince they omit important comparisons such as: help and support, availability of pre-trained models, custom layers and architectures, data-loaders, debugging,…
  • There are many popular deep learning frameworks that are leveraged in the community, and this is one effort to help AI developers and data scientists leverage different deep learning frameworks as applicable.
  • A related effort is theOpen Neural Network Exchange (ONNX)which is an open source interoperability standard for transferring deep learning models between frameworks.
  • In contrast, the repo we are releasing as a full version 1.0 today is like a Rosetta Stone for deep learning frameworks, showing the model building process end to end in the different frameworks.

Tags: deep learning frameworks, different frameworks, deep-learning frameworks, ,

  • This post summarizes the contents of a recent O’Reilly article outlining a number of methods for interpreting machine learning models, beyond the usual go-to measures.
  • An article on machine learning interpretation appeared on O’Reilly’s blog back in March, written by Patrick Hall, Wen Phan, and SriSatish Ambati, which outlined a number of methods beyond the usual go-to measures.
  • I approach complex machine learning model interpretability as an advocate of automated machine learning, since I feel the two techniques are flipsides of the same coin: if we are going to be using automated techniques to generate models on the front-end, then devising and employing appropriate ways to simplify and…
  • If the surrogate model is created by training, say, a simple linear regression or a decision tree with original input data and predictions from the more complex model, the characteristics of the simple model can then be assumed to be an accurately descriptive stand-in of the more complex model.
  • Sensitivity analysis — this technique helps to determine whether intentionally perturbed data, or similar data changes, modify model behavior and destabilizes the outputs; it is also useful for investigating model behavior for particular scenarios of interest or corner cases – – Global variable importance measures — typically the domain of…

Tags: human domain knowledge, Surrogate models, variable importance measures, surrogate model, Wen Phan

Top Big Data Courses

The Ultimate Hands-On Hadoop - Tame your Big Data! (31,889 students enrolled)

By Sundog Education by Frank Kane
  • Design distributed systems that manage "big data" using Hadoop and related technologies.
  • Use HDFS and MapReduce for storing and analyzing data at scale.
  • Use Pig and Spark to create scripts to process data on a Hadoop cluster in more complex ways.
  • Analyze relational data using Hive and MySQL
  • Analyze non-relational data using HBase, Cassandra, and MongoDB
  • Query data interactively with Drill, Phoenix, and Presto
  • Choose an appropriate data storage technology for your application
  • Understand how Hadoop clusters are managed by YARN, Tez, Mesos, Zookeeper, Zeppelin, Hue, and Oozie.
  • Publish data to your Hadoop cluster using Kafka, Sqoop, and Flume
  • Consume streaming data using Spark Streaming, Flink, and Storm

Learn more.

Taming Big Data with MapReduce and Hadoop - Hands On! (13,894 students enrolled)

By Sundog Education by Frank Kane
  • Understand how MapReduce can be used to analyze big data sets
  • Write your own MapReduce jobs using Python and MRJob
  • Run MapReduce jobs on Hadoop clusters using Amazon Elastic MapReduce
  • Chain MapReduce jobs together to analyze more complex problems
  • Analyze social network data using MapReduce
  • Analyze movie ratings data using MapReduce and produce movie recommendations with it.
  • Understand other Hadoop-based technologies, including Hive, Pig, and Spark
  • Understand what Hadoop is for, and how it works

Learn more.