# BigData News Saturday, March 17 Prediction interval forecasts, Probabilistic forecasting, Open source & more…

## BigData News TLDR / Table of Contents

- Probabilistic Forecasting: Learning Uncertainty
- The majority of industry and academic numeric predictive projects deal with deterministic or point forecasts of expected values of a random variable given som…
*prediction interval forecasts, probabilistic forecasting, multiple quantile forecasts, point forecasts, probabilistic forecasts*

- 18 Big Data tools you need to know!!
- In today’s digital digital transformation, big data has given organization an edge to analyze the customer behavior & hyper-personalize every interact…
*open source, Apache Hadoop, big data, Apache Hadoop data, big data tools*

- 5 Things to Know About Machine Learning
*data preparation, machine learning, , ,*

- Probabilistic forecasting comes in three main flavors, the estimation of quantiles, prediction intervals, and full density functions.
- If F_t is a strictly increasing, the quantile q(t, ) with proportion [0,1] of the random variable Y_t is uniquely defined as the value x such that P(Y_t < x) = or equivalently as the inverse of the distribution function.
- A prediction interval produced at time t for future horizon t+k is defined by its lower and upper bounds, which are the quantile forecasts q(t+k, _l) and q(t+k, _u).
- The forecasts are produced by a SARIMA model assuming a normal density: When it is assumed the future density function will take a certain form, this is called parametric probabilistic forecasting.
- This can be done by either gathering a set of finite quantiles forecasts such that with chosen nominal proportions spread on the unit interval, most common approach is to use quantile regression, or through direct distribution estimation methods such as kernel density estimation.

@KirkDBorne:

Probabilistic Forecasting — Learning Uncertainty: https://t.co/xLtzRV1rrV #abdsc #BigData #DataScience… https://t.co/KvKNrTdTxc

- While Apache Hadoop is the most well-established tool for analyzing big data, there are thousands of big data tools out there.
- Flume: is a framework for populating Hadoop with data from web servers, application servers and mobile devices.
- It allows for a unified view of all data in Hadoop clusters and allows diverse tools, including Pig and Hive, to process any data elements without needing to know physically where in the cluster the data is stored.
- JSON: Many of todays NoSQL databases store data in the JSON (JavaScript Object Notation) format thats become popular with Web developers – – Kafka: is a distributed publish-subscribe messaging system that offers a solution capable of handling all data flow activity and processing these data on a consumer website.
- This type of data (page views, searches, and other user actions) are a key ingredient in the current social web.

@granvilleDSC:

Checking out “18 Big Data tools you need to know!!” on #DataScience Central: https://t.co/mdltXnT7om #DataScientist… https://t.co/r0j27KhfQK

- It’s fairly well-discussed that data preparation takes a disproportionate amount of time in a machine learning task.
- Some of the best machine learning advice that I can think of is that since you are ultimately destined to spend so much of your time on preparing data for The Big Show, being determined to be the very best data preparation professional around is a pretty good goal.
- For some more practical insight into data preparation, here are a couple of places to start out: – – – – So you have modeled some data with a particular algorithm, spent time tuning your hyperparameters, performed some feature engineering and/or selection, and you’re happy that you have squeezed out…
- Other times, random splits of data will be useful; it depends on further factors such as the state of the data when you get it (is it split into train/test already?)
- Jupyter Notebooks have become a de facto data science development tool, with most people running notebooks locally or via some other configuration-heavy method such as in Docker containers, or in a virtual machine.

@kdnuggets:

5 Things to Know About #MachineLearning https://t.co/JnLBN9rsNp https://t.co/jaGQ7AMz7E

### Top Big Data Courses

#### The Ultimate Hands-On Hadoop - Tame your Big Data! (31,889 students enrolled)

*By Sundog Education by Frank Kane*

- Design distributed systems that manage "big data" using Hadoop and related technologies.
- Use HDFS and MapReduce for storing and analyzing data at scale.
- Use Pig and Spark to create scripts to process data on a Hadoop cluster in more complex ways.
- Analyze relational data using Hive and MySQL
- Analyze non-relational data using HBase, Cassandra, and MongoDB
- Query data interactively with Drill, Phoenix, and Presto
- Choose an appropriate data storage technology for your application
- Understand how Hadoop clusters are managed by YARN, Tez, Mesos, Zookeeper, Zeppelin, Hue, and Oozie.
- Publish data to your Hadoop cluster using Kafka, Sqoop, and Flume
- Consume streaming data using Spark Streaming, Flink, and Storm

#### Taming Big Data with MapReduce and Hadoop - Hands On! (13,894 students enrolled)

*By Sundog Education by Frank Kane*

- Understand how MapReduce can be used to analyze big data sets
- Write your own MapReduce jobs using Python and MRJob
- Run MapReduce jobs on Hadoop clusters using Amazon Elastic MapReduce
- Chain MapReduce jobs together to analyze more complex problems
- Analyze social network data using MapReduce
- Analyze movie ratings data using MapReduce and produce movie recommendations with it.
- Understand other Hadoop-based technologies, including Hive, Pig, and Spark
- Understand what Hadoop is for, and how it works

Comments are closed, but trackbacks and pingbacks are open.