BigData News Thursday, February 8
BigData News TLDR / Table of Contents
- Big help for your first big data project. | Informatica US
- Our big data management workbook covers strategies for managing data, identifying initiatives, and enlisting sponsors and stakeholders, to drive results.
- big data, email address, big data analytics, cutting-edge analytics initiatives, high-priced data scientists
- Machine Learning Summarized in One Picture
- Here is a nice summary of traditional machine learning methods, from Mathworks.I also decided to add the following picture below, as it illustrates a metho…
- supervised learning, supervised learning category, kernel density estimation, neural networks, Logistic Regression
- Artificial Intelligence May Have Cracked Freaky 600-Year-Old Manuscript
- Since its discovery over a hundred years ago, the 240-page Voynich manuscript, filled with seemingly coded language and inscrutable illustrations, has confounded linguists and cryptographers. Using artificial intelligence, Canadian researchers have taken a huge step forward in unraveling the document’s hidden meaning.
- Machine learning mega-benchmark: GPU providers (part 2) | RaRe Technologies
- We had recently published a large-scale machine learning benchmark using word2vec, comparing several popular hardware providers and ML frameworks in pragmatic aspects such as their cost, ease of use, stability, scalability and performance.
- GPUs, end GPUs segment, lower end GPUs, expensive AWS GPUs, high-end GPUs division
- Deep Learning for Natural Language Processing: Tutorials with Jupyter Notebooks
- At untapt, all of our models involve Natural Language Processing (NLP) in one way or another. Our algorithms consider the natural, written language of our users’ work experience and, based on…
- deep learning, Natural Language Processing, deep learning approaches, Deep learning algorithms, hands-on Jupyter notebooks
- What to Do With the Wealth of Information in Credit Card Transaction Descriptions
- Credit card transactions create a wealth of data about consumers. How can banks use that data and automated machine learning to drive better business results?
- credit risk, transaction descriptions, credit card transactions, credit risk models, personal credit card
- SpringML Achieves Machine Learning Partner Specialization in Google Cloud Partner Program
- PLEASANTON, Calif., Feb. 7, 2018 /PRNewswire/ — SpringML Achieves Machine Learning Partner Specialization in Google Cloud Partner Program.
- Google Cloud, machine learning, Google Cloud partner, Google Cloud specialization, Partner Specialization Program
- Big Data is a chicken-or-egg scenario on the farm
- It’s one of the most- and least-connected industries out there. While agriculture has remained one of the least digitized sectors, there is a bountiful harvest of data that could give some economic leverage back to the farmer. But what will it take for farmers to invest?
- big data, food safety, big data analytics, recent Big Data, data science life
- Video: Jane Robbins’ Testimony to Congress: On Consent and Student Data Privacy
- On January 30, 2018, Jane Robbins, a lawyer with the American Principles Project, testified to Congress’s House Education and Workforce Committee. She strongly opposed the recommendations of the Commission on Evidence-based Policy (CEP) that there should be an expansion of federal agencies’ access to data collected on U.S. citizens, or…
- data, college education data, discrete data point, American Principles Project, consent
- The Healthcare IT Consulting Daily
- Virtelligence, A Global Healthcare consulting and technology services. by Chris Leon
Tweeted At: Thu Feb 08 03:45:01 +0000 2018
- How to Manage Big Data and Deliver Transformative Insights – Much-hyped but genuinely transformative, big data analytics is a technology that can fully deliver on sky-high promises—but only if the data is managed correctly.
- Too many high-priced data scientists spend most of their time just prepping bad data, rather than running business-redefining experiments.
Tweeted At: Thu Feb 08 12:48:31 +0000 2018
Author: Vincent Granville
- Here is a nice summary of traditional machine learning methods, from Mathworks.
- I also decided to add the following picture below, as it illustrates a method that was very popular 30 years ago but that seems to have been forgotten recently: mixture of Gaussian.
- Note that you can use a mixture of any distributions, not just Gaussian, for instance, (data-driven) estimated distributions such as those based on kernel density estimation.
Tweeted At: Thu Feb 08 16:01:13 +0000 2018
Publish Date: 2018-01-29T16:00:00+00:00
Author: George Dvorsky
- The Vonyich manuscript (Image: Beinecke Rare Book Manuscript Library, Yale University)Since its discovery over a hundred years ago, the 240-page Voynich manuscript, filled with seemingly coded language and inscrutable illustrations, has confounded linguists and cryptographers.
- Using artificial intelligence, Canadian researchers have taken a huge step forward in unraveling the documentâs hidden meaning.Named after Wilfrid Voynich, the Polish book dealer who procured the manuscript in 1912, the document is written in an unknown script that encodes an unknown languageâa double-whammy of unknowns that has, until this…
- Some have even suggested the document is an elaborate hoax.The Vonyich manuscript (Image: Beinecke Rare Book Manuscript Library, Yale University)For Greg Kondrak, an expert in natural language processing at the University of Alberta, this seemed a perfect task for artificial intelligence.
- (Image: Beinecke Rare Book Manuscript Library, Yale University)For the second step, the researchers entertained a hypothesis proposed by previous researchersâthat the script was created with alphagrams, that is, words in which text has been replaced by an alphabetically ordered anagram (For example, an alphagram of GIZMODO would read DGIMOOZ).
- For the final step, the researchers deciperhered the opening phrase of the manuscript, and presented it to colleague Moshe Koppel, a computer scientist and native Hebrew speaker.
Tweeted At: Thu Feb 08 15:53:30 +0000 2018
Publish Date: 2018-02-08T14:17:45+00:00
Author: Leave A
- We include the following HW platforms in this benchmark: Amazon Web Services AWS EC2, Google Cloud Engine GCE, IBM Softlayer, Hetzner, Paperspace and LeaderGPU.
- This HW provider list should be a good assortment of platforms with virtual instances (AWS, GCE), bare metal infrastructure (Softlayer), dedicated servers (Hetzner) and comparatively newer players specialized in providing GPUaaS (LeaderGPU, Paperspace).
- GPU prices change frequently, but at the moment, AWS provides K80 GPUs (p2 instances) starting at $0.9/hr which are billed in one second increments whereas the more powerful and performant Tesla V100 GPUs (p3 instances) commence at $3.06/hr.
- Paperspace rivals GCE in the low cost league with rates for dedicated GPUs starting from Quadro M4000 at $0.4/hr to Tesla V100 at $2.3/hr.
- IBM Softlayer is one of the very few platforms on the market which provides bare metal servers with GPUs on monthly and hourly basis.
Tweeted At: Thu Feb 08 14:05:06 +0000 2018
Publish Date: 2017-11-13T03:36:45.495000+00:00
Author: Jon Krohn
- Deep Learning for Natural Language Processing: Tutorials with Jupyter NotebooksAt untapt, all of our models involve Natural Language Processing (NLP) in one way or another.
- Our algorithms consider the natural, written language of our users’ work experience and, based on real-world decisions that hiring managers have made, we can assign a probability that any given job applicant will be invited to interview for a given job opportunity.A still from the intro to the “Deep Learning…
- We have found deep learning approaches to be uniquely well-suited to solving them.
- Deep learning algorithms:trivially include millions of model parameters that are free to interact non-linearly;can incorporate learning units designed specifically for use with sequential data, like natural language or financial time series data; and,are typically far more efficient in production environments than traditional machine learning approaches for NLP.To share my love…
- Following on from my acclaimed Deep Learning with TensorFlow LiveLessons, which introduced the fundamentals of artificial neural networks, my Deep Learning for Natural Language Processing LiveLessons similarly embrace interactivity and intuition, enabling you to rapidly develop a specialization in state-of-the-art NLP.A still from Lesson 3.2, where we calculate the area…
Tweeted At: Wed Feb 07 19:45:12 +0000 2018
Publish Date: 2017-10-30T11:59:57+00:00
Author: Colin Priest
- This quickly equates to billions of credit card transactions every day, which is precisely the type of big data that banks and fintechs are starting to use to better understand their customers.
- In a new predictive model I developed using transaction descriptions, which have become a powerful predictor in our customers’ credit risk models, I have been able to exceed the accuracy of traditionally designed credit risk models by up to 100%.
- Consider the word cloud above, which visualizes the transaction descriptions in a credit risk model.
- Until recently, most banks and fintechs didn’t use transaction descriptions for scoring credit risk.
- And now there is automated machine learning, expert software that automatically builds complex machine learning algorithms from historical data, enabling credit risk modeling staff to quickly ramp up their techniques.
Tweeted At: Thu Feb 08 18:11:26 +0000 2018
Publish Date: 2018-02-07T12:00:00+00:00
- “We are encouraged by the rapid expansion of the machine learning market, and are delighted to deepen our partnership with Google, one of the world’s leaders in this area,” said Charles Landry, CEO of SpringML.
- SpringML’s apps and services apply machine learning to today’s most pressing business problems, so that customers receive insights that they can trust to drive business growth.
- SpringML proudly uses Google Cloud Platform technology as part of the solutions that we build for our clients the world over.
- Obtaining the rigorous Google Cloud specialization in machine learning helps further validate the mission of SpringML, as well as add credibility to our capabilities.
- “SpringML is constantly looking for ways to solve our customers’ business problems using AI and machine learning,” said Girish Reddy, CTO of SpringML.
Tweeted At: Sun Feb 04 00:00:01 +0000 2018
Publish Date: 2018-02-01T13:00:00+00:00
Author: Tony Baer (Ovum)
- There’s hardly a lack of scenarios for digitizing farming, with the common thread for all the use cases is that they each involve Big Data.Among the obvious use cases, precision farming uses sensory data to tell farmers exactly where to plant and how much to water and how to fertilize….
- Food safety and spoilage prevention can be enhanced through use of smart devices that detect ambient humidity, temperature, chemical contamination, and the presence of gasses signaling the presence of harmful microbes.There are other Big Data use cases not necessarily specific to agriculture with the potential for eliminating cost and waste…
- When your GPS-equipped tractor or combine gets machine data from the equipment and agronomic data on field conditions, does the data belong to the farmer or the manufacturer?According to Todd J. Janzen, an attorney who specializes in agricultural law, there is nothing on the books that protects agricultural data.
- In this case, those transactions would be comprised overwhelmingly of IoT data from sensors that are deployed with food.Also: A day in the data science life: Salesforce’s Dr. Shrestha Basu Mallick | Port of Rotterdam plots IoT rollout, efficiency push with IBM, Cisco, Axians | Data scientist: The cult of…
- According to a survey conducted for Wal-Mart in China, food shoppers indicated that they would be willing to pay premiums of up to 30% more if they had a smartphone app that could scan the QR code of the food item to identify the source and wholesomeness.For farmers, such programs…
Tweeted At: Wed Feb 07 17:01:38 +0000 2018
Publish Date: 2018-02-06T23:22:47+00:00
Author: Christel Swasey
- She strongly opposed the recommendations of the Commission on Evidence-based Policy (CEP) that there should be an expansion of federal agencies’ access to data collected on U.S. citizens, or that there should be permission given to researchers to access that data without citizens’ consent.
- Allowing the government to vacuum mountains of such data and employ it for whatever purposes it deems useful, without the citizens’ consent or in some cases even his knowledge, conflicts deeply with this truth about the dignity of persons.
- Bear in mind that the analyses contemplated by the commission go further than merely sharing discrete data point among agencies, they involve creating new information about individuals via matching data, drawing conclusions, and making predictions about those individuals; so in essence the government would have information about a citizen even…
- Our founding principle, which enshrine consent of the governed, dictate that a citizen’s data belong to him rather than to the government.
- In closing, I reiterate my respect for the value of unbiased research as the foundation for policymaking, but speaking for the millions of parents with whom we work in various states whose concerns about education policy and data have been minimized by various levels of government for years, I urge…
Tweeted At: Thu Feb 08 05:49:25 +0000 2018
Author: Chris Leon
Top Big Data Courses
The Ultimate Hands-On Hadoop - Tame your Big Data! (31,889 students enrolled)By Sundog Education by Frank Kane
- Design distributed systems that manage "big data" using Hadoop and related technologies.
- Use HDFS and MapReduce for storing and analyzing data at scale.
- Use Pig and Spark to create scripts to process data on a Hadoop cluster in more complex ways.
- Analyze relational data using Hive and MySQL
- Analyze non-relational data using HBase, Cassandra, and MongoDB
- Query data interactively with Drill, Phoenix, and Presto
- Choose an appropriate data storage technology for your application
- Understand how Hadoop clusters are managed by YARN, Tez, Mesos, Zookeeper, Zeppelin, Hue, and Oozie.
- Publish data to your Hadoop cluster using Kafka, Sqoop, and Flume
- Consume streaming data using Spark Streaming, Flink, and Storm
Taming Big Data with MapReduce and Hadoop - Hands On! (13,894 students enrolled)By Sundog Education by Frank Kane
- Understand how MapReduce can be used to analyze big data sets
- Write your own MapReduce jobs using Python and MRJob
- Run MapReduce jobs on Hadoop clusters using Amazon Elastic MapReduce
- Chain MapReduce jobs together to analyze more complex problems
- Analyze social network data using MapReduce
- Analyze movie ratings data using MapReduce and produce movie recommendations with it.
- Understand other Hadoop-based technologies, including Hive, Pig, and Spark
- Understand what Hadoop is for, and how it works