今年夏季,絕不能錯過名勝壹號世界郵輪重回基隆啟航!多種優惠方案讓您輕鬆預訂心儀的日本沖繩郵輪行程,同時省下大筆開支!

Roaring Elephant

1 年前
-
-
(基於 PinQueue 指標)

Roaring Elephant

Bite-Sized Big Tech

Episode 308 – Infra as Code: Building Blocks with Rosemary Wang

Today we are joined by our special guest Rosemary Wang, Developer Advocate at HashiCorp and author of the book "Infrastructure as Code: Patterns and practices" and the topic is, quite obviously: Infrastructure as Code. In this second episode with Rosemary, we continue with move information on the components you'll likely need. Head over to the Manning Publications website for more information about Rosemary's book: "Infrastructure as Code: Patterns and Practices" Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-08-02 14:00:00

Episode 307 – Infra as Code: Introduction with Rosemary Wang

Today we are joined by our special guest Rosemary Wang, Developer Advocate at HashiCorp and author of the book "Infrastructure as Code: Patterns and practices" and the topic is, quite obviously: Infrastructure as Code. In this first episode with Rosemary, we start with an introduction on the topic. Head over to the Manning Publications website for more information about Rosemary's book: "Infrastructure as Code: Patterns and Practices" Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-07-26 14:00:00

Episode 306 – Personal Branding

As a final episode in our recent coverage of Career advice, we're tackling the concept of Personal branding and why it is important for your career. And we finish off with potential sources of help and inspiration when you are indeed thinking of making a switch. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-07-19 14:00:00

Episode 305 – Our Own Careers – Present Day

After a couple of episodes (trying) to give career advise, we figured it made sense to talk about our own career paths. Now both Dave and Jhon have been "around", so this will be a mini series inside the careers track, and here is the final installment bringing it to the present day. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-07-12 14:00:00

Episode 304 – Our Own Careers – The Middle Years

After a couple of episodes (trying) to give career advise, we figured it made sense to talk about our own career paths. Now both Dave and Jhon have been "around", so this will be a mini series inside the careers track, and here is the second installment covering our middle years. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-07-05 14:00:00

Episode 303 – Our Own Careers – The Early Days

After a couple of episodes (trying) to give career advise, we figured it made sense to talk about our own career paths. Now both Dave and Jhon have been "around", so this will be a mini series inside the careers track, and here is the first installment covering our early days. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-06-28 14:00:00

Episode 302 – To be or not to be… a Manager

Continuing our series on career choices, today we go deep on the age old question: does career equal a manager position or is that a misconception? With your co-hosts having very different goals, this is sure to be a god one! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-06-21 14:00:00

Episode 301 – The (Career) Niche Trap

After the small pause, looking back at 300 episodes of Roaring Elephant goodness, we're back this week dispensing career advice. This time we focus on the risks of allowing yourself to become stuck in a niche role that could hinder your career chances. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-06-14 14:00:00

Episode 300 – The Road to 300…

Three centuries of Roaring Elephant podcasts! Well, in number of episodes then... We don't actually believe the universe would be able to survive actual 300 years of our content! :) Why don't you join us on this relaxing episode where we take a slow meandering walk down memory lane... Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-06-07 14:00:00

Episode 299 – (Career) Change is Good!

Following on form last weeks discussion about career choices, we now discuss why, in our opinion at least, fear of change is rarely necessary and more often than not a cause for career stagnation! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-05-31 14:00:00

Episode 298 – The winding (career) path

As we have been active in the amazing world of IT for a while now and both of us are involved with the process of hiring people, Dave as a hiring manager and Jhon as a mentor and frequent candidate interviewer, we are going to try to share some of our experience and (let's be brave) maybe even some wisdom! Unrelated to this episode, you may have caught a glimpse of Dave in the wild recently when he talked about The key tennents of successful presales leadership at demofest. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-05-24 14:00:00

Episode 297 – Open Source Misunderstandings: OSS Vendors aren’t Different.

As a conclusion to our continuing series about misunderstandings that (still) surround Open Source, today we are covering the misconception that Open Source vendors are just the same as regular vendors of proprietary software. This may come as a bit of a shock to some, but since good OSS vendors are well embedded in the open source community and culture, should it really come as a surprise that this also influences the interactions and sales cycles? Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-05-17 14:00:00

Episode 296 – Open Source Misunderstandings: Community Doesn’t Matter!

In our continuing series about misunderstandings that (still) surround Open Source, today we are covering the misconception that it does not matter if your enterprise open source vendor is part of, and contributing a healthy amount, to the open source community. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-05-10 14:00:00

Episode 295 – Open Source Misunderstandings: Open Source is Fragmented!

In our continuing series about misunderstandings that (still) surround Open Source, today we are covering the misconception that Open Source is hopelessly fragmented and therefore impossible to use in an Enterprise setting. While we'll agree to a part of it, we would suggest it's actually one of pen Source's many strengths and nothing to be afraid of! For this episode, we found inspiration at the following web page: https://enterprisersproject.com/article/2022/2/4-common-misunderstandings-about-enterprise-open-source-software Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-05-03 14:00:00

Episode 294 – Open Source Misunderstandings: Open Source is Just Cheap!

In our continuing series about misunderstandings that (still) surround Open Source, today we are covering the misconception that the only reason you should look at open source is because it's cheap. While it may be so that there is a lesser or absent license cost, there are both reasons why the total cost of ownership may well be higher than proprietary software, but the benefits can also far outweigh that... For this episode, we found inspiration at the following web page: https://enterprisersproject.com/article/2022/2/4-common-misunderstandings-about-enterprise-open-source-software Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-04-26 14:00:00

Episode 293 – Open Source Misunderstandings: Do-It-Yourself is More Costly?

In a new series about misunderstandings that (still) surround Open Source, today we are covering the misconception that DYI is just more expensive. And while that may be true in pure monetary terms, switching to Open Source also brings a number of great advantages that, while hard to express in hard currency, are definitely worth having! For this episode, we found inspiration at the following web page: https://enterprisersproject.com/article/2022/2/4-common-misunderstandings-about-enterprise-open-source-software Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-04-19 14:00:00

Episode 292 – Open Source Misunderstandings: Security is “Their” Problem

In a new series about misunderstandings that (still) surround Open Source, today we are covering the misconception that security is "their" problem. That's simply not the case. When you leverage the awesome power that is Open Source, it is definitely your responsibility to make sure you have done everything you could to make sure your product or service is secure! For this episode, we found inspiration at the following web page: https://enterprisersproject.com/article/2022/2/4-common-misunderstandings-about-enterprise-open-source-software Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-04-12 14:00:00

Episode 291 – Digital Adoption Platforms?

Out with the old, in with the new... snake-oil? Apparently the industry has not learned from the mistakes that were made with Digital Transformation projects, but never fear: the solution to all your ailments is here with the bright new marketing hype of Digital Adoption Platforms! Or are we just too cynically? You be the judge! For this episode, we got our inspiration from the following web sources: https://www.toolbox.com/tech/innovation/guest-article/the-enterprise-kingmaker-digital-adoption-platforms/ https://www.c-bstatesman.com/live-updates-digital-adoption-platform-dap-software-market-2022-development-stat-walkme-applearn-useriq-appcues-whatfix/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-04-05 14:00:00

Episode 290 – CyberSecurity Needs YOU!

We are under constant threat! And by "we", we mean our Tech environment, of course. Malware is rampant and software isn't always the best prepared to fight of the endless onslaught. So who do we blame? The CISO, the Cert team, that security person? Well, maybe we need to look a little closer to home... Inspiration for this episode was found here: (But we really only used the title of that frist link, to be honest, cause the article itself is rather "meh"....) Who should own cybersecurity in your organization? - TechBullionhttps://risk3sixty.com/2020/09/14/how-to-design-a-security-program-organizational-structure-that-supports-your-business-goals-part-2/https://www.dayblink.com/wp-content/uploads/2019/04/Structuring-the-InfoSec-Org_vFINAL_04.pdf Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-03-29 14:00:00

Episode 289 – The Sticky Employee Conundrum

Hiring junior employees and training them to become productive members of the team has gotten "out of style" due to the perceived or real problems with longer term employee retention. In this episode, we take a deeper look at this "sticky" topic. For this episode, we found inspiration in the following web resource: How to get reskilled employees to stick around - Protocol Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-03-22 15:00:00

Episode 288 – The Future of Open Source Security?

In recent time it has become more and more obvious that the security of Open Source software is something to be taken seriously. For this reason, the Linux foundation, under the banner of the OpenSSF, is experimenting with a new project, prosaically names the Alpha & Omega Project. The Alpha & Omega project was recently introduced to the world on a webinar that is currently available here. And the project is looking for community members: Get involved in OpenSSF Working GroupsSecuring Critical ProjectsIdentifying critical open source projects (feed into Alpha & Omega)Best Practices for OSS DevelopersBuilding high quality “leave behind” material for maintainers when we engageVulnerability DisclosuresImprove the vulnerability disclosure processJoin Alpha-Omega announcements mailing listSlack channel: #alpha_omega at slack.openssf.orgFill out the interest form Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-03-15 15:00:00

Episode 287 – AI vs Business & Public Clouds Growing up

Some Cloud updates today where apparently AWS is no longer lagging behind and also, should AI influence business operations, or the other way around? There's something to be said for both opinions, and we don't hold back! This episode was inspired by the following online resources: It’s time for a new cloud price warhttps://www.protocol.com/newsletters/protocol-enterprise/aws-price-war-microsoft-google?rebelltitem=1#rebelltitem1 Do Companies Have To Adjust To AI Or Vice Versa?https://www.forbes.com/sites/naveenjoshi/2022/02/24/do-companies-have-to-adjust-to-ai-or-vice-versa/?sh=409cfd742e6a Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-03-08 15:00:00

Episode 286 – The Importance of Open Source

It's not a secret that Open Source is widely used in organizations everywhere. At least, for our listeners this should definitely be no secret! In this episode, we discuss a couple of stats that Dave found on the internet and we're not quite agreeing with them... Shocker! I know! In this episode, we used the following online resource as inspiration: Open source is more important than ever, say developers. Here's what's driving adoptionhttps://www.zdnet.com/article/open-source-is-more-important-than-ever-say-developers-heres-why/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-03-01 15:00:00

Episode 285 – Cloud Costs Nightmare Statistics

Cloud statistics apparently cause insomnia, or at least the internet claims that to be true... As true defenders of a good night's sleep, your co-hosts will get to the bottom of this for you and see if we can keep the night terrors at bay! In this final part, we look at budget and costs and why it's never as rosy as you might have predicted! The following website was the inspiration for this episode: https://www.thesslstore.com/blog/19-cloud-computing-statistics-that-will-keep-you-awake-at-night/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-02-22 15:00:00

Episode 284 – Cloud Security Nightmare Statistics

Cloud statistics apparently cause insomnia, or at least the internet claims that to be true... As true defenders of a good night's sleep, your co-hosts will get to the bottom of this for you and see if we can keep the night terrors at bay! In this second part, we concentrate on the topic of Cloud Security, or lack thereof! The following website was the inspiration for this episode: https://www.thesslstore.com/blog/19-cloud-computing-statistics-that-will-keep-you-awake-at-night/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-02-15 15:00:00

Episode 283 – Cloud Adoption Nightmare Statistics

Cloud statistics apparently cause insomnia, or at least the internet claims that to be true... As true defenders of a good night's sleep, your co-hosts will get to the bottom of this for you and see if we can keep the night terrors at bay! In this first part, we concentrate on the topic of Cloud Adoption. The following website was the inspiration for this episode: https://www.thesslstore.com/blog/19-cloud-computing-statistics-that-will-keep-you-awake-at-night/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-02-08 15:00:00

Episode 282 – Enterprise I.T. Knowledge Atrophy

In todays episode, we started talking about the trends towards no-code and low-code, got a bit side tracked towards commoditized A.I. and ended up talking mostly about the apparent voluntary atrophy of internal I.T. knowledge due to outsourcing knowledge to applications. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-02-01 15:00:00

Episode 281 – Open Source Adoption Journey

Open Source is definitely making a lot of head way in the corporate world and today the Roaring Elephant takes a look at how far this adoption of the new reality has actually progressed. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-01-25 15:00:00

Episode 280 – Solo Maintainer OSS Projects @Risk!

A cry for help or wanton vandalism by Open Source Project maintainers? Last week, the maintainer of the popular NPM package manager was responsible for breaking compatibility far and wide. As it turns out, this was a cry for attention for the plights of small but important OSS projects that are used widely by "Billion Dollar mega corporations" while the few (or even single) maintainers are merely scraping by to pay the bills. A little bit of research proves that this is far from a solitary issue an neither is it a recent development. Over the last decade, many projects that you and I use every day have almost disappeared because the maintainer just could not spend the time for free anymore. How did it get this far and how could this possibly be improved? Listen to this episode to hear our thoughts on the matter. Here are the articles we used in this episode: https://www.bleepingcomputer.com/news/security/dev-corrupts-npm-libs-colors-and-faker-breaking-thousands-of-apps/https://increment.com/open-source/the-rise-of-few-maintainer-projects/https://www.propublica.org/article/the-worlds-email-encryption-software-relies-on-one-guy-who-is-going-brokehttps://www.informationweek.com/it-life/ntp-s-fate-hinges-on-father-time- https://arstechnica.com/information-technology/2014/04/tech-giants-chastened-by-heartbleed-finally-agree-to-fund-openssl/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-01-18 15:00:00

Episode 279 – The Lidle Cloud?

Cloud Computing is everywhere and apparently more companies feel it's a good idea to claim a piece of that fluffy pie. Recently, the parent company of the well know supermarket chain Lidl decided to join in the fun and they have now solidified their initiative by gobbling up a cloud security firm.... Is this a sign of things to come, or is the new Lidl cloud destined to stay... little? We also take a quick look at how the UK government is going to penalize unsafe password policies... A good thing, for sure, but why is the suggested law already full of security holes? Here are the articles that were referenced in this episode: Lidl parent company becomes cloud provider | Computable.nlSchwarz Group Acquires Israeli Hybrid Cloud Cybersecurity Innovator XM Cyber (prnewswire.com)Huge fines and a ban on default passwords in new UK law - BBC News Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-01-11 15:00:00

Episode 278 – The 2021 Open Source Jobs Report (2)

The yearly job report from The Linux Foundation was releases in cooperation with Edx. In last week's first part we went over the findings of the Linux Foundation and in this episode, we'll cover the Edx angle a bit more. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2022-01-04 15:00:00

Episode 277 – The 2021 Open Source Jobs Report

The yearly job report from The Linux Foundation was releases in cooperation with Edx. In this first part we'll go over the findings of the Linux Foundation and in next episode, we'll cover the Edx angle a bit more. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-12-28 15:00:00

Episode 276 – IBM Fails @ Open Source Cloud?

When we saw IBM put up a "Guide to open source development in the Cloud" website, we were expecting so much.... And we got so little... While we've often lauded IBM for their open source initiatives, this "guide" reads incomplete and pretty much a waste of space. See if you agree with us? Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-12-21 15:00:00

Episode 275 – High Performance podCast (2)

Today we are going deeper, higher, farther and better than we ever went before: it's HPC time at the Roaring Elephant and since we definitely do not qualify as experts on that particular matter, we are overjoyed to welcome Walter Lioen and Valeriu Codreanu who are Surf's rocket scientists, when it comes to High Performance Computing! Snellius Top500 & Green500 Since this episode was recorded, the new TOP500 and Green500 have been announced and the newly launched Dutch supercomputer Snellius scored a respectable 313st and 403rd place on the top500 and an absolutely incredible 6th spot in the Green500! Huge congratulations to Walter and Valeriu and the entire team at Surf for this outstanding result! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-12-14 15:00:00

Episode 274 – High Performance podCast

Today we are going deeper, higher, farther and better than we ever went before: it's HPC time at the Roaring Elephant and since we definitely do not qualify as experts on that particular matter, we are overjoyed to welcome Walter Lioen and Valeriu Codreanu who are Surf's rocket scientists, when it comes to High Performance Computing! As it turned out, we had way too much contect for a single episode so look forward to part 2, next week. Snellius Top500 & Green500 Since this episode was recorded, the new TOP500 and Green500 have been announced and the newly launched Dutch supercomputer Snellius scored a respectable 313st and 403rd place on the top500 and an absolutely incredible 6th spot in the Green500! Huge congratulations to Walter and Valeriu and the entire team at Surf for this outstanding result! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-12-07 15:00:00

Episode 273 – John Mertic’s Linux Foundation Update

Another year of The Roaring Elephant Podcast has passed and as has become a bit of a tradition, we invite the incredible source of energy that is John Mertic over to talk to us about all the great stuff The Linux Foundation in doing. Every year, again and again, John succeeds in surprising us with all the different initiatives that are under the stewardship of the Linux foundation, and this year was definitely not an exception! Here are the different initiatives John covered today: ASWF: The Academy Software Foundation The Academy Software Foundation (ASWF) was founded in August 2018 by the Academy of Motion Picture Arts & Sciences (AMPAS) as a result of a two-year survey by the Science and Technology Council into the use of Open Source Software (OSS) across the motion picture industry. The survey found that almost 84% of the industry uses open source software, particularly for animation and visual effects, but challenges including siloed development, managing multiple versions of OSS libraries (versionitis) and varying governance and licensing models need to be addressed in order to ensure a healthy open source community. The mission of the ASWF is to increase the quality and quantity of open source contributions by developing a governance model, legal framework and community infrastructure that lowers the barrier to entry for developing and using open source software. LFEnergy LF Energy is an open source foundation focused on the power systems sector, hosted within The Linux Foundation. LF Energy provides a neutral, collaborative community to build the shared digital investments that will transform the world’s relationship to energy. LF Energy brings together stakeholders to solve the complex, interconnected problems associated with the decarbonization of energy by using resilient, secure and flexible open source software. The digitalization of power systems enables the abstraction of the world’s largest machine into composable software defined infrastructure. Digitalization also means that operators can “network electrons” by orchestrating the metadata about an electron in ways never before possible. Digitalization facilitates a radically energy-efficient future. When every electron counts, renewable and distributed energy provides humanity with the tools to address climate change by decarbonizing the grid, powering the transition to e-mobility, and supporting the urbanization of world populations. O3DF: The Open 3D Foundation The O3D Foundation (O3DF) is the umbrella open source organization that houses the O3DE project. The O3DF can be thought of as a sister foundation to CNCF or OpenJS, which serve as umbrella organizations under the Linux Foundation to house multiple open source projects and pool resources amongst the member community to sustain these projects. O3DE is the first project of the O3DF, but it is expected for the O3DF to house multiple projects in the future. And some more Linux Foundation Goodness: The LXF PlatformThe Open Mainframe Project Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-11-30 15:00:00

Episode 272 – Is HPE Saying “NO” to Public Cloud?

In a reaction to a "Cloud First" initiative form the UK government, HPE has released a manifesto claiming his is a bad thing. Now, the government plans are already a couple of years old, so why is HPE speaking out against it only now? Well, that only one of the many questions we have! Here is a link to to the HPE Manifesto. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-11-23 15:00:00

Episode 271 – Faster, Safer & Cheaper Cloud Infra? (2)

Uni kernels, Nano Kernels... It was all new to your co-hosts but this tech has been around for a while now and promises faster, safer & cheaper Cloud Infrastructure. Is this just hype or is it everything you could possibly hope for? We invited Ian Eyberg, CEO of nanovms and subject matter expert, to tell us all about it. Get more information from the following websites that were mentioned during the interview with Ian: NanoVMsNanos.orgOPS - Easily Build and Run Unikernels Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-11-16 15:00:00

Episode 270 – Faster, Safer & Cheaper Cloud Infra?

Uni kernels, Nano Kernels... It was all new to your co-hosts but this tech has been around for a while now and promises faster, safer & cheaper Cloud Infrastructure. Is this just hype or is it everything you could possibly hope for? We invited Ian Eyberg, CEO of nanovms and subject matter expert, to tell us all about it. Get more information from the following websites that were mentioned during the interview with Ian: NanoVMsNanos.orgOPS - Easily Build and Run Unikernels Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-11-09 15:00:00

Episode 269 – OSS Data Tech that CAN!

We found some statistics around Open source data sources and we're very keen to find that OSS Data Tech that can solve all your needs, today! Or we'll find out one more time that statistics are less trustworthy than we should like... For this episode, we got some inspiration from this (not so good) article on Datanami. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-11-02 15:00:00

Episode 268 – Coding Like a Pro! (2)

Jort Rodenburg, author of Code Like a Pro in C# (Published by Manning) joins your cohost in this second episode where we discuss good software development practices. O, and yea, he wrote a book! :D Be sure to check out "Code Like a Pro in C#" by Jort Rodenburg! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-10-26 14:00:00

Episode 267 – Coding Like a Pro!

Jort Rodenburg, author of Code Like a Pro in C# (Published by Manning) joins your cohost in this and the next episode to discuss good software development practices. O, and yea, he wrote a book! :D Be sure to check out "Code Like a Pro in C#" by Jort Rodenburg! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-10-19 14:00:00

Episode 266 – Best Enterprise Communication Choices

Email as a fundamental way of communication in your enterprise may be much maligned and despised, however, ditching email altogether, as some Internet sources would like you to do is definitely not the best path forwards. At least we don't think so. A better way, possibly even the best way forward, is to use tools for the purpose they are built. Tools that are a good fit for real time communications, are not fit for other, slower communication channels. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-10-12 14:00:00

Episode 265 – Securing the Open Source Ecosystem (2)

Join David A. Wheeler, Director of Open Source Supply Chain Security at the Linux Foundation and us for a discussion about securing the Open Source Ecosystem. The Linux Foundation has had a couple of initiatives around this and they now all culminate in the recently minted Open SSF. Find out more about the Open SSF foundation and other things David talked about using the following links: OpenSSF Security Metrics Project: https://metrics.openssf.org/search?q=kubernetesBest practices Badge: https://bestpractices.coreinfrastructure.org/Best practices Badge "passing" criteria: https://bestpractices.coreinfrastructure.org/criteria/0Best Practices Badge Statistics: https://bestpractices.coreinfrastructure.org/project_statsOpenSSF edX Courses: https://openssf.org/edx-courses/OpenSSF Security Scorecards: https://github.com/ossf/scorecard Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-10-05 14:00:00

Episode 264 – Securing the Open Source Ecosystem

Join David A. Wheeler, Director of Open Source Supply Chain Security at the Linux Foundation and us for a discussion about securing the Open Source Ecosystem. The Linux Foundation has had a couple of initiatives around this and they now all culminate in the recently minted Open SSF. Find out more about the Open SSF foundation and other things David talked about using the following links: OpenSSF Security Metrics Project: https://metrics.openssf.org/search?q=kubernetesBest practices Badge: https://bestpractices.coreinfrastructure.org/Best practices Badge "passing" criteria: https://bestpractices.coreinfrastructure.org/criteria/0Best Practices Badge Statistics: https://bestpractices.coreinfrastructure.org/project_statsOpenSSF edX Courses: https://openssf.org/edx-courses/OpenSSF Security Scorecards: https://github.com/ossf/scorecard Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-09-28 14:00:00

Episode 263 – Who OWNS Open Source

With Open Source quickly becoming the backbone of most software, both commercial and free, the question about who actually OWNS open source gets asked louder and louder.... Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-09-21 14:00:00

Episode 262 – OpenDS4All: The Road to Data Science (Part 2/2)

Taking your first steps on the road to Data Science is definitely a confusing and uncertain endeavor. With the amount of courses and other information available, it becomes extremely hard to find a straight road towards a result that is actually valuable in your journey towards becoming a professional Data Scientist. Under the umbrella of the Linux Foundation, the recently formed OpenDS4All project is trying hard to solve this problem for you. Join us in this first part of our interview with André de Waal to find out how you can benefit from this! Many thanks to André de Waal for joining us and don't forget to catch our next episode where we will conclude this interview. More information on DS4All can be found on their website. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-09-14 14:00:00

Episode 261 – OpenDS4All: The Road to Data Science

Taking your first steps on the road to Data Science is definitely a confusing and uncertain endeavor. With the amount of courses and other information available, it becomes extremely hard to find a straight road towards a result that is actually valuable in your journey towards becoming a professional Data Scientist. Under the umbrella of the Linux Foundation, the recently formed OpenDS4All project is trying hard to solve this problem for you. Join us in this first part of our interview with André de Waal to find out how you can benefit from this! Many thanks to André de Waal for joining us and don't forget to catch our next episode where we will conclude this interview. More information on DS4All can be found on their website. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-09-07 14:00:00

Episode 260 – Right & Wrong Ways to Cloud Adoption

Moving to cloud can be done in many different ways, but not all paths will be as forgiving. Unfortunately, Some internet sources give really bad advise, at least we think so. Here is the Roaring Elephant to set the record straight and offer some do's and don'ts! We typically list the internet sources we used for inspiration here, but considering we really do not agree with he source in this case, we really do not want to aid in spreading the misinformation... Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-08-31 14:00:00

Episode 259 – The Perfect Team (3)

Assembling a perfect team turns out to be closer to alchemy than science! During our research for the Corporate Culture series, we came across an article describing some research that was done by Google on the very subject. We really liked the article but feel it fell short on a couple of points, so here is us talking about our take on this subject. For this episode, we took some inspiration from the following article: What Google Learned From Its Quest to Build the Perfect Team – The New York Times Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-08-24 14:00:00

Episode 258 – Saying ‘NO’ to Kubernetes?

Voluntarily or not, everything falls victim to the Hype Cycle and Kubernetes is no exception. Should you say 'NO' to Kubernetes? Well, at least one company that seems to be a poster child for the technology published a blog explaining why they are not using it... and the internet exploded.... Fr this episode, we found inspiration in the following internet resources: No, we don’t use Kubernetes | Ably Blog: Data in MotionApplication reveals which congressmen are distracted on their phones (newsbeezer.com) Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-08-17 14:00:00

Episode 257 – The Perfect Team (2)

Assembling a perfect team turns out to be closer to alchemy than science! During our research for the Corporate Culture series, we came across an article describing some research that was done by Google on the very subject. We really liked the article but feel it fell short on a couple of points, so here is us talking about our take on this subject. For this episode, we took some inspiration from the following article: What Google Learned From Its Quest to Build the Perfect Team - The New York Times Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-08-10 14:00:00

Episode 256 – Software Update Hell

Keeping software up to date is not always easy and even the biggest out there are not immune to "Software Update Hell"! For this episode, we got inspiration from the following online resources: Migrating Facebook to MySQL 8.0 - Facebook Engineering (fb.com)Bleak Cyborg Future from Brain-Computer Interfaces if We're Not Careful - AIP Publishing LLC Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-08-03 14:00:00

Episode 255 – The Perfect Team

Assembling a perfect team turns out to be closer to alchemy than science! During our research for the Corporate Culture series, we came across an article describing some research that was done by Google on the very subject. We really liked the article but feel it fell short on a couple of points, so here is us talking about our take on this subject. For this episode, we took some inspiration from the following article: What Google Learned From Its Quest to Build the Perfect Team - The New York Times Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-07-27 14:00:00

Episode 254 – The End of JEDI

This episode was inspired by the cancelation of the much talked about 10 billion dollar Pentagon JEDI cloud project. Do enormous single vendor contracts make sense? Is 10 years really a realistic timeline to limit the technological advances to a single provider? Well, the Pentagon ultimately decided against it (or did they?). Should you? Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-07-20 14:00:00

Episode 253 – Corporate Culture Impact (5)

In this continuation of last episode, we discuss the remaining 4 items on the "Corporate Culture to-do list for the dream company". We're slowly closing off this series on the subject with one more, maybe two more episodes coming up. Do let us know how you are enjoying this series? In this episode, we took inspiration from this wonderful article on the Harvard Business Review: https://hbr.org/2013/05/creating-the-best-workplace-on-earth Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-07-13 14:00:00

Episode 252 – Open Source Snowflake

The SaaS technology company Snowflake has been in the news repeatedly, taking a firm stance on their views on Open Source and it is rather puzzling why they choose to do this... In this episode, the following articles were referenced: https://techexec.com.au/?p=3892https://finance.yahoo.com/news/c3-ai-ai-partners-snowflake-140702850.htmlhttps://gcn.com/articles/2021/05/18/open-source-serverless.aspx Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-07-06 14:00:00

Episode 251 – Corporate Culture Impact (4)

After focusing on what can go wring when it comes to Corporate Culture and what the impact can be, we're switching gears now and look at the alternative. During our research for this series we fortunately also came across content that covered how it would be possible to build a corporate culture that is supportive and nurturing and in this first episode, we're looking at what kind of impact this can have on the organization and the people. In this episode, we took inspiration from this wonderful article on the Harvard Business Review: https://hbr.org/2013/05/creating-the-best-workplace-on-earth Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-06-29 14:00:00

Episode 250 – Emerging Technology Adoption

In the technology space especially, but in general as well, organizations need to "keep up" or become obsolete. But how fast should emerging technology be adopted, if at all? The following articles were referenced n this episode: https://www.helpnetsecurity.com/2021/05/24/top-emerging-technologies/https://www.bbc.co.uk/news/world-57394831 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-06-22 14:00:00

Episode 249 – Corporate Culture Impact (3)

Continuing our journey on everything about Company Culture, this episode focusses on what actually feeds a Toxic Culture and how one could potentially avoid going down that unfortunate path. Make sure to listen to the two earlier episodes on this topic if you want to get the full picture! Here are the links to the web resources we touched on this episode: Toxic workplace culture pissing off employees and pushing away customers?7 Signs of a Toxic Workplace Culture Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-06-15 14:00:00

Episode 248 – Microsoft Data sovereignty

Microsoft just release a blog post where the "guarantee" your data will no move out of the EU. Sounds great, but what are they really after... Why can't we all just get along? Data and nation borders are not the best of bed fellows and Microsoft is coming to the rescue with a firm guarantee that data that happens in the EU, stays in the EU! Or do they... Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-06-08 14:00:00

Episode 247 – Corporate Culture Impact (2)

In this second episode on Company Culture, we move away from the specific case of Basecamp, though we'll still use it as an example. Specifically their new stance on how a company should distance itself from politics and social issues. Is that really a good idea? Is it even possible? Join us in this episode where we dive deeper into that aspect of Company Culture. Here are links to the web resources that we referenced during this episode: Changes at BasecampAmazon extends ban on police use of its facial recognition software ‘until further notice’Thousands of Google employees asked CEO Sundar Pichai to stop providing AI tech for the US military’s dronesMilitary work is a lightning rod in Silicon Valley, but Microsoft will sell the Pentagon all the AI it needs Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-06-01 14:00:00

Episode 246 – Old Dogs, New Clouds?

In this episode, we look at how old tech is trying to retake their glory days by embracing Cloud and Apparently, just plain automation is no longer enough... Old Dog, New Tricks? Cloud continues to make live harder and harder for the traditional big tech companies. Is embracing cloud a way out for them, or do they just have their heads in the clouds.... Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-05-25 14:00:00

Episode 245 – Corporate Culture Impact (1)

That Company Culture is a huge part of why someone chooses to work for a certain company (or not) should come as no surprise to anyone really. However as recent news has demonstrated, there are still many ways to stumble by misjudging the consequences of certain decisions made by company execs. We'll be covering this subject over multiple episodes and in this first episode we discuss not only why Basecamp made the news, but also why this news inspired us to do this episode series. Here are links to the web content we referenced during the episode: The story of Basecamp’s disastrous policyhttps://www.theverge.com/2021/5/4/22419512/basecamp-political-speech-policy-falloutChanges at Basecamphttps://world.hey.com/jason/changes-at-basecamp-7f32afc5Basecamp Just Broke the Mold to Grow Better, and Fasterhttps://www.realclearmarkets.com/articles/2021/05/07/basecamp_just_broke_the_mold_to_grow_better_and_faster_776155.html Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-05-18 14:00:00

Episode 244 – Holy Tracking Bat-Signal!

A couple of weeks ago, Signal pulled of a astonishing feat: not only did it do a remarkable job on getting the word out that the big ad companies are tracking you, they actually got Facebook to do most of the talking for them! When advertising goes Viral It feels like sheer genius: Signal tells the truth about tracking and gets Facebook to help spread the word. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-05-11 14:00:00

Episode 243 – Open Source Safety

Continuing on last weeks Open Source Roundup, this time we focus on concept of security and safety with Open Source projects. This weeks inspiration came from these articles: Is Open Source More Secure Than Closed Source?https://devops.com/is-open-source-more-secure-than-closed-source/Open-source, cloud-native projects: 5 key questions to assess riskhttps://techbeacon.com/app-dev-testing/open-source-cloud-native-projects-5-key-questions-assess-risk7 Reasons to Get Serious About Your Open-Source Strategyhttps://builtin.com/software-engineering-perspectives/open-source-program-office Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-05-04 14:00:00

Episode 242 – Open Source Roundup

Today's episode features a roundup of Open source related topics, more specifically about the importance of communities. Inspiration was taken from these online articles: What's the point of open source without contributors? Turns out, there are severalhttps://www.techrepublic.com/article/whats-the-point-of-open-source-without-contributors-turns-out-there-are-several/Apache Software Foundation retires slew of Hadoop-related projectshttps://www.zdnet.com/article/apache-software-foundation-retires-slew-of-hadoop-related-projects/New Linux Foundation project takes blockchain and the open source approach to the insurance industryhttps://www.techrepublic.com/article/new-linux-foundation-project-takes-blockchain-and-the-open-source-approach-to-the-insurance-industry/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-04-27 14:00:00

Episode 241 – Oracle: 0 – Free API: 1

After a decade of court battles, Oracle finally lost their case against Google about copying the Java API for use in Android. The fact that Java has not been part of Android since version 7 (Nougat) did not mean Oracle would stop the litigation. O, no... Oracle wanted their 9 billion dollar claim and they were prepared to go to the supreme court for that... Today's Articles https://www.androidpolice.com/2021/04/05/supreme-court-gives-google-the-final-win-versus-oracle/ https://cloud.google.com/blog/topics/developers-practitioners/back-popular-demand-google-cloud-products-4-words-or-less-2021-edition https://lists.apache.org/x/thread.html/rab2a820507f7c846e54a847398ab20f47698ec5bce0c8e182bfe51ba%40%3Cdev.mesos.apache.org%3E Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-04-20 14:00:00

Episode 240 – Yahoo Answers… R.I.P.?

Everybody's favorite agony aunt is being put out to pasture and we felt we cold not let this occasion go unnoticed. Instead we decided to regale our audience with some of our favorite picks of the cornucopia that is Yahoo Answers! And yes, if you had totally forgotten it still existed, so did we... Our Fabulous Picks: The Good: A question for programmers. Is it too late to start programming at age 35, but to get a job not just for fun. ?https://answers.yahoo.com/question/index?qid=20210406090153AAv17hOWhat is big data about?https://answers.yahoo.com/question/index;_ylt=AwrC1ClLGHBgvlIAMhdPmolQ;_ylu=Y29sbwNiZjEEcG9zAzEEdnRpZAMEc2VjA3Ny?qid=20130528083829AANsKIH The Bad: What are some ways to print your emails.?https://answers.yahoo.com/question/index?qid=20210406211203AAT7E0tHow i can build an artificial intelligence and what programming language I must use?https://answers.yahoo.com/question/index;_ylt=AwrC0CYDGnBgawoAqBRPmolQ;_ylu=Y29sbwNiZjEEcG9zAzEEdnRpZAMEc2VjA3Ny?qid=20141220041546AAVsLRJ The Ugly:Can machines and computer learn?https://answers.yahoo.com/question/index;_ylt=AwrC1zF3GHBg_xYASglPmolQ;_ylu=Y29sbwNiZjEEcG9zAzUEdnRpZAMEc2VjA3Ny?qid=20101031054037AAXrV4j Is it ideal to keep your data totally off your computer? ?https://answers.yahoo.com/question/index?qid=20210405001740AAilRYs Bonus RoundWhy don't computers run on coal?https://answers.yahoo.com/question/index?qid=20210330060136AA2MQf9 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-04-13 14:00:00

Episode 239 – Living on the Edge!

While discussing an article on Edge Computing, we discovered we have pretty diverging ideas on the subject an thus, a Roaring Elephant episode was borne! General Edge computing concepts, where it came from and where it could possibly end up in the future all come up in this episode. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-04-06 14:00:00

Episode 238 – Stop Tech Fails

Big technology projects have the reputation of failing often and hard. For this episode, we took inspiration from the "Fishbone of Problems" put together by Stephen J. Andriole to discuss some potential pitfalls. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-03-30 14:00:00

Episode 237 – Open Source @ Google

As you might expect, Open Source @ Google works just a little different. Thanks to their scale, they are able to push the "monorepo" paradigm to the extreme where they compile all the critical open source projects from their own repository. In this episode we try to answer the question if everybody should adopt Google approach or not. And unsurprisingly, the answer is the ubiquitous "it depends".... The topic for this episode was the result of reading a number of online articles which we references during the discussion: How Google manages open sourceOpen Source PatchingOpen source: Google wants new rules for developers working on 'critical' projects'We're finding bugs way faster than we can fix them': Google sponsors 2 full-time devs to improve Linux security Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-03-23 15:00:00

Episode 236 – Enterprise Architect Musings…

We recently came across some content on the do's and dont's, good and bad, ups and downs and lefts and rights when it comes to the much discusses, often misunderstood but definitely vital role of the Enterprise Architect. While we were discussing it, we decided to switch on the microphones and record our musings on the matter. Now please do not expect any kind of conclusion at the end of this episode: it's not that kind of episode, at all! It's much more just two tech veterans reminiscing and grumbling over past experiences.... Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-03-16 15:00:00

Episode 235 – The CentOS [R]Evolution

Recently there has been some outcry from the community about Red Hat's plans with their free CentOS distribution. Moving to the newly proposed "CentOS Stream" style of making the distro available does seem to make sense for Red Hat, but how does it affect the rest of the world? CentOS Stream Timeline March 31, 2016RedHat launches developer program with “free” subscriptions availableSeptember 24, 2019Transforming the development experience within CentOSDecember 8, 2020CentOS Stream: Building an innovative future for enterprise LinuxCentOS Project shifts focus to CentOS StreamJanuary 20th 2021New Year, new Red Hat Enterprise Linux programs: Easier ways to access RHELFebruary 25, 2021Extending no-cost Red Hat Enterprise Linux to open source organizations Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-03-09 15:00:00

Episode 234 – The Kubernetes “EASY” Button

With the release of GKE Autopilot by Google, there is now an "easy" button for Kubernetes deployments everywhere. But is an easy button really what the world needs, especially for something as flexible, transformative and yes, definitely complex as Kubernetes? Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-03-02 15:00:00

Episode 233 – Combating Global Misinformation & No More Cookies from GitHub

In this weeks episode, we give our take on how reliable the information on the internet really is. Also, is GitHub leading the way towards a cookie-less future? And if so, what will the consequences be? Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-02-23 15:00:00

Episode 232 – Graph Data Science? Slack Outage Postmortem.

In this weeks episode we discuss NEO4J's take on Graph Data Science and take a look at the postmortem that Slack released, following their world-wide outage a month ago. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-02-16 15:00:00

Episode 231 – Remote Working with Rodolphe @Remotive.io (3/3)

About a year ago, we had the lucky coincidence to have a discussion with Rodolphe Dutel on the subject of working remotely. Twelve months later, the world is a much different place and working from home has become the norm, especially in the more technical workplaces, we touch base with Rodolphe and get his insights and experiences. In this final episode on the topic, for this year at least, we finish off looking at how we feel, think, hope and fear the situation will stabilize once the troubles are behind us. If you are interesting in working remotely, check out the Remotive.io job board! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-02-09 15:30:00

Episode 230 – Remote Working with Rodolphe @Remotive.io (2/3)

About a year ago, we had the lucky coincidence to have a discussion with Rodolphe Dutel on the subject of working remotely. Twelve months later, the world is a much different place and working from home has become the norm, especially in the more technical workplaces, we touch base with Rodolphe and get his insights and experiences. This second part focuses more on how we will transition from the current extreme situation into a more long term, sustainable organization. If you are interesting in working remotely, check out the Remotive.io job board! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-02-02 15:30:00

Episode 229 – Remote Working with Rodolphe @Remotive.io (1/3)

About a year ago, we had the lucky coincidence to have a discussion with Rodolphe Dutel on the subject of working remotely. Twelve months later, the world is a much different place and working from home has become the norm, especially in the more technical workplaces, we touch base with Rodolphe and get his insights and experiences. In this first part of our interview with Rodolphe, we look at the current "working from home" situation. In upcoming episodes. we'll try to predict how the "working remotely"situation will evolve in the future. If you are interesting in working remotely, check out the Remotive.io job board! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-01-26 15:00:00

Episode 228 – The State of Microservices 2020

With the advent of architectures based on Microservices, the days of the monolith really do seem to be numbered. Recently, the good people at The Software House released a report on the State of Microservices 2020 and when reading it, we decided we wanted to dig into our personal experiences and offer our view on the data. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-01-19 15:00:00

Episode 227 – 2021 Predictions for Open Source

We found someone brave enough to do some rather bold (and some not so bold) predictions regarding the future of Linux. Obviously we disagree but since we're not brave enough to make our own predictions, we're just going to dissect Jack Wallen's hard work instead! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-01-12 15:00:00

Episode 226 – Let’s talk about 2030’s Internet (2/2)

We're continuing our crustal ball peering, trying to divine what the internet will be like in ten years time. Where we looked more at the geopolitical factors last time, we're looking at the technology angle in this episode! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2021-01-05 15:00:00

Episode 225 – Let’s talk about 2030’s Internet

With the disaster that was 2020 coming to an end, we think we can say that the global Internet has been a savior to the economy and, to be frank, our sanity. We decided to do some future gazing today and see if we can predict how the Internet will evolve in the next decade... Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-12-29 15:00:00

Episode 224 – GitHub’s State of the Octoverse

The Github "State of the Octoverse" for 2020 is out and we're dedicating this episode to dissecting all the wonderful information presented here. https://youtu.be/3yigp4E2zug Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-12-22 15:00:00

Episode 223 – In-Memory Computing with John DesJardins

This week we're visited by John Desjardins who regaled us with his knowledge of in-memory computing. https://youtu.be/LHoO2XAQXqM Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-12-15 15:00:00

Episode 222 – 10 years of OpenStack: Success or Failure?

Did a decade of OpenStack deliver the cloud environment everybody flocks too? Not really.... Fortunately, Matt Turck saves the episode, if not the year, with his 2020 Data and AI landscape poster! https://youtu.be/WtYcGq2b-00 OpenStack, the little tech that couldn't? Ten years ago, the new cloud craze called OpenStack hit the scene and they did so with a lot of noise. A decade later, what is left of this once darling of the hypervisor orchestration layer component technology? And yes, that convoluted phrase was written especially as a pun on the incredibly complex gaggle of projects that formed the "Open Stack" and just may have been it's downfall... Pretty visuals are pretty! Matt Turck is at it again and we are very grateful for this yearly graphic representation of the Data and AI landscape. We try to give his hard work a little bit of attention every year and here is the 2020 edition. Thanks Matt! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-12-08 15:00:00

Episode 221 – Byte Sized Python with Michael Kennedy

Today we are joined by Michael Kennedy who is an expert on Python, the little scripting language that is helping solve the worlds biggest problems today. There seems to be no stopping it and it has become the go-to language for anything form devops glue scripts to data scientists' machine learning jobs and everything in between. So how does one get started with Python and what are the tips & tricks of the trade. https://youtu.be/8uxSVk8-TK0 Find more Michael Kennedy goodness at: Michael Kennedy on Technology:Cutting-Edge Developer and Technology Writin A podcast on Python and related technologies Python Bytes is a weekly podcast hosted by Michael Kennedy and Brian Okken. Python Bytes podcast delivers headlines directly to your earbuds. If you want to stay up on the Python developer news but don't have time to scour reddit, twitter, and other news sources, just subscribe and you'll get the best picks delivered weekly.Have feedback for the show? Send it to contact_pythonbytes@talkpython.fm. We'd love to hear from you. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-12-01 15:00:00

Episode 220 – Year 5: Definitely a two-handed podcast now! (Ft John Mertic.)

We have decided long ago that we do not want to work on our birthday, so we were very happy indeed to welcome John Mertic back to the podcast. John talks extensively about all the things the Linux Foundation has been up to and Dave and Jhon simply lay back and enjoy the show. :) https://youtu.be/4InDVh7WlGM For more information about the different projects John talked about, please go to the following web sites With ODPi, building products and services and ensuring the right data is understood, shared and protected is simple. We’re developing the only vendor neutral, open source standards to enable best practices for data governance, connectivity, business intelligence, and analytics. ODPi helps… ODPI The LF AI & Data Foundation supports open source projects within artificial intelligence, machine learning, deep learning and the data space. You can think of us as a greenhouse growing and sustaining open source AI, ML, DL and Data projects from seed to fruition. LF AI & Data Open source software has become pervasive. Because of its development process, the OSS that ultimately reaches end users has a chain of contributors and dependencies. It is important that those responsible for their user or organization’s security are able to understand and verify the security of this dependency chain. OpenSSF Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-11-24 15:00:00

Episode 219 – Roaring News

It's been a while but News is back! We're taking a long, deep look at how we see the tech industry evolve through this pandemic lock-down environment. And we finish the episode off with Dave's contribution about some Industrial Design... https://youtu.be/37mlzYC-cSI Predicting the Future is not easy... Using this article as a reason to discuss a topic, rather than the article itself, we offer up our views on how the current lock down situation will likely evolve over time. While we definitely do not have a working crystal ball, we do feel we have a decent idea to do a little bit of future gazing.... Everybody thinks they have good taste, that really cannot be true.... Dave certainly missed his calling: interior decorating and industrial design appear to be his real strength! And obviously, Jhon really does not agree with Dave's assessment! (You really did not expect anything different now, did you?) Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-11-17 15:00:33

Episode 218 – Talking “Effective ML” with Keith McCormick (2/2)

With Machine Learning and AI being on everybody's minds and lips these days, we invited Keith McCormick to joins us and discuss the do's and don'ts of leveraging the undeniable power of ML in your organization. In last week's first part, we covered the technology part and today we have the human side of the story for you! https://youtu.be/B5GZa_N3Uz0 Find out more about Keith on his blog where you can find links to his numerous courses on LinkedIn Learning. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-11-10 15:00:00

Episode 217 – Talking “Effective ML” with Keith McCormick (1/2)

With Machine Learning and AI being on everybody's minds and lips these days, we invited Keith McCormick to joins us and discuss the do's and don'ts of leveraging the undeniable power of ML in your organization. In this first part, we cover the technology part and we'll have the human side of the story for you next week! https://youtu.be/b-zofn0IiN0 Find out more about Keith on his blog where you can find links to his numerous courses on LinkedIn Learning. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-11-03 15:00:00

Episode 216 – CI/CD with Chris Short (2/2)

We're joined by Chris Short from Red Hat to discuss the ins and outs of Continuous Integration and Continuous Deployment, often abbreviated to CI/CD. With years of experience under his belt, we let loose with every and all questions we could think of! https://youtu.be/gvCNHeSnglY Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-10-27 15:00:00

Episode 215 – Roaring News

This news episode is pretty much dominated by IBM. The company has gone through so many divestments and reorganizations recently, the acquisition of Red Hat only one of the noteworthy events, that the sudden news about IBM tearing their IT infrastructure support organization away form the mothership was enough to have your co-hosts busy for most of this episode. We did find some time however to discuss the new Kubernetes GUI's: Doom and Minecraft! https://youtu.be/PtIoeS4qxEM Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-10-20 14:00:00

Episode 214 – CI/CD with Chris Short (1/2)

We're joined by Chris Short from Red Hat to discuss the ins and outs of Continuous Integration and Continuous Deployment, often abbreviated to CI/CD. With years of experience under his belt, we let loose with every and all questions we could think of! https://youtu.be/3dGESuSggW0 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-10-13 14:00:00

Episode 213 – Roaring News

A hard task it was, but we actually found some technology news to discuss! I really feels like the world has more or less come to a bit of a standstill, the world holding it's breath while we wait for the current troubles to go away. But we prevailed and have 30 minutes of discussion on if you should build or buy your tech, if Kubernetes really is already last years news in favor of Serverless Compute and we end with a quick look at a nice visualization of "Data Fallacies". https://youtu.be/yxUoOTQNnbU Try or Buy? Buying your techn9ology "off the shelf" will likely give you a faster route to deployment. But if you deploy the same tech as your competition, how can you still offer differentiation to your discerning customers? Kubernetes is yesterday's tech already? After spending quite some time discussion containers and Kubernetes in the last few episodes, it would seem that it is already bypassed, at least in promise, by "serverless compute". We discuss our take on the hype vs reality of this "next level" of computing. Data Fallacies - Visualisation Happened upon this reading my twitter feed today and felt we should share it with our little part of the world. As a data scientist, these should not come as a surprise, but they may be useful in explaining to the project leader why a certain batch of data should be discarded. :) Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-10-06 14:00:00

Episode 212 – Intro to Containers and Kubernetes (6/6)

The end is in sight! After a heroic voyage across the plains of containers, we reach the mountain called Doom Kubernetes! Join us for this last foray in this epic saga where we look at Kubernetes, some of its alternatives and offer our opinion on the matter of adopting Kubernetes and the micro services architectures, or not... https://youtu.be/j8JmZIkS1n4 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-09-29 14:00:00

Episode 211 – Intro to Containers and Kubernetes (5/6)

Almost at the end of our journey, deep into the realms of containers and Kubernetes, we finally reach the destination that is Kubernetes. After this, we'll round of this series of episodes with out closing summary next week. https://youtu.be/Wz8qKFmIUXQ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-09-22 14:00:00

Episode 210 – Intro to Containers and Kubernetes (4/6)

Returning to our series on containers and Kubernetes, we discuss the concept of Micro Services in this episode. Weaving our way through software best practices and the advent of the Big Web 2.0 companies, we end up in an ever more complex world of Micro Architectures. https://youtu.be/sPCh-ogNlHk Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-09-15 14:00:00

Episode 209 – Roaring News

Taking a brief hiatus from our series on containers and micro-service architectures, we have a news episode for you! We discuss whether or not it is smart to move back to on premise, once you have moved to the cloud, take a closer look at the state of A.I. ion law enforcement today and highlight a new project from Netflix for monitoring your streaming environment. https://youtu.be/MmyLhVLQOMg Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-09-08 14:00:00

Episode 208 – Intro to Containers and Kubernetes (3/6)

And this is already installment 3 in the ongoing saga on Containers and Kubernetes, and this time we finally end up actually talking about containers! We promised we'd get there, and here it is. But this is still far form the end of the story: in the next couple of episodes, we'll discuss orchestration and micro services architectures, again covering the good and bad and throw in a bit of history, courtesy of Dave! https://youtu.be/F_-zmhHROwA Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-09-01 14:00:00

Episode 207 – Intro to Containers and Kubernetes (2/6)

We started this saga on Kubernetes and containers last week during KubCon and we're picking up where we left off then, continuing to dig deeper in the subject of Virtualization, jails and containers. We still have much more to talk about, so catch us next week for the continuation of this epic story! https://youtu.be/4_hHgYqzBKw In this part, we move on from Virtual Machines and into a deep discussion of Linux Jails and LXC containers, the precursors to today's container concepts. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-08-25 14:00:00

Episode 206 – Intro to Containers and Kubernetes (1/6)

It's the week of KubeCon and while everybody is doing there best to present the greatest and latest of everything surrounding this micro services architecture approach, we're going back to basics with an intro to Containers & Kubernetes, explaining what this stuff is, where it came from and why people feel it is a good idea. With great knowledge comes great power, use it wisely! When we were recording this, we found that there was really a lot to talk about, so we will be running this subject over multiple episodes over the next month or so. https://youtu.be/fUcrSVl6j2k In this part, we actually talk about Virtual Machines because many people know what they are and the recent surge of containers everywhere, is largely a attempt at solving the problems that are inherent to virtualization. For that reason it makes sense to discuss the good and the bad of virtual machines first. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-08-18 14:00:00

Episode 205 – KubeCon Europe 2020 Preview (part 2)

In a couple of weeks we expect to virtually meet a lot of you at Kubecon/CloudNativeCon Europe. As is quickly becoming the new normal, this conference also shifted to a virtual, on-line only event, but that may actually be a blessing! With everything being online (and we expect mostly prerecorded) it should be a lot easier to catch parallel sessions! In this episode, we go over the schedule as it exists at the time of recording and discuss our top pics for every time slot. https://youtu.be/Wb3WFiUiL50 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-08-11 14:00:00

Episode 204 – KubeCon Europe 2020 Preview (part 1)

In a couple of weeks we expect to virtually meet a lot of you at Kubecon/CloudNativeCon Europe. As is quickly becoming the new normal, this conference also shifted to a virtual, on-line only event, but that may actually be a blessing! With everything being online (and we expect mostly prerecorded) it should be a lot easier to catch parallel sessions! In this episode, we go over the schedule as it exists at the time of recording and discuss our top pics for every time slot. https://youtu.be/Uedeo3tQz-Y Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-08-04 14:00:00

Episode 203 – Roaring News

Another fortnight, another Roaring News episode looking at an Acquisition by SUSE, some privacy shattering interviews and a PSA about unsecured databases! https://youtu.be/kJJ-IVcUReY Another startup bites the dust? We've discussed how a lot of startup's business strategy is to get notices, get popular and cash out. Is Rancher doing this now, or is this acquisition by SUSE more interesting? Do laws make things better, or worse? In another attempt of using unproven technology to solve perceived deficiencies in human, intuition based tasks, legislation conceived to protect the individual, may just be making things worse? Nah, leave it with the default password. It'll be fine! Though Dave feels this isn't worth writing about, Jhon believes that this cannot be repeated often enough: please secure your data stores! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-07-28 14:00:00

Episode 202 – Data Quality: Your place or mine?

Data Quality is a subject that comes up a lot now that more organizations are moving towards a data driven decision making system. Obviously, to get good decisions, they need to be based on good quality data and getting that QA stamp is quite a bit harder than it should be... https://www.youtube.com/watch?v=9Qk7DIcFrnA Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-07-21 14:00:00

Episode 201 – Roaring News

In this first "live action" News episode, we take a look at the state of public clouds and how their business models are often more complex that you would think. We end up with a short discussion about Fujitsu's "digital annealer" and how you only need a decent pair of binoculars to eves drop on your neighbors... https://www.youtube.com/watch?v=HzRuWrUoz1I Public Cloud as a side-job Every time we look at the state of public cloud today, we are stunned by how "coincidental" the clouds are for 2 out of the 3 major players where the massive public cloud is dwarfed by the organizations real business endeavors. Not really Quantum Fujitsu is the latest player to enter the Quantum playing field with an offering that is decidedly NOT Quantum... If you thought it was bad that the walls have ears... With privacy concerns on the rise, this article detailing a rather simple to execute eves-dropping technique won;t do much to make you sleep better at night... At least the lights are out then. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-07-14 14:00:00

Episode 200 – Bi-Centennial Podcast

For our 200th episode, we're taking a stroll down memory lane, looking back on how the podcast evolved over time. We discuss the content related changes we made over the years, and also talk about some of the technical choices we made. And not to spoil anything, there is definitely something strange happening in this one.... ;) https://www.youtube.com/watch?v=NAisApoDUWw Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-07-07 14:00:00

Episode 199 – Azure Data Engineering with Richard Nuckolls

This episode, we go deep into everything data related in the public cloud, specifically the Microsoft Azure cloud. Fortunately we don't have to do this on our own since we're joined by Richard Nuckolls, author of the book "Azure Data Engineering", published by Manning publications under their MEAP program. Free eBook raffle Manning Publication has been kind enough to give us a couple of download codes for a free eBook version of "Azure Data Engineering" As always, our Patreons get a first chance to get their hands on one of the codes. If you are a Roaring V.I.P. (or higher), you can head over to our Patreon Page now where you will find a posts containing all the information required. If you become a Patreon now, you immediately get access tot that post! 😉 Listen to the episode to find out how you can enter the raffle! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-06-30 14:00:00

Episode 198 – Roaring News

This week we tackle some "software is used by the military" controversy, we have a light discussion on Quantum Computing and marvel in the apparent absence of common sense when it comes to putting information on public services. As the final comment on LinkedIn says:"SkyNet is pleased with your progress." When something is open source, even though everybody hopes it will only be used for "good", these moral and ethical questions are very deep and complicated... Let's kill it before it spreads! Here's me thinking the world is trying their utmost to make quantum computing "a thing" and these people are already trying to break it! Who needs a password manager if you have GitHub? A good process and good governance of what you put on GitHub, or any other public service for that matter, are not a luxury, they're a necessity! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-06-23 14:00:00

Episode 197 – Exploring Redis with Kyle Davis Part 2

We're joined by Kyle Davis, head of Developer Advocacy at Redis Labs to discuss the ins and outs of Redis. It's the the in-memory data store that everybody probably uses, whether you know it or not and Kyle does a great job on discussing the pros and cons of deploying Redis in the many use cases it can add tremendous value. This is the final part of our interview with Kyle. Redis Microservices for Dummies Make sure to check out the free "Redis Microservices for Dummies" e-book by Kyle Davis with Loris Cro on the Redis Labs website! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-06-16 14:00:00

Episode 196 – Roaring News

Today we discuss the needs of keeping your software up to date and how A.I. was able to recreate the game of Pac-Man. We finish off with the a look at the new enterprise GPU announcement from NVidia. Who needs enemies with friends like these? And we do not mean the good folks behind HaveIBeenPwned here but rather the people that install good software and then never look at upgrading it when vulnerabilities are patched! Now the A.I. is also making them? A bit of fun looking at how an A.I. recreated the game of Pac-Man, courtesy of NVidia. We also mentioned another YouTube video demonstrating how an adversarial A.I.teaches itself: https://www.youtube.com/watch?v=9-8nXKmJMmI The more things change, the more they stay the same... It looks like we're on the next cycle of the distribute->consolidate-> distribute->consolidate merry-go-round where NVidia is going back to it's Silicon Graphics roots and decided to base their new Ampere product on the principle of combining many "simple" components in an intelligent fashion to create something that can do "more". Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-06-09 14:00:00

Episode 195 – Exploring Redis with Kyle Davis – Part 1

We're joined by Kyle Davis, head of Developer Advocacy at Redis Labs to discuss the ins and outs of Redis. It's the the in-memory data store that everybody probably uses, whether you know it or not and Kyle does a great job on discussing the pros and cons of deploying Redis in the many use cases it can add tremendous value. This is the first part of the interview with Kyle. Redis Microservices for Dummies Make sure to check out the free "Redis Microservices for Dummies" e-book by Kyle Davis with Loris Cro on the Redis Labs website! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-06-02 14:00:00

Episode 194 – Roaring News

This week we first discus the importance of good documentation using an article that actually did not follow it's own good advice. Next up is a discussion around "Big Irons" IBM and HPE having seen the cloud-first light. Documentation! Documentation! Documentation! Just use your imagination to add the sweaty overweight bald gentleman, will you? Cloud: the miracle cure! For years, we've been telling everyone willing to listen that cloud is not a "solution". It is a tool you can use for good, or for bad, but in and of itself it does not have any inherent 'miracle juice'. Big Iron organizations IBM and HPE however seem to think that cloud will save them from failing business practices... Maybe they should start listening to the Roaring Elephant... Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-05-26 14:00:00

Episode 193 – Bite-Sized Big Python

We are joined by John T. Wolohan, writer of the Large Datasets with Python book published by Manning Publications to talk about the business impact of living the Big Data life and how Python has become the programming language darling of Big data programmers and Data Scientist alike. Free eBook raffle Manning Publication has been kind enough to give us a couple of download codes for a free eBook version of “Mastering Large Datasets with Python“. As always, our Patreons get a first chance to get their hands on one of the codes. If you are a Roaring V.I.P. (or higher), you can head over to our Patreon Page now where you will find a posts containing all the information required. If you become a Patreon now, you immediately get access tot that post! 😉 Listen to the episode to find out how you can enter the raffle! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-05-19 14:00:00

Episode 192 – Roaring News

In this news inspired news episode, we talk about A.I's filing for patents, Framework SDKs causing major software to crash on mobiles and how the army is teaching the world how not to overspend! Enjoy! And so the revolution starts... Of course it's rather obvious that today, having patents filled in an A.I.'s name would not be logical, practical or simply sane. However, a couple of inventors decided to take international patent law to the test and tried to do just that. It took a while, but courts, quite rightly so, rejected the applications. It does however open up an interesting discussion on who own what whn we, mere mortals, don't really understand how an A.I. neural network comes to it's conclusions... What did I tell you about accepting candy from strangers? Using frameworks and templates are the way most software is written today but in that lies the danger of dependency on unknown and therefore unchecked sources of code. Where in theory, one would expect all software projects exerciser due diligence when it comes to these "borrowed" libraries. But in practice, it turns out that this isn't actually 100% possible to do when those upstream sources can change their code at any given time... How much regression and unit testing should a software project do? Sure, of all organizations, the army has "the magic" How easy is project management in the world of agile and scrum? And can the army, of all organizations, claim to be the one that can do it better than others? Dave sure doesn't think so? Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-05-12 14:00:00

Episode 191 – The Dawn of False Hope

Following on from episode 189 where we looked at all things pronounced "dead", we now go in search of all things that will make sure the future is bright and cheerful. Most confusingly, most things pronounced dead also seem to be the future? Along the way, we discuss the Open Source mentality and all things Machine Learning & A.I. Privacy and something called "converged security" round out the episode. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-05-05 14:00:00

Episode 190 – Roaring News

Today we talk about the dangers of relying on clod services, how technology is failing us in these items of crisis and how the new world of social distancing is shaping the future for tech conferences. Cloud/SaaS for the win! :s The world creates more and more data and simultaneously relies more and more on rather ephemeral cloud based storage and services. One of the consequences that is becoming more and more apparent is that we are no longer in control of our data and while sometimes, there is no material investment, sometimes there is and it can be significant... How can we change and adapt to cope with this brave new reality? https://www.theguardian.com/commentisfree/2020/mar/10/yahoo-deleted-all-my-emails-and-theres-nothing-i-can-do-about-it In the words of Boy George: Popularity breeds contempt! We've tried to steer clear of the pandemic that is affecting everybody's life but alas, as we turn towards technology to make the new reality work for us, it would seem that technology just isn't up to the challenge. Among the many disappointments we think it is fait to say that the poster child is the voice chat app Zoom which has shown that being more popular all of a sudden, isn't always a good thing... https://theweek.com/articles/910342/pandemic-revealing-how-bad-technology-really https://www.bbc.com/news/technology-52033217 The cure for jet-lag! Not only have a lot of live conferences been cancelled due to the isolation and social distancing that has been established, some companies are going one step further and completely re-invent themselves as online only services. A rather noteworthy, especially in the Big Data environment is O'Reilly who has decided to disband the business entity that is, or rather was, organizing the well known and respected Strata events. Now these will continue in a virtual format, but the most important question remains: where do we get out laptop stickers now? https://www.oreilly.com/conferences/from-laura-baldwin.html Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-04-28 14:00:00

Episode 189 – It’s Darkest before Dawn

Maybe it's the dark days we live in, maybe it's just us being us... But everywhere we looked, we found somebody telling us somethings as dead or dying... Speaking about technology, of course! We are a tech podcast after all! Anyway, we decided to dive deep into the gloom and doom on the internet and see what has been pronounced deceased and how we felt about that. Don't let this epoisode get you down though! The title is definitely a hint for things to come! :D We found a lot (and we do mean a lot) of content on the internet about hings "being dead" so we decided to group things into topics and we ended up with about four of them. To start with, we tackle the end or privacy, Followed by the demise of the internet and open source. next we discuss the obituary for the startup and we end with a fond farewell and remembrance of boredom: we knew it well... Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-04-21 14:00:00

Episode 188 – Roaring News

Artificial Intelligence will save us from all that ails us! The Big Cloud Providers should pay all the bills! Old stuff is old so you really have to buy news stuff now! Really! Really? When preparing for this weeks episode, we felt a real dissatisfaction with the quality of "the news" so we've selected a handful of articles and try to elevate them to a semblance of serviceability. A good Title, a good article does not make... We take our first shot at his article which does posit a valid point in the title, but then immediately devolves into a bunch of hype words and wishful thinking. In our commentary, we try to add a semblance of intelligence tot he article. Well... at least as far as we are able to, obviously. But in all seriousness, AI, ML, Cloud or whatever buzzword that is on today's page of the "learn a new word a day" calendar, will NOT save you: tools can help, but people always make the real difference! https://www.techradar.com/news/becoming-technology-led-not-just-technology-enabled A Good Idea at first sight... While we definitely understand and even agree with the sentiment behind this article, we cannot see a world where this can realistically be implemented. It should also be totally unneeded: if the move to the public cloud was done for the right reasons and in the right way, these companies would be mostly insulated against the immediate financial fallout of the current troubles by the flexibility that their cloud deployments now offer them. https://techcrunch.com/2020/03/26/cloud-relief/ The infomercial is overpowering in this one! Stating obvious truths does not make a good article and in the end it turns out it's a commercial for "big software"... As mentioned at the top of this blog, "news journalism" really is taking a nose-dive... The fact that this "article" is actually a commercial for one of the largest, if not "the" largest software company in the world only makes this even more vexing. They of all companies really should not need to stoop down to this kind of underhanded tactics! No link on this one cause we really want you to give this one a pass! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-04-14 14:00:00

Episode 187 – Talking Business Agility with Caroline of Estelia (Part 2)

Agile Development is quite popular in the world of software development, but the concepts are applicable to a much broader business environment. Very often, however, the actual implementation of these rather excellent concepts are plagued with faulty expectations and simply not knowing what you are trying to do. Find out how you can benefit, both as an organization and personally, when doing Agile "right" in this two part interview with Caroline from Estelia. Estelia Find out more about Agile Coaching and reach out to Caroline at http://estelia.uk The Agile Manifesto Find more information about the Agile Manifesto at https://agilemanifesto.org/ The Agile Onion During the interview, Caroline mentioned the Agile Onion. There are a number of places on the internet where you can find more information and here is one: https://www.adventureswithagile.com/2016/08/10/what-is-agile/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-04-07 14:00:00

Episode 186 – Roaring News

To counter the current horrible news around the world, we turned to The Register as today's Guest Publication and tried to find a couple of articles that have a more positive tone. Sadly, even at The Register, things are looking bad for NASA, GitHub and even the British military are looking for help... from Facebook? Weird that the concept of gravity would be a stranger at NASA... large amounts of data come with a lot of gravity. Data gravity, that is. At NASA, this was apparently overlooked for a new project where hundreds of petabytes of data are stored on AWS and made available for download by the public. https://www.theregister.co.uk/2020/03/19/nasa_cloud_data_migration_mess/ This is why the overentitled humans cannot have anything nice... Open source is good. Open source is great. Everything should be open source! Unfortunately, due to everyday economics, people still want to get paid. So even companies like GitHub that work within the open source environment need to be able to charge for some of their add-on services. https://www.theregister.co.uk/2020/03/18/github_mobile_app_closed_source/ Facebook now truly rules the world. We're hoping this article is pulling things a bit out of context and proportions, but I find it quite worrisome that the military, with all the defense budgets they have available, go for what surely must be the least secured chat app out there? It would be great to hear fro someone involved how WhatsApp was chosen for this... https://www.theregister.co.uk/2020/03/18/army_adopts_whatsapp_orders_coronavirus/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-03-31 14:30:00

Episode 185 – Talking Business Agility with Caroline of Estelia (Part 1)

Agile Development is quite popular in the world of software development, but the concepts are applicable to a much broader business environment. Very often, however, the actual implementation of these rather excellent concepts are plagued with faulty expectations and simply not knowing what you are trying to do. Find out how you can benefit, both as an organization and personally, when doing Agile "right" in this two part interview with Caroline from Estelia. Estelia Find out more about Agile Coaching and reach out to Caroline at http://estelia.uk The Agile Manifesto Find more information about the Agile Manifesto at https://agilemanifesto.org/ The Agile Onion During the interview, Caroline mentioned the Agile Onion. There are a number of places on the internet where you can find more information and here is one: https://www.adventureswithagile.com/2016/08/10/what-is-agile/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-03-24 15:30:00

Episode 184- Roaring News

A Law&Order themed news episode this week where we look at a collection of articles about the many legal implications that come with the ever increasing usage of machine learning and neural networks in the worlds courts. We start with the first traces of "Minority Report" PreCrime, follow on to privacy rights of convicted people and finish with the realities of A.I. based facial recognition. "We can't risk peoples' lives on automated apps that save money." Driven by a shortage of human resources, machine learning and artificial intelligence are slowly infiltrating the court system. From our doubts about speech-to-text systems causing problem with future decision reversals to our reservation regarding the accuracy of Neural Network "Recommendation systems" playing a role in deciding the measures of penalty for convicted wrong-doers.. We're definitely not convinced that this is the ideal way forward. On the other hand, us humans also do not have a squeaky clean track record when it comes to decision making, so maybe it's all for the best after all? "Good" isn't only for the "good"… While this article tries to inflame tensions a little and unfortunately seems to be taking this as a chance to take some pot-shots at Google, we did feel the idea of privacy rights of convicted criminals is worth a discussion. To be sure, we definitely don't claim to have all the wisdom and this is just our opinions, but articles like this do seem to have a "hidden agenda" and hopefully clearer minds prevail when this subject matter is discussed. Maybe so, but this genie is not going back in the bottle.. ClearView's credibility has pretty much been destroyed at this point and the whole image recognition technology also doesn't fare much better! But hang on, it's easy to use, easy to set up and sounds cool? Let's do it anyway! Yay! Not.... Sigh... Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-03-17 15:00:00

Episode 183- Taking Apache Cassandra for a spin

In this episode we have an in depth discussion on Apache Cassandra with Patrick McFadin, VP Developer Relations at DataStax. We had a great conversation with Patrick about topics ranging from basic NoSql topics to more in depth applicability of Apache Cassandra. Apache Cassandra really is one of the most used NoSql solutions out there and this information should really be useful for anybody working in Big Technology! Apache Cassandra More information on Apache Cassandra can be found on the Apache Cassandra website and on the website of DataStax. DataStax Accelerate Meet the creators of Apache Cassandra at a DataStax Accelerate event near you. They will be in Dan Diego starting May 11th and in London from June 2nd. We do have discount codes for our listeners that are want to atend these events. Use the promo code ELEPHANT20 for a 20% discount on the ticket price! Unfortunately, we've been informed that these DataStax Accelerate events have been cancelled due to Covid-19. More information is available at https://www.datastax.com/accelerate Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-03-10 15:00:00

Episode 182- Roaring News

It has been a little while, but here is a regular Roaring News episode where we discuss three articles. First off is an Ozone benchmark followed by an A.I. directed dungeon crawl and we end with a bit of an ethical discussion about some things that scientists get up to these days.... Size matters, but we expected more... This Cloudera blog with benchmark information about the still in pre-beta Apache Ozone felt rather underwhelming. No big speed increase, quite a bit of slowdown on not so large data-sets. Having only limited details on the resuolts also does not inspire a lot of trust nor engender much enthusiasms. Of course, being pre-beta, there is still a way to go, but this projhect has been in development a for quite a while now... Does it still make sese in an age of cloud computing? The monster in the corner uses it's tensorflow model to confuse you... Text based adventure games are nothing new or that interesting, except when the story itself is "live generated" by an A.I. linguistic neural network. Now rebased on the GPT-2 model, it became a bit of a hit with enthusiasts and left the creator with some monetary difficulties. This article discusses how the original approach was remodeled towards a micro services architecture, avoiding financial ruin. We can do this, let's not complicate things with ethics, right? From time to time we encounter an article about some scientists doing something really... disturbing, I guess. We'll let you make up your own mind about what side of "the line" this subject lands, but for s it was worthy of some cogitation... Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-03-03 15:00:00

Episode 181- DBIR 2019: So, are we doomed yet?

Verizon has published their "Data Breach Investigations Report" for some time now and with the news of data leaks, ransomware attacks and more of that kind of horror dominating much of the technology conversation, we gratefully borrow this report to dedicate a full episode on this subject. As ever, we never shy away from deviating form the subject a little (or a lot) but this time we did try to keep close to the subject matter. If you're in any way connected to cyber security, and honestly, everybody in the tech world should be, you owe it to yourself to download the DBIR or, even better, listen to this episode! You can download the full report, or an executive summary directly from the Verizon website. Here direct links to the DBIR pdf and executive summary. And kudos to Verizon for not putting this document behind a "leave you email here" or other form of "paywall"! Just click the "view only" button if you prefer not to share your details! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-02-25 15:00:00

Episode 180 – KubeCon/CloudNativeCon Preview

Not a regular news episode this time. Instead, we are starting our KubeCon/CloudNativeCon Amsterdam coverage and have co-chairs Vicki Cheung and Constance Caramanolis on as guest to tell us all about these conferences. If you've never attended one of these, this discussion will give a good idea on what to expect and for seasoned attendees, there is a little bit of a behind-the-scenes look at how these events take form. Get Your Ticket here! What is definitely news worthy though is that the Roaring Elephant is giving a way a free ticket to the upcoming conference in Amsterdam next month. make sure to listen to the episode to find out how to get a chance at winning this ticket! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-02-18 15:00:00

Episode 179- Network Virtualization with Nicolas Thomas (part 2)

In this episode we welcome Nicolas Thomas, an NFV pioneer at Fortinet to educate us on everything related to Software Defined Networking (SDN) and Network Function Virtualization (NFV). We look at the changes the industry has gone though and how these upcoming new standards are changing how corporations and consumers experience their globally connected environment. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-02-11 15:00:00

Episode 178- Roaring News

It's been a little while, but here is another full fledged Roaring News Episode! For starters we are happy to announce that the Roaring Elephant is now an official media partner of Kubecon|CloudNativeCon and we will be covering the Amsterdam event in a couple of months. Keep your ear to this podcast to find out how you can win a free ticket to the convention! And in other news, we are discussing how Open Source has shaped the past decade, or not.... The more things change, the more they stay the same? This article looking at the past 10 years and talks about how Open Source, in it's many guises, has changed the world. We go over the area's covered in the article and give our views. https://www.infoworld.com/article/3481661/how-open-source-changed-everything-again.html Politics to the rescue! As a follow-on to the previous article, this NY Times article pretty much sums up the current state of the license wars.... https://www.nytimes.com/2020/01/17/technology/antitrust-hearing-boulder-colorado.html Roaring Elephant Roadshow is ON! We've been hinting at this for a little while now but all the pieces have fallen into place and we can now officially say we're a KubeCon| CloudNativeCon media partner! As pat of this, we will be doing some in-depth coverage of the convention and we'll be giving away a free ticket in an upcoming episode! As always, our Patreons will get first dibs so if you want to improve your odds, and help out your favorite podcast, have a look at our Patreon Page! https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-02-04 15:00:00

Episode 177- Network Virtualization with Nicolas Thomas (part 1)

In this episode we welcome Nicolas Thomas, an NFV pioneer at Fortinet to educate us on everything related to Software Defined Networking (SDN) and Network Function Virtualization (NFV). We look at the changes the industry has gone though and how these upcoming new standards are changing how corporations and consumers experience their globally connected environment. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-01-28 15:00:00

Episode 176- Let’s do the time-warp again! (Part 2)

For the first Roaring News episode of the year, we decided to look back at 2019 to discuss the stuff we found noteworthy and perhaps take a chance at predicting what the next year will bring. We actually had so much fun recording this, we went way over time, so we decided to split this one into two parts. Episode 174 had the first part of the discussion and here is part two, bringing it home. We've meandered over a large number of subjects in this episode, so no summary for this one! :D Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-01-21 15:00:00

Episode 175- Stackoverflow Developer Survey Results 2019

A few episodes ago, we discovered the Stackoverflow Developer survey from 2019. Now even though it's been a while since the results were published, we found some of the results quite interesting and timeless enough to dedicate an episode about it. In case you are wondering about the untimeliness of the "cyber-truck with unbreakable windows" reference at 25:50, we originally intended to release this episode a couple of months ago in 2019, but then we recorded the truly excellent series with Rodolphe Dutel on remote working and decided to publish those episodes first. This inevitable meant delaying this episode to, well, now. :) In this hour long episode, we pick and choose the survey results that we found surprising, remarkable, weird or otherwise worth mentioning and give our view on the how and why. And of course, as you are by now accustomed to, your co-hosts definitely did not meander and go of topic... At all... Honestly... Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-01-14 15:00:05

Episode 174- Let’s do the time-warp again! (Part 1)

For the first Roaring News episode of the year, we decided to look back at 2019 to discuss the stuff we found noteworthy and perhaps take a chance at predicting what the next year will bring. We actually had so much fun recording this, we went way over time, so we decided to split this one into two parts. So here is part one and in two weeks time, part two will reach you in episode 176. We've meandered over a large number of subjects in this episode, so no summary for this one! :D Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2020-01-07 15:00:24

Episode 173- Remote Working: The way of the Future? (Part 3)

In the effort to hire the best candidates, more and more companies are transforming into distributed organizations where remote working is the way to go. As with any emerging technology, many businesses are still trying to figure out how to make this work and on the other side of the fence, potential employees are looking for guidance on what to expect. Your co-hosts have been working remotely for some time and we are joined by Rodolphe Dutel, founder at Remotive.io for a really deep and extensive discussion on everything related to remote working and distributed companies. In this three part series with Rodolphe, we are trying to get to the bottom of the whole "is remote working good or bad" issue. For people who are not working remotely, the prospect can be exciting or just down right scary. For businesses that want, or need to embrace the distributed paradigm, it can be equally daunting. Rodolphe has been "helping Tech professionals go remote" for over five years now and we are very happy that he agreed to be on the podcast to share his extensive knowledge and expertise with this subject. You can find Rodolphe at Remotive.io Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-12-31 15:00:44

Episode 172- Roaring Special: KubeCon+CloudNativeCon & AWS re:Invent

Our roving word traveling reporter Dave spent some time at Kube+CloudNativeCon ane AWS re:Invent and in today's News special, we're discussing the differences and similarity between these events. We're not going to cover any of the sessions, but rather talk about the conferences. For creators... Taking place twice a year, once in the US and once in Europe, we feel this conference is a good fit for people that want to find out what is new in the world of cloud and, actually, open source in general. We briefly compare this event with the pure open source conferences like FOSDEM and conclude that KubeCon+CloudNativeCon could just be the preeminent open source conference today. Though definitely not without it's faults (you;ll ned to listen to the episode to find out more), we do feel we can recommend attending this conference if you are active in this environment. For Consumers... Compared to KubeCon+CloudNativeCon, this conference is absolutely massive with over 60.000 attendees, sprawling over multiple monstrosities on the Las Vegas strip. With an obvious emphasis on AWS we feel this event is much more geared toward those that are deploying tech on public cloud infrastructure. Dave felt that attendees take advantage of this event to interact with AWS. Considering many of the visitors are spending important amounts of money with Amazon, this is not really surprising, but it is still remarkable. Controversy... Never a stranger to (may we say needless?) controversy, AWS appears to have controlled the media exposure with a heavy hand. There is an article on CloudPro.co.uk discussing how multi-cloud and the Pentagon JEDI contract were not to be discussed. But when we looked up the article to discuss it on this show, the article had mysteriously vanished from their site with the original URL now resulting in a "permission denied" error... Fortunately, Google cache still has a copy and for when that cache finally expires, we've decided to keep a PDF version for future reference. ;) Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-12-24 15:00:47

Episode 171- Remote Working: The way of the Future? (Part 2)

In the effort to hire the best candidates, more and more companies are transforming into distributed organizations where remote working is the way to go. As with any emerging technology, many businesses are still trying to figure out how to make this work and on the other side of the fence, potential employees are looking for guidance on what to expect. Your co-hosts have been working remotely for some time and we are joined by Rodolphe Dutel, founder at Remotive.io for a really deep and extensive discussion on everything related to remote working and distributed companies. In this three part series with Rodolphe, we are trying to get to the bottom of the whole "is remote working good or bad" issue. For people who are not working remotely, the prospect can be exciting or just down right scary. For businesses that want, or need to embrace the distributed paradigm, it can be equally daunting. Rodolphe has been "helping Tech professionals go remote" for over five years now and we are very happy that he agreed to be on the podcast to share his extensive knowledge and expertise with this subject. You can find Rodolphe at Remotive.io Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-12-17 15:00:24

Episode 170- Roaring News

This week, we have the shocking news that Docker Enterprise has been gobbled up by a rather unknown entity. (Shocking not because it happened, but because it took so long.) Also, a new open source license attempt to stem the cloud predation appeared on the scene and Microsoft has all the orchestration you would ever want, in the cloud no less, no really! How the Mighty have Fallen... Not only has docker now completely ended as an entity, we wonder what kind of "synergy" can be achieved with this acquisition by Mirantis. And apart for the opensource docker format, usually deployed via Kubernetes, does docker still matter? https://techcrunch.com/2019/11/13/mirantis-acquires-docker-enterprise/ "Eventually open source": the new abandonware? In th eongoing OSS license wars, a new mongrel has appeared with Sentry borrowing the "Business Source License" and adding an "eventual open source" clause. Not sure if serious..? https://blog.sentry.io/2019/11/06/relicensing-sentry Azure stack did not work out for you then, did it? Microsoft is jumping more firmly on the hybrid cloud bandwagon with a new orchestration that will rule them all, from the clouds no less. So just an answer to Google's Anthos or is this actually interesting? https://www.itprotoday.com/hybrid-cloud/azure-arc-microsofts-latest-enterprise-multi-cloud-play Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-12-10 15:00:29

Episode 169- Remote Working: The way of the Future? (Part 1)

In the effort to hire the best candidates, more and more companies are transforming into distributed organizations where remote working is the way to go. As with any emerging technology, many businesses are still trying to figure out how to make this work and on the other side of the fence, potential employees are looking for guidance on what to expect. Your co-hosts have been working remotely for some time and we are joined by Rodolphe Dutel, founder at Remotive.io for a really deep and extensive discussion on everything related to remote working and distributed companies. In this three part series with Rodolphe, we are trying to get to the bottom of the whole "is remote working good or bad" issue. For people who are not working remotely, the prospect can be exciting or just down right scary. For businesses that want, or need to embrace the distributed paradigm, it can be equally daunting. Rodolphe has been "helping Tech professionals go remote" for over five years now and we are very happy that he agreed to be on the podcast to share his extensive knowledge and expertise with this subject. You can find Rodolphe at Remotive.io Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-12-03 15:00:55

Episode 168- Roaring News

This week's news is full of spying smart devices and the perceived sprawl of IOT. We finish off discussing the use and non-use of big monolithic government contracts in technology. It's not because you're paranoid they're not out to get you... Privacy is foremost in many peoples mind, but on the other hand we keep bringing so called "smart" devices into our pivacy sanctums... And now we need to start thinking on how this trend affects our surroundings... https://lifehacker.com/are-you-morally-obligated-to-tell-house-guests-about-yo-1839174119 We've outsourced all the people? OK, now let's outsource all the bytes! Red Hat is reporting on the apparent trend of outsourcing IOT, but to be honest, how pervasice is IOT in everyday life and is there really a need for this? https://datafloq.com/read/red-hat-report-iot-outsourcing-trend-accelerating/7063 All your bases, they are belong to... Amazon? We don't really care about how the Pentagon sends the taxpayers money and this article wrongly predicts Amazon to win the much discussed and maligned US Pentagon JEDI contract anyway, but are these monster deals still make sense in the world of hybrid cloud and micro-services? https://www.technologyreview.com/s/614487/meet-americas-newest-military-giant-amazon/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-11-26 15:00:39

Episode 167- Year 4: definitely no longer a Toddler!

Please join us as celebrate four years of podcasting on Big Data and Big Tech. After 166 episodes, we've somehow reached our fourth anniversary and as is by now tradition, we invite guests we had over the year back for this episode to see what they have been up to. The internet: an amazing place to find interesting imagery! :D Big thanks to our guests today: Jean-Georges Perrin https://www.linkedin.com/in/jgperrin/ John Mertic https://www.linkedin.com/in/jmertic/ Mark Phillips https://www.linkedin.com/in/markphillips/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-11-19 15:00:34

Episode 166 – Roaring News

SpaceX will solve the worlds Internet woes, courtesy of Elon Musk, Quantum Supremacy may still be a little while off (like a couple of decades), courtesy of IBM and from now on, the "don't call us, we'll call you" response to your job interview will be courtesy of an artificial intelligence (or lack thereof). Duck! 40.000 Satelites passing through! Satelite Internet services have definitely not been without its issues, but Elon Musk's Starlink has now successfully made (at least) twitter available to the masses! https://arstechnica.com/information-technology/2019/10/elon-musk-sends-tweet-via-spacexs-starlink-satellite-broadband/ The Technology wars of 2020 have started already... After Big Data washing and A.I. washing, prepare for the inevitable supremacy of Quantum washing? Google and IBM are going for round one! https://www.technologyreview.com/s/614604/quantum-supremacy-from-google-not-so-fast-says-ibm/ I for one welcome the our new masters... An AI preforming job interviews, cutting down the average time to hire a new employee from a couple of weeks to a couple of days cannot be bad, surely? Personally, Dave and I don't agree, thank you very much! https://www.washingtonpost.com/technology/2019/10/22/ai-hiring-face-scanning-algorithm-increasingly-decides-whether-you-deserve-job/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-11-12 15:00:11

Episode 165 – Best Practices for Machine Learning

A little while ago we came across a blog by Martin Zinkevich about the best practices for ML Engineering at Google. We had this one on the back burner for a while now and we decided to go ahead and do an episode on this subject, using this article as a rough guide. Rules of Machine Learning:Best Practices for ML Engineering The blog post by Martin is definitely worth a thorough read! https://developers.google.com/machine-learning/guides/rules-of-ml/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-11-05 15:00:05

Episode 164 – Roaring News

Believe it or not, Hadoop is dead, again. In other news, Facebook's Libra coin is getting more and more boxed in and Microsoft thinks the real world is realy just too ugly to look at during your daily commute. Enjoy! How much deader can it get? When Arun Murthy, "father of Yarn" and therefore quite an authority on Hadoop declares said hadoop to be dead (yet again), we take notice. Now this was published a little while ago but for some reason, we kept putting off talking about it. However, today is the day we tackle this subject, yet again, hopefully for the very last time... https://medium.com/@acmurthy/hadoop-is-dead-long-live-hadoop-f22069b264ac FaceBook will "Libra" your wallet When Facebook revealed details about their cryptocurrency Libra, the world certainly noticed. While we shy away from discussing bitcoin variants, this news has been interesting enough, especially the reaction of the world towards this newcomer on the financial markets. https://www.cnbc.com/2019/10/22/facebook-ceo-mark-zuckerbergs-prepared-remarks-before-congress.html Be gone, ugly reality Is this an occasion of Virtual Reality and Augmented reality going too far, off the proverbial deep-end or is Microsoft on to something here? Apart from an in depth discussion on the stylistically merits (or lack thereof) this contraption has, we also discuss the recent history of these type of reality altering technologies. https://arstechnica.com/gadgets/2019/10/microsofts-dreamwalker-vr-turns-your-daily-commute-into-a-totally-different-one/# Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-10-29 15:00:55

Episode 163- The Cloudpod weather report Part 2

It cannot be denied that cloud, particularly public cloud has become ever more prevalent across the "Big Tech" universe. In order to give this subject the attention it deserves, we went out and found some podcasting colleagues who specialize on this subject and put them in front of our "question firing squad". Many thanks to our Special Guests: Peter Roosako Twitter: @FoghornCTO Peter is the CTO of Foghorn Consulting, a cloud consulting and DevOps engineering company based in San Francisco. Justin Brodley Twitter:@jbrodley Justin Brodley is an IT Technologist who has worked with cloud computing in the SaaS computing space. He has built out MSP AWS partner organizations, led IT operations for fortune 500 companies and founder and CEO of SaaS 11, a consulting firm specializing in SaaS Software providers. Jonathan Baker Twitter: @cloudkickoff Jonathan Baker is a career problem solver and serverless evangelist. He is currently focused on developing cloud based platform services, and recently won first prize at the AWS re:Invent 2018 NPO Hackathon. He enjoys his family, Lego and making stained glass in his down time. The CloudPod Our three guests are the hosts of The CloudPod podcast where they break down the big cloud announcements, what it means for devops/SRE/IT teams and how to think about the cloud market each week. Make sure to check out their content! https://www.thecloudpod.net/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-10-22 14:00:22

Episode 162- Roaring News

For this News episode we are joined by Niels and Carlo from Kuori.io and we discuss the need that some people have to build solutions using technologies that are not really suited for the purpose, while perfectly valid technology is readily available. In the second part, we discuss a whitepaper on "cloud native" and how it seems this drive towards to cloud is still very often a an exercise in marketing... If you build it, they will come? Not saying that this is a bad blog, not at all, it discusses the subject matter in clear detail and all in all is a pretty god read. However, the first (and arguably only valid) question that popped into my mind when I read the title was: WHY? Why would anyone want to do something like this? Are there good reasons for trying to build something using tools that are clearly not suited for the purpose? https://yokota.blog/2019/09/23/building-a-relational-database-using-kafka/ Back to the Future? A new whitepaper by Weaveworks begs the question: are we still talking about this as a "new" thing? Are there still organizations for who this is new? Taking a deeper look at the whitepaper, it turns out it is full of marketing buzzwords and platitudes... So how far is cloud native really accepted and in use today and is a micro service architecture really something you need to go for to solve all your problems..? https://www.weave.works/blog/6-reasons-to-start-the-cloud-native-transformation Why don't you take it easy? Final thanks to our guests Niels and Carlo from Kuori.io! If you're looking for a carefree data infrastructure, be sure to have a look at their website! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-10-15 14:00:02

Episode 161- The Cloudpod weather report Part 1

It cannot be denied that cloud, particularly public cloud has become ever more prevalent across the "Big Tech" universe. In order to give this subject the attention it deserves, we went out and found some podcasting colleagues who specialize on this subject and put them in front of our "question firing squad". Many thanks to our Special Guests: Peter Roosako Twitter: @FoghornCTO Peter is the CTO of Foghorn Consulting, a cloud consulting and DevOps engineering company based in San Francisco. Justin Brodley Twitter:@jbrodley Justin Brodley is an IT Technologist who has worked with cloud computing in the SaaS computing space. He has built out MSP AWS partner organizations, led IT operations for fortune 500 companies and founder and CEO of SaaS 11, a consulting firm specializing in SaaS Software providers. Jonathan Baker Twitter: @cloudkickoff Jonathan Baker is a career problem solver and serverless evangelist. He is currently focused on developing cloud based platform services, and recently won first prize at the AWS re:Invent 2018 NPO Hackathon. He enjoys his family, Lego and making stained glass in his down time. The CloudPod Our three guests are the hosts of The CloudPod podcast where they break down the big cloud announcements, what it means for devops/SRE/IT teams and how to think about the cloud market each week. Make sure to check out their content! https://www.thecloudpod.net/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-10-08 14:00:44

Episode 160- Roaring News

Before we dive into the News, we want to really thank you, our audience for helping us reach our YouTube subscriber goal. We now are the proud owner of the www.youtube.com/RoaringElephant url and we could not have done this without you! For News, we talk about boring one-man companies, something hybrid cloudy at Google and a followup on the open-source David versus the public cloud Goliath story. It's not because it's boring that it's not Cool! The author if this article is the sole employee of his own company "Listen Notes" and here he explains in pretty deep detail how he was able to setup an internet service, by himself, using mostly open source software and good development practices. https://broadcast.listennotes.com/the-boring-technology-behind-listen-notes-56697c2e347b You need Hybrid Cloud. Even Google says so! As probably the last of the big three public cloud providers, google has joined the select group that offers a hybrid cloud environment. Does it make sense? Does it not? Listen and find out what we think! https://bravenewgeek.com/whats-going-on-with-gke-and-anthos/ Open Core Companies Unite! Now that the Roaring Elephant discussed the situation, a large number of notable open source and open core organizations decided they need to unionize under a common goal and fight for a sustainable open source/ope core business model. Said model is yet to be defined though... https://www.theregister.co.uk/2019/09/20/open_source_companies_cloud/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-10-01 14:00:48

Episode 159 – OCMAWDIAM Part 2?

Today we are joined by Mark Phillips, product marketing manager and more interestingly, heavily involved with Ansible at Red Hat. Now, rather than making this episode specifically about Ansible, Mark shares his extensive expertise on the subjects of orchestration, config management and automation. Mark shares his 25 years of experience on various questions covering the usual "what, why and who". A bit of history is included and of course the difference in approach for cloud versus on-premise also come up. Specific terminology is explained and we cover the usual excuses for not using things like config management. For those that are still on the fence the information shared by Mark should give you a firm grasp of the concepts and deployment methods and help you get started. During the interview, Mark mentioned a number of blogs and other online resources: Why failure should not be celebrated in the startup world "Migrating the runbook - from legacy to DevOps" at IPExpo London 2015 As work gets more complex, 6 rules to simplify - TED talk Puppet vs Chef vs Ansible Mark Phillips (Ansible) - Go Agentless! at #DOXLON ANSIBLE BEST PRACTICES: THE ESSENTIALS Ansible Tips & Tricks Ansible as automation glue Many thanks to Mark for spending time with us and sharing his extensive knowledge and experience with our audience! Since the interview did run a bit long, this will be a two-parter and we will probably follow this up with a more in depth discussion on Ansible. Let us know of you have any questions you want to see answered? Title image background Designed by starline / Freepik Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-09-24 14:00:35

Episode 158 – Roaring News

The main topic for this news episode is a revisiting of the Multi-Cloud subject we touched last time. Next we take a look at an article about the state of the Docker project and we end on an article about an excellent post-morten by Monzo about some trouble they had over the summer. The Muli-Cloud article we wanted to discuss… When we discussed the subject of Multi-cloud on the last News Episode, we did it from an article we though wasn't very good. As luck or faith has it, we came across a different article that actually gave us a better start to that particular discussion and we're not above doing just that! In this pretty good article, we follow the article along the 5 talking points he find relevant and important, adding our views. And don't fear, with a subject as broad as multi-cloud, there is plenty to talk about! Community does matter in Open Source! Go figure… When I got the title for this article from Dave, my first thought was "I didn't know Docker went away". And the second thought I had was "whatever... *shrug*". Now the article is a little inflammatory and we won't be joining in on that particular pass-time, but it did get us talking about Docker and open source project management in general. So if good journalism is supposed to make people stop and think, the article certainly achieved that and so we're going to be covering it in this episode. Who wants to get in on Monzo's post mortem? And now for something completely different. Something delightful and unexpected! A post mortem by Chris Evans at Monzo regaling us with an excellent write-up of something that happened in July, how they approached it and finally fixed it. It is quite rare to see this kind of transparency, even when open source is a large part of the operating environment and this write-up, warts and all, is truly a delight to read and discuss. We often say its great to hear about things that work, but we really learn when we hear about things that did not go so well. Many thanks to Chris and his team for this blog post! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-09-17 14:04:03

Episode 157 – Orchestration, config management, automation, what does it all mean?

Today we are joined by Mark Phillips, product marketing manager and more interestingly, heavily involved with Ansible at Red Hat. Now, rather than making this episode specifically about Ansible, Mark shares his extensive expertise on the subjects of orchestration, config management and automation. Mark shares his 25 years of experience on various questions covering the usual "what, why and who". A bit of history is included and of course the difference in approach for cloud versus on-premise also come up. Specific terminology is explained and we cover the usual excuses for not using things like config management. For those that are still on the fence the information shared by Mark should give you a firm grasp of the concepts and deployment methods and help you get started. During the interview, Mark mentioned a number of blogs and other online resources: Why failure should not be celebrated in the startup world "Migrating the runbook - from legacy to DevOps" at IPExpo London 2015 As work gets more complex, 6 rules to simplify - TED talk Puppet vs Chef vs Ansible Mark Phillips (Ansible) - Go Agentless! at #DOXLON ANSIBLE BEST PRACTICES: THE ESSENTIALS Ansible Tips & Tricks Ansible as automation glue Many thanks to Mark for spending time with us and sharing his extensive knowledge and experience with our audience! Since the interview did run a bit long, this will be a two-parter and we will probably follow this up with a more in depth discussion on Ansible. Let us know of you have any questions you want to see answered? Title image background Designed by starline / Freepik Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-09-10 14:00:27

Episode 156 – Roaring News

In this Roaring News episode we start debunking a "85% of companies use multi cloud" statement, look at the future of Data Engineering and are completely astounded at the amount of tracking that happens on the world wide web. We close off with a deeper look at the Cloud Native Computing Foundation open sourcing the Kubernetes audit and go another round on "Smart Data" versus "Big Data". Multi-Cloud Confusion With this first article, we want to do a little bit of debunking what recently has become one of the more hyped statements in the Big Technology space. If you believe "them", almost every company out there is now operating in multiple clouds! Now, this does not necessarily need to be a false statement, it just depends on how you define "cloud". We discuss the different views and offer some advise on how a successful multi-cloud strategy should work. And it immediately illustrates how difficult a real multi-cloud deployment actually is... Data Engineering of Future Past This article touches on a few "hot topics" that, according to the author will be what Data Engineers will concentrate on in the coming years. What stuck us wen reading this article was that we kind of accepted these topics to be done and finished today! In our admittedly distorted view, real-time systems and de-coupled bus centric architectures are the way everybody does things today, not something that is still on the horizon. The last part of the article also discusses how te Data Engineer is getting more and more attached to the less technical business processes. If you build it, they will track you Not a new topic by any stretch but a very detailed article that treats the subject with a lot of integrity, not eschewing some self-criticism. It's a revealing story with some really excellent imagery that should make you stop and think about how "free" the internet really is. The old adage "if you're not paying for it, you're not the customer, you're the product" is clearly still very much truth. Open source really goes everywhere In episode 153 we briefly mentioned how the CNCF has open sourced the Kubernetes security audit and in this blog post they give access tot he full audit document. They also aim to repeat this process for other Cloud Native projects, giving priority to the graduated project, as one should expect. The importance of something lke this can not be overstated, so we spend some time discussing this. Make sure to check out episode 153 too! Common Sense Anyone? This Forbes article excels in stating one of the most obvious facts that exixt in Big Data today. However, it does not hurt to repeat it from time to time since at the beginning of the Data revolution, and actually still continuing today, a lot of projects place way too much importance on the gathering of data when they should pay more attention on what that data is supposed to deliver for the project. Simply gathering data will not make you r your organization faster or more productive, unless you have the means to get value out of the data. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-09-03 14:00:34

Episode 155 – NoSQL: You keep using that word…

For a podcast on Big Data, we were amazed that we never covered the subject of NoSQL. So we're correcting this today. Not by listing and comparing all the NoSQL solutions out there, but rather by going over the differences between the two paradigms. This way we hope to offer enough insight so you can feel comfortable deciding if you should or should not deploy NoSQL in your environment. There are definitely a number of really important benefits to using a NoSQL solution in your environment and your co-hosts are big fans of the technology. However, make sure to carefully consider the positive and negative consequences. Make sure you are going for NoSQL for the right reasons. The discussion in this episode should give you a good bases for that decision but do let us know if there is something we missed? Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-08-27 14:00:44

Episode 154 – Roaring News

In this News installment, coming to you courtesy of Dave and Endgaget, we talk about how the Osaka track is paving the way for data free flow across borders and take a look at the alleged problem with the UK's facial recognition system. Ending on a "high note", we discuss how Facebook and Googles (and a lot of other's) MLPerf benchmark is going to change the way we look at our machine learning setups. Or not... Data without frontiers! Have world leaders woken up and smelled the coffee that is reality or do they still believe data flow is something that stops at a border check point? The Osaka track that was initiated by Japanese Prime Minister Shinzo Abe seems to indicate enlightenment is at hand. However, real concrete information is rather hard to find so we engage in a bout of theory-crafting on this subject in the hopes of coming to a useful conclusion. Wish us luck! I'm sure I've seen that face before... This article on how some people claim the UK's facial recon system is quite bad and other people say it's wonderful is a prefect excuse to talk about how context and full understanding of the underlying technology is often a "good" think when reading up on statistics and benchmarks. It feels like every time somebody writes up an evaluation, they use (and abuse) the data such that it makes the point they are looking for. And can you blame them? of course you can, though that does not excuse us form getting educated on a subject before buying in to the rhetoric, no matter how fit-for-purpose it seems to be. So yea: "lies, damn lies and statistics" is still very much alive it would seem... On the subject of lies, damn lies and benchmarks... Following SPEC and TCP, machine learned experts decided they need a propper benchmark suite of their own, thank yo very much indeed!. Now the actual value to the end consumer of Machine Learning Goodness(TM) is more often than not negligeable, but "industry standard" benchmarking has definitely earned its place in the world of RFIs and RFPs... This endeavor in benchmarking does seem to be a very ambitious project encompassing all of hardware, software and services (think cloud SaaS). Does this make MLPerf brillant, or doomed to failure? And do we need benchmarking after all? All good questions and the answers can probably not be found in our discussion on the matter... Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-08-20 14:00:54

Episode 153 – How Secure is the Future of Open Source?

The way open source software is being consumed has changed drastically: originally found on the fringes, open source technology has now become a core part of many organizations of all sizes. We take a look at the confusion and sometimes vocal irritation that has accompanied this adoption by "Big Business" and ask the question if the future of Open Source is in danger. We have been playing with the idea to give our view on this subject for a while now, but we wanted to make sure not to add to very flammable situation. Rather, we try to share usefull information and stay as close to an unbiased narrative as possible. We end the conversation on a positive note, being hopefull that the inherent openness and transparency that imbues Open Source will prevail and a new equilibrium will be found. We are not basing this discussion of any specific article, but here is a list of articles that we reference during the discussion. The CBInsights article is waht we consider the most FUD-less of them all containing a lot of useful information and the infographics we talk about come from this CBInsights article. Open-Source Software Has Changed The Way Software Is Developed. Here’s Where The $33B Industry Is Headed The sustainability of open source for the long term Amazon Has Gone From Neutral Platform to Cutthroat Competitor, Say Open Source Developers Captain, we've detected a disturbance in space-time. It's coming from Earth. Someone audited the Kubernetes source Information on Open Source Licenses: Free software license on wikipedia Why We’re Relicensing CockroachDB Top 10 Open Source Licenses in 2018: Trends and Predictions Shout-Out to Manning Publications! Have a look at our patreon page to find out why we like Manning! :) Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-08-13 14:00:05

Episode 152 – Roaring News

Another fortnight, another roaring news episode covering this time: de-anonymizing anonymized data is reportedly easy, Kubernetes is easier than Big Data, Big Data is hard and hard to understand Kafka can be made easier using Factorio visualization. It's not because you're paranoid, they're not out to get you! Not totally unsurprising to your co-hosts, this article discusses how easy it is to recombine previously anonymized data to regain the ability to identify a person, based on the data sets. Now this does involve combining multiple datasets and this is something legislators have warned against in the past. GDPR specifically has a clause that adreses this and data owners need to exercise care to avoid this fro happening. That being the case, though, there are bound to be entities that are not bound by privacy legislation, or that simply choose not to follow them. So long story short (too late!), take care abot what information you share where! Going on a litle tangent, we discuss how bad this loss of privacy actually is. Is it really Dangerous (with the capital 'D') or merely an inconvenience where it is more the lack of control that poses the most irritation? Kubernetes is less complex than Hadoop! For now? This first article in a sequence of three is more than a little click-baity, but the comparison of complexity between Kubernetes and the Hadoop ecosystem did intrigue enough to validate a discussion. The inherent unfairness of comparing a scheduler (Kubernetes) with a full big data platform including all the higher level applications does make the premise of the article a bit hard to take. So we don't really discuss the article itself rather than try and predict the future of Kubernetes and how they will avoid the complexity pitfall. As a bonus, here is an older article that goes a little deeper "On complexity in big data". I'm not Gaming, I'm studying up on my Kafka skills! Following on from the previous article, this little gem uses an indy game Factorio to explain the rather abstract concepts used by Kafka. The advantages of distributed systems deployed in a de-coupled fashion are highlighted and the concepts of topics and partitions are also demonstrated in what we find a cvery clear way. Now this is one of those articles that suffer from the visually-impaired nature of mp3 podcasting, so do pull up the article wile you;re listening to this section, if at all possible. And if the visuals intrigue, feel free to check out Factorio by Wube Software and let us know what you think! :) Open Source to the rescue of Cloudera (stock price)! For the final article in this episode, we take a look at how Cloudera, pretty much the only remaining Hadoop distribution vendor out there, is betting the farm on Open Source. When Hortonworks was still a separate entity, one of the important differentiators between them and Cloudera was the the Hortonworks Data Platform was 100% open source where Cloudera's Data Hub was open core wit proprietary add-ons you pretty much needed to get if you wanted to have a somewhat livable user experience. Now Cloudera has announced to open source everything in their new distribution. This is definitely remarkable and open source enthusiasts will welcome this move. However the question remains why Cloudera is going this path now and what they think they will gain form it. Hortonworks, near the end of their existence, added proprietary add-ons on top of the open distribution in an attempt to increase revenue (or so we guess) and as the article mentiones near the end, predatory behavior by the likes of Amazon have caused popular open source projects to move away from 100% open source, in orde to protect their sustainability... Only the future can tell how this will turn out. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-08-06 14:00:31

Episode 151 – Do you only need 6 principles?

A little while ago, Dave came across an article by Francesca Lazzeri titled "The Data Science Mindset: Six Principles to Build Healthy Data-Driven Organizations" and in this episode we're giving our view and expand on those principles. Is it really possible to define a successful data science organization following 6 concrete principles? Are these principle a step by step, one after the other plan you can follow on the road to success, or are these principle something you need to keep in mind from the start up until the end of days? 1. Understand the Business and Decision-Making Process We're pretty much agreeing with this one and expanding on it, we talk about the benefits of doing this exercise on streamlining the organization and security. However, to achieve the C-level support which we agree is needed, some free-form experimentation needs to take place to get to a position where you actually have something that can be shown in a clear and concise way to said C-level. However, when the step to production is made, higher management approval, or actually, real active support, will be a primary requirement for the future health of your project. 2. Establish Performance Metrics Basically this principle seems to go back to making your project S.M.A.R.T.: make sure you have a set goal in mind and a way to measure your success, or failure. Going through this exercise is probably mandatory before you go to the C-level and ask for their support since a good CEO or CFO isn't going to give you a pile of cash if you cannot state your success criteria and intended goals... Think R.O.I. here people! As a little bonus, there is a nice "make machine learning algorithms understandable via simple questions list here. take a look at the article, principle 2 under paragraph 3 "Define the success metrics". 3. Architect the End-to-End Solution While Dave has questions around the amount of detail (down to the product level) that is required at this step in the process, But the choice of products will have a large impact on the financial picture of the environment and therefore, your budget will influence your end-to-end architecture. However, going back to our initial questions around whether these principles are a step by step guide to follow or a bunch of guidelines to keep in mind, we feel this principle needs to be revisited a couple of times along the road... As a bonus, we discuss in a little more depth how the choice of big data tools and products can and will influence your spend. 4. Build Your Toolbox of Data Science Tricks This is where you could see a refinement of the brad strokes that are defines in the previous principle. On the other hand, since this talks specifically of "data science" tricks, this could also be seen as attached to the Data Scientist role specifically. In this section we also cover how you should avoid "inventing the wheel" over and over again and how standardizing on a set of technologies can really help and accelerate your project. Important to understand is that in this modern age, standardizing does not mean it is set in stone for ever. Quite the opposite, agile methodology includes updating and adapting your standards to new realities all the time! The DevOps practices around CI/CD will almost always require some forethought when new tools are introduced and this can works well with a level of standard enforcement. 5. Unify Your Organization’s Data Science Vision We both agreed here that this step is way late in the 6 steps, idf the steps are supposed to be gone through in sequence. More likely, this principle should be well defined from the start and be a primary principle across the whole thing. Apart from that, we completely support the idea that you need a common vision well established in order to make your data environment successful. This is also a principle that we feel is really not a good fit for the "step by step" approach but should really be part...

2019-07-30 14:00:59

Episode 150 – Roaring News

In this news episode, we use a nice little article on how you can help keep open source sustainable as a structure for a broader discussion on this subject. The second subject this time goes another round on the "data engineers are not data scientists" (and the reverse) subject. Ask not what Open Source can do for you, ask what YOU can do for Open Source! Many organizations, commercial and not, are using open source software so heavily, they are becoming dependent on open source for their own survival. So when you look at how you can support open source, it is not an entirely altruistic project, but makes just good business sense. Using this article for structure and inspiration, we go over the different way everybody, including YOU, can help keep the open source movement sustainable. Donating some hard cash, employing open source committers and just be an open source advocate are just a few possibilities. On a related note; do you want to keep this little open source focused podcast sustainable? Please head over to our Patreon page! Let's do compare some apples and oranges! We've covered this subject before, but this article is a great write-up of this still relevant issue in the marketplace. Apart from covering the subject is more than a cursory way, it also presents the reader with a number of practical examples of what the possible results are when these two quite different functions get confused. This confusion is further caused by all posible parties: the scientist and the engineer themselves, but also the HR department and you and me. There's plenty of blame to go around. At the end of the day, having the right person in the right place really is where you want to be! (See what I did there? :D) Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-07-23 14:00:57

Episode 149 – The State of Developer Ecosystem 2019

When friend of the show Ward Bekker sent us a link to the recent survey write-up on the State of Developer Ecosystem in 2019 by JetBrains, we immediately set up a recording date with him to go over all the facts and figures... DevOps appear to be quite rare The first thing we picked up on is how many organizations are still surviving without any kind of DevOps. Even though everybody is talking about DevOpsand config management, it would appear, at least according to this survey, that these tools are still far from prevalent in the development environments. After discussing the different facts and figures contained in the webpages on the JetBrains website, we were left wondering how generic the target group was. Since this survey was conducted by JetBrains, it would definitely make sense that the respondent population was taken from their customer base and this could skew the results towards smaller, "Indy" development environments. The sense and non-sense of Multi Cloud deployment We then take a bit of a detour to talk about multi-cloud deployments. While it may be a good idea in theory, it is often less than ideal in practice. You will either limit yourself to the lowest common denominator of available services on the different clouds, or you will need to take on quite a bit of glue layer development... Where container based deployments will allow some extra freedom, the better way to look at multi-cloud deployment could be to land different workloads on different clouds, where they are best suited. But this no longer avoids the much feared "vendor lock-in" which is quite often the reason why one would look at multi-cloud deployment in the first place. Lies, damned lies, and statistics? Circling back to the main topic, we tried to take a deeper look at the raw numbers, which at the time of writing were not yet available. So we needed to try and extrapolate meaning from the available information and kind of concluded that the raw data is really required before real conclusions can be drawn... That old "Lies, damned lies, and statistics" mantra does come to mind. And for once not missing a chance to plug our merchandise store, we've had a quite relevant bumper sticker available for a while now: :D Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-07-16 14:00:10

Episode 148 – Roaring News

With Summer starting and news drying up a little in the heat, we managed to find some interesting things happening at the Apache Software Foundation and we try to find correlations with the Cloud Native Computing Foundation. After that, we discover that Robots actually won't be taking all our jobs... Who would have thought... The more things change, the more they stay the same..? While the ranks have closed and the messaging is "everything continuing as usual, nothing to see here", things are apparently happening at the ASF with some top level people moving on. Since only the future can tell how (and even if) this will have any noticeable effect, we have a little discussion about software foundations in general. Aside from the ASF, we talk about the Linux Foundation and the CNCF who also have their role to play. One is still glad to be of service! Over the years, there has been more than a little bit of fear mongering going on about how robots and technology in general will destroy a lot of jobs. Institutions like the World Economic Forum has releases their views over the years and recently, Manpower group released a pretty pdf with their conclusions. Looking over it, we are really puzzled by certain conclusions, or rather predictions about how the geographical impact will be and have some fun about the types of jobs that will be affected. Fortunately, on the whole, the consensus seems to be that rather than decreasing the amount of jobs in the world, technological advances and automatism will actually increase the amount of people required to keep it all running smoothly. But "reskilling" and "upskilling" will be ever more important if you want to remain part of the "in demand" crowd! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-07-09 14:00:00

Episode 147 – Alex Zeltov on MLOps with mlflow, kubeflow and other tools (part 2)

In this episode, Global Black Belt and Technical Architect in Big Data and Advanced Analytics Team at Microsoft, Alex Zeltov, is our guest and he explains the in's and out's of MLOps though various tools like mlflow and kubeflow In this second part, we go into more depth on the practical consequences of implementing MLOps and the various tools that are available. We also go on a bit of a tangent discussing why traditional enterprises are still having a hard time to look at machine learning models as something that requires and benefits from things like model management, version control and periodic updating of models. For more from Alex on MLOps and mlflow, check out his presentation at the Washington DC DataWorks Summit a couple, of weeks ago. The slides are now available on SlideShare and the video is available on YouTube: https://www.youtube.com/watch?v=Ns82mJjJgto MLOps Just like DataOps follows on to DevOps, one may say that MLOps continues after DataOps. While there is a wikipedia page on the subject, there is not that much "prior art" available just yet. The main advantages that MLOps can deliver, according to Alex, are a much improved move to production of trained algorithmes, even allowing for CI/CD, and a more structured approach to training models where multiple data scientists can work together to achieve better results. mlflow One of the main tools emerging at the moment is the DataBricks backed mlflow project. Though not an Apache project, it has been open sourced under the Apache License now and shows much promise. In the episode, Alex explains how mlflow integrates with your data science notebooks to allow for reliable model management with minimal disruption. Kubeflow (Image taken from https://medium.com/@amina.alsherif/how-to-get-started-with-kubeflow-187792f3e99) The second contender to reach for the MLOps crown is Kubeflow. Even though less mature than mlflow, it is backed by the very popular Kubernetes framework and that brings a large community together working on this project. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-07-02 14:00:58

Episode 146 – Roaring News

A new function is being called into being by Forrester called the "Data Hunter" which sounded interesting enough to us to spend some time on. Then we cover a nice guest blog on the Cloudera site and we finish off with some rambling on he changes in the HPC world. Enjoy! Loincloths and spears to the ready: the Data Hunter is born! Dave found a small arcticle on the Forrester site that points to a paid webinar about Data Hunting. Now we did not pony up the 300$ they charge for the webinar, but we found the concept quite compelling and looked at the three "audience questions" that were included in the article. The "Small File Problem" and a little "You're Doing it Wrong"...? This guest blog on the Cloudera web site actually has some practical information that can be useful when you need to consolidate your incremental upload files to reduce the amount of files your Hive queries need to traverse. The additional complexity here was that this had to happen on a live production environment without service interruption and keeping all data available and sane. We do however need to remark that the author of this article was making life quite difficult for himself since his "data estate" really does not seem to justify the use of any kind of Big Data technologies. We fully agree with his own summary where he states that using a standard RDBMS would most likely be a better solution... Should "HPC" now be spelled "HPE"??? With the Enterprise branch of HP gobbling up Cray, after doing the same with whatever remained of Silicon Graphics way back in 2016, theynow represent a large percentage of what could be considered "traditional HPC". Of course, IBM is still in there too, but not much of the old supercomputer firms are still around. Of course, the whole HPC world is undergoing a major redesign towards GPU's (and to a lesser extent FPGA's) so it does make sense that the ecosystem is changing...? And that's all we have for this episode. See you next week! Don't forget to subscribe to our YouTube channel and consider becoming a Patreon and support your favorite podcast! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-06-25 14:00:48

Episode 145 – Alex Zeltov on MLOps with mlflow, kubeflow and other tools (part 1)

In this episode, Global Black Belt and Technical Architect in Big Data and Advanced Analytics Team at Microsoft, Alex Zeltov, is our guest and he explains the in's and out's of MLOps though various tools like mlflow and kubeflow In this first episode, Alex talks on a more theoretical level about MLOps and the benefits it can deliver. For more from Alex on MLOps and mlflow, check out his presentation at the Washington DC DataWorks Summit a couple, of weeks ago. The slides are now available on SlideShare and the video is available on YouTube: https://www.youtube.com/watch?v=Ns82mJjJgto MLOps Just like DataOps follows on to DevOps, one may say that MLOps continues after DataOps. While there is a wikipedia page on the subject, there is not that much "prior art" available just yet. The main advantages that MLOps can deliver, according to Alex, are a much improved move to production of trained algorithmes, even allowing for CI/CD, and a more structured approach to training models where multiple data scientists can work together to achieve better results. mlflow One of the main tools emerging at the moment is the DataBricks backed mlflow project. Though not an Apache project, it has been open sourced under the Apache License now and shows much promise. In the episode, Alex explains how mlflow integrates with your data science notebooks to allow for reliable model management with minimal disruption. Kubeflow (Image taken from https://medium.com/@amina.alsherif/how-to-get-started-with-kubeflow-187792f3e99) The second contender to reach for the MLOps crown is Kubeflow. Even though less mature than mlflow, it is backed by the very popular Kubernetes framework and that brings a large community together working on this project. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-06-18 14:00:14

Episode 144 – Roaring news

In the past week, trouble at Cloudera really stood out and in the context of similar problems at MapR and (somewhat less related to Big Data) Pivotal, we are devoting the entire episode to this. (Image taken from https://media.thinknum.com/articles/is-hadoop-hype-wearing-off-the-answer-may-lie-in-startups-data/) As this is a Roaring News Episode, we will discuss this story based on a number of articles we found. Cloudera has a "bad" day... The combination of some bad quarterly results and both CEO Tom Reilly and chief strategy officer and co-founder Mike Olson leaving the company have had a dramatic effect on the stock price. Now this could be an isolated incident, quickly forgotten, but in the light of similar issues at MapR (which is not a public company) and Pivotal, there does seem to be something more fundamental happening in these Open Source, venture capital fueled companies. Looking at job listings over the years The second article we discuss (from which we also took the image above because Dave really liked it) has some interesting statistics on job listings by Cloudera & Hortonworks. Even though the content of the article itself was not that interesting, we try to divine some conclusions studying the data from the graphs in this article. Hadoop is dead (again)? From there, we go to this article on TheNextPlatform where one of the possible reasons of this phenomenon is placed at the feet of the big public clouds. It's hard to deny that the rise of the Amazon, Azure and Google clouds are, at least partially, responsible for the fragmenting of the "Hadoop ecosphere" and the traditional "pure play" companies will need to adapt and conform to the new reality. (Unless they have a really unexpected trick up their sleeves...) Our Thoughts In the end, time will tell how this will evolve. Even though there does seem to be a kind of rationalization going on, the big data layers still have time to re-adjust and evolve to overcome the challenges they are facing. The question is whether or not these companies have retained their open source flexibility DNA while transforming themselves into enterprise businesses and how successful they will be in communicating their roles and ambitions to their customers and the public in general. And no, we don't think Hadoop is dead (again), or ever will be: "Hadoop" (the big data environment, not the specific Apache project) and the commercial use-cases it engendered are here to stay, but in ever changing shapes and forms. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-06-11 14:00:59

Episode 143 – Spark in Action with author Jean-Georges Perrin (Part 2)

And now for something completely different: a book review! Not something we have done before, but when Jean-Georges Perrin contacted us with the suggestion of taking a deeper look at the "Spark in Action" book he is currently writing, we certainly did not say no! However, in al honesty, we talked about much, much more... Free eBook raffle Manning Publication has been kind enough to give us a couple of download codes for a free eBook version of "Spark in Action". As always, our Patreons get a first chance to get their hands on one of the codes. If you are a Roaring V.I.P. (or higher), you can head over to our Patreon Page now where you will find a posts containing all the information required. If you become a Patreon now, you immediately get access tot that post! ;) After one week, if there are any codes left, there will be a tweet about what you can do to get a free code, even if you are not a Patreon. A book review on Spark in Action, second edition with author Jean-Georges Perrin In the second part we go deeper into the book, going over the available chapters and appendices. We cover a number of topics and concepts like the layout of a typical data lake, the four pillars of Apache Spark and more. We end the interview with a discussion on what it's like to write a technical book like Spark in Action. Our thanks to Jean-Georges for spending quite a bit of time with us talking about Apache Spark and to Manning Publication for the free eBook codes! Find out more about Jean-Georges at his blog: https://jgp.net/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-06-04 14:00:27

Episode 142 – Roaring News – KubeCon 2019 Report

A little over a week ago, KubeCon and CloudNativeCon happened and our independent Roaring Roving Reporter Rubik Dave came back from Barcelona with a comprehensive report. Kubernetes As the kubernetes.io webpage tells us: "Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications." As we discuss in the episode, Kubernetes forms a kind of middleware layer that performs orchestration of light weight docker containers. To be sure, you can use other container technologies but Docker (and its companion project Moby) are what is most often used with Kubernetes. The biggest advantage of Kubernetes, I believe, is how it has standardized the way a micro services framework based on docker container instances can be deployed and managed. There have been a myriad of other approaches that tried to solve that problem (and Dave gives a rather exhaustive list in the episode), Kubernetes has emerged to be the best supported by the community. KubeCon And that is where KubeCon comes in: there are other, more developer oriented conferences, but KubeCon is perhaps the largest event for Kubernetes consumers. Details on this years event are available at the KubeCon | CloudNativeCon Europe 2019 website. If you missed this years installment, take a note that next years Europe event will be in Amsterdam, March 30th to April 2nd. And if the American continent is more practical, you can join the community at the San Diego venue, November 18th to 21st. CloudNativeCon KubeCon ran together with the CloudNativeCon for as long as I can figure out and since Kubernetes is one of the larger "CNCF graduated" projects, that is not surprising. It also makes sense since micro services architectures are an excellent fit for cloud based deployments so a lot of the Kubernetes community is likely to also be a member of the "cloud crowd". Now, reading the CloudNative website, their charter in particular, it does seems to see it's purpose in a similar vein as the Apache Foundation does. However, the CloudNative folk recommend the projects under it's wings to use the Apache 2.0 license so they certainly don't appear to be in any kind of direct competition here... I think I feel a future podcast episode announcing itself! :D Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-05-28 14:00:29

Episode 141 – Spark in Action with author Jean-Georges Perrin (Part 1)

And now for something completely different: a book review! Not something we have done before, but when Jean-Georges Perrin contacted us with the suggestion of taking a deeper look at the "Spark in Action" book he is currently writing, we certainly did not say no! However, in al honesty, we talked about much, much more... Free eBook raffle Manning Publication has been kind enough to give us a couple of download codes for a free eBook version of "Spark in Action". As always, our Patreons get a first chance to get their hands on one of the codes. If you are a Roaring V.I.P. (or higher), you can head over to our Patreon Page now where you will find a posts containing all the information required. If you become a Patreon now, you immediately get access tot that post! ;) After one week, if there are any codes left, there will be a tweet about what you can do to get a free code, even if you are not a Patreon. A book review on Spark in Action, second edition with author Jean-Georges Perrin In this first part of the interview, we meet the author and talk about Apache Spark and Open Source in general. We also cover the MEAP system used by Manning Publication to get books like these in the hands of the readers as soon as possible while allowing early readers to help shape the book. Our thanks to Jean-Georges for spending quite a bit of time with us talking about Apache Spark and to Manning Publication for the free eBook codes! Find out more about Jean-Georges at his blog: https://jgp.net/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-05-21 14:00:39

Episode 140 – Roaring News

Another week another feed of roaring news articles starting with apparent changes at MapR and the release of Red Hat Enterprise Linux 8. We go in depth on the open sourcing of the DataBricks developed Delta Lake and finish with some SQL generated fractals. Big thanks to our Roaring Patreons making this podcast possible! DataWorks Summit free ticket raffle. Final week for our DataWorksSummit Washington DC free ticket giveaway! Get your free ticket now! The Roaring Elephant on YouTube. The Roaring Elephant YouTube channel has launched! Will you help us reach 100 subscribers (modest goals are a good start!) so we can claim our personalized URL on YouTube? Every time a new episode is published, you will find a video uploaded to the channel as well. There won't be any real video yet though, only a still image as you can see in the thumbnails. But as soon as we reach the related goal on our Patreon, this is where our video content will appear. In case you are wondering, when we start recording actual video's, the regular mp3's on the podcast feed will remain exactly as they are now. So if you prefer not to look at our mugs while enjoying the podcast, that should remain possible. Interactive DWS-DC session dashboard https://aka.ms/DWS2019DC As I've been doing for a while now, I've again launched a session statistics dashboard for this event. It can be found at https://aka.ms/DWS2019DC and as usual, this PowerBi dashboard is interactive. simply click on the different elements to filter or drill down. There's only 58 sessions listed at the moment. I will be updating it from time to time so keep an eye out for some tweets from @jhonmasschelein if you want to get notified! R.I.P. MapR? https://www.linkedin.com/feed/update/urn:li:activity:6532418505361416192 https://www.linkedin.com/feed/update/urn:li:activity:6532352941800595456 Our first bit of news is more of a rumor for now: we were pointed towards some messages on LinkedIn that seem to indicate some reorganising is happening there: We will be following how this develops in the next few weeks. Best of luck to anyone who is affected! RHEL version 8 is out! Red Hat Opens the Linux Experience to Every Enterprise, Every Cloud and Every Workload with Red Hat Enterprise Linux 8 It's been a while coming but even though RHEL 7 is still around for a few years, Red Hat has released the next version of their popular Linus distro.Notwithstanding Dave's horror at the new logo, we're very exited about this and personally, I am eagerly awaiting the Centos 8 release that should appear in a couple of months Delta lake Open-Sourced. Open Sourcing Delta Lake Databricks claims its new product Delta is the missing link to enterprise AI A press release from the good folks at DataBricks informs the world that their proprietary data lake storage layer called "Delta Lake" has now been open sourced. Delta Lake was released by DataBricks at the end of 2017 and was only available on their managed Service offerings in the public clouds, but now anyone can download and deploy. However, all is not well: we're having some serious issues with the content of the press release and quite frankly, we're scratching our heads to find exactly what problem Delta Lake is trying to solve and if it actually does that... Fractals, SQL-Style! Generating Fractals with Postgres: Escape-Time Fractals Just to make Dave happy, we finish this episode off with some great fractal visualizations made with SQL. Euch... What? Yes, SQL. That's right! Click the link to see how the apparently Turing Complete SQL is able to do that. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-05-14 14:00:49

Episode 139 – Interview on DataOps with Chris Bergh of DataKitchen.io (Part 2)

DataKitchen.io's Chris Bergh takes us down the path towards successful DataOps implementation. If you have not heard of the DataOps concept yet and data is a big part of your environment (and really, it should be) we're sure you will find more than a couple takeaways here! Christopher Bergh (@ChrisBergh) CEO & Head Chef, DataKitchen The DataOps Cookbook DataOps is NOT Just DevOps for Data Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-05-07 14:00:32

Episode 138 – Roaring News

The biggest news is of course the launch of our Patreon! Hop over to https://www.patreon.com/roaringelephant and see if you want to help us thrive and grow! On the technical front, we have a Blog on Machine Learning Model Management, Apache turning 20 and Google breeding aggressive A.I.! And we also have a side-conversation on NginX... Apache Software Foundation Continues to Grow Open Source Software https://www.eweek.com/development/the-apache-software-foundation-continues-to-grow-open-source-software Frameworks for Machine Learning Model Management https://www.inovex.de/blog/machine-learning-model-management/ Google's AI Has Learned to Become "Highly Aggressive" in Stressful Situations https://www.sciencealert.com/google-deep-mind-has-learned-to-become-highly-aggressive-in-stressful-situations Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-04-30 14:00:09

Episode 137 – Interview on DataOps with Chris Bergh of DataKitchen.io (Part 1)

DataKitchen.io's Chris Bergh takes us down the path towards successful DataOps implementation. If you have not heard of the DataOps concept yet and data is a big part of your environment (and really, it should be) we're sure you will find more than a couple takeaways here! Christopher Bergh (@ChrisBergh) CEO & Head Chef, DataKitchen The DataOps Cookbook DataOps is NOT Just DevOps for Data Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-04-23 14:00:57

Episode 136 – Temet Nosce

Breaking with tradition, this News Episode does not have any Big data related articles. Instead, this episode is all about our plans for the future of this podcast... Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-04-16 14:00:14

Episode 135 – Big Data in Cybersecurity with Saad Ayad, featuring Apache Metron (Part 2)

DataLeaks and the resulting attack on our privacy have been a major news item in the recent months. Big data tools like Apache Metron, built on top of Hadoop can be instrumental in detecting and preventing intrusions. In this episode, we are joined by Saad Ayad who was General Manager Security Operations at Telstra and currently is a Director at Digital Fortress Services in Melbourne Australia. Saad has been active in the cybersecurity world for a long time and we are grateful he was willing to spend some time with us and share his knowledge and experience. [Digital Fortress Services - Cybersecurity] Saad Ayad (@saadayad_) Cyber Security, Big Data Analytics & Operations http://www.digitalfortress.services @DigFortServ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-04-09 14:00:06

Episode 134 – Roaring News: Dataworks Summit Lightning Interviews

A special edition of Big Data News featuring a number of quick interviews at the booths in the community expo hall. A big thank you to the brave people there that were willing to face the Roving Roaring Mike at the Barcelona Dataworks summit a couple, of weeks ago. 03:04 Attunity https://www.attunity.com/ 07:41 Cloudera Fast Forward Labs https://www.cloudera.com/products/fast-forward-labs-research.html 11:09 DataVard https://www.datavard.com 17:19 Cazena https://www.cazena.com/ 22:39 Syncsort https://www.syncsort.com 26:22 Accenture https://www.accenture.com 30:44 Unravel Data https://unraveldata.com Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-04-02 14:00:24

Episode 133 – Big Data in Cybersecurity with Saad Ayad, featuring Apache Metron (Part 1)

DataLeaks and the resulting attack on our privacy have been a major news item in the recent months. Big data tools like Apache Metron, built on top of Hadoop can be instrumental in detecting and preventing intrusions. In this episode, we are joined by Saad Ayad who was General Manager Security Operations at Telstra and currently is a Director at Digital Fortress Services in Melbourne Australia. Saad has been active in the cybersecurity world for a long time and we are grateful he was willing to spend some time with us and share his knowledge and experience. [Digital Fortress Services - Cybersecurity] Saad Ayad (@saadayad_) Cyber Security, Big Data Analytics & Operations http://www.digitalfortress.services @DigFortServ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-03-26 15:00:45

Episode 132 – Roaring DataWorks Summit Barcelona, ft. John Mertic

Dataworks Summit 2019 Barcelona has come and gone... Recording live from my hotel room, we give our view on the highs and lows of the event and talk about the things we learned. This episode also include a short interview with John Mertic from the Linux Foundation who talked to us about Data Governance and ODPi Egeria. John Mertic Director of Program Management for ODPi, R Consortium, and Open Mainframe Project https://www.linkedin.com/in/jmertic/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-03-22 04:02:09

Episode 131 – Dataworks Summit 2019 Barcelona Session Preview

With the Dataworks summit in Barcelona comming up next week, we take a look at the agenda with the available sessions and take you through our best picks and honorable mentions. Session statistics dashboards: Dataworks Summit 2019 in Barcelona: https://aka.ms/DWS2019BA Dataworks Summit 2018 Berlin: https://aka.ms/DWS2018 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-03-12 15:00:25

Episode 130 – Roaring News

In this episode of Bite Sized Big Data news, we cover the merging of Data Artisans and Alibaba forming the new Ververica entity, AI related challenges and a BBC cook book for visualizations in R. Dave had some issues recording his side, our apologies for the rather bad quality of Dave's audio track on this episode. Data Artisans, who was recently purchased by Alibaba, have renamed to Ververica. https://www.ververica.com/blog/introducing-our-new-name https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions The challenges to tackle before you start with AI http://www.ronaldvanloon.com/the-challenges-to-tackle-before-you-start-with-ai/ Create data visualisations like BBC news with the BBC R Cook Book https://medium.com/bbc-visual-and-data-journalism/how-the-bbc-visual-and-data-journalism-team-works-with-graphics-in-r-ed0b35693535 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-03-05 15:00:05

Episode 129 – DataWorks Summit Barcelona Track Chair Interviews

In this episode we have interviews with Niels Basjes and Aljoscha Krettek, respectively track chairs for Big Compute & Storage and Internet of Things. We talk with them about what being a track lead means, the sessions in their tracks and of course about what they are doing themselves with Big Data and Advanced Analytics. Niels Basjes Lead IT-Architect Scalable Solutions at Bol.com Bol.com Techlabs: https://techlab.bol.com/ https://techlab.bol.com/author/nbasjes/ Bol.com on Youtube: https://www.youtube.com/results?search_query=bol.com+berlinbuzzwords Bol.com is looking for you! https://careers.bol.com/ Aljoscha Krettek Co-Founder, Software Engineer at Data Artisans Data Artisans / Ververica Blogs: https://www.ververica.com/blog Join a world-class team at Ververica: https://www.ververica.com/careers Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-02-26 15:00:43

Episode 128 – Roaring News

In this Deep learning heavy edition of Big Data News, we have articles about how to get into the Data Scientist life, how and where to get the skills and how you eventually may end up beating pro-gamers at their thing. [powerpress The DataWorks Summit Barcelona is coming up soon and we have a free entry ticket to raffle off to a lucky Big Data Winner! Send an email to DWS19BARCELONA at roaringelephant.org to enter the raffle! What’s Driving Data Science Hiring in 2019 https://www.datanami.com/2019/01/30/whats-driving-data-science-hiring-in-2019/ Practical Deep Learning for Coders 2019 https://www.fast.ai/2019/01/24/course-v3/ https://course.fast.ai/ Deep Learning vs Classical Machine Learning https://towardsdatascience.com/deep-learning-vs-classical-machine-learning-9a42c6d48aa Top Machine Learning Algorithms for Predictions. A Short Overview. https://www.aisoma.de/top-machine-learning-algorithms-for-predictions-a-short-overview/ An AI crushed two human pros at Starcraft but it wasn’t a fair fight https://arstechnica.com/gaming/2019/01/an-ai-crushed-two-human-pros-at-starcraft-but-it-wasnt-a-fair-fight/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-02-19 15:00:49

Episode 127 – Sparkling Water with H2O.AI (part 2)

We recently sat down with Kuba and Pavel from H2O to discuss how you can easily lift your Spark notebooks to the next level by adding some H20 to it using their open source Sparkling Water project. In this second part of the interview, we go deeper into the technical details of Sparking Water and how you can deploy and use it in your environment. We end the conversation with a look at the roadmap and anything else the future may bring. Jakub "Kuba" Hava Senior Software Engineer at H2O.ai Pavel Pscheidl Machine learning engineer at H2O.ai, Software engineer, Writer Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-02-12 15:00:53

Episode 126 – Roaring News

The second news episode for 2019 is almost entirely devoted to practical AI with some tutorial notebooks and finding a parking space. We end this show with dire warnings of the impending Big Data induced Apocalypse! Practical AI Workshop https://blog.revolutionanalytics.com/2019/01/notebooks-from-the-practical-ai-workshop.html Snagging Parking Spaces with Mask R-CNN and Python https://medium.com/@ageitgey/snagging-parking-spaces-with-mask-r-cnn-and-python-955f2231c400 Head of Russian Orthodox Church Warns Big Data Will Usher in the Antichrist https://gizmodo.com/head-of-russian-orthodox-church-warns-big-data-will-ush-1831598967 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-02-05 15:00:51

Episode 125 – Sparkling Water with H2O.AI (Part 1)

We recently sat down with Kuba and Pavel from H2O to discuss how you can easily lift your Spark notebooks to the next level by adding some H20 to it using their open source Sparkling Water project. In this first part of the interview, we cover the conceptual principles behind Sparkling water and discuss some existing use case implementations. Jakub "Kuba" Hava Senior Software Engineer at H2O.ai Pavel Pscheidl Machine learning engineer at H2O.ai, Software engineer, Writer H2O World San Fransisco Find out more at the upcoming H2O World conference in San Fransisco on February 4-5, 2019 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-01-29 15:00:54

Episode 124 – Roaring News

The Hortonworks -Cloudera merger has been finalized and the new CDP (Cloudera Data Platform) has been announced. We also talk about data mining bias, the good and bad of Hackathons and end on a rant about data sizes. Cloudera Unveils CDP, Talks Up ‘Enterprise Data Cloud’ https://www.datanami.com/2019/01/10/cloudera-unveils-cdp-talks-up-enterprise-data-cloud/?_lrsc=718d30ff-51ed-40c5-bba9-750a82009aaf Cloudera and Hortonworks' merger closes; quo vadis Big Data? https://www.zdnet.com/article/cloudera-and-hortonworks-merger-closes-quo-vadis-big-data/ Welcome to a brand-new Cloudera https://hortonworks.com/blog/welcome-brand-new-cloudera/ The Exaggerated Promise of So-Called Unbiased Data Mining https://www.wired.com/story/the-exaggerated-promise-of-data-mining/ On Hackathons : Lessons Learned, Experience, Advice https://www.knoyd.com/blog/2019/1/10/on-hackathons-lessons-learned-experience-advice Big Insights Not Big Data: Why We Should Stop Talking About File Size https://www.forbes.com/sites/kalevleetaru/2019/01/09/big-insights-not-big-data-why-we-should-stop-talking-about-file-size Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-01-22 15:00:53

Episode 123 – Infrastructure and Data Lifecycle (part 2)

In episode 121 we discussed the first part of this story and now we conclude with a discussion of the data life-cycle considerations that apply to a Big Data and Advanced Analytics environment. The primary inspiration for this episode: The Big Data Lifecycle explained https://www.pinkelephantasia.com/big-data-lifecycle/ Additional Inspiration: 7 phases of a data life cycle https://www.bloomberg.com/professional/blog/7-phases-of-a-data-life-cycle/ Thinking Beyond Traditional Data Life Cycle Management https://hortonworks.com/article/thinking-beyond-traditional-data-life-cycle-management/ Understanding the Big Data Life-Cycle https://www.linkedin.com/pulse/four-keys-big-data-life-cycle-kurt-cagle/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-01-15 15:00:21

Episode 122 – Roaring news

In this first Big Data News episode of 2019, we cover how A.I. will nudge you to a happier (work)life, the new Hive Data Warehouse connector. We end the episode with unstable artificial intelligence and how you can make a chance on a one million Euro prize! Can an AI keep you happy at work? Ex-Google team reveal software that 'nudges' workers with messages throughout the day https://www.dailymail.co.uk/sciencetech/article-6545051/The-AI-happy-work-Ex-Google-team-reveal-software-nudges-workers.html https://humu.com/ Apache Hive Warehouse Connector Use-Cases https://hortonworks.com/blog/hive-warehouse-connector-use-cases/ https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html http://www.russellspitzer.com/2017/05/19/Spark-Sql-Thriftserver/ In January, the EU starts running Bug Bounties on Free and Open Source Software https://juliareda.eu/2018/12/eu-fossa-bug-bounties/ AI has a probability problem https://go.forrester.com/blogs/artificial-intelligence-has-a-probability-problem/ Apache Kafka 58.000,00 € 07/01/2019 15/08/2019 HackerOne https://www.zdnet.com/article/eu-to-fund-bug-bounty-programs-for-14-open-source-projects-starting-january-2019/ https://juliareda.eu/2016/07/eu-audits-keepass-apache/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-01-08 15:00:08

Episode 121 – Infrastructure and Data Lifecycle (part 1)

Does the standard Dev-Test-Prod cycle make sense in a Big Data environment or should you approach this subject a little differently? In this episode, we sum up our experiences and best practice tips regarding the infrastructure part and Data Lifecycle will be features in the next topic episode. Planning on attending the Melbourne @DataWorksSummit? Send email to DWS18APAC@roaringelephant.org for a free ticket to the Melbourne event in February! Big thanks to @DataWorksSummit & @hortonworks for sponsoring this giveaway! Dataworks Summit Barcelona is also rapidly approaching. You can find my dynamic sessions statistics dashboard here: https://aka.ms/DWS2019BA Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2019-01-01 15:00:21

Episode 120 – Roaring News

Merry Big Data News Christmas! Since it's the 25th of December, we're investigating how Big Data is changing the operations at the North Pole using a couple of blog posts from Splunk. Christmas 2020. Will big data and IOT change things for Father Christmas? Part I https://www.splunk.com/blog/2014/12/17/christmas-2020-part1.html Christmas 2020. Will big data and IOT change things for Father Christmas? Part II https://www.splunk.com/blog/2014/12/18/christmas-2020-part2.html Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-12-25 15:00:07

Episode 119 – Knowage: The Open Source Business Analytics Suite

This time we are joined by Paolo from Knowage who gives us a high level overview of Knowage: a totally open source suite for Business Analytics. The Knowage suite is composed of several modules, each one conceived for a specific analytical domain. They can be used individually or combined with one another to ensure full coverage of user’ requirements, allowing to build a tailored product. Thank you to our guest: Paolo Raineri Business Developer (linkedin) https://www.knowage-suite.com Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-12-18 15:00:44

Episode 118 – Roaring News

In this Big Data News episode, we use an article on how some disgruntled open source projects tried to force the "net giants" to give back as an excuse to talk about open source ethics. The second article for today comes from the hand of Noel Sharkey about possible deception in modern robotics. Time for Net Giants to Pay Fairly for the Open Source on Which They Depend https://www.linuxjournal.com/content/time-net-giants-pay-fairly-open-source-which-they-depend Mama Mia It's Sophia: A Show Robot Or Dangerous Platform To Mislead? https://www.forbes.com/sites/noelsharkey/2018/11/17/mama-mia-its-sophia-a-show-robot-or-dangerous-platform-to-mislead Artificial Intelligence: A Modern Approach (Third edition) by Stuart Russell and Peter Norvig http://aima.cs.berkeley.edu/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-12-11 15:00:09

Episode 117 – Big Data Disaster Recovery

When Big data projects mature from R&D projects to business critical components, it becomes important to look at how your environment can survive and recover from catastrophic failures. Considering the not unimportant cost of a good Disaster Recovery plan, it is good to take a good look at your deployment and carefully weigh the good and bad on a granular level. Here is the link to the slideshare presentation by Carlos Izquierdo at Big Data Spain 2017: Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-12-04 15:00:17

Episode 116 – Roaring News

This Machine Learning heavy edition of Big Data News, covers Boston School Bus schedules and Model interpretation using LIME. As a bonus, we have a great source of Nifi knowledge for you! What the Boston School Bus Schedule can Teach US About AI https://www.wired.com/story/joi-ito-ai-and-bus-routes/ Understanding model predictions with LIME https://towardsdatascience.com/understanding-model-predictions-with-lime-a582fdff3a3b Introduction to Local Interpretable Model-Agnostic Explanations (LIME) https://www.oreilly.com/learning/introduction-to-local-interpretable-model-agnostic-explanations-lime Locally Interpretable Models and Effects based on Supervised Partitioning (LIME-SUP) https://arxiv.org/abs/1806.00663 Best of NiFi https://pierrevillard.com/best-of-nifi/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-11-27 16:00:13

Episode 115 – Anniversary three: I guess we’re in it for the long run now!

It's been three years since we started this podcast and as we've done in previous years, we invited the wonderful people that were a guest on our show in the past twelve months and made our little podcast so much better for our listeners! Our thanks to our guests that celebrated our three year anniversary with us: Ward Bekker (Linkedin) Pre-Sales Solutions Engineer II at Hortonworks Talking about Apache Metron Rohit Jain (linkedin) Chief Technology Officer at Esgyn Talking about Esgyn, Trafodion and cloud vs on-premise vs hybrid. Sanjeev Kulkarni (Linkedin) Co-Founder at Streamlio Talking about Apache Pulsar Phillip Radley (Linkedin) Chief Data Architect at BT Talking about future predictions made years ago Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-11-20 16:00:27

Episode 114 – Roaring News

In this serving of bite-sized Big Data News we talk about the IBM takeover of Red Hat, a new Botnet going for unprotected Hadoop nodes and a somewhat disappointing Cloudera blog post. IBM To Acquire Red Hat https://investors.redhat.com/news-and-events/press-releases/2018/10-28-2018-184027500 https://newsroom.ibm.com/2018-10-28-IBM-To-Acquire-Red-Hat-Completely-Changing-The-Cloud-Landscape-And-Becoming-Worlds-1-Hybrid-Cloud-Provider New DDoS botnet goes after Hadoop enterprise servers https://www.zdnet.com/article/new-ddos-botnet-goes-after-hadoop-enterprise-servers/ (remember Dr.Who ? https://medium.com/@neerajsabharwal/hadoop-yarn-hack-9a72cc1328b6 ) New in Cloudera Enterprise 6: Apache Hive 2.1 (By the Cloudera Hive Team) http://blog.cloudera.com/blog/2018/10/new-in-cloudera-enterprise-6-apache-hive-2-1/ https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_601_unsupported_features.html#hive_c6_unsupported_features https://hive.apache.org/downloads.html https://issues.apache.org/jira/browse/HIVE-17129 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-11-13 16:00:19

Episode 113 – H2OAIWorld London 2018 Roaring Report

Here is our H2O.ai World conference London Roaring Report. We had a blast and we hope that this episode can give you a good taste of what was going on. The sessions are now available online: https://www.youtube.com/playlist?list=PLNtMya54qvOHh9LaA08hkusynWVStNEhm Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-11-06 16:00:50

Episode 112 – Roaring News

In this last Big Data news episode for the month of November, we look forward to the H2O World event next week in London and we have articles on BI Maturity and the upcoming Apache Ozone project that will supplant HDFS in future Hadoop clusters soon(TM). BI Maturity: You can’t get there from here! http://makingdatameaningful.com/bi-maturity/ Introducing Apache Hadoop Ozone: An Object Store for Apache Hadoop https://hortonworks.com/blog/introducing-apache-hadoop-ozone-object-store-apache-hadoop/ Katacoda example down on this page https://hadoop.apache.org/ozone Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-10-30 16:00:38

Episode 111 – How Public Cloud changed Big Data

No interview this time but just Dave and Jhon talking about how public cloud changed Big data. Current news has brought this topic back to the foreground and we though it was a good idea to give our views on this subject. Along the way, we go over the different deployment strategies for Hadoop across on premise, private and public cloud and of course, hybrid environments. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-10-23 16:00:16

Episode 110 – Roaring News

Another week, another Big Data News episode. After going over all the event ticket giveaways that are currently going on, we have an article that goes over the basics on ETL vs ELT and have some fun with R graphs by the XKCD web comic. We finish with an in depth article on columnar data stores and a quick shout-out to Apache Nifi. Breaking News Our thanks to our guest from H2O.ai: John Spooner Director of Solution Engineering, h2o.ai Dave: XKCD Curve Fitting in R http://blog.revolutionanalytics.com/2018/09/curve-fitting.html Artificial intelligence, data will be the differentiator in the marketplace https://www.information-age.com/artificial-intelligence-data-123475102/ Jhon: Scaling ETL: How data pipelines evolve as your business grows https://bytes.grubhub.com/scaling-etl-how-data-pipelines-evolve-as-your-business-grows-72ff6c744e6e The design and implementation of modern column-oriented database systems https://blog.acolyer.org/2018/09/26/the-design-and-implementation-of-modern-column-oriented-database-systems/ Apache NiFi In Depth https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html?es_p=7695258 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us

2018-10-16 16:00:28

Episode 109 – Open Metadata and Governance Masterclass with Mandy Chessell – Part 2

In this GDPR world, Data Governance and Data Lineage are, or should be, very much top of mind for anybody in the Big Data world. We reached out to Mandy Chessell, who has been very active in this area and were delighted when she accepted to do an interview with us. In this second part, we discuss the ins and outs of good data stewardship and how companies can adopt, implement and contribute. Mandy Chessell Distinguished Engineer, Master Inventor, Fellow of Royal Academy of Engineering https://www.linkedin.com/in/mandy-chessell-a4989722/ ODPi Blog post on Egeria: First Release of ODPi Egeria is Here ODPi github projects: Egeria - Open Metadata and Governance https://github.com/odpi/egeria Data-governance companion project https://github.com/odpi/data-governance Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-10-09 16:00:14

Episode 108 – Roaring News

Another episode of Big Data News and not just another episode, but an episode packed and packed with items. Before we do our regular article reviews, we are doing raffles for not one, not two but three different events! And as if that was not enough, our friends from Pulsar dropped in with their big Apache top-level project announcement. So not very bite sized this time, but smack full of delicious Big Data news! Breaking News Our thanks to our guests: Solix Empower Sai Gundavelli Founder/CEO, Solix Technologies Streamlio Sanjeev Kulkarni Co-Founder at Streamlio Sijie Guo Co-Founder at Streamlio Free Big Data Event ticket giveaways: DataWorks Summit Asia Pacific Singapore Oct 11, 2018 - Tokyo Oct 16, 2018 - Melbourne Feb 06, 2018 To enter the raffle, send email to dws18apac@roaringelephant.org Tell us what event you want to attend! (Singapore, Tokyo, Melbourne) Solix Empower New York 2018 New York November 01, 2018 To enter the raffle, send email to SolixEmpower18@roaringelephant.org H2O AI World London London October 29-30, 2018 To enter the raffle, send email to h2oLondon18@roaringelephant.org Please note that we are giving away discount codes that will give you access to the events for free. You still need to arrange your own travel and lodging! News articles: The Apache Software Foundation Announces Apache® Pulsar™ as a Top-Level Project https://blogs.apache.org/foundation/entry/the-apache-software-foundation-announces39 https://github.com/apache/pulsar Who wrote that anonymous NYT op-ed? Text similarity analyses with R http://blog.revolutionanalytics.com/2018/09/anonymous-nyt-op-ed.html Beyond Interactive: Notebook Innovation at Netflix https://medium.com/netflix-techblog/notebook-innovation-591ee3221233 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-10-02 16:00:44

Episode 107 – Open Metadata and Governance Masterclass with Mandy Chessell – Part 1

In this GDPR world, Data Governance and Data Lineage are, or should be, very much top of mind for anybody in the Big Data world. We reached out to Mandy Chessell, who has been very active in this area and were delighted when she accepted to do an interview with us. In this first part, the focus is more on Mandy herself and we lay the groundwork for the second part that will go live in episode 109. Mandy Chessell Distinguished Engineer, Master Inventor, Fellow of Royal Academy of Engineering https://www.linkedin.com/in/mandy-chessell-a4989722/ ODPi Blog post on Egeria: First Release of ODPi Egeria is Here ODPi github projects: Egeria - Open Metadata and Governance https://github.com/odpi/egeria Data-governance companion project https://github.com/odpi/data-governance Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-09-25 16:00:30

Episode 106 – Roaring News

In this edition of Big Data News, we take the pulse of Machine learning adoption and talk about Big Data Online Learning by IBM on Coursera and by Columbia University on Edx. We round the episode off with a look at MR3 and the evil that are benchmarks. Breaking News Data Science Professional Certificate https://cognitiveclass.ai/blog/data-science-professional-certificate/ Taking the pulse of machine learning adoption https://www.zdnet.com/article/taking-the-pulse-of-machine-learning-adoption/ Performance Comparison of HDP LLAP, Presto, SparkSQL, Hive on Tez, and Hive on MR3 using the TPC-DS Benchmark https://mr3.postech.ac.kr/blog/2018/08/15/comparison-llap-presto-spark-mr3/ Join Jhon on Artificial Intelligence (AI) & Robotics by ColumbiaX on Edx https://www.edx.org/micromasters/columbiax-artificial-intelligence https://www.edx.org/course/robotics-columbiax-csmm-103x-4 https://www.edx.org/course/artificial-intelligence-ai-columbiax-csmm-101x-4 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-09-18 16:00:17

Episode 105 – Big Data at British Telecom with Phillip Radley

In this episode we welcome Phil Radley, Chief Data Architect at BT to talk about the Big Data deployment at BT. Phillip Radley (Linkedin) Chief Data Architect @ BT https://home.bt.com/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-09-11 16:00:11

Episode 104 – Roaring News

In this Big Data News episode, we discuss an article with guidelines on how you should arrange your data gathering projects with the customer in mind. Dave brings a matrix of visualization products. Breaking News The five Cs: Five framing guidelines to help you think about building data products. https://www.oreilly.com/ideas/the-five-cs?utm_medium=social&utm_source=twitter.com&utm_campaign=awareness&utm_content=radar+content The Chartmaker Directory http://chartmaker.visualisingdata.com/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-09-04 16:00:41

Episode 103 – Apache Pulsar version 2.0 with Matteo and Sijie from Streamlio

Matteo and Sijie from Streamlio reached out to us and let us know they had an update on Apache Pulsar. It turned out they had a lot to talk about so we cut the interview in two parts. the first of which was published in episode 101. Here is the second part with information on version 2.0 and the future of the Apache Pulsar project. Apache Pulsar logo The first subject taken on by Sijie is Pulsar Functions, followed by Matteo talking about the new schema registry and Topic Compaction. With a new major version being released, users will probably want to upgrade so we asked the guys about the upgrade path. The rest of the episode, Matteo and Sijie share what they can regarding the future Pulsar Roadmap. Matteo Merli (https://www.linkedin.com/in/matteomerli/) Co-Founder - Software Engineer Sijie Guo (https://www.linkedin.com/in/samuelguo/) Co-Founder Apache Pulsar (incubating) https://pulsar.apache.org/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-08-28 16:00:12

Episode 102 – Roaring News

Big Data News at the end of the summer is not easy to find, but we did end up with three topics to discuss: from isolating GPUs in Hadoop 3.x to replicating big data (to the cloud) and quick tips from Adam's blog. Breaking News First Class GPUs support in Apache Hadoop 3.1, YARN & HDP 3.0 https://hortonworks.com/blog/gpus-support-in-apache-hadoop-3-1-yarn-hdp-3/ Replicating big datasets in the cloud https://medium.com/hotels-com-technology/replicating-big-datasets-in-the-cloud-c0db388f6ba2 https://dataworkssummit.com/berlin-2018/session/tools-and-approaches-for-migrating-big-datasets-to-the-cloud/ https://www.slideshare.net/Hadoop_Summit/tools-and-approaches-for-migrating-big-datasets-to-the-cloud Quick Tip: The easiest way to grab data out of a web page in Python https://medium.com/@ageitgey/quick-tip-the-easiest-way-to-grab-data-out-of-a-web-page-in-python-7153cecfca58 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-08-21 16:00:52

Episode 101 – Apache Pulsar update with Matteo and Sijie from Streamlio

Matteo and Sijie from Streamlio reached out to us and let us know they had an update on Apache Pulsar. It turned out they had a lot to talk about so we cut the interview in two parts and here is the first part where they introduce Apache Pulsar, go in depth on the correct deployment scaling of a stable Pulsar cluster and clarify Pulsars "at least once vs exactly once" strategy. Part two will go in more depth on what's new. Stay tuned! Apache Pulsar logo Matteo Merli (https://www.linkedin.com/in/matteomerli/) Co-Founder - Software Engineer Sijie Guo (https://www.linkedin.com/in/samuelguo/) Co-Founder Apache Pulsar (incubating) https://pulsar.apache.org/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-08-14 16:00:36

Episode 100 – Celebrating our Centennial with the history of Hadoop

100 Big Data episodes! We made it, in no small part thanks to our audience: you are who keeps us going! In this episode we celebrate our centennial by going over the history of Hadoop releases, highlighting the most noteworthy events along the way. Join us down the twisty paths of our memory lanes! The blockchain related Linkedin post Jhon liked The sources for this episode: http://hadoop.apache.org/releases.html https://en.wikipedia.org/wiki/Apache_Hadoop Debate over which company had contributed more to Hadoop: http://hortonworks.com/blog/reality-check-contributions-to-apache-hadoop/ Thank you for being part of the ride and now on to episode 200! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-08-07 16:00:07

Episode 99 – The State of Big Data at Codemotion Amsterdam

The Roaring Elephant podcast was a guest at the Codemotion conference in Amsterdam a little while ago. This episode contains the audio of the talk we did on the State of Big Data. Our talk was dfinitely light on slideware, but if you want to see the video cast of our presentation, you can find it on the Codemotion youtube channel:Codemotion Amsterdam 2018: The State of Big Data by Roaring Elephant podcast Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-07-31 16:00:47

Episode 98 – Roaring news

In this episode of Big Data Roaring News, Dave laments another announcement of Hadoop's demise and exposes A.I. imposters. Jhon has articles comparing Ranger with Sentry and Apache Nifi reaching the ripe age of 1.7 with a Minifi charged practical demo to prove the point. Breaking News Hadoop’s star dims in the era of cloud object data storage and stream computing https://siliconangle.com/blog/2018/07/09/hadoops-star-dims-era-cloud-object-data-storage-stream-computing/ The rise of “pseudo-ai” how tech firms quietly use humans to do bots work https://www.theguardian.com/technology/2018/jul/06/artificial-intelligence-ai-humans-bots-tech-companies Apache Ranger Vs Sentry https://www.linkedin.com/pulse/apache-ranger-vs-sentry-mythily-rajavelu/ How to build an IIoT system using Apache NiFi, MiNiFi, C2 Server, MQTT and Raspberry Pi https://medium.freecodecamp.org/building-an-iiot-system-using-apache-nifi-mqtt-and-raspberry-pi-ce1d6ed565bc Apache Nifi Version 1.7.0 released: https://cwiki.apache.org/confluence/display/NIFI/Release+Notes Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-07-24 16:00:42

Episode 97 – ODPi: A new world for data governance

In this episode, we welcome back John Mertic one more time. It was quite obvious that John had lots more to talk about at the end of our last interview with him. ODPi has recently reinvented itself, moving away from a strict distribution standards body towards data governance and reference specifications. ODPi logo John Mertic Director of Program Management for ODPi, R Consortium, and Open Mainframe Project https://www.linkedin.com/in/jmertic/ ODPi website links: https://www.odpi.org/ https://www.odpi.org/blog/2018/04/04/the-state-of-open-source-and-big-data-three-years-later https://www.odpi.org/projects/data-governance-pmc https://www.odpi.org/events Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-07-17 16:00:20

Episode 96 – Roaring news

In this edition of Roaring news, Ward Bekker returns to discuss what is happening in the world of Big Data. Ward brings news on GPUs in supercomputers and how Big Data could be wrong about you. Dave and Jhon found articles on Big data growth visualizations and GDPR. Breaking News 10 Charts that will change your perspective of Big Data’s Growth https://www.forbes.com/sites/louiscolumbus/2018/05/23/10-charts-that-will-change-your-perspective-of-big-datas-growth/#1ea595702926 New GPU-Accelerated Supercomputers Change the Balance of Power on the TOP500 https://www.top500.org/news/new-gpu-accelerated-supercomputers-change-the-balance-of-power-on-the-top500/ GDPR: A Call to Remove Technical Debt from Data Science https://medium.com/@kjarmul/gdpr-a-call-to-remove-technical-debt-from-data-science-c103a01c3102 Everything big data claims to know about you could be wrong http://news.berkeley.edu/2018/06/18/big-data-flaws/ Our thanks to Ward for adding some variety to this News episode. Ward Bekker (Linkedin) Pre-Sales Solutions Engineer II @ Hortonworks Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-07-10 16:00:22

Episode 95 – DataWorks Summit in San Jose with Ward Bekker

Since both Dave and Jhon were not able to attend the Dataworks Summit in San Jose a couple of weeks ago, we have a guest, Ward Bekker, who was happy to join and educate us on the subject. DataWorks Summit San Jose 2018 In this episode we discuss the daily keynotes and Wards' selection of sessions at the Summit ranging from the new things in Yarn 3.0, Materialized views in Hive and much more. Ward Bekker (Linkedin) Pre-Sales Solutions Engineer II @ Hortonworks Some of the sessions and topics discussed are: Apache Hadoop State of the union https://dataworkssummit.com/san-jose-2018/session/apache-hadoop-yarn-state-of-the-union-2/ What is new in Apache Hive https://dataworkssummit.com/san-jose-2018/session/what-is-new-in-apache-hive/ Runing distributed tensorflow in production https://dataworkssummit.com/san-jose-2018/session/running-distributed-tensorflow-in-production-challenges-and-solutions-on-yarn-3-0-2/ Just the sketch: advanced streaming analytics in Apache Metron https://dataworkssummit.com/san-jose-2018/session/just-the-sketch-advanced-streaming-analytics-in-apache-metron/ Containers and Big Data https://dataworkssummit.com/san-jose-2018/session/containers-and-big-data/ Catch a hacker in realtime: Live visuals of bots and bad guys https://dataworkssummit.com/san-jose-2018/session/catch-a-hacker-in-realtime-live-visuals-of-bots-and-bad-guys/ HDFS tiered storage https://dataworkssummit.com/san-jose-2018/session/hdfs-tiered-storage/ Geospatial data platform at Uber https://dataworkssummit.com/san-jose-2018/session/geospatial-data-platform-at-uber/ What's the Hadoop-la about Kubernetes? https://dataworkssummit.com/san-jose-2018/session/whats-the-hadoop-la-about-kubernetes/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-07-03 16:00:50

Episode 94 – Roaring news

I this weeks edition of Roaring Big Data News, Dave talks about modernizing Hadoop and a billion java errors. Jhon has an article on improving your learning data sets. We finish with a discussion about the newly released HDP 2.6.5 with an emphasis on the deprecation notices and Yarn Containers. Breaking News Dave Modernizing Hadoop: Reaching the plateau of productivity https://www.zdnet.com/article/modernizing-hadoop-reaching-the-plateau-of-productivity/ 1 billion Java errors, here’s what causes 97% of them https://blog.takipi.com/we-crunched-1-billion-java-logged-errors-heres-what-causes-97-of-them/ https://blog.takipi.com/the-top-10-exceptions-types-in-production-java-applications-based-on-1b-events/ Jhon Why you need to improve your training data, and how to do it https://petewarden.com/2018/05/28/why-you-need-to-improve-your-training-data-and-how-to-do-it/amp/ Announcing the General Availability of Hortonworks Data Platform (HDP) 2.6.5, Apache Ambari 2.6.2 and SmartSense 1.4.5 https://hortonworks.com/blog/announcing-general-availability-hortonworks-data-platform-hdp-2-6-5-apache-ambari-2-6-2-smartsense-1-4-5/ Component Versions https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_release-notes/content/comp_versions.html Deprecation Notices https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_release-notes/content/deprecated_items.html YARN Containers Trying out Containerized Applications on Apache Hadoop YARN 3.1 https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/ Containerized Apache Spark on YARN in Apache Hadoop 3.1 https://hortonworks.com/blog/containerized-apache-spark-yarn-apache-hadoop-3-1/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-06-26 16:00:46

Episode 93 – Apache Kylin: Extreme OLAP Engine for Big Data

In this episode Apache PMC member Dong Li joins us to explains how Apache Kylin can deploy Analytical OLAP cubes in your Big Data environment. http://kylin.apache.org/ Dong Li Technical Partner & Senior Architect of Kyligence (linkedin) PMC Member of Apache Kylin http://en.kyligence.io/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-06-19 16:00:22

Episode 92 – Roaring news

Another week, another edition of Roaring Big Data News. This time, Dave talks about driving teens and Jhon takes a detailed look at an Eventbrite data pipeline article. Breaking News Dave Driver monitoring isn't just for teens; adults can benefit, too https://arstechnica.com/cars/2018/05/buicks-smart-driver-explains-why-my-gas-mileage-sucks-and-my-editors-doesnt/ Jhon Looking under the hood of the Eventbrite data pipeline! https://www.eventbrite.com/engineering/looking-under-the-hood-of-the-eventbrite-data-pipeline/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-06-12 16:00:16

Episode 91 – ODPi is back and better than ever!

In this episode, we welcome back John Mertic, director of Program Management for ODPi, R Consortium, and the Open Mainframe Project. It's been almost two years since we checked in with John and the ODPi initiative and as John mentions in the interview, a lot has changed in Hadoop... ODPi logo John Mertic Director of Program Management for ODPi, R Consortium, and Open Mainframe Project https://www.linkedin.com/in/jmertic/ ODPi website links: https://www.odpi.org/ https://www.odpi.org/blog/2018/04/04/the-state-of-open-source-and-big-data-three-years-later https://www.odpi.org/projects/data-governance-pmc https://www.odpi.org/events Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-06-05 16:00:26

Episode 90 – Roaring news

In this weeks Roaring News episode, Dave brings up the resilience of Apache Community open source projects and plays some Doom. Jhon has some practical Apache NIFI guides and the emergence of multi modal NoSQL databases. Breaking News DataWorks Summit Berlin video recordings are up: https://www.youtube.com/user/HadoopSummit/playlists Find Dave on his Australian road-trip: http://bit.ly/aus-nz-ibm-hwx-tour Dave DataTorrent, Stream Processing Startup, Folds (Apache Apex) https://www.datanami.com/2018/05/08/datatorrent-stream-processing-startup-folds/ DOOM! https://arxiv.org/abs/1804.09154 https://www.technologyreview.com/s/611072/ai-generates-new-doom-levels-for-humans-to-play/ https://www.youtube.com/watch?v=K32FZ-tjQP4 Bonus doom news: https://www.rockpapershotgun.com/2018/03/28/dodge-fireballs-forever-in-a-neural-nets-doom-nightmare/ https://worldmodels.github.io/ Jhon Accessing Feeds from EtherDelta on Trades, Funds, Buys and Sells (Apache NiFi) https://community.hortonworks.com/articles/191146/accessing-feeds-from-etherdelta-on-trades-funds-bu.html?es_p=6741162 NiFi Processing and Flow with Couchbase Server https://blog.couchbase.com/nifi-processing-flow-couchbase-server/ The new era of the Multi-Model Database https://www.zdnet.com/article/the-new-era-of-the-multi-model-database/ Seven Databases in Seven Weeks, Second Edition - A Guide to Modern Databases and the NoSQL Movement https://pragprog.com/book/pwrdata/seven-databases-in-seven-weeks-second-edition Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-05-29 16:00:39

Episode 89 – DataWorks Summit San Jose Agenda Review

With the San Jose edition of the DataWorks Summit only a month away, we go over the sessions that are available in the agenda today and offer our top picks. If you're going, or if you will be watching the replays online, we hope to guide you on your selection of sessions. DataWorks Summit San Jose 2018 And here is the dashboard we created with statistics on the San Jose sessions, for your enjoyment: https://aka.ms/DWS2018SJ The agenda is still in flux so we will be updating the dashboard regularly. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-05-22 16:00:58

Episode 88 – Roaring News

Returning to our more regular schedule, we have a Roaring News episode today. Dave has articles on multi-cloud readiness, Big Data being a pariah, and Google Duplex and Jhon came up with Synthetic data, data engineers and scientists and a Neural Network sharing cake recipes. Breaking News Dave Less than 10% ready for multi cloud http://www.cloudpro.co.uk/cloud-essentials/hybrid-cloud/7451/idc-less-than-10-of-organisations-are-ready-for-multi-cloud Tech companies distancing themselves from Big Data https://qz.com/1262102/tech-companies-are-distancing-themselves-from-big-data/ Google Duplex https://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html Jhon The Rise of Synthetic Data to Help Developers Create and Train AI Algorithms Quickly and Affordably https://insidebigdata.com/2018/05/08/rise-synthetic-data-help-developers-create-train-ai-algorithms-quickly-affordably/ Data engineers vs. data scientists https://www.oreilly.com/ideas/data-engineers-vs-data-scientists?utm_medium=social&utm_source=twitter.com&utm_campaign=awareness&utm_content=radar+content+datascience We asked a neural network to bake us a cake. The results were...interesting. https://www.popsci.com/neural-network-bakes-a-cake Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-05-15 16:00:54

Episode 87 – Druid: a high-performance, column-oriented, distributed data store – part 2

This is the second part of an interview with Fangjin Yang, co-founder and CEO at Imply and committer/PMC member for the Druid project. Druid: a high-performance, column-oriented, distributed data store which has entered the Hadoop environment with the recent integration with Apache and we since Druid has been around for a while, we are grateful to FJ for spending some time with our listeners. Fangjin Yang Cofounder and CEO at Imply (linkedin) Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-05-08 16:00:00

Episode 86 – Druid: a high-performance, column-oriented, distributed data store – part 1

This is the first part of an interview with Fangjin Yang, co-founder and CEO at Imply and committer/PMC member for the Druid project. Druid: a high-performance, column-oriented, distributed data store which has entered the Hadoop environment with the recent integration with Apache and we since Druid has been around for a while, we are grateful to FJ for spending some time with our listeners. Fangjin Yang Cofounder and CEO at Imply (linkedin) Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-05-01 16:00:03

Episode 85 – DataWorks Summit Community Showcase Exhibitor Soundbites

This is the final part of our coverage of the DataWorks Summit Berlin 2018. Normally we would not have had an episode this week, since we were in Berlin last week, but we had lightning interviews with the vendors in the Community Expo Are and used that coverage to make this episode. So less of "Dave & Jhon" and more "ecosystem tech" snippets this time. Even though this does stray a bit from our usual content, we still hope it is useful. This was recorded in a hotel room and on the expo floor so the audio quality is not up to our usual standards, we hope you’ll forgive us! Here is a timestamped list of the lightning interviews: 02:41 Hortonworks https://hortonworks.com/ 06:28 Alation https://alation.com/ 08:45 Arcadia Data https://www.arcadiadata.com/ 11:12 Attunity https://www.attunity.com/ 13:10 BlueMetrix https://www.bluemetrix.com/ 15:27 BMW https://www.bmw.com 18:04 IBM https://www.ibm.com 19:54 Microsoft https://www.microsoft.com 22:15 Nutanix https://www.nutanix.com/ 23:26 Syncsort https://www.syncsort.com 24:54 Synerscope http://www.synerscope.com/ 27:05 Talend https://www.talend.com 27:59 Teradata https://www.teradata.com/ 29:02 -Interview End- Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-04-24 16:00:37

Episode 84 – DataWorks Summit Berlin – Day 2 Recap

And with the end of day two of the 2018 DataWorks Summit in Berlin comes the end of this years Europe Summit. But never fear, we have an extra 90 minutes of DataWorks goodness for you to consume on your way home. No real editing on this one, recording in a hotel room so audio quality may not be up to our usual standards, we hope you'll forgive us! Enjoy! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-04-20 03:38:30

Episode 83 – DataWorks Summit Berlin – Day 1 Recap

Another year, another European Dataworks Summit, and yes, another daily recap show from Jhon and Dave. We walk through the keynotes and sessions we attended and give our thoughts and views. This should be useful for anyone who wasn't able to attend or those seeking to peek into sessions they couldn't make. No real editing on this one, recording in a hotel room so audio quality may not be up to our usual standards, we hope you'll forgive us! Enjoy! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-04-19 02:51:23

Episode 82 – DataWorks Summit Berlin 2018 Preview

Next week is DataWorks Summit Berlin week! Your two hosts will be in attendance and in this episode we go over the agenda and plan which sessions we want to attend and why. Peppered throughout we add further insights and experiences from previous years. Unfortunately, Dave's network was a little unstable and there are a couple audio glitches in this episode. For some session statistics or if you can use some help deciding what sessions you want to attend, you can use the dashboard we created: Click the screenshot above or go to http://aka.ms/DWS2018 to access the dashboard. It is a dynamic report: clicking on graph elements (bars of pie slices) will apply filters on all the visualizations and the session list. Use control-click to combine filters. At some point the dashboard will dissapear because it is no longer relevant. for future reference, here is a large version of the screenshot. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-04-10 16:00:00

Episode 81 – Roaring News

In this installment of Big Data News, we talk about the recent Facebook leak, how everybody is still doing it wrong (according to some at least) and installing Hadoop "the old-fashioned way". Also briefly covered is Elastic's X-Pack, now even more "open" than before, but still rather closed it would seem. Breaking News Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-04-03 16:00:27

Episode 80 – Big Data Tracking

Last June, Wolfie Christl published a 93 page report Corporate Surveillance in Everyday Life using big data tracking. Apart from the massive pdf that can be downloaded on the net, an extensive summary can be found on the Cracked Labs website. In this episode we go over the content and give our views on the subject. If you want to follow along with us while we are discussing the different point in the onlin earticle, here is the link: http://crackedlabs.org/en/corporate-surveillance Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-03-27 16:00:21

Episode 79 – Roaring News

Another Big Data news episode! This time we consider the Big or small nodes conundrum based on an article that after close scrutiny doesn't really seem to test the real issue. Other things that get covered are Linkedin's Dynanometer, Cloudera's full production architecture advise for a recommendation service and a really interesting visualization technique based on blobs. Breaking News Big Data, Small Nodes https://insidebigdata.com/2018/02/22/make-sense-big-data-small-nodes/ Dynamometer Release https://github.com/linkedin/dynamometer https://venturebeat.com/2018/02/08/linkedin-open-sources-dynamometer-for-hadoop-performance-testing-at-scale/ Cisco IoT predictions Aka someone somewhere trots out the old “data is the new oil” trope for one more circuit, please please please stop? https://www.networkworld.com/article/3257769/internet-of-things/7-transportation-iot-predictions-from-cisco.html Production Recommendation Systems with Cloudera http://blog.cloudera.com/blog/2018/02/production-recommendation-systems-with-cloudera/ A Day in the Life of Americans http://flowingdata.com/2015/12/15/a-day-in-the-life-of-americans/ Intercontinental Ballistic Microfinance (2006) https://vimeo.com/28413747 Understanding AI, Machine Learning & Predictive Analytics https://www.forcecast.com/blog/understanding-ai-machine-learning-predictive-analytics/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-03-20 16:00:04

Episode 78 – Apache Trafodion transactional SQL for Hadoop (Part 2)

This episode, a group of people from Esgyn join us to talk about the Apache Trafodion transactional SQL for Hadoop database engine. In this second part Rohit, Ken and Rao talk about the internal workings and best practices of Apache Trafodion. Rohit Jain Chief Technology Officer (linkedin) https://esgyn.com Ken Holt Chief Operating Officer and Co-Founder (linkedin) https://esgyn.com Rao Kakarlamudi VP of Pre-sales & Principal Architect (linkedin) https://esgyn.com In Search of Database Nirvana (oreilly) By Rohit Jain Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-03-13 16:00:47

Episode 77 – Roaring News

Another Roaring News wpisode where we cover recent Big Data News items we found interesting. This time we talk about Open Source turning 20 years old, the annoyances that come with Smart Homes and a big data device in Germany. Additionally, we talk about some introductory guides to AI. Breaking News 20 years of open source + who contributes http://www.zdnet.com/article/open-source-turns-20/ https://www.infoworld.com/article/3253948/open-source-tools/who-really-contributes-to-open-source.html Smart home living is annoying as hell https://gizmodo.com/the-house-that-spied-on-me-1822429852 Big Data Divide https://www.politico.eu/article/to-protect-or-collect-germanys-big-data-divide/ The Art of Learning Data Science https://medium.com/@aparnack/the-art-of-learning-data-science-65b9f703f932 The Long Road To Become a Big Data Scientist - Infographic https://medium.com/@aparnack/sequel-to-the-art-of-learning-data-science-cb2e1f078e5a An executive’s guide to AI https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/an-executives-guide-to-ai?cid=other-soc-twi-mip-mck-oth-1802&kui=udT5IIoYx3yxUmZYJz7_2A Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-03-06 16:00:06

Episode 76 – Apache Trafodion transactional SQL for Hadoop (Part 1)

This episode, a group of people from Esgyn join us to talk about the Apache Trafodion transactional SQL for Hadoop database engine. In this first part Rohit, Ken and Rao talk about the history and goals behind the Apache Trafodion. Rohit Jain Chief Technology Officer (linkedin) https://esgyn.com Ken Holt Chief Operating Officer and Co-Founder (linkedin) https://esgyn.com Rao Kakarlamudi VP of Pre-sales & Principal Architect (linkedin) https://esgyn.com In Search of Database Nirvana (oreilly) By Rohit Jain Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-02-27 16:00:27

Episode 75 – Roaring News

In this Big Data News episode, we discuss the 5 year aniversary of Hadoop Weekly, now Data Engineering Weekly, the Strava "data leak" and Twitter Wars, may the data be with you! Breaking News Five Years of Hadoop Weekly (Joe Crobak @joecrobak @Medium) https://medium.com/@joecrobak/five-years-of-hadoop-weekly-7aa8994f140b https://dataengweekly.com/ https://www.hadoopweekly.com/ How Strava's "anonymized" fitness tracking data spilled government secrets ([Nathan Ruser @Nrg8000] @zackwhittaker @ZDNet) http://www.zdnet.com/article/strava-anonymized-fitness-tracking-data-government-opsec/ http://www.abc.net.au/news/science/2018-01-29/strava-heat-map-shows-military-bases-and-supply-routes/9369490 Tweet Wars - The last data point (@basecamp_ai) http://www.knoyd.com/blog/the-last-data-point Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-02-20 16:00:26

Episode 74 – Hadoop sizing part 3: Compute sizing

As promised, in this final part of our Hadoop Sizing series, we round off the subject with sizing your compute and network resources. Undoubtedly we'll be revisiting this subject in the future, but the three parts of this series should give ample information on the subject for now. Hadoop Node Sizing Hadoop Data Node Density Tradeoff on HCC: https://community.hortonworks.com/content/kbentry/48878/hadoop-data-node-density-tradeoff.html Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-02-13 16:00:39

Episode 73 – Roaring News

In this edition of the Roaring News series, we talk about delivering business value and how to build an analytics team. For the Machine learning aficionados, we cover the top ML algorithms and we round off with an article on sizing a Apache Flink cluster, which fits nicely with the previous and next episode! Breaking News Delivering Business Value with Big Data Projects https://www.techrepublic.com/article/4-tips-for-delivering-more-business-value-with-short-term-big-data-projects/ Sizing Flink (and other streaming?) https://data-artisans.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines Building The Analytics Team At Wish Part 1 — Rebuilding The Foundation Part 2 — Scaling Data Engineering Part 3 — Scaling Data Analysis Part 4 — Recruiting A Tour of The Top 10 Algorithms for Machine Learning Newbies https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to c

2018-02-06 16:00:45

Episode 72 – Hadoop sizing part 2: Storage sizing

In this continuation of our Hadoop Sizing series we started last September, we move on from sizing your cluster to sizing the individual server chassis or virtual machines in your cluster. We did not finish the entire story just yet, concentrating mainly on the storage component. The final part 3 where we round off the subject with sizing your compute and network resources is planned to be published in the next topic episode. Hadoop Node Sizing Hadoop Data Node Density Tradeoffpost on HCC: https://community.hortonworks.com/content/kbentry/48878/hadoop-data-node-density-tradeoff.html Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-01-30 16:00:02

Episode 71 – Roaring News

This time Dave has prepared some articles for us to discuss. First we talk about something new on our radar: Apache Trafodion which is a transactional SQL on Hadoop. Next we spend some time on Artificial ignorance and we round off with some IoT predictions by IBM Breaking News Apache Trafodion - http://trafodion.apache.org/ goes TLP after 2.5 years… http://incubator.apache.org/projects/trafodion.html https://www.slideshare.net/mKrishnaKumar1/trafodion-an-enterprise-class-sql-based-on-hadoop Artificial ignorance: The 10 biggest AI failures of 2017 https://www.techrepublic.com/article/the-10-biggest-ai-failures-of-2017/ The Internet Of Things (IOT) Will Be Massive In 2018: Here Are The 4 Predictions From IBM https://www.forbes.com/sites/bernardmarr/2018/01/04/the-internet-of-things-iot-will-be-massive-in-2018-here-are-the-4-predictions-from-ibm Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-01-23 16:00:41

Episode 70 – 10 Facts about Hadoop, five years later

In this trip down memory lane, we go over an article from five years ago and discuss how Hadoop and Big Data have changed since then, or has it...? Time Machine Data tunnel Hadoop is 10 years old. Lets look back at public opinion just five years ago. (https://www.developer.com/db/10-facts-about-hadoop.html) Import/Export Data to and from HDFS Data Compression in HDFS Transformation in Hadoop Achieve Common Task Combining Large Volume Data Ways to Analyze High Volume Data Debugging in Hadoop World Easy to Control Hadoop System Scalable Persistence Data Read and Write in Hadoop Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-01-16 16:00:04

Episode 69 – Roaring News

The first news episode of 2018 has landed. We discuss the new Big Data architecture at CERN, a curious case of a broken benchmark and the future plans of the Apache Hadoop project. Breaking News The Architecture of the Next CERN Accelerator Logging Service https://databricks.com/blog/2017/12/14/the-architecture-of-the-next-cern-accelerator-logging-service.html The Curious Case of the Broken Benchmark: Revisiting Apache Flink® vs. Databricks Runtime https://data-artisans.com/blog/curious-case-broken-benchmark-revisiting-apache-flink-vs-databricks-runtime Hadoop 3.0 Ships, But What Does the Roadmap Reveal? https://www.datanami.com/2017/12/15/hadoop-3-0-ships-roadmap-reveal/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-01-09 16:00:06

Episode 68 – Future Predictions

Welcome to 2018! And welcome to our 110% fact based prediction show for 2018. As you may expect from your two hosts, everything in this episode is 110% sure to become reality in the next twelve months. And since 110% is not actually possible, our predictions might also be just a little bit off? But we have 365 days to bask in the glory of our predictions before we, as usual, are shot back down to earth. Nancy comic Dave The year of cloud first and hybrid cloud Many organisations will move from solely on prem to cloud or hybrid with new workloads seeking alternatives to their traditional on-prem. Edge computing for IoT With edge devices becoming more powerful and IoT workloads increasing (and bandwidth not getting cheaper at the same rate) we'll see more of the intelligence pushed further to the edge. GDPR will fundamentally change the face of data governance, collection, anon, retention in big data GDPR regulations start to arrive in the form of concrete plans being in place and many organisations will start to feel the governance pressure especially in their data lake(s). Big Data Cybersecurity adoption significantly increased A lot of work has gone into preparation, which I believe we'll start to see realised in 2018, not commonplace yet, but significantly increased deployments over 2017. AI and ML will replace "Big Data" in general parlance The phrase "big data" will become secondary and the focus will be on Artificial Intelligence and Machine Learning, regardless of the underlying big data platform. Jhon Cloud will become the de-facto choice for big data deployment. The decoupled data storage and ephemeral compute make it possible for small entities to go for it AND makes it possible for small companies to create scalable solutions for the big guys. In house development is too slow since good, reusable solutions become available cross industry. Because of cloud, we'll see a shift where big data solutions are no longer built in-house, but "ready" solutions are licensed/purchased from start-up like orgs. (Self service Big Data as a Service made possible by cloud.) Hadoop becomes invisible, a bit like programing frameworks. Data Science and Engineer jobs will change Data Scientists domain knowledge becomes the main advantage and "math" becomes the base knowledge. Data engineer will change a lot due to AI taking over a lot of the data prep work. May even disappear or go back to simple data ingest. Still no good governance solution in 2018 Atlas will get better, with IBM behind it, but will still fail to offer the comprehensive governance and lineage solution businesses are looking for. And sadly, no other solution will fit the bill either. Chat bots (in social media) are here to stay. Example Blue bot: https://tweakers.net/nieuws/133099/klm-laat-bots-antwoorden-formuleren-op-helft-van-klantvragen-op-social-media.html Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2018-01-02 16:00:41

Episode 67 – Roaring News

It's here: the final news episode for 2017! We finish off the year talking about Apache Pulsar, Hadoop Delegation tokens (aka Kerberos), the Hadoop on Container hype (or is it?), Apache Hadoop 3.0 release and all you need to know bout Data Prepping (or at least all we can tell you in about 10 minutes, that is). Breaking News Jhon Comparing Pulsar and Kafka: unified queuing and streaming https://streaml.io/blog/pulsar-streaming-queuing/ Hadoop Delegation Tokens Explained http://blog.cloudera.com/blog/2017/12/hadoop-delegation-tokens-explained/ Hadoop and Containers Big Data and Container Orchestration with Kubernetes (K8s) https://www.bluedata.com/blog/2017/12/big-data-container-orchestration-kubernetes-k8s/ Spark on Kubernetes series https://banzaicloud.com/blog/spark-k8s/ https://banzaicloud.com/blog/scaling-spark-k8s/ https://banzaicloud.com/blog/zeppelin-spark-k8/ Data Prepping in the clouds Google Cloud Dataprep: Spreadsheet-Style Data Wrangling Powered by Google Cloud Dataflow https://medium.com/mark-rittman/google-cloud-dataprep-spreadsheet-style-data-wrangling-powered-by-google-cloud-dataflow-a48c405d81c Data Transformations “By Example” in the Azure Machine Learning Workbench https://blogs.technet.microsoft.com/machinelearning/2017/09/25/by-example-transformations-in-the-azure-machine-learning-workbench/ Dave Hadoop 3.0 Released on December 13th 2017 http://hadoop.apache.org/docs/r3.0.0/index.html http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-common/release/3.0.0/RELEASENOTES.3.0.0.html http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-common/release/3.0.0/CHANGES.3.0.0.html Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-12-26 16:00:39

Episode 66 – Past Predictions

It the time of the year again where you can call us out on being totally rubbish at predicting much of anything, or can we..? Listen to the episode and find out! In any case, we unabashedly will be recording a new "future predictions" show in a couple of weeks so if you have any predictions you want us to consider, send them to us by tweet or email! Bart Simpson - Being Right Sucks Predictions: Fragmentation of ecosystem Scale of data-breaches get larger and more IOT focused Chat-bots everywhere More options for self service big data platforms for SMB Commerce will muddy the waters - snake oil sales: call it big data and it will sell Cyber-security with big data becomes commonplace In-Memory and GPU will rule; commodity hardware will evolve into "big iron". Atlas repeat: it's here! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-12-19 16:00:50

Episode 65 – Roaring news

It's another Roaring News episode. Today Jhon talks about machine learning projects for beginners, data visualization and the new neural network hotness which is transfer learning. Dave covers the Dataworks Summit call for papers and Apache Impala reaching Top Level Project status. Breaking News Jhon 8 Fun Machine Learning Projects for Beginners https://elitedatascience.com/machine-learning-projects-for-beginners Data is Beautiful https://www.reddit.com/r/dataisbeautiful/ https://twitter.com/hashtag/dataisbeautiful Transfer Learning - Machine Learning's Next Frontier http://ruder.io/transfer-learning/index.html#whatistransferlearning Dave Apache Impala gains TLP https://blogs.apache.org/foundation/entry/the-apache-software-foundation-announces24 Dataworks Summit call for papers https://dataworkssummit.com/blog/dataworks-summit-berlin-call-for-papers-is-now-open/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-12-12 16:00:43

Episode 64 – Talking Apache Pulsar with Matteo and Sijie from Streamlio

A while ago, the all knowing oracle that is twitter pointed out that we really did not do justice to the Apache Pulsar project when we covered it in or Roaring News episode. The good people at Streamlio reached out to us and here is the 80+ minutes long discussion we had with Matteo Merli and Sijie Guo, going in depth on the merits and technical details, setting the Roaring Pulsar record straight! Apache Pulsar logo Matteo Merli (https://www.linkedin.com/in/matteomerli/) Co-Founder - Software Engineer Sijie Guo (https://www.linkedin.com/in/samuelguo/) Co-Founder Apache Pulsar (incubating) https://pulsar.apache.org/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-12-05 16:00:42

Episode 63 – Roaring News

It's another news episode folks. This time Dave and Jhon talk about extracting telemetry from a PS3 steering wheel and pedal set, IBM sun-setting BigInsights and 6 things a budding Data Scientist should be aware of. Breaking News Dave Taking KSQL for a Spin Using Real-time Device Data https://www.rittmanmead.com/blog/2017/11/taking-ksql-for-a-spin-using-real-time-device-data/ Jhon IBM leads BigInsights for Hadoop out behind barn. Shots heard https://www.theregister.co.uk/2017/11/08/ibm_retires_biginsights_for_hadoop/ If you want to be a data scientist, you need to know about these 6 trends https://www.siliconrepublic.com/advice/data-scientist-trends Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-11-28 16:00:13

Episode 62 – Second Year Anniversary

Are there really two years worth of Roaring Elephant podcasts out there? Well, since this is our second anniversary party, it must be! Join some of the guests we had on the podcast this year to reminisce about the months gone by. Due to the drop-in drop-out nature, this episode is a little rough but we hope you can enjoy being part of our little party! Discussion topics ranged from what our guests have been up to, Apache Kafka, Dremio the effects of GDPR on the industry and how our guests see the future of Big Data. Our returning guests today are: Eduardo Barbaro Sr. Data Scientist at Mobiquity, Inc – Europe https://www.linkedin.com/in/edbarbaro/ Marcel-Jan Krijgsman Data Engineer at Open Circle Solutions B.V. https://www.linkedin.com/in/marcel-jankrijgsman/ Youen Chéné CTO @Saagie https://www.linkedin.com/in/youenchene/ Pitt Fagan Senior Data Analyst at Zendesk https://www.linkedin.com/in/pittfagan/ Big Data Madison Meetups: https://www.meetup.com/BigDataMadison/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-11-21 16:00:30

Episode 61 – Roaring News

In this episode of Roaring News, we talk about the seemingly inevitable block chain, Fraud detection in banking and a celebration of the DevOps engineer. Dave: The continued journey to understand enterprise usage of block-chain http://fortune.com/2017/10/17/blockchain-berners-lee/ https://www.hyperledger.org/blog/2017/10/17/qa-does-blockchain-alleviate-security-concerns-or-create-new-challenges Jhon: StreamING Machine Learning Models: How ING Adds Fraud Detection Models at Runtime with Apache Flink® https://data-artisans.com/blog/real-time-fraud-detection-ing-bank-apache-flink DevOps might be the key to your Big Data project success https://datahub.packtpub.com/big-data/devops-for-big-data-success/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-11-14 16:00:55

Episode 60 – Big Data Roles: Recruiting and hiring

In this entry in our "Roles in Big Data" series, we talk to Chuck Waygood, global director of talent Acquisition at Hortonworks. Chuck has been in this space since 2013 and in this episode he talks about his experiences, what recruiters are looking for, how you can attract that perfect candidate and what you can do to improve your chances of landing that great career in Big Data. Chuck Waygood Director, Global Talent Acquisition at Hortonworks https://www.linkedin.com/in/chuckwaygood/ Please use the Contact Form on this blog or our twitter feed to send us your questio

2017-11-07 16:00:15

Episode 59 – Roaring News

It's another installment of Roaring News! This time, we talk about the ensemble recommendation system allegedly used by Spotify, not-so-new kid-on-the-block-after-all Apache Pulsar, the ever so popular "Hadoop is dead" and end with a quick shout-out to the Tokyo Data Platform Conference. Dave Apache Pulsar https://pulsar.apache.org/ https://www.slideshare.net/ydn/october-2016-hug-pulsar-a-highly-scalable-low-latency-pubsub-messaging-system https://streaml.io/blog/apache-pulsar-geo-replication/ https://streaml.io/blog/geo-replication-patterns-practices/ https://news.ycombinator.com/item?id=12453080 Data Platform Conference Tokyo http://dataplatform.jp/ Jhon Spotify’s Discover Weekly: How machine learning finds your new music https://hackernoon.com/spotifys-discover-weekly-how-machine-learning-finds-your-new-music-19a41ab76efe Hadoop Was Hard to Find at Strata This Week https://www.datanami.com/2017/09/29/hadoop-hard-find-strata-week/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-10-31 16:00:46

Episode 58 – Big Data Roles: The data scientist

In this entry in our long-running "roles in Big Data" series, we talk to Eduardo Barbaro, a Sr. Data Scientist at Mobiquity. To say that the data scientist is a pivotal person in any big data or advanced analytics project is not an exaggeration and we are really grateful to Eduardo for spending some time on the podcast to give us his views and recount his experiences. Eduardo Barbaro Sr. Data Scientist at Mobiquity, Inc - Europe https://www.linkedin.com/in/edbarbaro/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-10-24 16:00:02

Episode 57 – Dataworks Summit Sydney recap by Dave – Part 2

In this second part of Dave's tale of the Sidney Dataworks Summit, the subjects range from Apache Metron, a talk by Telstra, Australia's leading mobile provider, Yarn 3.0 and Apache Zeppelin Solving Cyber at Scale - Simon Ball https://www.slideshare.net/Hadoop_Summit/solving-cyber-at-scale-80187657 Implementing greenfield Apache Metron SOC – Telstra - Saad Ayad Slides not available :( Yarn past present future - Rohith Sharma KS - Sunil G https://www.slideshare.net/Hadoop_Summit/yarn-past-present-future Model as a service - Casey Stella https://www.slideshare.net/Hadoop_Summit/maas-model-as-a-service-modern-streaming-data-science-with-apache-metron-incubating Protecting your Critical Hadoop Clusters against Disasters - Jeff Sposetti / Sankar Hariappan https://www.slideshare.net/Hadoop_Summit/protecting-your-critical-hadoop-clusters-against-disasters Running Zeppelin in the Enterprise https://www.slideshare.net/Hadoop_Summit/running-zeppelin-in-enterprise-80378187 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-10-17 16:00:57

Episode 56 – Dataworks Summit Sydney recap by Dave – Part 1

Dave has attended the Dataworks Summit in Sidney and we go over the different sessions he attended there. In this first of two episodes, the focus lies on the new goodness that Hadoop 3.0 will bring us soon. Hadoop 3.0 – Sanjay Radia https://www.slideshare.net/Hadoop_Summit/apache-hadoop-30-community-update-79999467 JDK 8+ Port number changes Class-path isolation HDFS – 3 node Namenode, intra data node balancer for balanced storage within a node, erasure coding 10TB node recovering in a few hours on a large cluster (3000 nodes) Erasure coding 2012, 2013, 2014 Erasure coding methods, blogs or stripes Surprisingly little performance difference for EC, what’s not shown is the network bandwidth cost, which is significantly higher Yarn 3.0 Scheduler, priorities within a queue Q – Inter queue priorities Long running services, dynamic container configuration, cpu and io easy, hard to do memory Service discovery in YARN via zookeeper, dns Elastic resource model, graceful decommissioning node managers Resource isolation with disk and network Yarn UI YARN federation SparkR best practices – Casey Stella https://www.slideshare.net/Hadoop_Summit/sparkr-best-practices-for-r-data-scientist Benefits pros/cons of R - Legions of academics have built R packages over the years Where Spark + R came from Data science workflow, data wrangling with spark and spark r Kerberos troubleshooting - Vipin Rathor https://www.slideshare.net/Hadoop_Summit/troubleshooting-kerberos-in-hadoop-taming-the-beast Why Kerberos, solves authentication Where is Kerberos used Enduser and service auth mechanism Hadoop delegation token Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-10-10 16:00:31

Episode 55 – Roaring News

In this edition of Roaring News, Dave covers the release of Apache Metron based HCP 1.3 and an HBase vs Cassandra benchmark battle. Jhon talks about some Spark tuning and scheduler inner-workings and finishes with a tale of a compliance kettle... Dave HCP 1.3 release https://hortonworks.com/blog/hortonworks-cybersecurity-platform-big-data-cybersecurity-solution/ https://docs.hortonworks.com/HDPDocuments/HCP1/HCP-1.3.0/bk_release-notes/content/ch01.html Battle of the Apache NoSQL heavyweights https://hortonworks.com/blog/hbase-cassandra-benchmark/ Jhon Spark Performance Tuning: A Checklist https://medium.com/zero-gravity-labs/spark-performance-tuning-a-checklist-abb3c80efb44 How the Spark Scheduler Work http://www.russellspitzer.com/2017/09/01/Spark-Locality/ A tale of a compliance kettle… https://cupfighter.net/2017/09/a-tale-of-a-compliance-kettle Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-10-03 16:00:50

Episode 54 – Hadoop sizing part 1: One big cluster, or many small ones

In this episode, we took an online article by Chris Riccomini and give our take on the discussion on having a single big cluster versus many smaller ones. If you are architecting a Hadoop cluster and are faced with this choice, this episode should give you a lot of information on the subject. One big cluster, or many small ones? by Chris Riccomini https://medium.com/@criccomini/one-big-cluster-or-many-small-ones-5f3126ed7045 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-09-26 16:00:57

Episode 53 – Roaring News

In this episode of Roaring News, Dave brings up the newly released HDP 2.6.2 which incorporates IBM's move from their proprietary IOP to HDP. Jhon brings an update on the MLEAP story for productionizing your spark model. We finish off discussing the newly released Apache Atlas version 0.8.1 Dave HDP and IBM HDP 2.6.2 https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_release-notes/content/ch_relnotes.html Jhon MLeap: Providing (Near) Real-time Data Science with Apache Spark https://medium.com/rv-data/mleap-providing-near-real-time-data-science-with-apache-spark-c34e7df093ca The Apache Atlas team is happy to announce the release of Apache Atlas - version 0.8.1. https://lists.apache.org/thread.html/82337a63dd216dbfa4f4609f76ceaef30de79e68dcbf726a673539b9@%3Cannounce.apache.org%3E Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-09-19 16:00:40

Episode 52 – Big data in travel

Over the summer, when your hosts enjoyed a well-earned vacation (well, we like to think we earned it) we could not stop being Big-Data Nerds and in this episode we talk about the Hadoop opportunities we spotted. During this episode you will hear us talk about how Big data does, could or should improve many aspects of vacationing. We talk about review sites, preventive maintenance on rental cars, IoT tracking beer levels, the social media privacy issues and much, much more. We really tried to make this a "new-style" short episode, but clearly, we still need some training... Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-09-12 16:00:42

Episode 51 – Roaring News

In this news episode (our very first one), Dave is all-out on Artificial Intelligence and its use in naming "stuff"; for some subjects it apparently works very well, for other subjects not so much... Jhon brings a blog on deploying new Kerberos functionality and a tutorial for Kafka Connect for those that have not really looked at it. The ensuing discussion on Nifi vs kafka is purely coincidental. Dave AI naming Paint (May 2017) http://lewisandquark.tumblr.com/post/160776374467/new-paint-colors-invented-by-neural-network https://arstechnica.co.uk/information-technology/2017/05/ai-paint-colour-names/ Guinea Pigs (June 2017) http://gizmodo.com/this-is-what-happens-when-you-teach-an-ai-to-name-guine-1796172891 Improved Paint (July 2017) https://arstechnica.co.uk/information-technology/2017/07/ai-paint-colours-reprogrammed/ British sounding place names (July 2017) http://www.telegraph.co.uk/technology/2017/07/20/ai-trained-generate-incredibly-british-place-names/ Beer (August 2017) http://gizmodo.com/weve-run-out-of-beer-names-and-ai-is-here-to-help-1797480178 Jhon Accessing Secure Cluster from Web Applications http://blog.cloudera.com/blog/2017/08/accessing-secure-cluster-from-web-applications/ The Simplest Useful Kafka Connect Data Pipeline In The World https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-1/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-09-05 16:00:06

Episode 50 – Alan Gates Wrap Up (Part 4)

This is the final part of our long interview with Alan Gates. In this part, Alan talks more about ODPI, Cloud First, Apache Flink, Apache Pig and we finish off with a little bit of Philosophy. A big thank you to Alan for sharing his pearls of wisdom with us! [Image from Linux.com] 00:00 Recent events Our vacation is almost over but this episode too was pre-recorded ahead of time. Because of this, we do not have any recent events to talk about 02:10 Alan Gates Wrap Up (Part 4) 34:37 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-08-29 16:00:48

Episode 49 – Thomas Henson on IoT architectures

In this episode we have an interview with Thomas Henson for you. Thomas is an Isilon Data Lake Evangelist at Dell/EMC, but in this episode he will talk about IoT architectures, related to his talk at the DataWorks Summit San Jose 2017 00:00 Recent events Since both Dave and Jhon are still on vacation, this episode was pre-recorded ahead of time. Because of this, we do not have any recent events to talk about. 02:14 Thomas Henson on IoT architectures You can find Thomas Hensons blog on Big Data at https://www.thomashenson.com/ 52:45 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-08-15 16:00:40

Episode 48 – Alan Gates on the DataWorks Summit (Part 3)

In this third part of our interview with Alan Gates, PMC member for various Apache projects including Apache Hive and co-founder of Hortonworks, we talk about his sessions at the DataWorks Summits and about the Summits in general. [Image taken from Linux.com] 00:00 Recent events Since both Dave and Jhon are still on vacation, this episode was pre-recorded ahead of time. Because of this, we do not have any recent events to talk about. 02:38 Alan Gates on the DataWorks Summit (Part 3) Since this part of the interview goes public after the San Jose Summit, it is too late to submit abstracts for that particular summit. However, the Australian version is in a couple of months so please go to the DataWorks website for more information about that one. 35:35 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-08-01 16:00:50

Episode 47 – Deep dive into Kudu

We've been interested in Kudu for a while. But it's something that neither of your hosts have been exposed to very much. Apache Kudu went from incubation to top level project in record time and now seemed like the time was right to dig into this piece of antelope. Mike Percy, PMC member and committer on the Apache Kudu project and software engineer at Cloudera was only too glad to come on the podcast and answer all our questions! 00:00 Recent events Since both Dave and Jhon are currently on vacation, this episode was pre-recorded ahead of time. Because of this, we do not have any recent events to talk about. 01:40 Deep dive into Kudu Special guest today is Mike Percy, PMC member and committer on the Apache Kudu project. 01:11:54 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-07-18 16:00:58

Episode 46 – San Jose DataWorks Summit 2017 in Review

Dave joined our free ticket raffle winner Pitt at the Data Works Summit in Sunny San Jose last month and they came back with almost two hours worth of exciting stories! Thanks again to Hortonworks for providing the free ticket to our raffle that Pitt won. San Jose DataWorks Summit 2017 in Review 00:01:20 Keynotes 00:31:20 Day 1 sessions 01:10:00 Day 2&3 sessions 01:54:55 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-07-04 16:00:41

Episode 45 – Modern Day Airships

Breaking up our series of insights from Alan Gates, we switch gears to another really interesting topic (and guest!) where we talk about the new visualisation features coming in Apache Zeppelin and we get it straight from the brains behind the new code, Bernhard Walter. Recent events 03:03 Jhon: Churn Prediction with Apache Spark Machine Learning by Carol McDonald (@caroljmcdonald) @mapr https://mapr.com/blog/churn-prediction-sparkml/ 12:12 Dave: HDFS Maintenance State by Manoj Govindassamy @cloudera https://blog.cloudera.com/blog/2017/05/hdfs-maintenance-state/ https://issues.apache.org/jira/browse/HDFS-7877 https://issues.apache.org/jira/browse/HDFS-6729 https://issues.apache.org/jira/browse/HDFS-7541 30:50 Modern Day Airships Bernhard Walter talks about the new visualisation options in Zeppelin with some of the what, why and how. 01:09:00 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-06-20 16:00:41

Episode 44 – Suicidal Spark

In this episode we're joined by Youen Chéné and Aurélien Vandel from Saagie who talk to us about their experiences deploying Spark Streaming workloads in production (based on their Dataworks Summit talk), what worked well, what didn't and what they'd recommend you might want to do if you follow in their footsteps. Enjoy! 00:00 Recent events Dave Big Data Videos http://www.kdnuggets.com/2017/05/top-recent-big-data-videos-youtube.html https://www.youtube.com/watch?v=RQ9czRAdmMs https://www.youtube.com/watch?v=hsoKlE67rTw Jhon InsightOut: The role of Apache Atlas in the open metadata ecosystem http://www.ibmbigdatahub.com/blog/insightout-role-apache-atlas-open-metadata-ecosystem https://www.youtube.com/watch?v=yQvmoDtGgbo Apache Atlas API Version 2 https://atlas.incubator.apache.org/api/v2/index.html Cloud giants 'ran out' of fast GPUs for AI boffins https://www.theregister.co.uk/2017/05/22/cloud_providers_ai_researchers/ Benchmark: Sub-Second Analytics with Apache Hive and Druid https://hortonworks.com/blog/sub-second-analytics-hive-druid/ 26:00 Spark Streaming and Suicidal Tendencies https://dataworkssummit.com/munich-2017/sessions/spark-streaming-and-suicidal-tendencies/ Video: https://www.youtube.com/watch?v=Us8kizlbJtc Slides: https://www.slideshare.net/HadoopSummit/spark-streaming-and-suicidal-tendencies Youen Chéné, CTO @Saagie https://www.linkedin.com/in/youenchene/ Aurélien Vandel, Data Engineer https://www.linkedin.com/in/aur%C3%A9lien-vandel-060b5b8a/ 01:11:17 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-06-06 16:00:22

Episode 43 – Alan Gates talks Hive (Part 2)

In this episode we discuss the maturity of the Hadoop ecosystem and how hard it currently still is to get the value out of data. In the main section, we will have the second part of the interview with Alan Gates, this time talking about the place Hive has in the ecosystem. We still have more from Alan so stay tuned for more Hive goodness in future episodes! 00:00 Recent events Dave PredictionIO 0.11 release https://github.com/apache/incubator-predictionio/blob/v0.11.0-incubating/RELEASE.md http://predictionio.incubator.apache.org/ http://predictionio.incubator.apache.org/start/ http://predictionio.incubator.apache.org/system/ http://predictionio.incubator.apache.org/gallery/template-gallery/ https://techcrunch.com/2016/02/19/salesforce-acquires-predictionio-to-build-up-its-machine-learning-muscle/ Jhon Ultra-fast OLAP Analytics with Apache Hive and Druid – Part 1 of 3 https://hortonworks.com/blog/apache-hive-druid-part-1-3/ Why Big Data Hasn’t Yet Made a Dent on Farms https://www.wsj.com/articles/why-big-data-hasnt-yet-made-a-dent-on-farms-1494813720 The Pentagon is hunting ISIS using big data and machine learning https://www.engadget.com/2017/05/15/the-pentagon-is-hunting-isis-using-big-data-and-machine-learning/ 34:00 Alan Gates talks Hive (Part 2) Here is the second part of our interview with Alan Gates where he talks about Apache Hive. 54:24 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-05-23 16:00:05

Episode 42 – Alan Gates talks Hive (Part 1)

Welcome to the life the universe and everything episode of the Roaring Elephant Podcast. We talk some news and this episode got a little bit ranty... Apologies for that; to balance it out we have a chat with Alan Gates talking about Hive for you. There was so much Alan Gates goodness, we've split it over a few sessions and here's part one... 07:00 Recent events Dave Metron graduates to Apache TLP status https://blogs.apache.org/foundation/entry/apache-software-foundation-announces-apache https://hortonworks.com/blog/congratulations-apache-metron-tlp/ 2017 Big Data Landscape https://www.linkedin.com/pulse/firing-all-cylinders-2017-big-data-landscape-matt-turck You’re doing Hadoop and Spark wrong and they will probably fail https://www.theregister.co.uk/2017/02/21/hadoop_and_spark_risks_and_opportunities/ Jhon Apache Impala Leads Traditional Analytic Database http://blog.cloudera.com/blog/2017/04/apache-impala-leads-traditional-analytic-database/ Cloudera Data Science Workbench blogs http://blog.cloudera.com/blog/2017/04/use-your-favorite-python-library-on-pyspark-cluster-with-cloudera-data-science-workbench/ http://blog.cloudera.com/blog/2017/04/deep-learning-frameworks-on-cdh-and-cloudera-data-science-workbench/ http://blog.cloudera.com/blog/2017/04/bigdl-on-cdh-and-cloudera-data-science-workbench/ 34:00 Alan Gates talks Hive In this first part of our interview with Alan Gates where he talks about Apache Hive. 01:04:12 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-05-09 16:00:19

Episode 41 – News, news and some more news

In this episode, due to us blowing our recording space budget with the Dataworks Summit day by day episodes (39 and 40 if you've not listened yet, go and do so!) we're just bringing you a short episode this time with news, all the news that's new and approved by the Roaring Elephants! 05:10 Recent events Superset: benefits and limitations of the open source data visualization tool by Airbnb https://indatalabs.com/blog/data-strategy/open-source-data-visualization-tool-superset http://airbnb.io/superset/index.html Even artificial intelligence can acquire biases against race and gender http://www.sciencemag.org/news/2017/04/even-artificial-intelligence-can-acquire-biases-against-race-and-gender Building a cognitive data lake with ODPi-compliant Hadoop http://www.ibmbigdatahub.com/blog/building-cognitive-data-lake-odpi-compliant-hadoop Top 5 Performance Boosters with Apache Hive LLAP https://hortonworks.com/blog/top-5-performance-boosters-with-apache-hive-llap/ Integrate SparkR and R for Better Data Science Workflow https://hortonworks.com/blog/integrate-sparkr-and-r-for-better-data-science-workflow/ 33:16 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-04-25 16:00:50

Episode 40 – Dataworks Summit Europe – Day 2

In this episode of the Roaring Elephant podcast, Dave and I continue to share our Dataworks summit experience, meet yet more listeners, sit in on a few more sessions and give our overall view of the day and the summit as a whole! It will make you wish you were here. 00:00:00 Intro Roaring Elephant Roadshow Day 2 - The night after the party! 00:04:14 Session Discussions Our review of the sessions, what we liked, what we learned, what we'd recommend you go and check out afterwards: Keynote Meet HBase 2.0 Bridle your Flying Islands and Castles in the Sky HBase in Practice Solving Cyber at Scale Achieving Realtime Ingestion and Analysis of Security Events through Kafka and Metron Row/Column-Level Security in SQL for Apache Spark Apache Kafka Best Practices Mool - Automated Log Analysis using Data Science and ML Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark Backup and Disaster Recovery in Hadoop 01:02:15 Wrap up Some final overall observations and looking forward to the next summit news from Dataworks San Jose! 01:07:51 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-04-07 04:42:25

Episode 39 – Dataworks Summit Europe – Day 1

In this episode of the Roaring Elephant podcast, Dave and I attend the Dataworks summit, meet listeners, sit in on sessions and give our overall view of the day! It's the next best thing to being here. If you ARE here, then look out for us, we'll exchange limited edition Roaring Elephant stickers for audio clips. 00:00 Intro Roaring Elephant Roadshow Day 1- Direct from Munich! 03:25 Session Discussions Our review of the sessions, what we liked, what we learned, what we'd recommend you go and check out afterwards: Keynote An Apache Hive Based Data Warehouse Interactive Analytics at Scale in Apache Hive using Druid Hadoop 3.0 in a Nutshell Running Services on YARN Streamline - Stream Analytics for Everyone (AKA SAM: Streaming Analytics Manager) Apache Atlas: Governance for your Data File Format Benchmark - Avro, JSON, ORC and Parquet An Approach for Multi-Tenancy through Apache Knox 01:27:00 Wrap up Some final overall observations and looking forward to day 2! 01:34:31 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-04-06 07:44:46

Episode 38 – Dataworks Summit 2017 – Preview

This week, your hosts go over what we consider to be our pick of the sessions that will be presented during the Hadoop Summit Dataworks Summit in Munich next week. The Roaring Elephant will be in attendance, look out for the two guys in distinctive yellow fleeces with the Roaring Elephant logo on the back, we hope to see you there! 00:00 Recent events Dave DS Model Lifecycle https://www.svds.com/models-lab-factory/ Stitchfix Algorithm Tour http://algorithms-tour.stitchfix.com/ Cloudera Data Science Workbench http://vision.cloudera.com/cloudera-data-science-workbench-self-service-data-science-for-the-enterprise/ http://www.dbms2.com/2017/03/19/cloudera-data-science-workbench/ Jhon Yarn 3 Data Lake 3.0: The EZ button to deploy in minutes and cut TCO by half https://hortonworks.com/blog/data-lake-3-0-deploy-minutes-cut-tco-half/ Data Lake 3.0 Part 2 – A multi colored YARN https://hortonworks.com/blog/data-lake-3-0-part-2-multi-colored-yarn/ Data Lake 3.0 Part 3 – Distributed tensorflow assembly on apache hadoop YARN https://hortonworks.com/blog/distributed-tensorflow-assembly-hadoop-yarn/ Playing Catch-up: How To Set Up a Shared Amazon RDS as Your Hive Metastore http://blog.cloudera.com/blog/2017/02/how-to-set-up-a-shared-amazon-rds-as-your-hive-metastore/ YARN Node Labels: Label-based scheduling and resource isolation https://developer.ibm.com/hadoop/2017/03/10/yarn-node-labels/ Hadoop at Strata: Not Exactly ‘Failure,’ But It Is Complicated https://www.datanami.com/2017/03/20/hadoop-strata-not-exactly-failure-complicated Microsoft and Hortonworks empower Azure HDInsight customers to be first to benefit from innovation https://hortonworks.com/blog/microsoft-hortonworks-empower-azure-hdinsight-customers-first-benefit-innovation/ https://azure.microsoft.com/en-us/blog/announcing-new-capabilities-of-hdinsight-and-documentdb-at-strata/ http://www.zdnet.com/article/microsoft-moves-ahead-on-cloud-data-ai-fronts/ 40:05 Dataworks Summit 2017 - Day 1 Preview In this episode, we touch most of the sessions that take place during the Dataworks Summit in Munich next week and tell you what our favorites are. https://dataworkssummit.com/munich-2017/agenda/#20170403 01:12:20 Dataworks Summit 2017 - Day 2 Preview After the sessions from the first day, we continue with the second day. 01:32:30 Dataworks Summit 2017 - Preview - General comments Now that we went ver the entire two day event, we had some general comments about the event. 01:42:34 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-03-28 16:00:33

Episode 37 – Big Data Roles: The starter

In this episode, we start a new series on the different roles in Big Data. Purely by coincidence, it turns out that the winner of our raffle started a new job as a Data Engineer at the beginning of this month, so naturally we decided to invite Marcel-Jan on the show to talk about the how and why of his career move. 00:00 Recent events Dave It’s morphing time: Apache Ranger graduates to a Top Level Project https://hortonworks.com/blog/morphing-time-apache-ranger-graduates-top-level-project-part-1/ https://hortonworks.com/blog/morphing-time-apache-ranger-graduates-top-level-project-part-2/ Data-Driven User Engagement https://www.svds.com/data-driven-user-engagement/ Driving Product Engagement with User Behaviour Analytics https://www.svds.com/driving-product-engagement-user-behavior-analytics/ Jhon Using Apache Spark for large-scale language model training https://code.facebook.com/posts/678403995666478/using-apache-spark-for-large-scale-language-model-training/ Big data pioneer Cloudera reportedly planning an IPO later this year http://siliconangle.com/blog/2017/02/28/cloudera-reportedly-planning-ipo-later-year/ Google is acquiring data science community Kaggle https://techcrunch.com/2017/03/07/google-is-acquiring-data-science-community-kaggle/ https://techcrunch.com/2017/03/08/google-confirms-its-acquisition-of-data-science-community-kaggle/ How to use blockchain to build a database solution http://www.zdnet.com/article/blockchains-in-the-database-world-what-for-and-how/ 42:00 Big Data Roles: The starter Interview with Marcel-Jan Krijgsman (https://www.youtube.com/watch?v=fwBWlpm3Sx4), winner of our raffle and by pure coincidence starting a dedicated Hadoop role on the first of this month . Ratiings for MOOCs https://www.class-central.com Hadoop safari : Hunting for vulnerabilities http://archive.hack.lu/2016/Wavestone%20-%20Hack.lu%202016%20-%20Hadoop%20safari%20-%20Hunting%20for%20vulnerabilities%20-%20v1.0.pdf Open Circle Solutions http://www.opencirclesolutions.nl/en/ 01:22:58 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-03-14 16:00:06

Episode 36 – Use-case: Single View

No guests today, just Dave and Jhon talking so brace yourselves! This time we're actually going to explain what we mean by "single view of customer" go through explaining an example of a use-case and discuss how you might implement such a thing. Enjoy. 00:00 Recent events Dave Faster spark! http://www.zdnet.com/article/spark-gets-faster-for-streaming-analytics/ If you’re interested in reading/watching more then check out the site for Spark Summit East, the sessions slides and videos appear to all be live now https://spark-summit.org/east-2016/schedule/ Getting Started with Deep Learning/Speech Recognition http://www.svds.com/getting-started-deep-learning/ http://svds.com/open-source-toolkits-speech-recognition/ Data Driven Depression http://rcharlie.com/2017-02-16-fitteR-happieR/ http://blog.revolutionanalytics.com/2017/02/finding-radioheads-most-depressing-song-with-r.html Jhon IoT Calamity: the Panda Monium http://www.verizonenterprise.com/resources/reports/rp_data-breach-digest-2017-sneak-peek_xg_en.pdf Connecting your own Hadoop or Spark to Azure Data Lake Store https://medium.com/azure-data-lake/connecting-your-own-hadoop-or-spark-to-azure-data-lake-store-93d426d6a5f4#.w5oktxaix IBM Platform to Help Clients Address Storage Challenges at Massive Scale http://www.storagenewsletter.com/rubriques/oemchanneldistribution/ibm-platform-to-help-clients-address-storage-challenges-at-massive-scale/ 28:10 - Use-case: Single View "Single View of an Entity"; what does it mean, how can you do it and why should you? 01:02:52 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-02-28 16:00:24

Episode 35 – What do people get wrong when deploying Hadoop? – Part 2

Paul Codding and Sheetal Dolas, both from Hortonworks, join us in this second part of a two part episode where they share their experience with what can go wrong when Hadoop is deployed. Listen to the tips and tricks these gentlemen share and double the throughput for your cluster. 00:00 Recent events Dave TensorKart: self-driving MarioKart with TensorFlow http://kevinhughes.ca/blog/tensor-kart What is Data Engineering? https://www.dataquest.io/blog/what-is-a-data-engineer/ Jhon Machine Learning is Fun (parts 1-6) https://medium.com/@ageitgey/machine-learning-is-fun-part-6-how-to-do-speech-recognition-with-deep-learning-28293c162f7a#.vv1lh5755 Performance comparison of different file formats and storage engines in the Hadoop ecosystem https://db-blog.web.cern.ch/blog/zbigniew-baranowski/2017-01-performance-comparison-different-file-formats-and-storage-engines How to write code using the Spark Dataframe API: a focus on composability and testing https://blog.godatadriven.com/structure-spark-df-api-code 38:00 What do people get wrong when deploying Hadoop? – Part 2 The second part of the interview with two guests from Hortonworks: Paul Codding Product Management Director at Hortonworks Sheetal Dolas Engineering Leader, Architect And Big Data Champion at Hortonworks 01:12:13 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-02-14 16:00:23

Episode 34 – What do people get wrong when deploying Hadoop? – Part 1

Paul Codding and Sheetal Dolas, both from Hortonworks, join us in this first part of a two part episode where they share their experience with what can go wrong when Hadoop is deployed. Listen to the tips and tricks these gentlemen share and double the throughput for your cluster. 00:00 Recent events Dave Apache Beam becomes a top level project! https://beam.apache.org/ https://beam.apache.org/get-started/beam-overview/ https://github.com/eljefe6a/beamexample/blob/master/BeamTutorial/slides.pdf https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective Four Types of Data Analytics http://insights.principa.co.za/4-types-of-data-analytics-descriptive-diagnostic-predictive-prescriptive MapR claims open source victory with patent http://www.cbronline.com/news/verticals/cio-agenda/mapr-claims-open-source-big-data-victory-patent-award/ Jhon Ransomware attacks on insecure Hadoop systems may be next, say security researchers http://www.itworldcanada.com/article/ransomware-attacks-on-insecure-hadoop-systems-may-be-next-say-security-researchers/389944 http://www.gdi.foundation/ Revenge of the DevOps Gangster: Open Hadoop Installs Wiped Worldwide http://www.threatgeek.com/2017/01/open-hadoop-installs-wiped-worldwide.html Making Big Data User Friendly For Small Businesses https://smallbiztrends.com/2017/01/big-data-and-small-business.html 30:15 What do people get wrong when deploying Hadoop? - Part 1 An interview with two guests from Hortonworks: Paul Codding Product Management Director at Hortonworks Sheetal Dolas Engineering Leader, Architect And Big Data Champion at Hortonworks 01:00:45 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-01-31 16:00:01

Episode 33 – Roaring News

This episode, we have an absolutely brilliant topic that we were going to cover after the news section... But the news section has us talking so much that it ran a bit long. Preferring not to give you a two hour episode, we're rescheduling the delivery of the intended topic to next episode and present you with our first (and probably last) "News only" episode. 00:00 Recent events Dave A pair of “trends to watch in 2017” http://www.techrepublic.com/article/6-big-data-trends-to-watch-in-2017/ http://www.datamation.com/applications/5-big-data-predictions-for-2017.html Learning from a Year of Security Breaches https://medium.com/starting-up-security/learning-from-a-year-of-security-breaches-ed036ea05d9b#.4r22rbfjh Failing to monetise your apps, big data can help http://www.techrepublic.com/article/failing-to-monetize-your-apps-big-data-can-help/ A Perfect Illustration of the Big Data Value Chain http://www.techrepublic.com/article/a-perfect-illustration-of-how-the-big-data-value-chain-works/ Jhon 24/7 Spark Streaming on YARN in Production https://www.inovex.de/blog/247-spark-streaming-on-yarn-in-production/ SparkSQL, Ranger,and LLAP via Spark thrift server for BI scenarios to provide row, column level security, and masking http://hortonworks.com/blog/sparksql-ranger-llap-via-spark-thrift-server-bi-scenarios-provide-row-column-level-security-masking/ The Data Dichotomy: Rethinking the Way We Treat Data and Services https://www.confluent.io/blog/data-dichotomy-rethinking-the-way-we-treat-data-and-services/ 50:24 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-01-17 16:00:16

Episode 32 – The sense and non-sense of certifications

In this episode, we talk about the use and abuse of certifications, both the certifications you van achieve by passing an exam and the Industry ISV certifications that should help yu make purchasing decisions. 00:00 Recent events Dave 5 enterprise uses of blockchain today http://www.pcworld.com/article/3149504/cloud-computing/5-enterprise-related-things-you-can-do-with-blockchain-technology-today.html Top 7 big data trends for 2017 https://datafloq.com/read/the-top-7-big-data-trends-for-2017/2493 How to discover the hidden value in your customer journey https://www.linkedin.com/pulse/how-discover-hidden-value-your-customer-journey-ronald-van-loon Jhon Achieving a 300% speedup in ETL with Apache Spark http://blog.cloudera.com/blog/2016/12/achieving-a-300-speedup-in-etl-with-spark/ The Rhythm of Food http://rhythm-of-food.net/ http://www.thefunctionalart.com/ Information is beautiful awards http://www.informationisbeautifulawards.com/news/188-2016-the-winners Making data personal: Big data made small http://blogs.sas.com/content/sgf/2016/12/13/making-data-personal-big-data-made-small/ 27:50 The sense and non-sense of certifications Educational certifications ISV Certifications 50:59 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2017-01-03 16:00:58

Episode 31 – Bold Predictions, Past and Future

In this episode, we go over the bold predictions for 2016 we made just before the start of the year. Find out how right we were, or indeed how bad we are at predicting the future of Big Data. Undeterred, we then happily put on our Nostradamus hats and proceed to make even more new bold predictions for 2017. Have a listen and let us know if you agree or disagree with our view on the world? 00:03 Bold predictions - reviewing past predictions for 2016 Apace Atlas Apache Nifi Apache Spark SQL BigInsights 28:50 Bold predictions - future predictions for 2017 Fragmentation Data breaches Chat bots Self service Big Data Snake-Oil Alert Cyber security In-Memory & GPU Apache atlas BigInsights 01:07:07 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-12-20 16:00:10

Episode 30 – Apache Software Foundation

So many of the tools and projects we talk about and use every day are prefaced by 6 letters, A P A C H E... What does it mean to be an Apache project? What does the Apache Software Foundation (ASF) do for software? Are there other options? Let us tell you about the ASF! 00:00 Recent events Dave: How we caught the circle line rogue train with data https://blog.data.gov.sg/how-we-caught-the-circle-line-rogue-train-with-data-79405c86ab6a#.mhqs1mikx Black Friday 2016: Mobile vs Desktop User Behaviour http://appinstitute.com/black-friday-2016-mobile-vs-desktop-sales/ AI Machine Attempts to Understand Comic Books ... and Fails https://www.technologyreview.com/s/602973/ai-machine-attempts-to-understand-comic-books-and-fails/ https://arxiv.org/abs/1611.05118 https://arxiv.org/pdf/1611.05118v1.pdf Jhon: Paypal From Big Data to Fast Data in Four Weeks or How Reactive Programming is Changing the World Part 1 and Reactive programming manifesto http://www.reactivemanifesto.org/ https://www.paypal-engineering.com/2016/11/08/from-big-data-to-fast-data-in-four-weeks-or-how-reactive-programming-is-changing-the-world-part-1/ Part 2: How that change was followed by adding a spark micro bath (streaming) to the workflow https://www.paypal-engineering.com/2016/11/18/from-big-data-to-fast-data-in-four-weeks-or-how-reactive-programming-is-changing-the-world-part-2/ Paypal And they are not only using spark, here is one talking about how they use storm for another real-time workflow. https://www.paypal-engineering.com/2016/11/15/carrier-payments-big-data-pipeline-using-apache-storm/ Managing Spark Partitions with Coalesce and Repartition A short write up on how spark does partitioning internally and some ways of improving the partition scheme https://medium.com/@mrpowers/managing-spark-partitions-with-coalesce-and-repartition-4050c57ad5c4#.s2l3yxemt Principa The Top Predictive Analytics Pitfalls to avoid http://insights.principa.co.za/the-top-predictive-analytics-pitfalls-to-avoid?utm_content=buffera2780&utm_medium=social&utm_source=facebook.com&utm_campaign=buffer ODPi Publishes First Operations Specification To Provide Developers Consistency Across Application Management Tools As John talked about in our anniversary episode, the ODPI 2.0 released https://www.odpi.org/announcements/2016/11/14/odpi-publishes-first-operations-specification-to-provide-developers-consistency-across-application-management-tools 25:30 Apache Software Foundation The ASF http://apache.org/ Overview http://apache.org/foundation/ Process http://apache.org/foundation/how-it-works.html The Project List http://apache.org/index.html#projects-list Other Open Source Licence Options http://choosealicense.com/ https://opensource.org/licenses 01:02:08 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-12-06 16:00:32

Episode 29 – 1 Year anniversary

One year of elephants roaring has come and gone so we reminisce a little bit about what happened over the last year. And since we could not have done this podcast nearly as good without them, we asked the special guests we have had on the podcast over the previous year to call in on the Skype call and talk about what they have been up to. 00:00 One year of pod-casting... Dave and Jhon reminiscing about how the Podcast got started. 06:55 Fireside chats with guests over the year 07:56 Joe Witt, Senior Director of Engineering at Hortonworks, 22:40 Michele Lamarca, Team Lead Big Data at Bright Computing 43:00 John Mertic, Director of Program Management for ODPi 01:04:23 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-11-22 16:00:33

Episode 28 – Talking Datameer with Erik Stalpers

In this episode, Dave is stuck in a hotel basement in the middle of internet nowhere and Erik Stalpers from Datameer joins us to talk about the Datameer exploration and visualization tool. 00:00 Recent events Dave Machine learning vs AI http://www.wired.co.uk/article/machine-learning-ai-explained Machine Learning Data Cleansing https://gcn.com/articles/2016/10/19/activeclean-big-data.aspx https://activeclean.github.io/ Battle of the Data Science Venn Diagrams http://www.kdnuggets.com/2016/10/battle-data-science-venn-diagrams.html http://www.prooffreader.com/2016/09/battle-of-data-science-venn-diagrams.html (original doc 21 september 2016) Jhon How Vector Space Mathematics Helps Machines Spot Sarcasm https://www.technologyreview.com/s/602639/how-vector-space-mathematics-helps-machines-spot-sarcasm/ Straight talk about big data http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/straight-talk-about-big-data 25:10 Talking Datameer with Erik Stalpers Erik Stalpers, Solution Engineer at Datameer https://nl.linkedin.com/in/erikstalpers https://www.datameer.com/ 59:39End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-11-08 15:00:44

Episode 27 – Security 3: Encryption at rest and in motion

Rounding out our series on security in Hadoop, we finish with Encryption at rest and in motion. We go over the different approaches, do's and don'ts and mention some higher level application in this space. 00:00 News for the week! Dave: Executives Still Relying on Gut, Not Gigabytes in Planning for Future http://www.datadigestonline.com/2016/10/executives-still-relying-on-gut.html Rewriting SAS Programs for Financial Data Manipulation in R http://blog.revolutionanalytics.com/2016/09/rewriting-sas-in-r-for-finance.html Chris Surdak - Why so many Big Data projects fail http://surdak.com/innovation-vs-improvement/ Jhon: Apache Spark 2.0 Performance Improvements Investigated With Flame Graphs (14-Sep-2016) http://db-blog.web.cern.ch/blog/luca-canali/2016-09-spark-20-performance-improvements-investigated-flame-graphs SQL on Hadoop benchmarks get serious (14-Oct-2016) http://www.zdnet.com/article/sql-on-hadoop-benchmarks-get-serious/ WHERE IS APACHE HIVE GOING? TO IN-MEMORY COMPUTING. (06-Oct-2016) http://hortonworks.com/blog/apache-hive-going-memory-computing/ APACHE HIVE VS APACHE IMPALA QUERY PERFORMANCE COMPARISON (11-Oct-2016) http://hortonworks.com/blog/apache-hive-vs-apache-impala-query-performance-comparison/ Cloudera wants extra money from Intel to become a cloud provider? http://venturebeat.com/2016/08/30/cloudera-cloud-intel/ Four interesting things about IBM, Hadoop and open source (2 years old) http://www.ibmbigdatahub.com/infographic/four-interesting-things-about-ibm-hadoop-and-open-source Recovering from a database disk failure in Big SQL (20-oct-2016) https://developer.ibm.com/hadoop/2016/10/20/recovering-from-a-database-disk-failure-in-big-sql-worker-node-4-1fp2-and-4-2/ 37:20 Security 3: Encryption at rest and in motion Nice intro in the apache docs: https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html RPC Encryption: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_Security_Guide/content/ch_wire-rpc.html 57:53 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-10-25 15:00:18

Episode 26 – Security 2: Authorisation and audit

In this episode, we continue our coverage on Hadoop security. Where episode 24 dealt with the subject of authentication, we now delve deeper in the why and how of authorization and audit, and cover the major players in the arena. 00:00 Recent events Dave Beyond Privacy and Security in a Connected World http://www.svds.com/beyond-privacy-security-connected-world/ The broken promise of open-source Big Data software – and what might fix it http://siliconangle.com/blog/2016/09/27/the-broken-promise-of-open-source-big-data-software-and-what-might-fix-it-2/ Meet Apache Spot, a new open source project for cybersecurity http://www.csoonline.com/article/3124497/big-data/meet-apache-spot-a-new-open-source-project-for-cybersecurity.html SMEs advised to capitalise on ‘big data’ http://www.farminglife.com/news/farming-news/smes-advised-to-capitalise-on-big-data-1-7606523 Jhon What is hardcore data science—in practice? https://www.oreilly.com/ideas/what-is-hardcore-data-science-in-practice Hortonworks, IBM Collaborate to Offer Open Source Distribution on Power Systems http://www.prnewswire.com/news-releases/hortonworks-ibm-collaborate-to-offer-open-source-distribution-on-power-systems-300330299.html https://www-03.ibm.com/press/us/en/pressrelease/50553.wss Inside 'The Next Rembrandt': How JWT Got a Computer to Paint Like the Old Master The project leaders explain their brilliant, troubling masterpiece http://www.adweek.com/news/advertising-branding/inside-next-rembrandt-how-jwt-got-computer-paint-old-master-172257 https://www.nextrembrandt.com/ Strata+Hadoop World New York http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/grid/public/2016-09-28 http://hortonworks.com/blog/ http://community.cloudera.com/t5/News/ct-p/Welcome Cloudera Kudu 1.0.0 released http://community.cloudera.com/t5/Community-News-Release/ANNOUNCE-Apache-Kudu-1-0-0-released/m-p/45332 Audience Questions from Sampath @ Baltimore: http://www.infoignite.com/sentiment.html Azure HDInsight 3.5: https://azure.microsoft.com/en-gb/blog/new-security-performance-and-isv-solutions-build-on-azure-hdinsight-s-leadership-to-make-hadoop-enterprise-ready-for-the-cloud/ Azure Search: https://azure.microsoft.com/en-us/services/search/ 42:15 Security 2: Authorisation and audit The principles of auth reflected by the underlying organisation of your data Sync with AD/LDAP groups, don’t go user specific wherever possible. Use whatever tools are in your platform: Cloudera - Sentry https://sentry.apache.org/ Hortonwork - Ranger http://ranger.apache.org/ MapR - ??? https://www.mapr.com/hadoop-security-and-big-data-governance-mapr 01:10:32 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-10-11 15:00:48

Episode 25 – The pro’s and con’s of crafting your own distribution

When we talk about Big Data and Hadoop in particular, we generally have one of the existing distributions from Cloudera, Hortonworks or other Big Data companies in mind. But sometimes, a pre-built distro just does not meet the needs. In this episode, we have a guest on the show that explains why they made the choice to forgo the available distributions in favour of building ones own. http://lod-cloud.net/ 00:00 Recent events Dave: Which tool should I use? http://brohrer.github.io/which_tool_should_i_use.html YaRrr! - The Pirate’s guide to R Blog: http://nathanieldphillips.com/thepiratesguidetor/ YaRrr! - Download the book: https://drive.google.com/file/d/0B4udF24Yxab0S1hnZlBBTmgzM3M/view Video tutorials to go with the above: https://www.youtube.com/playlist?list=PL9tt3I41HFS9gmeZFEuNrnu_7V_NFngfJ Listener Question from Sampath from Baltimore: When moving into a career in Big Data, is it better to pick a technology like Spark and try to build expertise on it versus having a broader knowledge on many tools. I registered for Edx courses and working towards getting Cloudera Certification. Please provide me any advice. Jhon: More accountability for big-data algorithms http://www.nature.com/news/more-accountability-for-big-data-algorithms-1.20653 The "doomsday" version: http://time.com/4471451/cathy-oneil-math-destruction/ 6 Illusions Execs Have About Big Data https://www.entrepreneur.com/article/281809 Michele: Hadoop release 3.0.0-alpha1 available http://hadoop.apache.org/releases.html#03+September%2C+2016%3A+Release+3.0.0-alpha1+available Running Spark on Alluxio with S3 https://www.oreilly.com/learning/running-spark-on-alluxio-with-s3 47:00 The pro's and con's of crafting your own distribution With our special guest Michele Lamarca (@nonfacciocip). Many thanks to Michele for being on the podcast with us and sharing his experiences! 01:34:59 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-09-27 15:00:16

Episode 24 – Hadoop Summit Melbourne 2016 Preview

With Hadoop Summit Melbourne 2016 starting the day after we are recording this episode, we go over the published agenda and discuss the current state of the Big Data Technology ecosystem while we pick our favorite sessions. Wish we were there! 00:00 Recent events Dave Cloud Security Alliance release cloud and big data security guidelines http://siliconangle.com/blog/2016/08/28/the-cloud-security-alliance-publishes-its-best-practices-for-big-data-security/ https://cloudsecurityalliance.org/download/big-data-security-and- privacy-handbook/ Common Big Data Backup and Recovery myths http://www.networkworld.com/article/3113036/big-data-business-intelligence/debunking-the-most-common-big-data-backup-and-recovery-myths.html Big Data, Google, and the end of free will http://www.ft.com/cms/s/2/50bb4830-6a4c-11e6-ae5b-a7cc5dd5a28c.html Jhon SuperComputing now going to hadoop style systems https://techcrunch.com/2016/05/24/crays-latest-supercomputer-runs-openstack-and-open-source-big-data-tools/ The Home for Data Science https://www.kaggle.com/ 36:10 Hadoop Summit Melbourne 2016 Preview 01:07:33 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-09-13 15:00:15

Episode 23 – Security in Hadoop – Authentication

In this episode, we discuss this fortnight's interesting big data news that caught our eye and then go on to discuss the basics around authentication in Hadoop for what is the first in a series of episodes that we'll be doing over the next few months on the broad topic of security. 00:00 Recent events Dave: The new science behind customer loyalty http://insights.principa.co.za/the-new-science-behind-customer-loyalty http://insights.principa.co.za/infographic-creating-a-data-driven-customer-loyalty-strategy 5 great charts in 5 lines of R code http://blog.revolutionanalytics.com/2016/08/five-great-charts-in-5-lines-of-r-code-each.html Using big data to create value for customers, not just target them https://hbr.org/2016/08/use-big-data-to-create-value-for-customers-not-just-target-them Jhon: Linux turns 25 (25 August 1991 ) https://www.linux.com/news/linus-torvalds-reflects-25-years-linux http://web.archive.org/web/20100104211620/http://www.linux.org/people/linus_post.html Hadoop 2.7.3 a minor release in the 2.x.y release line, building upon the previous stable release 2.7.2 http://hadoop.apache.org/docs/r2.7.3/ Specification work related to the Hadoop Compatible Filesystem (HCFS) effort. Hadoop in the cloud/as a service getting a lot of attention lately http://hortonworks.com/blog/making-elephant-fly-cloud/ http://blog.cloudera.com/blog/2016/08/analytics-and-bi-on-amazon-s3-with-apache-impala-incubating/ https://vision.cloudera.com/analytic_database_in_cloud/ http://venturebeat.com/2016/08/25/sap-altiscale/ Facebook open sources image-recognition AI with live video in mind https://research.facebook.com/blog/learning-to-segment/ NoSQL Databases: a Survey and Decision Guidance https://medium.baqend.com/nosql-databases-a-survey-and-decision-guidance-ea7823a822d#.c037d5jbj Committer criteria from Apache https://hadoop.apache.org/committer_criteria.html Maybe they should just have referred to our podcast! :) Episode 11 - Interview with Community Award Winner Venkatesh Sellappa 40:20 Security in Hadoop - Authentication What is Authentication? Why is it important? When should I do it? Hadoop is insecure by default without strong Authentication Kerberos Active Directory, MIT Kerberos and FreeIPA 01:07:49 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-08-30 15:00:33

Episode 22 – Big Data in Small Business

The main subject in this episode features answer to a listener question we received a couple of months ago: How can big data help small businesses? What ways can small business use big data? At the moment all the talk is about big data helping enterprise firms. And we are introducing a new section which we hope you will enjoy! 00:00 Recent events Working with a new team in sunny cork, getting them up to speed Workshop with a global SI and a European tel-co about the upcoming phases of their big data journey Workshop with a customer who has been using Hadoop for a very long time, since Hadoop 0.2! Finally looking to migrate into the future Multi vendor workshop fraud analytics Object recognition and detection in images. 11:30 Our very own "New and Noteworthy" Dave http://blogs.teradata.com/international/streaming-analytics-story-many-tales/ http://www.datasciencecentral.com/m/blogpost?id=6448529%3ABlogPost%3A453888 http://research.ibm.com/cognitive-computing/ostp/rfi-response.shtml http://dataconomy.com/10-online-big-data-courses-2016/ Jhon Apache Spark 2.0 (July 28, 2016) http://spark.apache.org/releases/spark-release-2-0-0.html Unifying DataFrame and Dataset (RDD): In Scala and Java, DataFrame and Dataset have been unified, i.e. DataFrame is just a type alias for Dataset of Row. SparkSession: new entry point that replaces the old SQLContext and HiveContext for DataFrame and Dataset APIs. MLLib: The DataFrame-based API is now the primary API. The RDD-based API is entering maintenance mode. Spark 2.0 substantially improved SQL functionalities with SQL2003 support. Spark SQL can now run all 99 TPC-DS queries Ships the initial experimental release for Structured Streaming, a high level streaming API built on top of Spark SQL Databricks article: https://databricks.com/blog/2016/07/28/continuous-applications-evolving-streaming-in-apache-spark-2-0.html Apache Mesos 1.0 released https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces97 http://techblog.netflix.com/2016/07/distributed-resource-scheduling-with.html Apache Twill becomes top level project http://twill.apache.org/ https://blogs.apache.org/foundation/entry/apache_software_foundation_announces_apache1 44:40 Big Data for Small Business Define "small business" How can big data help small businesses What ways can small business use big data The problems a small business could face http://www.columnfivemedia.com/100-best-free-data-sources-infographic Our answers to those problems Some conclusions 01:32:35 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-08-16 15:00:00

Episode 21 – The Open Data Platform Initiative

This episode we have an interview with John Mertic about ODPi. There has been plenty of mystery and even some controversy about ODPi which we attempt to resolve for you. Big thanks to John for giving us some of his time for this interview! Sadly, this time the Skype Gods were not with us and we experienced some drops and hitches. We tried to smooth things over as much as possible, but we were not able to achieve our usual level of quality this time. 00:00 Recent events Vacation for Dave Study for Jhon 10:40 Interview with John Mertic @ ODPi https://www.odpi.org/ John Mertic, Director of Program Management for ODPi and Open Mainframe Project Find John on twitter: @jmertic If you're not familiar with the ODPi here's a few good links to get you started and interested in the area: Links to the ODPi Specifications: https://www.odpi.org/specifications Watch an interview with Alan Gates who discusses what the ODPi is trying to do to simplify the big data world: https://www.youtube.com/watch?v=Vogw33pbNOE Watch an interview with John Mertic who discusses how the ODPi compliance affects upstream Hadoop components: https://www.youtube.com/watch?v=siEkCutk_f8 56:30 Questions from our Listeners No questions this episode... ask us more questions and we'll answer them! 59:22End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-08-02 15:00:00

Episode 20 – Dave’s Hadoop Summit San Jose 2016 Retrospective – Part 2

In this second part, we discuss the sessions that Dave attended at the San Jose Hadoop Summit and we go in depth on some related topics. Since we ran over an hour with the main topic, and we did not want to make this a three-parter, we decided to forgo the questions from the audience just this one time... 00:00 Recent events Vacation tine! Edx.Org Big Data Courses 04:00 Dave's Hadoop Summit San Jose 2016 Retrospective - Part 2 Session 1: End-to-End Processing of 3.7 Million Telemetry Events per Second Using Lambda Architecture, by Saurabh Mishra @ Hortonworks and Raghavendra Nandagopal @ Symantec Talking point: Hero-culture or why nobody wants to talk about failure anymore Session 2: Top Three - Big Data Governance Issues and How Apache ATLAS resolves it for the Enterprise, by Andrew Ahn @ Hortonworks Talking point: Guaranteed Governance, who certifies the certificate? Session 3: IoT, Streaming Analytics and Machine Learning: Delivering Real-Time Intelligence With Apache NiFi, by Paul Kent @ SAS and Dan Zaratsian @ SAS Talking point: Commercial solutions versus build your own in open source Session 4: Productionizing Spark on YARN for ETL at Petabyte Scale, by Ashwin Shankar and Nezih Yigitbasi @ Netflix Talking point: Is Hadoop stilll a low-cost commodity affair? Session 5: Analyzing Telecom Fraud at Hadoop Scale, by Sanjay Vyas @ Diyotta Talking Point: Do commercial, proprietary products have a place at Hadoop Summit or are they just marketing fluff? 01:06:28 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-07-19 15:00:00

Episode 19 – Dave’s Hadoop Summit San Jose 2016 Retrospective

Dave went to the Hadoop Summit 2016 in San Jose last week and came back with a riveting tale to tell. In this first part of the Summit coverage, join me when I ask Dave all about the keynotes and the general event. Join us next episode where Dave will talk about some of the sessions he attended! 00:00 Recent events Lift and shift to IaaS Hybrid Disaster Recovery Spark & ML goodness MOOC's San Jose Hadoop Summit 09:25 Dave went to the Hadoop Summit in San Jose! Record attendance, maybe a venue change in future Sponsor exhibition area including "interesting" story The Community Corner The keynotes Hadoop is 10 years old Microsoft on Machine Learning Hadoop Assemblies Hadoop fragmentation Cyber security Car insurance premiums "to measure" Ethics session 40:55 Questions from our Listeners Beefy feedback from Kris A listener wants to know if it is worth the trip to go to the US Summit or to just go to the "local" Summit, wherever that is. Nishant would like an episode about the entire ecosystem. What do you think? 48:24 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-07-05 15:00:00

Episode 18 – MLeap interview: Productionising Data Science – Part 2

In this episode, we have the second part of the interview with Hollin Wilkins and Mikhail Semeniuk, the driving forces behind the MLeap project where they go into more technical details and give tips on deploying MLeap in your environment. If you are working with Spark, are deep into machine learning and are struggling to put those beautifully trained models into production, you definitely do not want to miss this episode! 00:00 Recent events Yet more telco security, again. RFI for european energy company followd by "the RFI rant" Metronnnnnnnnnnn Big Data Hackathon for an airline company predicting delays Preparing an IoT hackathon on predictive maintenance Spreading the word on MLeap at a couple of customers! 11:22 Interview on MLeap with Hollin Wilkins and Mikhail Semeniuk Part 2 http://combust.ml/ http://combust.ml/blog/2016/03/30/flexible-akka-clients-and-servers-part-1.html https://github.com/TrueCar/mleap https://github.com/TrueCar/mleap-demo 35:25 Questions from our Listeners Are there other technologies that allow machine learing models to be exposed as "web" api's? Zeppelin multi tenant right now? 43:17End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-06-21 15:00:00

Episode 17 – MLeap interview: Productionising Data Science

In this episode, we have an interview with Hollin Wilkins and Mikhail Semeniuk, the driving forces behind the MLeap project. If you are working with Spark, are deep into machine learning and are struggling to put those beautifully trained models into production, you definitely do not want to miss this episode! 00:00 Recent events Machine Learning Hackathon on Azure Strata Europe Fighting with Kafka 09:30 Interview on MLeap with Hollin Wilkins and Mikhail Semeniuk Meet Hollin and Mikhail today (7-Jun-2016) at Spark Summit 2016 in San Francisco! https://spark-summit.org/2016/events/mleap-productionize-data-science-workflows-using-spark/ http://combust.ml/ http://combust.ml/blog/2016/03/30/flexible-akka-clients-and-servers-part-1.html https://github.com/TrueCar/mleap https://github.com/TrueCar/mleap-demo 40:50 Questions from our Listeners The Episode 12 mystery unraveled Nifi works well for prototyping, but what's your view on using Nifi in production in a normal DTAP (Development, testing, acceptance and production) environment? 54:00 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-06-07 15:00:00

Episode 16 – Interview part two with Sumeet Singh – Senior Director, Cloud and Big Data Platforms @ Yahoo!

Hopefully you enjoyed the first part of our interview with Sumeet, here is part two where we go into more detail about Yahoo's use of Hadoop, with lots of interesting topics coming up including the splintering of the ecosystem, governance and much much more. 00:00 Recent events Customer and partner adventures with Apache Nifi Jhon is settling in at Microsoft but is unfortunately quite jet-lagged. 08:15 Part two of our interview with Sumeet Singh - Senior Director, Cloud and Big Data Platforms @ Yahoo! 39:05 Questions from our Listeners Is Apache Atlas Ready for production today? 46:35 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-05-24 15:00:00

Episode 15 – Interview with Sumeet Singh – Senior Director, Cloud and Big Data Platforms @ Yahoo!

Having met Sumeet at the Hadoop Summit we thought he'd make a great guest for the podcast, so here he is for your listening pleasure! 00:00 Recent events Louder! iTunes and the missing episode 12 Jhon's new role at Microsoft Hadoop as a Service A fortnight of SAS + Hadoop Metron teething troubles https://issues.apache.org/jira/browse/METRON-136 17:50 Interview with Sumeet Singh - Senior Director, Cloud and Big Data Platforms @ Yahoo! 42:50 Questions from our Listeners One data-lake for all workloads? Or separate clusters for each set of workloads? How large a team do I need to manage a Hadoop cluster? 1:00:56 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-05-10 15:00:00

Episode 14 – Hadoop Summit – Retrospective

After the last two special edition episodes where we quickly covered each Summit day in a "same-day" episode, we go over the full event in this episode, highlighting the sessions we enjoyed the most and sharing our general feelings about the 2016 Hadoop Summit in Dublin. 00:00 Recent events Summit! Sessions on youtube Meetings and planning, Apache Metron https://cwiki.apache.org/confluence/display/METRON/Metron+Wiki https://community.hortonworks.com/articles/26047/apche-metron-tp1-blog-series.html Setting up a new podcast recording "studio" 09:00 Hadoop Summit - Retrospective Summit Schedule App Hortonworks emphasising Streaming ingest using Nifi, but the other talks did not so much Summit video sessions are starting to appear online https://www.youtube.com/channel/UCAPa-K_rhylDZAUHVxqqsRA/videos Next year: Munich Day one sessions: It's not the size of your cluster, It's how you use it Big Fish - David Darden & Don Smith Unified stream and batch processing with Apache Flink Artisans Gmbh - Ufuk Celebi Taming the Elephant Hortonworks - Paul Codding How To: A beginners guide to becoming an apache contributor Teradata - Venkatesh On-Demand HDP Clusters using Cloudbreak and Ambari Symantec - Karthik Karuppaiya & Narendra Bidari Machine Learning in Big Data - Look Forward or be left behind Redpoint Global Inc - Bill Porto Past, Present, Future of hadoop at LinkedIn LinkedIn - Carl Steinbach Migrating Hundreds of Pipelines in Docker Containers Spotify - Noa Resare Day two sessions: MLLeap: Or how to Productionize Data science workflows using Spark Shift Technologies - Mihkail Semenluk & TrueCar - Hollin Wilkins Scaling out to 10 Clusters, 1000 Users, and 10,000 Flows: The Dali Experience at LinkedIn Carl Steinbach, LinkedIn Hadoop Platform at Yahoo: A Year in Review Sumeet Singh, Yahoo!, Inc. Apache Hive 2.0 SQL Speed Scale Hortonworks - Allen Gates Telematics with Hadoop and Nifi Adam Morton, Admiral Insurance - Simon Elliston Ball, Hortonworks Apache Eagle - Monitor Hadoop in Real-Time Ebay - Young Zang & Arun manoharan 43:18 Questions from our Listeners Great question in from Rene about small businesses and Big Data which we’ll cover on a future episode! Also Rene's feedback has helped us tweak the feedback form so it’s easier to use. Is this a vendor podcast? No, we’re all community! :o) How do you record the podcast, what is your equipment? Skype-saurus: the original, expensive hardware solution. http://www.leoville.com/the-skypesaurus-story (Sadly, this no longer seems to be available anymore.) Skype-o-saurus: a cheaper solution using an OS-X agregate sound device. https://drupalize.me/blog/201504/recording-podcasts-creating-skype-o-saurus 51:48 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-04-26 15:00:00

Episode 13 – Hadoop Summit Dublin 2016 – Day 2

Welcome to our second special edition podcast bought to you from day 2 of the Hadoop Summit. Breaking our normal fortnightly flow we're delivering a fresh new podcast at the end of each day of the Hadoop Summit. In this episode we cover our impressions of the second day of keynotes and yet more sessions that we enjoyed. 00:00 Recent events Introduction to the Hadoop Summit Dublin 2016 from day 2 01:45 Hadoop Summit 2016 Dublin Day 2 Review Keynote/Session - Yahoo! - Sumeet Singh Keynote - Information is Beautiful - David McCandless http://www.informationisbeautiful.net/ MLeap - Mihael Semeniuk (shift Technologies) Hollin Wilkins (Truecar) Admiral - Adam Morton (Admiral) and Simon Ball (Hortonworks) Hive - Alan Gates (Hortonworks) 37:47 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-04-15 05:37:00

Episode 12 – Hadoop Summit Dublin 2016 – Day 1

Welcome to our special edition podcast bought to you from day 1 of the Hadoop Summit. Breaking our normal fortnightly flow we're delivering a fresh new podcast at the end of each day of the Hadoop Summit. In this episode we cover our impressions of the keynotes and some of the sessions we enjoyed during day 1. 00:00 Recent events Introduction to the Hadoop Summit episode for day 1 01:40 Main Topic Some comments from attendees as to what they're looking forward to at the event Conversation about the keynotes and the sessions we enjoyed 29:38 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-04-14 04:57:00

Episode 11 – Interview with Community Award Winner Venkatesh Sellappa

Venkatesh is a new contributor to Apache NiFI and during his talk at the Hadoop Summit next week, he takes a light-hearted look at his journey of how to become a contributor to an Apache Project. Venkatesh is one of the Community Choice winners, so congratulation are in order and we are certain you will like this interview! Enjoy, and we looking forward to seeing you at the Hadoop Summit in Dublin next week! 00:00 Recent events Easter Break Big Data Analytics Big Telco workshops/meetings and sessions stuff Domain Knowledge is important 05:40 Main Topic Interview with Venkatesh Sellappa 33:50 Questions from our Listeners: No questions this time but information on our activities during the upcoming Hadoop Summit. 37:18 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-04-05 15:00:00

Episode 10 – Preparing for the 2016 Hadoop Summit in Dublin

Next month, the European Hadoop Summit will take place in Dublin. Now that the agenda for the event has been nearly finalised we take it upon ourselves to provide a virtual guide to the event. There's a lot of good things happening during the event so we share with you what sessions we think we'll be attending and why. Enjoy, and looking forward to seeing you there! This is another long episode, going over an hour for the first time. We are really curious to know if you like these longer episodes, or if you would prefer it if we kept it under the original 30 to 35 minutes? 00:00 Recent events Hands on upgrading, express vs rolling upgrade Workshop at telecom company in Russia Nifi workshops Securing a Hadoop cluster 08:00 Main Topic Dave has assembled some statistics on the type of sessions available. What sessions we would attend and why. http://hadoopsummit.org/dublin/agenda/ General advice to visitors mixed in... 54:30 Questions from our Listeners: What else is going on during the summit dates? Should I visit the Hadoop Summit and if so, go to Europe, the US or Australia? How do I get a speaking slot at summit? https://hadoopsummit.uservoice.com/ What other events are comparable/usefull to visit? 01:03:50 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-03-22 15:00:00

Episode 9 – SQL in Hadoop

SQL was one of the first data access methods added to vanilla Hadoop. Considering that the many of the people working with Hadoop in the early days came from a database background, this is not surprising. Since then, the SQL ecosystem in Hadoop has grown considerably and in this episode we do a general overview of many of the available choices.This episode runs a bit longer than normal but we hope you'll find it worthwhile! 00:00 Recent events Spark masterclasses NiFi on trains Mifid II and the active archive World Mobile Congress 08:30 Main Topic SQL solutions: Apache Hive https://hive.apache.org/ Apache Spark Sql http://spark.apache.org/sql/ Apache Phoenix https://phoenix.apache.org/ Apache Impala (incubating) https://www.cloudera.com/products/apache-hadoop/impala.html Apache Hawq (incubating) http://hawq.incubator.apache.org/ Apache Drill https://drill.apache.org/ Presto https://prestodb.io/ Oracle Big Data Sql http://www.oracle.com/us/products/database/big-data-sql/overview/index.html IBM BigSql http://www-01.ibm.com/software/data/infosphere/hadoop/big-sql.html Technology topics: JDBC/ODBC SQL syntax compliance Multi-user concurrency Benchmarks 46:40 Questions from our Listeners: How much storage overhead should I count on if I add SQL in my Hadoop workflow? How do I make my sql faster? 53:38 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

2016-03-08 16:00:00

-
-
(基於 PinQueue 指標)
0 則留言