It is the year 2020, no more time for large and expensive clusters.

These days, a modern Data Lake, built in a Cloud environment, should use as much as possible Cloud Native, Serverless services, to get the full agility, elasticity, and efficiency provided by the Public Cloud Paradigm.

In this note, I want to briefly highlight how such a Data Lake can be built using Oracle Cloud Infrastructure (OCI) and how we can use Oracle Autonomous Data Warehouse (ADWH) to provide SQL access to data stored in the Data Lake.

I’ll briefly describe what is the best option to store the data and what are the services available to extract, transform, and load the data. Then, I’ll address the steps needed to enable SQL access. …

Why Kaggle?

I started using Kaggle seriously a couple of months ago when I joined the SIIM-ISIC Melanoma Classification Competition.

The initial reason, I think, was that I wanted a serious way to test my Machine Learning (ML) and Deep Learning (DL) skills. At the time, I was studying for the Coursera AI4Medicine Specialization and I was intrigued (I’m still) by what can be realized by applying DL to Medicine. I was also reading the beautiful book by Eric Topol: Deep Medicine, which is full of interesting ideas on what could be done.

I had opened my Kaggle account several years ago, but haven’t done yet anything serious. Then, I discovered the Melanoma Challenge and it seemed to be a really good way to start working on a difficult task, with real data. …

Should we fear that automated decisions made by Machine Learning models are unfair?

Image for post
Image for post
(Photo by Deon Black on Unsplash)


I was preparing a webinar and I was thinking about what subjects could be interesting for it, between the many different subjects that come out in my mind.

I thought that to be successful in realizing a Machine Learning project, one should be really aware of what are the pitfalls, what are the points, during the project, where you can do important mistakes.

One area of concern for sure is regarding BIAS. For this reason, I decided to devote some of my time and prepared the following notes.

What is Bias and why does it matter?

Well, let’s start with a definition. And I will take the one coming from Wikipedia: “ Bias is a disproportionate weight in favor of or against an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. …People may develop biases for or against an individual, a group, or a belief. In science and engineering, a bias is a systematic error”. …

Come fanno aziende come Netflix a scoprire cosa ci può piacere?

Image for post
Image for post


Un elemento fondamentale di un moderno sito di commercio elettronico è il motore di raccomandazione, la componente del sistema che suggerisce ai clienti quali possono essere i prodotti o gli oggetti di maggiore interesse.

Ma come funziona e come può essere realizzato un motore di raccomandazione? Ed esiste una sola tecnica per realizzarlo, oppure, come possiamo immaginare, è possibile scegliere quale “anima” dare al nostro sito scegliendo su quale tecnica basare il suo motore di raccomandazione?

E, le aziende più famose, ad esempio Netflix, come hanno realizzato il loro motore di raccomandazione? …

How to easily provision an environment with GPU, H2O4GPU, TensorFlow, PyTorch, on Oracle Cloud.

Image for post
Image for post


If you want to develop a Deep Learning model, or maybe use a Machine Learning framework like H2O, most certainly you will get great benefits from using an environment with a GPU.

But, it is not always easy the setup of such an environment. You need to find all the right versions for Nvidia drivers, TensorFlow, and all the other Open Source packages and tools and you risk to spend a lot of time, simply trying to figure out the correct and compatible versions.

In a previous article, I have described all the steps needed to set up a VM in Oracle Cloud, using Ubuntu 18 LTS. …

Image for post
Image for post
Running on a multi-CPU VM


If you read an introductory book on Machine Learning (ML), the development of a model can seem always an easy, interesting and nice task. The reality, in a true business case, is quite different.

In the real world, the development of an effective model can be a difficult and rather long task. To achieve good predictive power, you need a lot of data (the more the better) and often many attempts with different algorithms and to define best values for the hyper-parameters of the model.

That is why often you need high computational power, to be able to iterate many executions in a reasonable time. …


If you want to develop a Deep Learning model and want a good likelihood to be successful, you need two things:

  • lots of data, for the training of the model
  • enough computational power, in the form of GPUs

A Deep Learning model commonly is based on a Deep (many layers) Neural Network. The algorithm commonly used for the training of the model is based on some form of the “back-propagation” algorithm and it requires many “tensor” operations. For this kind of operations, a GPU is much more effective than a CPU since it can execute with a degree of parallelism that you can’t achieve even with a modern CPU. Nvidia P100 has 3584 core and is capable of 5.3 …

OML4Py: a Python extension to Oracle Advanced Analytics.


These days I’m working, with my colleague C. De Bari, on setting up training for customers and partners where we will talk about Oracle Machine Learning for Python. Cool, but what is it?

Oracle DB since a long time has an option called: Oracle Advanced Analytics. It is an option that provides you access to many widely used Machine Learning algorithms and enables you to run these algorithms without the need to move data away from the DB.

Many algorithms are available. Algorithms for regression, classification, anomaly detection, clustering, feature extraction and so on. The list is long.

Using these algorithms as part of Oracle DB not only enables you to have the power of Machine Learning without moving data away from the DB, with all the positive implications about security, efficiency and so on but also to leverage the underlying parallel DB engine. …

Using OCI Python SDK and Autonomous DWH API you can easily load data for your Data Science and ML work

Image for post
Image for post


In a previous article, I have explored how we can use Python and popular Open Source frameworks, like Pandas and scikit-learn, to analyze data stored with Oracle Autonomous Data Warehouse (ADWC). In this shorter story, I want to show you how you can easily load even big data files using OCI Python SDK.

Automation “on steroids”

One of the big features of the Cloud is Automation. …

How to use Open Source tools to analyze data managed through Oracle Autonomous Data Warehouse Cloud (ADWC).


Oracle Autonomous Database is the latest, modern evolution of Oracle Database technology. A technology to help managing and analyzing large volumes of data in the Cloud easier, faster and more powerful.

ADWC is the specialization of this technology for Data Warehouse and Advanced Data Analysis. It is a technology that simplifies uploading, transforming data and making advanced analytical tools accessible to Business Users and non-DBAs. Those tools that are part of the baggage of Data Scientists, so to speak.

In this article, however, I do not want only to examine in depth the tools available in ADWC, but I want to take a slightly different point of view. …


Luigi Saetta

Born in the wonderful city of Naples, but living in Rome. Always curious about new technologies and new things. I work in a big Cloud Company.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store