Database and Datascience: Development on Oracle Autonomous Database

Development on Oracle Autonomous Database

Author: Donatien MBADI OUM, Oracle Consultant

1. Introduction

In this section, we are going to talk about the recently released Autonomous JSON Database (AJD). Oracle makes it easy to develop JSON-centric applications. We will also talk about how you can create graph models from data in your database. You can perform graph analysis, develop graph visualizations and graph application and then share your results. We will show you how you can expand your data ecosystem by integrating the data warehouse with data lake object storage.

After that, we show you how to perform linguistic analysis on documents and search text using Oracle Text and get started with location intelligence analytics and mapping services using a spatial offerings.

2. Terms and concepts

ADB supports the JavaScript Notation, also known as JSON, natively in the database. It supports applications that use the Simple Oracle Document Access (SODA) API to store and retrieve JSON data or SQL queries to store and retrieve data stored in JSON formatted data. This concept is named as Oracle’s converged database. Oracle AJD is Oracle Autonomous Transaction Processing (ATP) but it’s designed for developing NoSQL-Style applications that use JSON documents.

You can promote an AJD service to ATP. Development of these NoSQL-Style document-centric applications is particularly flexible because the applications use schema list data; this lets you quickly react to changing application requirements. There is no need to normalize the data into relational tables and no impediment to changing the data structure or organization at any time. A JSON document has its own internal structure, but no relation is imposed on separate JSON document.

AJD give you the ability to create highly complex SQL-based queries for reporting and analysis purposes. It has built-in binary JSON storage type which is extremely efficient for searching and for updating. It also provide advanced indexing capabilities on the actual JSON data. It’s built on ADB so that gives you all of the self-driving capabilities for ADB.

3. Introduction of JSON Document

JSON stands for JavaScript Object Notation. It was originally developed as a human readable way of providing information interchange between different programs. A JSON document looks like this:

{

"id": "0001",

"type": "donut",

"name": "Cake",

"ppu": 0.55,

"batters":

{

"batter":

[

{ "id": "1001", "type": "Regular" },

{ "id": "1002", "type": "Chocolate" },

{ "id": "1003", "type": "Blueberry" },

{ "id": "1004", "type": "Devil's Food" }

]

"topping":

[

{ "id": "5001", "type": "None" },

{ "id": "5002", "type": "Glazed" },

{ "id": "5005", "type": "Sugar" },

{ "id": "5007", "type": "Powdered Sugar" },

{ "id": "5006", "type": "Chocolate with Sprinkles" },

{ "id": "5003", "type": "Chocolate" },

{ "id": "5004", "type": "Maple" }

]

}

A JSON document is set of Fields; each of these fields has a Value and those values can be of various data types. We can have simple Strings, Integers, Real numbers, Booleans, Date, Null values. Additionally, values can be Objects and objects are whole JSON documents embedded inside a document. There is no limit on the nesting. Finally, we can have Arrays and arrays can have a list of scalar data types or list of objects.

JSON documents are stored in something called Collections. Each document may have its own schema, its own layout, to the JSON. So this is mean that JSON document databases are schemaless but you can a check constraint to enforce a schema constant that you wish to introduce to your JSON data. You can also create indexes on a JSON collection and those indexes can be of various types:

- Unique index
- Single field
- Compound index
- Flexible Search index

4. Using AJD

Since Oracle database 12c, JSON documents can be stored, queried, searched, updated on generated in Oracle Database:

- As varchar, clob, blob data type in 12c

- As JSON data type with an efficient binary format in Cloud 19c

This allows you to combine schema flexibility of JSON with strengths of relational model in one database system. So you can go from one to the other and easily manipulate JSON as though it was a relational database or relational data even though it’s JSON data. Because you are storing the JSON in an Oracle database, you get the full benefits of the ACID transaction model on your JSON data like you get with your relational data.

A primary unit of storage for JSON is a Document. Each document has a key associated with it and we can use that key to do a CRUD (Create, Read, Update, Delete) operations. We can query the document using Query by Example language, so we don’t need to use or even know SQL to be able to query that document. A group of documents is called a collection, so documents live inside the collection. The collection is identified by a name and a group of collection is known as the database; within an Oracle relational model, that is represented by a schema. So each schema can represent one database. A document is like a Row and a collection is like a Table. A JSON document itself is represented by a row in an Oracle table:

We have there, its key and a few metadata columns like the version, the creation date and the last modified date.

AJD is built JSON-centric application document:
Accessed via document APIs
SQL foe cross-collection queries, reporting and analysis
Native JSON storage, in-memory optimizations and advanced indexing

AJD is based on the common ADB platform:

- Fully managed database, No DBA required
- Instant autos calling
- Lifecycle operations controlled via interface, APIs and CLIs
- One-button in-place upgrade to full ADB for multi-model workloads

AJD contains (virtually) ALL functionality of ATP and ATP contains ALL functionality of AJD. The difference between AJD and ATP are:

- AJD is limited to 20GB of storage outside of SODA JSON collections
- AJD cannot store a heterogeneous collection (i.e. JSON and non-JSON documents) via SODA
- AJD is lower in cost (priced same as ATP BYOL, but license is included)

So what is SODA?

It’s a set of NoSQL APIs and it’s provide for a number of different programming languages (Java, JavaScript/Node.js, Python, REST, PL/SQL, C). Through SODA, you can create collections, store documents to those collections, retrieve documents by key, or query them using a simple Query by Example syntax. And then, there is no need to know SQL in order to use SODA.

So what is SQLcl?

It’s a modern SQL Developer Command Line Interface for Oracle Database that provide inline editing, statement completion and command recall. It can also use SODA commands.

Let’s present you the solution architecture used by AJD:

On the top, we have the user view of the application and they might be running a web application, mobile application or any other simple user-friendly API.

In the middle, we have application code, which will be running in Oracle’s container engine for Kubernetes, also known as OKE. This allow applications to be built using Docker containers. Those containers are transferred over to OKE, which takes charge of them and provides full container management capabilities.

Now those applications running in OKE would talk to the JSON document store via SODA API.

To the right of that, we can see that you can also do SQL-based reporting which connect directly to the Oracle Database over SQL*Net.

So the development flow for developer is:

- Provision your database using the Web interface, Command Line or REST API

- Create local Docker container where you can do the bulk of development work and it’s local on your machine

- Develop in Docker connecting Autonomous document DB

- Deploy Docker container to Oracle Container for Kubernetes. OKE will then take over the maintenance of your Docker container. It will look after running multiple copies of it as is necessary. If any of those copies crash or go down for any reason, it will simply spool up new versions.

Example of application:

5. Using Oracle Text

Oracle database in not only a relational store but also a document store. You can load test and JSON assets along with your relational assets in a single database. Traditionally, Oracle has been all about SQL development, but SQL language does take some advanced knowledge to make the best of it. So SQL requires you to define your schema up front and making changes to that schema could be a little tricky and sometimes highly bureaucratic task. In contrast, JSON allows you to develop your schema as you go, by imposing less rigid requirements on the developer. It allows you to be more fluid and Agile development style.

Oracle text use standard SQL to index, search and analyze text and documents stored in the Oracle Database, in files and onto the web. It can perform linguistic analyses on documents as well as search text using a variety of strategies, including keyword searching, context queries, Boolean operations, pattern matching and mixed thematic queries like HTML/XML session searching and so on. It can also render search results in various formats including unformatted text, HTML with term highlighting and original document format.

Oracle text supports multiple languages and uses advanced relevance-ranking technology to improve search quality. It also offers advantage features like classification, clustering and support information visualization metaphors.

Finally, Oracle text provides:

- Indexing
- Word and theme searching
- View capabilities for text in query applications

- Document classification applications

To create a full text index, you create context index using the same syntax usually used, but add the phrase: Indextype is ctsys.context.

Examples:

create index ops_indx on customer(cust_comment) Indextype is ct sys.context;

So here, we are looking at “good” and “client” following each other:

select * from customer where contains(cust_comment,’good client’)>0;

For JSON column jtext, we can use textual attribute ‘description’

create index ops_indx on customer(jtext) for json;

select * from customer where json_textcontains(jtext,’$.description’,’good client’);

Oracle text is now enabled automatically in ADB. It provides full-text search capabilities over Test/XML/JSON content and all the power of Oracle database and a familiar development environment.

We can deal with text in many different places and many different types of text. So it is not just in the database but also outside the database. In the database, we can be looking a Varchar2 column, LOB column or binary LOB column such as PDF or Word. Outside of database, we might have a document on the file system or out on the web with URLs pointing out to the document. If they are on the file system, then we would have a file name stored in the database table and if they are on the web, then we should have a URL or partial URL stored in the database.

We can then fetch the data from the locations and index it in the term documents format. The basic forms we can deal with plain text are:

- HTML
- JSON
- XML
- Formatted documents like Word, PDF, PowerPoint documents
- Many others types of documents.

All of those are automatically handled by the system and then processed into the format indexing and there and we are not restricted by the English either here.

There are various stages in the index pipeline. A document start and taken through the different stages until reaches the index.

- Data Store is responsible for actually reaching the document.
- The filter is responsible for processing binary documents into indexable text
- The Sectioner is responsible for identifying things like paragraphs and sentences
- The Lexer divide the text into indexable words
- The index engine is responsible for laying out to the indexes on the disk

Storage, Word List and Stop List are some additional inputs there. Storage tells exactly how to layout the index on disk. Word List has special preferences like desegmentation and then Stop List is the word that se don’t want to index.

Most of these stages can be replaced by the user functions in PL/SQL, C code or Java code.

6. Spatial on ADB

Oracle spatial database is included in Autonomous Database, allowing developers and analysts to get started easily with location intelligence and mapping services using basic spatial search and analysis to advanced geospatial applications and Geographic Information Systems (GIS).

The common aspect for all spatial data is Universal Key. Location is a universal key relating otherwise unrelated entities.

Spatial features provide a schema and functions that facilitate the storage, retrieval, update and query of collections. Spatial consists of the following:

- A schema MDSYS that prescribes the storage, syntax and semantics of supported geometric data types

- A spatial indexing mechanism

- Operators, functions and procedures for performing area-of-interest queries. Spatial joins queries and other spatial operations

- Utility functions and procedures for validating, loading, extracting and working with spatial data

- GeoRaster, a feature that lets you store index, query, analyze and deliver GeoRaster data.

Spatial data are stored as points, lines, polygon and also imagery into ADB.

In high level of development with spatial, we have:

- Spatial Database APIs : SQL and PL/SQL are native in database for working with spatial but we also have Java (SDOAPI), Python (cx_oracle), Javascript (node-oracledb) and REST (ORDS)

- Mid-tier Spatial Component APIs: XML and REST (Mapping, Geocoding, Routing, Web Services [OGC] and Studio.

To load spatial data use the same tools as for other in ADB: Data Pump and Golden Gate. You can also use a spatial ETL tools like Safe Software FME or GDAL/OGR (Open source).

ADB natively handles JSON in general. In-database JSON is extended to support spatial operations. SDO_GEOMETRY constructors is extended to take JSON as input and spatial index and queries are extended to operate on JSON documents.

What is Spatial Studio?

Spatial Studio is a low-code self-service application that enables you to create interactive maps and perform spatial analysis on business data quickly and easily.

Users can visualize, explore and analyze geospatial data stored and managed by Oracle in the Cloud or on-premises.

Spatial studio include no additional cost with ADB, however, requires additional compute resource to deploy. It can run on free tier with a very small shape, just for evaluation or test. Spatial Studio is also available on the Oracle Cloud Marketplace. Yu can also use Spatial Studio to load spatial data into ADB. Once you have loaded your data or accessed that already exists in your ADB, if data does not already include native geometrics type, then you can prepare the data if it has addresses of if it has latitude and longitude coordinates as a part of the data. Once you have the data prepared, you can easily drag and drop and the start to visualize your data and start to ask spatial questions.

When you done some work, you can save your work in a project that you can return to later and you can also publish and share the work you have done.

In addition, with the work that you do when you perform spatial analysis using Spatial Studio, you are able to see the SQL that was generated by Spatial Studio. You also have a GeoJSON endpoint so you can access all the results programmatically and get the results back on JSON.

7. Graph on ADB

Graph is way of representing your data. It’s a collection of points (vertices) and lines between those points (edges). Vertices and edges can have properties.

For example, you have Melli, Jean and John who could account holders ate the bank. And every time there is cash transfer between them. You have what is called edges between them. So graph helps make explicit connection in your data entities and in the properly graph data model, vertices and edges can have properties.

Database and Datascience

lundi 22 août 2022

Development on Oracle Autonomous Database

Aucun commentaire:

Enregistrer un commentaire

oracle

oracle