b'Data trends useful to set-up proof-of-concept frontmight usefully be considered for a veryto spend on developing and maintaining ends very quickly, but comes with itslarge platform with thousands of AEMa database solution. Deploying a NoSQL own bag of problems that can makedatasets. database on a Kubernetes cluster, it unsuitable for storing AEM datasets.achieving database multi-tenancy and a There are no backups, data duplicationJust picking up a cloud vendor doesnttable per dataset, will require significant is an issue, no fixed latency (can be verysolve all our problems; we still mustresource allocation. If you want to cut high sometimes), and inadequate billingthink about database tenancy options.down on resources, the first thing to transparency amongst a range of otherFor example, will all the databases fromlook at will be replacing the Kubernetes factors - none of which make it idealall the users stay in the same databasesolution with a managed database - for a production backend. However,table? Or should we have separate tablessomething like MongoDB atlas, managed there is Google Cloud Bigtablea cloudfor each user? Should datasets belongingCassandra or DocumentDB. This will native NoSQL wide-column store forto the same user stay in the same table?enable you to achieve low cloud vendor large scale low-latency workloads thatShould this be the case even if we do notlock-in without breaking your back can fit well with AEM datasets. If weneed to query across datasets, let aloneon engineering. If there is significant want insight from our data, it can alsoquery datasets across users? in-house knowledge about a particular be plugged into BigQuery. However,These concerns prompt questions aboutcloud platform, most other resources there is an hourly minimum price limitpossible needs and benefits. Havingalready on the same cloud and there is which means there will be mandatorya database per user makes accidentallittle organisational incentive to switch minimum costs during exploration anddata sharing across users impossible,cloud platforms/have multi-cloud development. this is useful as AEM data collectionsolutions now or in near future, you If we want to stay in Google Cloudis a difficult task and there can be IPcould even go for a fully managed and but create a low vendor lock-in, kindissues around data. Yes, it slows downintegrated cloud database solution of like DocumentDB in AWS, our bestengineering and requires a bit more skilloffered by one of the cloud platforms. option would be MongoDB Atlas. Thisand thoughtfulness to implement. But,This will keep the solution in the offering is from MongoDB and can befor a sustainable product, security makesengineering comfort zone, while freeing used in any cloud platform. This is ansense. Also, there is not a compellingup more resources to be used on the additional NoSQL database providedneed to make the datasets share aimplementation of the database tables through Google Cloud Partner Servicesdatabase. and applications.and comes with a free tier that can beAfter we split the database by user, theChoosing a database solution for cloud utilised for testing out the service andnext question would be about how weapplications presents a variety of options, development. It also supports AWS andstore multiple datasets from the sameall with their own pros and cons. It is Azure apart from Google Cloud, makinguser. When we are processing the data,important to evaluate organisational migration for us much easier. running an inversion algorithm or wantingcapability and product needs before Azure has Cosmos DB on offer from theto map results, the processing streammaking any decision. My personal pick NoSQL range. Cosmos DB takes prideis generally being set up with a specificof a solution for AEM datasets would be in fast read and write times, with singledataset in mind. There isnt a big needa NoSQL managed offering from one of digit millisecond response time andto run queries or do analysis across allthe cloud platforms, but hey, the world is 99.999 percent availability. As with AEMdatasets. Considering how big a datasetyour oyster.datasets we are mainly concerned withcan be, storing several in the same table read and writes to the database, this cancan lead to having massive documents be a very fast and reliable option. Testingwithout reaping significant benefits. WeShouv Sarker is a software engineer and development can also be free, whichcould think about partitioning, or weby training, and a passionate will keep it in front of many managedcould simply have a table per dataset.advocate of functional programming database solutions. In NoSQL terms, that will be just a newpractices. He is currently working in document every time a dataset getsthe Mineral Resources team at CSIRO Azure also has a managed instancecreated. We will have smaller, succincton an exploration and geophysical option for Cassandra, a very populardocuments and read writes would beprocessing toolkit to assist with database. However, this tool may be toomuch faster. We can also achieve datasetthe interpretation and inversion of grand for the task at hand unless youisolation this way, and modifying existinggeophysical survey data. When he have your own data centres and want todocuments becomes very infrequent. takes his programmer hat off, Shouv approach a hybrid on-prem and cloudlikes to write fiction, and get more strategy. In the hindsight this wouldAt the end of the day, it will boil down toscreen time with video gamesbe a no vendor lock-in option, andhow much time and resources you want FEBRUARY 2022 PREVIEW 40'