b'Data trends Choosing a cloud database solution for managing AEM survey datasetsShouvojit Sarker Senior Software Engineer Mineral Resources, CSIROshouv.sarker@csiro.auAirborne electromagnetics (AEM) is a method used for mineral explorationFigure 1.Collecting AEM data (from https://www.hgiworld.com/methods/airborne-electromagnetic-around the world, and is popular formethod-for-large-scale-geophysical-characterization-aem/)its non-invasive nature and ability to provide extensive ground coverage (Figure 1). This popularity has led to a fair bit of investment over the years, and created competing AEM systems. This means AEM datasets can originate from a variety of systems, use different naming conventions, different domains (time/frequency) etc. This can make running inversion algorithms and storing datasets at different stages of their lifecycles challenging.To meet this challenge, and to be able to process datasets coming from most AEMFigure 2.Processing AEM datasystems, we can add additional steps to define the system and process the raw dataset. This will enable us to come updone for required fields, but the sameWhile writing data, we can put checks with a uniformed structured format forisnt true for optional ones. In some cases,in place to verify we have the required the data, but will also mean that we mightwe may even want to store single piecesnumber of columns, but beyond that we have some leftovers from the processingof information about the dataset, whichwill have full flexibility with our dataset. that dont completely fit in with thewould be just one value instead of beingAlso, if the purpose of storing these format. We may want to keep thesean entire column (Figure 2). datasets is to run inversion algorithms leftovers around for future use. In theoryin future, we do not really need cross this data will be available in the rawThere are some options to consider. Wedataset querying capability. Single key dataset files anyway (as we should keepcan have multiple tables for datasets value pair retrieval is competitively the old copy), but is much more valuableone for single values, one for the requiredfast. Presumably we will write the and interpretable when it is associatedcolumns and another for any additionaldataset once and there will be minimal with metadatacontext and labels todata. We can even make all of them SQL,modificationsas most have been done provide additional insight. So, instead ofusing dynamic columns for additionalin the processing stage anyway. This discarding columns that we feel may notdata so that we can accommodateallows us to afford a BASE consistency be required for the specific inversion weany number of columns. If we wantmodel instead of a more mutation are running, we should allow the user toto consider efficiency, a multi-tablefriendly ACID model.edit labels and add information beforeapproach may be better than having storing the entire dataset. one dynamic table with nulls for missingAnother important but decisive valuesgiven that most queries will beadvantage NoSQL databases provide is This leads us to datasets without a fixedto retrieve required columns anyway. Or,the ability to scale horizontally. (Figure3) number of columns. Also, row consistencywe could look for a NoSQL solution. Each object is self-contained and may be impacted as data may not beindependentas they are not structured available for all the rows. We can promptWith NoSQL we can create documentslike an SQL database. So, they can be the user to rethink their input if this iswithout declaring the structure upfront.stored on multiple servers without being FEBRUARY 2022 PREVIEW 38'