ASEG_Preview_228

b'CommentaryData trendsThe two new tables have the same Data trendschemical names for columns. Luckily the pivot operation creates layers of column names and we can combine the unique Tim Keepingchemical names with whatever name was Associate Editor for geophysical datamanagementused with values=column. In this case andanalysis either VALUE or UNIT.Etechnical-standards@aseg.org.au Some pythonic wizardry is required to combine the name Tools for getting to know I reassigned numbers in VALUE column tolayers. Thanks to the Stackoverflow BigData columns made with names in CHEM_CODE.website (https://stackoverflow.com/Then ran again with UNIT. The resultingquestions/14507794/how-to-flatten-The Big Data/Machine Learning worldtables t1 and t2 are identical except t2a-hierarchicalindex-in-columns) there likes big comma separated text files (csv)has unit names where t1 hasnumbers. is an answer looping backwards over as the simplest exchange format. The resultant spreadsheets can be too big for MS Excel and can crash when being imported into MS Access. Import Pandas as pdThe Geological Survey of SouthImport csvAustraliaoffer files for Machine Learning# Open large file, drop columns by name and save smaller filesuch as a 20GB csv of geochemistryfpath = D:\\\\ sarig_rs_chem_exp.csv(https://dem-sdp.s3-ap-southeast-2. df = pd.read_csv(fpath, delimiter=,, header=0, low_memory=False)amazonaws.com/index.html), which iscols =[SAMPLE_SOURCE_CODE,,CHEM_METHOD_CODE]not easy for mere mortals to explore. Thefor col in cols:file has over 61million lines (25 GB indf = df.drop(col, axis=1)RAM) of one chemical value per line plus various other numbers and text. I wantedfpath = D:\\\\chem_cutdown.csvto use the csv file to generate a shapedf.to_csv(fpath, sep=,,header=True,quotechar=)file of point locations and associated chemical values. # reload the smaller datasetfpath =D:\\\\chem_cutdown.csvThe (free) Python programming languagedf = pd.read_csv (fpath, delimiter=,, header=0, low_memory=False)is a favourite tool in data science teaching, and the process described# Flatten the dataframebelow used the Pandas dataframest1 = df.pivot_table(index = [SITE_NO, LONGITUDE_GDA94, LATITUDE_GDA94, DH_DEPTH_(Python Data Analysis Library), similar toFROM], columns=CHEM_CODE, values=[VALUE], aggfunc=first)the Matlab Cell Array. After a week oft2 = df.pivot_table(index = [SITE_NO, LONGITUDE_GDA94, LATITUDE_GDA94, DH_DEPTH_trawling the StackOverflow forums I hopeFROM], columns=CHEM_CODE, values=[UNIT], aggfunc=first)to introduce this process to the world of data wrangling. # Rename columns by combining UNIT and Ag into Ag_UNITt1.columns = [_.join(col).rstrip(_) for col in [c[::-1] for c in t1.columns.values]]First, I loaded the csv file and droppedt2.columns = [_.join(col).rstrip(_) for col in [c[::-1] for c in t2.columns.values]]half the columns, saved and reloaded the file so only 9 GB of RAM was now# Combine the 2 dataframes and sort alphabeticalrequired for processing (Figure 1). Myt3 = pd.concat([t1, t2], axis=1)intent was to flatten this very verticalt3.sort_index(axis=1, inplace=True)dataset by moving each unique chemical# separate the t3 dataframe into several csv filesname into its own column. The easiestfpath = D:\\\\way to do this is to use the pivot_tablesCols = t3.columns.tolist()command, which converts values inn = len(sCols)one column into separate columns andnCols = 50fills the rows with values from anotheri = 0column. The parameters of interest are: j = i + nCols + 1index = [SITE_NO, LONGITUDE_ while i n:GDA94, LATITUDE_GDA94, DH_ sHeadings = sCols[i:j+1]DEPTH_FROM] combinations becomefName = sHeadings[0] + _ + sHeadings[-1] + .csvrow ids and appear in the new tablet3.to_csv(fpath + fName, sep=,, header=True, columns=sHeadings, quotechar=)columns = [CHEM_CODE]uniquei = j + 1j = j + nColsnames entries will have a column created if j n - 1:values = [VALUE] or [UNIT]thej = n - 1numbers to fill each new chemical column (or left empty) Figure 1. Python code to open csv text file, flatten data and split into multiple csv out files.FEBRUARY 2024PREVIEW 39'