How To Query - NN-Analytics

How to query:

All queries in NNA are URL constructions that hit the endpoint /filter or /histogram.

There are several ways to construct an NNA query. The most basic type of queries follow the following format:

/filter?set=<files|dirs>&filters=[filter:filterOp:value]&sum=<sum>

Whereby set is refering to which set of INodes you wish to work with. Either the set of all files, or set of all directories.

filters refers to all the different combinations of filters you can then apply to that set. Note that some filters will not be accessible to you depending on which set you are working with. For example, running with &filters=fileSize:eq:0 will not work against the set of all directories as directories have no metadata regarding file size in HDFS. You can chain filters together as well.

Depending on which filter you pick, you can then use a filterOp to describe the conditional you are looking for. Some easy examples are eq being "is equal to", gt being "greater than", gte being "greater than or equal to", and then by extension, lt and lte.

sum will dictate what your final result represents. Once your INode set has been filtered down to the set of INodes you care about, the sum operation will then reduce the filtered set to a final value. For example, &sum=count will simply count the number of INodes in the filtered set. Or, &sum=diskspaceConsumed will sum the file sizes times their replication factors of every file in a filtered set of file INodes.

If you do not include &sum=<sum> in your query then /filter will instead dump the list of paths that comprise the filtered set.

On your instance of NNA you can always go to /sets, /filters, filterOps, and /sums to see the different options available to you.

There is another more advanced query in the following format:

/histogram?set=<files|dirs>&filters=[filter:filterOp:value]&type=<type>&sum=<sum>

The /histogram query is nearly identical to the /filter query above but it carries an extra parameter, &type. type in this case refers to the type of histogram and grouping you wish to achieve.

Take the following example histogram query: /histogram?set=files&filters=fileSize:eq:0&type=user&sum=count This query will take the set of all files, filter down to only empty files, and group the result by their owner, and sum the groupings by their number of empty files.

You can check /histograms to see the different values you can use for type. You can also assign a &histogramOutput=<csv|json> to get your histogram output as different formats.