Monday, June 25, 2012

Random Samples


** NOTE See Updated details in Random Samples 1.1 post **

I have seen this done wrong a lot so I decided to resurrect some code I produced for random selection of samples when controls testing


In working with NIST 800-53 testing as well as a number of other things I seem to always have the need to sample a set of like things based on some set of statistics. With this comes a few challenges, one being how to select random items with no duplicates from a set of items in an efficient way.

The concept I used basically works thus:
  • Break the inventory into groups base don some pre determined baseline / group / set 
  • Place each item of the group into an associated array or ‘set’
  • Shuffle the set like a deck of cards
  • Deal out the first x items in the set, x being based on some associated math.

The math in the application allows for a sampling percentage and a minimum amount of selected items.

As input you will need a 2 column file (inventory.csv) the first column being  ‘Baseline’ and the second column being a host identifier (this can be hostname, IP, asset tag number, serial number, etc)

The code will import the text file, break the inventory into sets, conduct your math, and output the desired amount of samples to the screen.

As always any improvements, changes, etc you would suggest send them my way (I am not a coder by trade) and if it helps you out ... Great!



No comments:

Post a Comment