Reading half file starting from random byte point in Java

I had to write a method within a Java testing application to read exactly the half file (no matter the size) of a binary file full of integers and store the integers in an array of integers. The method should calculate the random byte point and the size of the array based on the file size which is unknown.

At first place a File object is created and its size is identified:

    File dataFile = new File("/tmp/datastream.data");
    long data_size = dataFile.length();

One integer consists of 4 bytes. Therefore a valid integer within the file starts from point 4, and that is going to be the minimum value for the random. The maximum value is determined as the half size of the file in integers. If the data in the file were written using raw byte data stream, that would be divided by 2, but because one integer is 4 bytes, it has to be divided by 8.

    int minRandom = 4;
    int maxRandom = data_size / 8;

Having set the boundaries of the random byte, it has now to be calculated. The computed value must be a valid integer within the file, that means that its modulus when divided by 4 should be zero.

int randomNum;
do {
    randomNum = minRandom + (int)(Math.random() * ((maxRandom - minRandom) + 1));
} while (randomNum % 4 != 0);

Once the number is calculated, it is assigned as the start seek point. There’s a need for two variables with the same value as one of them will be used to identify the next seek byte in the file in order to read the next integer.

    seekByte = randomNum;
    initialSeekByte = randomNum;

Before reading the data, the array that will store the data has to be created. That will be the half size of the file considering that integers need to be stored:

    int_array = new int[data_size / 8];

Finally, the data can be fetched. Every next integer to be read must be the previous seek byte plus 4 to ensure a valid integer is read:

RandomAccessFile rafIN = new RandomAccessFile("/tmp/datastream.data", "r");
for (int i = 0; i < int_array.length; i++) {
    rafIN.seek(seekByte);
    int_array[i] = rafIN.readInt();
    seekByte += 4;
}

3 thoughts on “Reading half file starting from random byte point in Java

  1. A slightly more amusing idea would be to write a ‘streamable’ version of the same program. One that would be able to select a sufficiently random offset and keep reading as much data as possible.

    But then it might be necessary to keep too much data in memory.

    I’ll have to think about this a bit more to think how much is “too much” though🙂

  2. panoskrt

    Hm.. two possible options at the top of my head, both using a while(true){ … } loop:

    1) Get the last modified information when the File object is created and then keep checking in the loop if that have changed.

    2) Keep checking the size of the file against the size of the array.

    I think in both cases two arrays will need to be used. The first one being the initial one and the second one a temporary one that is initialised with a new size every time new data is available. The first one can then refer to the new array. I think so…🙂

  3. I was thinking more of cases like:

    some-command | myfilter

    where `myfilter’ can keep track of how much data it has already slurped in, and select a random number of ‘initial bytes’ to throw away. The tricky bit is that this may fail if `some-command’ generates several TB of data:/

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s