Saturday 4 January 2014

Compressing a byte array in C# with GZipStream

In .NET 4.0 or later versions, it is possible to compress a byte array with GZipStream and therefore the GZip algorithm. The GZipStream can be outputted to an array or a file. The code below shows a wrapper class for compressing a byte array, decompressing it and a unit test that reads all the bytes in text file, then compresses it, decompresses it and checks that the decompressed byte array has the same byte values as the bytes read from the text file. Compression and decompression code next:

using System;
using System.IO;
using System.IO.Compression;

namespace TestCompression
{
    
    /// 
    /// Compresses or decompresses byte arrays using GZipStream
    /// 
    public static class ByteArrayCompressionUtility
    {

        private static int BUFFER_SIZE = 64*1024; //64kB

        public static byte[] Compress(byte[] inputData)
        {
            if (inputData == null)
                throw new ArgumentNullException("inputData must be non-null");

            using (var compressIntoMs = new MemoryStream())
            {
                using (var gzs = new BufferedStream(new GZipStream(compressIntoMs, 
                 CompressionMode.Compress), BUFFER_SIZE))
                {
                    gzs.Write(inputData, 0, inputData.Length);
                }
                return compressIntoMs.ToArray(); 
            }
        }

        public static byte[] Decompress(byte[] inputData)
        {
            if (inputData == null)
                throw new ArgumentNullException("inputData must be non-null");

            using (var compressedMs = new MemoryStream(inputData))
            {
                using (var decompressedMs = new MemoryStream())
                {
                    using (var gzs = new BufferedStream(new GZipStream(compressedMs, 
                     CompressionMode.Decompress), BUFFER_SIZE))
                    {
                        gzs.CopyTo(decompressedMs);
                    }
                    return decompressedMs.ToArray(); 
                }
            }
        }

        //private static void Pump(Stream input, Stream output)
        //{
        //    byte[] bytes = new byte[4096];
        //    int n;
        //    while ((n = input.Read(bytes, 0, bytes.Length)) != 0)
        //    {
        //        output.Write(bytes, 0, n); 
        //    }
        //}
        


    }

}


In the code, memorystreams are used and the ToArray() method is used to generate byte arrays. The GZipStream can have a compression mode of either Compress or Decompress. The GZipStream in the compress and decompress methods are wrapped with BufferedStream with a buffer size of 64kB. This is done to be able to handle larger files. I have tested this code in a unit test with a lorem ipsum generated text file about 5,5 MB. The unit test is shown next:

using System;
using NUnit.Framework;
using System.Text;
using System.IO;
using System.Linq;


namespace TestCompression.Test
{
    [TestFixture]
    public class UnitTest1
    {

        [Test]
        public void CompressAndUncompressString()
        {
            byte[] inputData = File.ReadAllBytes("Lorem1.txt");
            byte[] compressedData = ByteArrayCompressionUtility.Compress(inputData);
            byte[] decompressedData = ByteArrayCompressionUtility.Decompress(compressedData);

            Assert.IsNotEmpty(inputData);
            Assert.IsNotEmpty(decompressedData);
            Assert.IsTrue(inputData.SequenceEqual(decompressedData));

            Console.WriteLine("Compressed size: {0:F2}%", 
             100 * ((double)compressedData.Length / (double)decompressedData.Length));

            //string outputString = Encoding.UTF8.GetString(decompressedData);

        }

    }
}


Output of this unit test is shown next:

------ Test started: Assembly: TestCompression.Test.dll ------

Compressed size: 28,74%

1 passed, 0 failed, 0 skipped, took 18,87 seconds (NUnit 2.6.2).



To generate a lorem ipsum text file, you can use a lorep ipsum generator here: http://loripsum.net

Wednesday 1 January 2014

RandomNumberGenerator in C#

To generate random numbers in C#, it is possible to use the class RandomNumberGenerator in System.Security.Cryptography namespace in .NET. This class can be easier to use with a simple wrapper class. The wrapper class provided here returns either an integer or an unsigned integer. The "randomness" is better in this class than in the default Random generator of .NET, the Random class. This class will for example emit the same random values for two instances instantiated at almost the same time of the Random class. The wrapper class looks like this:

 public static class RandomGenerator
    {

        private static readonly RandomNumberGenerator generator;

        static RandomGenerator()
        {
            generator = RandomNumberGenerator.Create();
        }

        public static int GetNext()
        {
            byte[] rndArray = new byte[4];
            generator.GetBytes(rndArray);
            return BitConverter.ToInt32(rndArray, 0);
        }

        public static uint GetNextUnsigned()
        {
            byte[] rndArray = new byte[4];
            generator.GetBytes(rndArray);
            return BitConverter.ToUInt32(rndArray, 0);
        }



    }

The class is in fact a static class with a static RandomNumberGenerator instance created in the static constructor. The methods to create a new random number uses the GetBytes method to fill a four byte array. We could of course generate longer arrays and create for example 64-bits integers, but here just a four byte array is used. Either an integer or unsigned integer is returned by the two respective methods for this. I have not bothered to refactor this simple class. The BitConverter class converts the byte array to int or unsigned int (32-bits) starting at index 0. We could also return other datatypes here than just integers. Simple unit test:

 [TestFixture]
    public class UnitTest1
    {

        [Test]
        public void GetNextInteger()
        {
            int random = RandomGenerator.GetNext();
            Debug.WriteLine(random);
        }

        [Test]
        public void GetNextUInteger()
        {
            uint random = RandomGenerator.GetNextUnsigned();
            Debug.WriteLine(random);
        }

    }

Sample output:

------ Test started: Assembly: TestRandomNumberGeneratorTest.dll ------

-1821995826

1013025195

2 passed, 0 failed, 0 skipped, took 0,42 seconds (NUnit 2.6.2).


If you would like random numbers in a specified range, for example 0 to 99, you could take the integer and do a modulo 100 operation, e.g RandomGenerator.GetNextUnsigned() % 100. Of course, this is tied to the desired range you want. If a range between for example -20 and 20 is desired, you could for example do something like: -20 + (RandomGenerator.GetNextUnsigned() % 41). The bottom line is that you should not entrust the randomness of System.Random class but use the RandomNumberGenerator class in System.Security.Cryptography if you want to generate random integers, signed or unsigned that exhibit more distributed randomness than the pseudorandomness of System.Random.