Home OR-Objects Tutorials Prev

 

Tutorial 7 - Parallel BLAS


In this tutorial you will develop an applet to measure the scalability of the matrix multiply in the parallel BLAS implementation. The applet computes the rate of floating point operations for thread counts of '1' to '16'. The rate will scale on Java virtual machines that use native threads and are running on an SMP platform that distributes threads across multiple processors. In the freeware version of OR-Objects, the cache coherency is highest if matrix A is row-major and matrix B is column-major.

For simplicity, the applet details will be skipped. If you would like to see all the details then view the complete source.


Tutorial 7 - Contents


Initialize - Build Matrices

The method 'buildMatrix' creates a matrix with 'm' rows and 'n' columns and initializesthe elements with random values. The type of matrix constructed is determined by the argument 'type'.
    public ContiguousMatrixI
    buildMatrix(String type, int mChoice, int nChoice)
    {
        ContiguousMatrixI A = null;
        if(type.equals("RowMajor")){
            A = new RowMajorMatrix(mChoice,nChoice);
        }
        else if(type.equals("ColumnMajor")){
            A = new ColumnMajorMatrix(mChoice,nChoice);
        }
        new UniformDistribution().setElements(A);
        return A;
    }
The first few lines of the method 'test' get the parameters and call 'buildMatrix' to create the three matrices.
    public void
    test()
    {
        int m = new Integer(mChoice.getSelectedItem()).intValue();
        int n = new Integer(nChoice.getSelectedItem()).intValue();
        int k = new Integer(kChoice.getSelectedItem()).intValue();
        double alpha = new Double(alphaChoice.getSelectedItem()).doubleValue();
        double beta = new Double(betaChoice.getSelectedItem()).doubleValue();
        
        ContiguousMatrixI A = buildMatrix(typea.getSelectedItem(), m, k);
        ContiguousMatrixI B = buildMatrix(typeb.getSelectedItem(), k, n);
        ContiguousMatrixI C = buildMatrix(typec.getSelectedItem(), m, n);

Initialize - Create BLAS

These four lines from test() clear the bar graph and create the matrix algorithm using an SMP algorithm for the underlying implementation. The SMP minimum work is set to '0' so that 'maximumThreads'  threads are used.
 
        values = null;
        paint(getGraphics());
        SmpBLAS3 smp = new SmpBLAS3();
        smp.getSmp().setMinWork(0);
        MatrixBLAS3 blas3 = new MatrixBLAS3(0, smp);

Test - Call BLAS

The 'for' loop iterates from '1' to '16' thread counts. The 'while' loop forces the test for each thread count to run at least one second.
        double max=0, min=0;
        values = new double[maxBars];
        for(int i=0; i<maxBars; i++){
            smp.getSmp().setMaxThreads(i+1);
            msgLabel.setText("Threads - "+(i+1));
            int it = 0;
            long time = 0;
            while(time < 1000){
                Date t1 = new Date();
                try{blas3.dgemm(false, false, alpha, A, B, beta, C);}
                catch(BlasException e){}
                Date t2 = new Date();
                time += t2.getTime() - t1.getTime();
                it++;
            }

Test - Compute Rate

The floating point rate is computed in millions of floating point operation sper second (MFLOPS). The matrix multiplication must compute the dot product of a row from 'A' and a column from 'B' for each element in 'C'. This requires (k*m*n) additions and (k*m*n) multiplications. Each dot product must be multiplied by 'alpha' this requires (m*n) multiplications. Each element in 'C' must be multiplied by 'beta' and summed with the dot product, this requires (m*n) multiplies and (m*n) additions. Finally this total must be multiplied by the number of times 'dgemm()' was called to extend the test past one second.
            double t = 1000.0 * (double)time;
            double mflops = ((double)(m*n*(2*k+3)*it))/t;
            System.out.println(mflops);
            values[i] = mflops;
            if(i == 0 || mflops > max) max = mflops;
            if(i == 0 || mflops < min) min = mflops;
        }
        msgLabel.setText("MFLOPS:  max = "+(int)(max+0.5)+",  min = "+(int)(min+0.5));
        paint(getGraphics());

In Association with Amazon.com

In Association with Amazon.com
Copyright (C) 1997-2000 by DRA Systems all rights reserved.