INSIDE THIS ISSUE
Solving Large Sparse L1 problems
LP_QP_Solve
is our linear and quadratic programming product. The code implements a sparse primal-dual interior point algorithm. This article describes the performance of this code on a specific linear programming problem, namely the L1 fitting (regression) problem. This problem is also known as the method of Least Absolute Values, LAV. Statisticians have long known that the LAV method is more robust than the standard least squares method. There is a very nice comparison on the Web between Least Squares and L1 regression for straight-line fits. Go to www.barrodale.com and select "Interactive Data Fitting" for a discussion and graphical demonstration of the difference between Least Squares and LAV.The simplest fitting problem is to fit a straight line to data. Specifying a straight line requires the determination of two parameters (e.g. the slope and y-intercept). In the example below, we will demonstrate the robust nature of the L1 fit as compared to the Least Squares fit. The first graph contains the L1 and Least Squares fits as well as the six data points. The fourth data point (marked with a dot) is then increased by two and the fits are recomputed and plotted. Notice that the Least Squares fit swings much more than the L1 fit.
|
L1 versus Least Squares on 6 data points. L1 dashed line and Least Squares solid line |
|
|
|
Change the fourth data point |
|
|
This example illustrates an over-determined system since a straight line requires two parameters, but we were trying to fit six observations. Over-determined problems are common in many engineering settings. The typical case occurs when there are more observations (data points) than degrees of freedom (parameters) in a given system. In the simplest case where the relationships are linear, the following problem arises. Select the best solution, x*, to the equations Ax = b, where A is an m by n matrix. Since the typical problem in data fitting is one of reconciliation, the number, m, of rows of A is generally much larger than the number, n, of columns of A. In a mathematical sense, there is frequently no solution to the equation Ax = b. However, there is an x which minimizes the sum of errors to the pth power (generally p
³ 1),|(Ax - b)1|p + …+|(Ax-b)m|p.
The exponent p=2 reduces to the standard least squares regression problem. When p=1 this is the L1 regression problem. It has long been known that the larger the exponent p, the more a given outlier may affect the solution x*. Thus one motivation for L1 regression, is to de-emphasize outliers.
The standard method of converting the L1 fitting problem to a linear programming (LP) problem involves the augmentation of 2m additional variables. Let e+ and e- denote two (row) vectors of length m. Let I denote the m by m identity matrix and let e denote the (row) vector of length m consisting of all 1s. Then the (LP) problem becomes:
Min (0, e, e) (xT, e+, e-)T
Subject to: (A | I | -I) (xT, e+, e-)T = b
And e+
Notice that we have augmented the matrix A with a positive and negative m by m identity matrix. This means that the LP problem has m rows and n+2m columns. These problems can get large quickly.
LP_QP_SOLVE has a MATLAB interface that accepts the MATLAB sparse matrix data as input. This substantially eases our task of going from the matrix A to the augmented matrix (A | I | -I). To form this larger matrix the following two lines suffice:
sp_I = sparse(1:m, 1:m, ones(1,m));
augmented_a = [a, sp_I, -sp_I];
We have collected some timings to illustrate the impressive speed of this code. Again, using MATLAB, we randomly generate problems of size (m, m/10) with an average of 3 nonzeros per row. Thus the augmented matrix has 5m nonzeros. We pose and solve the L1 problem for randomly generated problems for m in the range 500 to 5000 and graph the resulting times in seconds.
Notice that even solving problems with 5000 rows and 10500 columns takes only about 10 minutes on a 166 MHz Pentium with 64 Megabytes of memory.
|
Performance of LP_QP_Solve on the L1 problem |
|
|
A companion problem to the L1 problem is the L
¥ problem. This problem arises as the limiting case as p® ¥ . Given an m by n matrix A and a vector b of length m, the L¥ fitting problem is to determine a vector x of length n minimizing the error Ax - b in the L¥ norm. Precisely, this means that we need to determine x that minimizesmax {|(Ax-b)k| : 1
£ k £ m}.The standard method of converting the L
¥ fitting problem to a linear programming (LP) problem involves the augmentation of one additional variable and doubling the number of rows. Let e denote the additional variable and f a column vector of length m, consisting of all ones. Then the (LP) problem becomes:Min (0, 1) (xT, e)T
Subject to: (A | -f ) (xT, e)T
(A | f ) (xT, e)T
³ b,And e
³ 0.LP_QP_Solve documentation is available on our Web site:
web.wt.net/~wti. MATLAB m-files are available on request which setup and solve the L1 and L¥ regression problems.
OSP Program Developed by WTI
Windward Technologies has developed OSP (Optimization of Sampling Protocols) exclusively for Francis Pitard Sampling Consultants (FPSC). The OSP program is easy-to-use and learn, powerful, and fully integrated. It features advanced nomographic representation of functions of three variables. The program is written in Visual Basic to operate in the Windows 95, 98, and NT environments. It uses two Active X controls from Visual Components, Formula One and First Impression. Formula One provides an Excel compatible workbook interface that OSP uses for its input data. First Impression is a graphics control that generates charts, such as variograms, for OSP. First Impression provides a host of runtime features that make it possible for OSP users to customize charts. For more information about Formula One and First Impression, visit the Visual Components Web site:
www.visualcomp.com. In addition we have added easy to use licensing which allows FPSC to send fully functional demo versions of OSP that have a specific expiration date and a specific trial duration.
Francis Pitard is a recognized international expert in all aspects of Total Quality Management, Sampling, Statistical Process Control, and the practical application of statistical methods for problem solving. For more information about Francis Pitard and OSP you can contact FPSC by phone: 303-451-7893, by email:
info@fpscsampling.com, or visit their Web site: www.fpscsampling.com.
What’s New at WTI
WTI
or its products can be found at the following Internet Web sites:Research Systems Inc. (CONSTRAINED_MIN / GRG2) www.rsinc.com
Visual Solutions (VisSim/OptimizePRO)
Mathworks (GRG2, BCLS, LP_QP_Solve, and RBFpack)
www.mathworks.com/connections/bcls_grg.shtml,

Presentations and Publications
Professor Donald Ramirez (University of Virginia), www.math.virginia.edu/~der/home.html, presented the paper Applications of Smoothed Monotone Regression Splines and Smoothed Bootstrapping in Survival Analysis to the CompStat 98 conference held in London this August. Phil Smith is a co-author of this paper, which features the WTI product ConFit. A demo version of ConFit can be downloaded from our Web site:
web.wt.net/~wti.Phil Smith attended the Purdue University Mathematics Advisory Council meeting held at Purdue University, in West Lafayette, Indiana, October 2-3. ![]()
Tom Aird will attend the Fall meeting of the Dean’s Advisory Council, School of Science, Purdue University, in West Lafayette, Indiana, Oct 24-26. ![]()
WTI products will be exhibited at Systems 98 (www.systems.de) in Munich, Oct 19-23 at the Friedrich Software Resources booth, Hall 2, Stand A2.250/14 in the USA Pavilion, New Munich Fairgrounds.
Contact us about guest tickets and come by the booth to see demos of our new products. ![]()
The Wind is a quarterly newsletter written and edited by Tom Aird and Phil Smith. Please let us know what your interests are and what type of articles you would like to see. If you would like to contribute an article to the newsletter, please contact us at WTI@aol.com.
Consulting and Program Development Services We offer services in the following areas: Parallel computing, MPI, DSP, C, C++, Visual Basic, Fortran, Assembler, Applied Math, Visualization, Optimization, and Training. If your organization would benefit by having access to a part-time mathematical software development team, contact us!
GUARANTEE We are so convinced in the quality of this service and our products in general that if you are not completely satisfied, we offer a 90 day money-back guarantee.
This is a quarterly newsletter of Windward Technologies, Inc. Please let us know what your interests are and what sort of articles you would like to see. Please contact us if you would like to have your name added to our newsletter mailing list or if you would like to contribute an article to the newsletter
!|
Phone: |
281-564-6523 |
|
Fax: |
281-754-4022 |
|
|
|
|
Mail: |
Windward Technologies, Inc. |
|
|
12039 Mulholland |
|
|
Meadows Place, TX 77477 |
|
|
|
|
E-mail: |
The Wind is a quarterly newsletter written and edited by Tom Aird and Phil Smith. Please let us know what your interests are and what type of articles you would like to see. If you would like to contribute an article to the newsletter, please contact us at
WTI@aol.com.Regards,
Tom Aird
and
Phil Smith