A Fitted Sparse-Group Lasso for Genome-Based Evaluations

In life sciences, high-throughput techniques typically lead to high-dimensional data and often the number of covariates is much larger than the number of observations. This inherently comes with multicollinearity challenging a statistical analysis in a linear regression framework. Penalization methods such as the lasso, ridge regression, the group lasso, and convex combinations thereof, which introduce additional conditions on regression variables, have proven themselves effective. In this study, we introduce a novel approach by combining the lasso and the standardized group lasso leading to meaningful weighting of the predicted (“fitted”) outcome which is of primary importance, e.g., in breeding populations. This “fitted” sparse-group lasso was implemented as a proximal-averaged gradient descent method and is part of the R package “seagull” available at CRAN. For the evaluation of the novel method, we executed an extensive simulation study. We simulated genotypes and phenotypes which resemble data of a dairy cattle population. Genotypes at thousands of genomic markers were used as covariates to fit a quantitative response. The proximity of markers on a chromosome determined grouping. In the majority of simulated scenarios, the new method revealed improved prediction abilities compared to other penalization approaches and was able to localize the signals of simulated features.


INTRODUCTION
I N the framework of linear regression, an n-dimensional response vector and a p-dimensional vector of regressors are assumed to hold a linear relationship. Ordinary least squares aims to provide a solution for the p regression coefficients that minimizes the residual sum of squares. If p>n this minimizer is no longer unique. Instead, the set of minimizers includes an infinite number of potential candidates, which all solve the initial regression problem equally well. In order to determine a meaningful one, different approaches were introduced in the past. One approach -called penalization or regularization -consists of applying one or more additional conditions onto the set of minimizers. The corresponding penalty function of such an approach belongs to one of two mutually distinct categories: it is either differentiable everywhere such as the ridge regression (or Tikhonov regularization), or it is not (e.g., the lasso [1] and the group lasso (GL) [2]). Approaches from the second category are considerably harder to solve, but often implicitly apply variable selection, which increases the interpretability of the underlying model.
More complex methods can be built by applying several penalties at once. Examples are the elastic net (EN) [3] which combines the lasso and ridge regression, or the sparse-group lasso (SGL) [4] where the lasso and GL are simultaneously applied.
Moreover, p>n leads to multicollinearity among regressors. Dependencies between variables can be addressed by grouping those ones together that have a strong correlation with each other. A method which takes a group structure into consideration is the GL. There are explicit formulas to solve the GL [2], some of which require orthogonality of the regressors. In order to guarantee a flawless mathematical application to non-orthogonalized data, the standardized group lasso was proposed [5]. Compared to the GL where grouped regression coefficients form the penalty, the standardized group lasso incorporates a penalty of the grouped predicted ("fitted") outcome.
The objective of our study is to present a new penalty which combines the strong selective ability of the lasso and the mathematical adequacy of the standardized group lasso -the fitted sparse-group lasso (fitSGL). The solution to this mathematical problem is non-trivial as the two distinct penalties act on different scales which cannot be merged by any transformation. Therefore, established tools for calculations are not appropriate. Here, we provide a solution via proximal-averaged gradient descent [6] with additional acceleration based on [7].
One field of application, where highly correlated predictor variables may appear, is the genetic evaluation of individual trait expression in a breeding population. Fitted values, also known as genomic breeding values (GBVs), are determinant for breeding decisions. Explicitly accounting for them in a grouping approach is expected to improve the prediction precision of GBVs and to help identifying individuals with particularly high or low GBVs. Furthermore, the selection property of the lasso helps detecting trait-associated sites on the genome. Thus, for evaluation purposes, we applied the novel method to a large variety of simulated scenarios resembling data from a dairy cattle population. The scenarios differed in terms of sample size and features of causal variants influencing trait expression. The outcome was compared to that of other lasso-type penalization approaches.

Linear Regression Model
The underlying linear model consists of an n-dimensional response vector y, a matrix of features X with dimensions n Â p, and the corresponding vector of regression coefficients b. Then, where e is a vector of normally i.i.d. random variables. In order to estimate b, penalization is applied so that the linear regression minimization is altered by adding a penalty function f In this study, we combine the lasso and the standardized group lasso [5] and refer to this as fitSGL. The corresponding optimization function is where > 0 is the penalty parameter, and a 2 ½0; 1 is the mixing parameter. The superscript ðlÞ denotes group l. Therefore, b ðlÞ is a subvector of b, and X ðlÞ are the corresponding columns of X. Furthermore, p l is the number of elements in group l.

Proximal Gradient Descent Update
Solving (3) is not trivial as both penalty terms are based on different coordinate systems. And there is no substitution available to overcome this issue. Therefore, we divide the initial penalization into two sub-problems: a lasso and a standardized group lasso problem. We compute a proximal gradient descent (PGD) [8] update to each of them. PGD is an iterative algorithm: starting with a guessb 0 , a sequence of updatesb mþ1 is computed over iterations (m ¼ 0,1,. . .).
After each iteration, the updates are merged according to their respective weights a and 1 À a; this process is called proximal averaging (PA) [9]. Finally, the algorithm is forced to stop if some convergence criterion is met. Unlike for the lasso, the PGD update for the standardized group lasso, which is has not been published yet. To solve this, we followed the technique described in [5] and adapted it to fit the PGD framework. Thus, resulting in a joint update of regression coefficients with respect to step width t. This approach consists -in its core -of (compact) singular value decompositions of the matrices X ðlÞ , and a subsequent transformation into orthogonal variables. The transformation leads to the GL with PGD update according to [8], [10]. Eventually, a back transformation leads to the desired update in iteration . Assuming that each X ðlÞ has full column rank, the inverse matrix in (6) exists, otherwise a modified approach becomes necessary that is described in the next section. A proper penalty parameter is commonly determined via a grid search. All of the considered penalty functions in this paper share the property that if exceeds a certain threshold, say max , then the estimates for regression coefficients no longer change after a single iteration, i.e.,b 0 1 b 1 ¼ . . . . Thus, the implemented grid search is based on a logarithmic scale from max to 0:001 Á max , where the upper value max was determined from the corresponding update formula substitutingb 0 ¼b 1 ¼ 0 and solving for .

Alteration for Rank Deficiencies
In the case that at least for one group l the matrix X ðlÞ does not have full column rank, the matrix X ðlÞT X ðlÞ is not invertible. In [5], the authors proposed regularization by a positive value d 2 l , so that X ðlÞT X ðlÞ þ d 2 l I ðlÞ replaces X ðlÞT X ðlÞ . Here, I ðlÞ is the identity matrix with dimensions p l Â p l . This treatment alters the initial optimization problem (4) to where df l ¼ with d l;i being the i-th singular value of X ðlÞ . Note that (7) coincides with (4), if d l ¼ 0. By introducing the following expressions ÞÂp l Equation (7) can be rewritten as Since the introduced matricesX ðlÞ have full column rank, the PGD update of (9) is achieved by (5) and (6) with substitution of the corresponding components.

Final Algorithm for fitSGL
We implemented the proximal-averaged gradient descent scheme for the fitSGL by incorporating an acceleration (in terms of u) based on [7] (step 7 in Algorithm 1). We used warm starts to perform a grid search over consecutive values for the penalty parameter . The final algorithm is termed proximal-averaged accelerated gradient descent (PA-AGD).

4:
Calculate j-th element of P t f 1 (the proximal gradient descent update of the lasso) ½P t f 1 j ¼ signðr m j Þðjr m j j À tÞ þ 5: Calculate P t f 2 (the proximal gradient descent update of the standardized group lasso) via equation (5)  6:

Application to Genome-Based Association Analysis
Trait expression is often associated with genetics. The link between genetic and phenotypic variation can be elucidated with genome-based association analysis for which molecular markers provide useful information. The most common form of a molecular marker is the single nucleotide polymorphism (SNP). Such marker bears only two variants leading to three different genotypes at each site in a diploid organism. Then, in a genome-wide regression model as in (1), the phenotype y is regressed onto the observed genotype at p SNPs distributed over the whole genome. Hence, x ij 2 f0; 1; 2g for individual i ¼ 1; . . . ; n and SNP j ¼ 1; . . . ; p. Linkage and linkage disequilibrium (LD) between markers can cause extremely high correlation among predictor variables which typically satisfy a block structure. Thus, the challenge of a genome-wide regression approach is to identify the causal sites in genomic regions of high LD. Based on simulated data, we compared the performance of fitSGL to the lasso, GL, SGL, and EN often used in animal and plant breeding (e.g., [11], [12]).

Simulation Study
We conducted an extensive simulation study to evaluate characteristics of the proposed fitSGL. The data resembled a dairy cattle population for which genome-based evaluations drive the breeding success.
To generate a realistic amount of LD between SNPs, genotype data were simulated with the software AlphaSim version 1.05, a software suited to breeding populations and now integrated in the R package "AlphaSimR" [13]. We let the software simulate two chromosomes with a length of 100 centimorgan each. Each of the chromosomes consisted of 1,660 SNPs, giving a total of p ¼ 3,320. As proposed by default, 6 consecutive generations were simulated. Each generation consisted of 200,000 individuals, half of which were females. After random mating in generations 1-3, 200 males with high performance were mated to 1,000 dams in each of the generations 4 and 5. This mating scheme led to a half-sib family structure which is typical in livestock. We then used the data of the last two generations to set up 100 experiments for a comprehensive statistical analysis. For each experiment, we randomly picked 10 out of the 200 sires from generation 5. The corresponding 10,000 offspring in generation 6 were split into training and validation sets. In scenario (A) where p>n, the training data consisted of 1,000 individuals (100 progeny of each sire). The remaining 9,000 half sibs formed the validation data. In scenario (B) where p<n, the roles of training and validation data were reversed, i.e., the training data consisted of 9,000 individuals, whereas the validation data were formed by the remaining 1,000 offspring. Further, an additional set of 10,000 half sibs was simulated for each experiment. This set was split into 9,000 and 1,000 individuals as well, which served as independent test set for scenario (A) and (B), respectively.
In order to simulate the vector of features b , we assumed that genetic effects appear in groups of highly correlated SNPs. We first picked all 200,000 individuals of generation 5, and clustered the genotypes with respect to LD using the R package "BALD" version 0.2.1 as in [14]. This led to a total of 106 and 268 groups for the first and second chromosome, respectively. We then randomly picked groups from both chromosomes, either 1, 3, or 9 from each. These groups were allowed to harbor quantitative trait loci (QTL) with non-zero effect on y. The features corresponding to all remaining groups were set to 0. Furthermore, we divided the scenarios according to the proportion of simulated QTL, i.e., we allowed either 1/3, 2/3 or all of the SNPs inside the QTL groups to have a non-zero effect. The effects were sampled from a Gamma distribution with shape parameter 0.42, rate ð0:42 Á n QTL Þ 1=2 (with n QTL the total number of simulated QTL), and randomly drawn sign, or a Normal distribution with mean 0 and variance ð0:99 Á n QTL Þ À1 .
An individual's GBV was determined by its genotype and the vector of features We then simulated the phenotypes of each offspring by adding a residual to the GBV. The variation of GBV (s 2 a ) to the phenotypic variance (s 2 y ) constitutes heritability (h 2 ). For a range of h 2 2 f0:1; 0:3; 0:5g, the variance of the error term e was determined by s 2 Due to the simulated family stratification, we applied a family-wise centering of the genotypic and phenotypic data in each experiment prior to the evaluation.

Evaluation Criteria
Since EN, SGL, and fitSGL are all convex combinations of two penalties, the mixing parameter a needed to be specified in advance; it was set to 0.5. Each regularization path consisted of estimated features alongside 50 values for the penalization parameter . The solutions were estimated using the training data. The evaluation criterion to select one of the 50 solutions was based on the minimal mean squared error (MSE) of predicted GBV in the corresponding validation data. The performance of methods was evaluated in terms of precision of predicted GBV Xb, i.e., the correlation of predicted GBV and simulated phenotypic values within the independent test set.
Additionally, we assessed the quality of fitSGL and comparative methods with respect to the ability to detect trait-associated sites on the genome. Once the solution with minimal MSE was determined for each experiment and in each scenario, we calculated the sensitivity, the specificity, the positive predictive value (PPV), the negative predictive value (NPV), and the accuracy (ACC). All these measures were based on the determination of true and false positives (TP, FP), and true and false negatives (TN, FN).
Due to the proximity of SNPs, the appearance of at least locally strong correlations among predictor variables can lead to the identification of putatively causal sites only because a SNP was in high LD with a simulated QTL. The computation of the binary measures (TP, FP, TN, FN) needs to account for this association phenomenon. Two options are obvious, similar to [14]. First, the proximity-associated SNP: We considered a small interval around every QTL, i.e., the QTL itself and two SNPs to the left and to the right. If an algorithm identified any SNP from inside this window to be causal, we let this count as a true positive. Second, the group-associated SNP: Since the simulation of QTL was already based on prior grouping of SNPs, we defined a true positive result, if an algorithm correctly identified any SNP of the group of SNPs in which a QTL was located. Fig. 1 gives a visual impression of both approaches.
At last, we compared the ability of all methods to correctly identify the best performing individuals. For that we took the top 10% performing individuals based on the simulated and predicted GBV and determined the intersection of both sets.
The analyses were performed with R version 4.1.0 [15].

Impact of Mixing Parameter
We selected a single simulation scenario to investigate the influence of the choice of a on performance. This particular setting closely resembled the dairy trait "milk protein percentage" with QTL information available from the Cattle QTL database [16]. The QTL distribution of this trait approximately corresponded to the simulated setting where 3 groups of QTL were present per chromosome and 1/3 of the SNPs within such QTL groups had a simulated non-zero effect. As heritability we chose 0.3. We let a 2 f0:1; 0:5; 0:9g.

Scalability
We analyzed real dairy cattle data from [17], retrieved from Dryad https://doi.org/10.5061/dryad.cs133 to demonstrate scalability of "seagull" version 1.1.0, and how its computation times compare to the established R package "glmnet" version 2.0-18. The dataset consisted of marker genotypes at p ¼ 164,312 sites distributed over 29 chromosomes for n ¼ 1,092 individuals. As phenotype we used the fat-percentage average from day 1 to day 305 of lactation. All methods except the lasso and EN required grouping of the genotypic data. We calculated the squared correlation between SNP genotypes, i.e., r 2 ij ¼ corrðx :i ; x :j Þ 2 for i; j ¼ 1; . . . ; p, and used this as a measure of similarity among SNPs. Grouping was performed using the R package "adjclust" version 0.6.3 which is the follow-up implementation of "BALD". We selected L ¼ 200 groups per chromosome, similar to our simulation study with 106 and 268 groups of SNPs on the first and second chromosome, respectively, in the generation of ancestors (Section 2.6). Additionally, we reduced the number of groups by an order of magnitude to examine its impact on computation time.

RESULTS
The presented results are based on the 27 scenarios where effects were simulated via a Gamma distribution. The results based on a Normal distribution were mainly similar. Among all algorithms, the GL was most sensitive to this change of effect sampling. In particular, the GL showed high precision of predicted GBVs if effects were sampled from a Normal distribution and when every SNP within QTL groups had a non-zero effect.

Precision of Prediction
With respect to the precision of GBV prediction, the novel method fitSGL outperformed the lasso in all 27 settings for scenario (A) and in 20 settings for scenario (B). The maximum improvement of the mean correlation between simulated phenotypic values and predicted GBVs over 100 experiments was 2.81% and 0.20% for (A) and (B), respectively. However, compared to the lasso, the fitSGL lost adventages in scenario (B) with increasing heritability.
The fitSGL outperformed EN in (A) 27 and (B) 26 settings with maximum average improvement of 2.34% and 0.25%, respectively. In (B), EN performed better than fitSGL on average only in the case where a single group of simulated QTL was present and the QTL coverage within this group was 100%.
The fitSGL delivered improved results compared to GL in (A) 24 and (B) 27 settings. The respective improvements in mean correlation went up to (A) 10.76% and (B) 3.82%. The scenarios were GL performed better than fitSGL had either 3 or 9 groups of simulated QTL per chromosome, or the heritability was 10%.
Compared to SGL, the fitSGL had higher correlations in 21 settings for both (A) and (B) with maximum average improvements of 5.79% and 2.09%, respectively. We found a gradual decrease of advantage for fitSGL if either the heritability decreased or the total number of simulated QTL increased. Fig. 2 shows means and standard errors of the deviation of correlations between simulated phenotypic value and predicted GBV of the lasso, GL, SGL, and EN from fitSGL in percent for 9 out of 27 simulation settings. A negative value indicates an improvement of precision of prediction with fitSGL compared to another method. Thus, the displayed means reflect tendencies. Fig. 3 shows average values for sensitivity, specificity, PPV, NPV, and ACC over 100 experiments for 3 different settings for each category (A) and (B) using proximity-based measures. The larger any of these values was, the better. We chose radar plots to visualize these measures, as the total covered area within such a plot gives an impression of a method's overall performance. We noticed, even though EN was not superior based on a single criterion, its overall performance was very good. This was indicated by the area it covered within each radar plot of Fig. 3. If based on proximity measures, the calculated area covered by EN was larger than that of any other method in the vast majority of cases. But also the differences between EN and the lasso were marginal. However, if the covered area was based on group measures, the lasso was the peak performer compared to all other methods.

Binary Statistical Measures
In general, by comparing average TP, FN, TN, and FP, we observed a superiority of fitSGL for TP and FN but an inferiority with respect to TN and FP. With proximity-based measures, fitSGL showed both the highest means for TP as well as the lowest means for FN in 25 settings of category (A) and in 23 settings of category (B). When group-based measures were used, these values changed to 24 and 22, respectively. And since the sensitivity was calculated via TP and FN, the respective results were very similar.
Furthermore, we observed that the smaller the proportion of simulated QTL was, the higher the chances for another method were to perform just as well as fitSGL.
Though results for NPV were very similar to those of sensitivity, we observed different patterns for specificity, PPV, and ACC. In the case of proximity-based measures, the lasso performed better than any other method. With group-based measures, best results were obtained with the lasso and SGL.

Identification of Best Performing Indivduals
In scenario (A), fitSGL correctly identified most of the 10% best performing individuals based on their predicted GBV in 18 out of 27 settings. In 15 out of these 18 settings either only a single group or 3 groups of simulated QTL were present on each chromosome. SGL outperformed all other methods in 8 settings, and the lasso in a single setting. The two best performing methods were SGL and fitSGL, where the average overlap of fitSGL was at least 2.2% greater than that of SGL in settings with less than 9 groups of simulated QTL per chromosome. In settings with 9 groups of QTL both methods performed almost equally well. Similarly, fitSGL performed better than any other method in 13 out of 27 settings in scenario (B). In only 2 out of these 13 settings, 9 groups of QTL were simulated on each chromosome. The lasso reached peak performance in 9 settings and SGL in 5. All of SGL's best performances were found in settings with 9 groups of QTL per chromosome but the average overlap of fitSGL and SGL was the same. In settings with either 1 or 3 groups of simulated QTL per chromosome, the lasso and fitSGL performed best and with equal overlap. The proportion of correctly assigned top 10% performing individuals ranged from 63.5% to 95.2% (lasso), from 65.6% to 93.1% (SGL), and from 63.7% to 95.3% (fitSGL).
Figures displaying the average performance for the remaining settings are provided in Supplementary Files 2 and 3.

Impact of Mixing Parameter
FitSGL outperformed EN and SGL with respect to MSE and correlations of predicted GBVs independently of the choice of a. However, the relative difference between methods gradually diminished with increasing a approaching the lasso penalty. For example in scenario (A) and a ¼ 0:1, on average fitSGL performed 1.92% and 4.48% better than EN and SGL, respectively. With a ¼ 0:5 these values changed to 0.56% and 4.16%. And finally to 0.09% and 0.31% for a ¼ 0:9. With respect to binary statistical measures, we noticed the tendencies that if a increased, the numbers for TN and FN also increased, whereas the numbers for TP and FP decreased. And hence, the sensitivity decreased and the specificity increased. These observations were independent of n=p-ratio and whether proximity-or group-based measures were considered. Furthermore, we observed that the identification of the best performing individuals improved with increasing a. In scenario (A), for instance, the average overlap of correctly identified individuals from EN started at 67.9%, increased to 73.0%, and further to 73.8% for the largest a. The respective values for SGL were 64.9%, 67.4%, and 73.2%. And for fitSGL, 70.8% were observed for a ¼ 0:1 and 74.0% for both a ¼ 0:5 and a ¼ 0:9. However, in scenario (A), we found the peak performance of fitSGL with respect to the MSE and the correlation of predicted GBVs with a ¼ 0:5.

Scalability
The lasso was the only algorithm available in both packages seagull and glmnet, allowing a direct comparison. The time to calculate the full regularization path was 24 seconds for glmnet and 2 hours and 27 minutes for seagull. EN from glmnet also required 24 seconds. The remaining methods from seagull, i.e., GL, SGL, and fitSGL, needed 1h 50min, 2h 6min, and 1h 3min, respectively, when 200 groups of SNPs were present per chromosome. However, if 20 groups were present, the respective numbers changed to 1h 53min, 2h 8min, and 10h 14min. Thus, the computational time for GL and SGL apparently did not depend on the number of groups per chromosome, whereas fitSGL was heavily sensitive to it. FitSGL relies strongly on matrix algebra within groups but there is another major factor influencing the speed of calculations within the seagull package: acceleration was not implemented for the lasso, GL, and SGL algorithm but for fitSGL (step 7 in Algorithm 1). This explains first why the fitSGL with 200 groups per chromosome ran faster than other methods from seagull and second why other algorithms from seagull ran slower than the lasso and EN from glmnet. More examples about scalability of seagull in real data applications can be found in [18].

DISCUSSION
We introduced fitSGL as a novel penalization approach for estimating the vector of features b. FitSGL was designed for correlated predictor variables which need to be grouped in advance. As an example, we verified its ability to predict genetic values of not-yet phenotyped individuals and to detect genomic regions associated with trait expression. We inspected its performance based on simulated data and observed that fitSGL was a competitive approach in many respects.
Just like EN and SGL regularization, the penalty of fitSGL is a convex combination of two terms, which are linked via the parameter a. Based on [19], we set this parameter to 0.5 for all of these methods to balance between the estimation and prediction error. We investigated the impact of this mixing parameter using a representative setting. If high accuracy for the prediction of the future performance of an individual and high rates of correctly identified best performing individuals were desired, then larger values of a were favorable, i.e., a ! 0:5.
To substantiate our objectives, fitSGL was evaluated with respect to the two major criteria: (i) its precision of predicting the individuals' performance and, consequently, its potential to identify selection candidates for breeding objectives, and (ii) the ability to detect the trait-associated sites. The first criterion would favor a larger weight on the penalty that harbors Xb, whereas for the second criterion a larger weight on the penalty for b would be advisable. By setting a to 0.5 for the fitSGL to support both perspectives equally, a strong competition is introduced between sparsity on the level of b and a more subtle sparsity on the level of Xb. As a direct consequence, the false positive rate is tremendously increased compared to methods which introduce sparsity only on the level of b.
We observed that the new method outperformed all other methods with respect to the above mentioned evaluation criteria (i) and (ii), whenever the simulated signal, i.e., the number of causal variants, was very sparse, indicating a strong dependence on genetic architecture. The sensitivity of the estimation process towards single signals can be adjusted through a. However, this aspect requires more research in the future.
Another parameter to be mindful of is the regularization parameter d l from (7). In [5], it was suggested to set this parameter so that the degrees of freedom in (8) are equal among groups. However, in fitSGL this parameter is solely required for groups with rank deficiency. The above suggestion might not fit perfectly. It would result in the same weight for every such group, and thus potentially lead to poor interpretability when compared to any group with full rank. Instead, since the columns of X were initially standardized to gain independence of the scale of X, we suggest d l ¼ 1 as a scale-free solution.
Furthermore, it is necessary to specify the step width t. This parameter determines the width between consecutive iterations. If a value too large is chosen, there is a chance that at a certain point the algorithm jumps over the optimal solution. If chosen too small, the changes of the solution from one iteration to the next might be small enough in order to indicate convergence, even though the solution is still far from optimal. During preliminary investigations, we verified different values for t and found that values ! 1 caused unstable behaviour of MSE. Thus, we propose to reduce t by an order of magnitude, i.e., t ¼ 0:1.
GL, SGL, and fitSGL require additional grouping information which could be achieved with one of the various clustering algorithms. In genome-based evaluations, the ordering of predictor variables is determined by the physical coordinates of markers on a chromosome. Hence, we applied adjclust which is a hierarchical clustering procedure allowing only adjacent clusters to merge. The outcome is a tree structure, but we selected a fixed number of groups per chromosome in real data analysis only to demonstrate scalability of methods. In a practical application, however, the optimal number of groups shall be selected based on an objective criterion such as gap statistic (as implemented in the BALD package) or slope heuristics capushe (also available in adjclust).

CONCLUSION
We implemented a lasso-type penalization approach, which not only accounts for sparsity of signals but also of fitted values. As fitted values are of particular importance for breeding applications, we validated the new approach "fitSGL" for its use in a genome-wide regression analysis. If only few regions associated with trait expression exist on the genome, our method proved beneficial, especially if p > n. The lower the impact of regressors on trait expression is, the more difficult it is to identify the causal signals per se. FitSGL performed best under such circumstances. In other investigated scenarios, the novel method was still competitive to the other penalization approaches, often being closest in its performance to the sparse-group lasso. We extended our R package "seagull" (available at CRAN) to include fitSGL.