Advertisement

Performance indicators associated with match outcome within the United Rugby Championship

Open AccessPublished:December 02, 2022DOI:https://doi.org/10.1016/j.jsams.2022.11.006

      Abstract

      Objectives

      The aims of this study were to: i) identify performance indicators associated with match outcomes in the United Rugby Championship; ii) compare the efficacy of isolated and relative datasets to predict match outcome; and iii) investigate whether reduced statistical models can reproduce predictive accuracy.

      Design

      Retrospective analysis of key performance indicators in the United Rugby Championship.

      Methods

      Twenty-seven performance indicators were selected from 96 matches (2020–21 United Rugby Championship). Random forest classification was completed on isolated and relative datasets, using a binary match outcome (win/lose). Maximum relevance and minimum redundancy performance indicator selection was utilised to reduce models. In addition, models were tested on 53 matches from the 2021–22 season to ascertain prediction accuracy.

      Results

      Within the 2020–21 datasets, the full models correctly classified 83% of match performances for the relative dataset and 64% for isolated data, the equivalent reduced models classified 85% and 66% respectively. The reduced relative model successfully predicted 90% of match performances in the 21–22 season, highlighting that five performance indicators were significant: kicks from hand, metres made, clean breaks, turnovers conceded and scrum penalties.

      Conclusions

      Relative performance indicators were more effective in predicting match outcomes than isolated data. Reducing features used in random forest classification did not degrade prediction accuracy, whilst also simplifying interpretation for practitioners. Increased kicks from hand, metres made, and clean breaks compared to the opposition, as well as fewer scrum penalties and turnovers conceded were all indicators of winning match outcomes within the United Rugby Championship.

      Abbreviations:

      MDA (mean decrease accuracy), MRMR (maximum relevance, minimum redundancy), OOB (out of bag), PI (performance indicator), RFC (random forest classification)

      Keywords

      Practical implications

      • An effective kicking approach is essential to winning performances, including an understanding of the opposition's strategy.
      • Datasets with context of the opposition should be used to interpret performance post-match.
      • Identifying a smaller set of uncorrelated PIs using feature selection methods can be as efficient for monitoring as using a full dataset.

      1. Introduction

      When quantifying success within Rugby Union, performance indicators (PIs) can be used to investigate and infer key processes that underpin winning performances. This approach has been studied in Rugby Union, including within the English Premiership and at international level, with most studies focussed on the predictive ability of PIs without consideration of the opposition's performance.
      • Colomer C.M.E.
      • Pyne D.B.
      • Mooney M.
      • et al.
      Performance analysis in rugby union: a critical systematic review.
      In these studies, numerical techniques such as supervised machine learning
      • Bennett M.
      • Bezodis N.
      • Shearer D.A.
      • et al.
      Descriptive conversion of performance indicators in rugby union.
      • Bennett M.
      • Bezodis N.
      • Shearer D.A.
      • et al.
      Predicting performance at the group-phase and knockout-phase of the 2015 Rugby World Cup.
      • Mosey T.J.
      • Mitchell L.J.G.
      Key performance indicators in Australian sub-elite rugby union.
      and varied hypothesis testing
      • Bunker R.P.
      • Spencer K.
      Performance indicators contributing to success at the group and play-off stages of the 2019 rugby world cup.
      • Hughes A.
      • Barnes A.
      • Churchill S.M.
      • et al.
      Performance indicators that discriminate winning and losing in elite men’s and women’s rugby union.
      • Bishop L.
      • Barnes A.
      Performance indicators that discriminate winning and losing in the knockout stages of the 2011 rugby World Cup.
      were utilised to compare PIs between the winning and losing team performances. For example, winning teams in World Cup matches tend to win opposition lineouts,
      • Hughes A.
      • Barnes A.
      • Churchill S.M.
      • et al.
      Performance indicators that discriminate winning and losing in elite men’s and women’s rugby union.
      gain more metres,
      • Bunker R.P.
      • Spencer K.
      Performance indicators contributing to success at the group and play-off stages of the 2019 rugby world cup.
      kick out of hand more
      • Bishop L.
      • Barnes A.
      Performance indicators that discriminate winning and losing in the knockout stages of the 2011 rugby World Cup.
      and concede most of their penalties between their opposition's 50 and 22 m.
      • Bishop L.
      • Barnes A.
      Performance indicators that discriminate winning and losing in the knockout stages of the 2011 rugby World Cup.
      In contrast, losing teams carry less and have low lineout success.
      • Bunker R.P.
      • Spencer K.
      Performance indicators contributing to success at the group and play-off stages of the 2019 rugby world cup.
      Other studies have also developed more complex models, using methods such as principal component and discriminant analyses to interpret PIs and alternative areas of performance such as training effects and physical markers.
      • Ortega E.
      • Villarejo D.
      • Palao J.M.
      Differences in game statistics between winning and losing rugby teams in the Six Nations Tournament.
      ,
      • Pino-Ortega J.
      • Rojas-Valverde D.
      • Gómez-Carmona C.D.
      • et al.
      Training design, performance analysis, and talent identification—a systematic review about the most relevant variables through the principal component analysis in Soccer, Basketball, and Rugby.
      Additionally, Ortega et al. reported similar results to studies using random forest,
      • Bennett M.
      • Bezodis N.
      • Shearer D.A.
      • et al.
      Descriptive conversion of performance indicators in rugby union.
      • Bennett M.
      • Bezodis N.
      • Shearer D.A.
      • et al.
      Predicting performance at the group-phase and knockout-phase of the 2015 Rugby World Cup.
      • Mosey T.J.
      • Mitchell L.J.G.
      Key performance indicators in Australian sub-elite rugby union.
      whilst also reporting higher average values for mauls won and line breaks in winning teams.
      • Ortega E.
      • Villarejo D.
      • Palao J.M.
      Differences in game statistics between winning and losing rugby teams in the Six Nations Tournament.
      Other factors that have been reported to have an impact on game performance include the match location
      • Vaz L.
      • Carreras D.
      • Kraak W.
      Analysis of the effect of alternating home and away field advantage during the Six Nations Rugby Championship.
      and the stage of competition.
      • Bishop L.
      • Barnes A.
      Performance indicators that discriminate winning and losing in the knockout stages of the 2011 rugby World Cup.
      Recently, several articles have focussed in on performance indicators with consideration of the opposition, in which team PIs are relativised to reflect the differences between two teams within a match. For example, if one team made 100 passes and the opposition made 150 passes, the relativised passes would be −50 and 50 for each team, respectively.
      • Bennett M.
      • Bezodis N.
      • Shearer D.A.
      • et al.
      Descriptive conversion of performance indicators in rugby union.
      Adopting this simple mathematical process, Bennett et al.
      • Bennett M.
      • Bezodis N.
      • Shearer D.A.
      • et al.
      Descriptive conversion of performance indicators in rugby union.
      ,
      • Bennett M.
      • Bezodis N.
      • Shearer D.A.
      • et al.
      Predicting performance at the group-phase and knockout-phase of the 2015 Rugby World Cup.
      reported many relative variables had significant relationships with match outcome in both the English Premiership and the 2015 World Cup. These included kicks from hand, clean breaks, and average carry distance, and similar outcomes were established by Mosey et al. within sub-elite Australian Rugby.
      • Mosey T.J.
      • Mitchell L.J.G.
      Key performance indicators in Australian sub-elite rugby union.
      Interestingly, Bennett et al.
      • Bennett M.
      • Bezodis N.
      • Shearer D.A.
      • et al.
      Descriptive conversion of performance indicators in rugby union.
      identified that there was a clear improvement to match prediction when relative data were utilised in place of standard PIs, whereas Mosey and Mitchell
      • Mosey T.J.
      • Mitchell L.J.G.
      Key performance indicators in Australian sub-elite rugby union.
      reported no clear improvement.
      To date, feature selection methods, such as maximum relevance, minimum redundancy have not been used to develop performance prediction models, particularly in the United Rugby Championship (URC) or its predecessors (PRO14 and PRO12). Due to this, results of previous studies include large groups of PIs, which can be complex to interpret and difficult to implement within training or tactical strategies. Statistically optimising PI selection with reduced datasets has the potential to simplify dissemination and maximise up-take by sporting practitioners.
      The primary aims of this study were to: i) identify performance indicators associated with match outcomes in the United Rugby Championship, ii) compare efficacy of isolated data and data relative to opposition in predicting match outcome, and iii) investigate whether reduced PI statistical models can reproduce predictive accuracy.

      2. Methods

      2.1 Data selection

      Within the 2020–21 URC season, PIs for 96 regular matches were downloaded from OPTA (www.optaprorugby.com). Finals and knockout matches were not selected (n = 1 in this season due to structuring) as they may differ to regular matches, given previous research has separated different stages of competition.
      • Bennett M.
      • Bezodis N.
      • Shearer D.A.
      • et al.
      Predicting performance at the group-phase and knockout-phase of the 2015 Rugby World Cup.
      ,
      • Bunker R.P.
      • Spencer K.
      Performance indicators contributing to success at the group and play-off stages of the 2019 rugby world cup.
      There has been no published analysis of reliability of OPTA for Rugby Union, but OPTA data within football have been shown to have high reliability with kappa values of 0.92–0.94.
      • Liu H.
      • Hopkins W.
      • Gómez A.M.
      • et al.
      Inter-operator reliability of live football match statistics from OPTA Sportsdata.
      Rugby Union data from OPTA is used by major clubs and broadcasters worldwide. The following 27 PIs were tabulated from each team's match summaries: carries, metres made, defenders beaten, offloads, passes, tackles, missed tackles, turnovers conceded, kicks from hand, clean breaks, turnovers won, lineouts won, lineouts lost, scrums won, scrums lost, rucks won, rucks lost, penalties conceded, free kicks, scrum penalties, lineout penalties, tackle/ruck/maul penalties, general play penalties, control penalties, yellow cards, red cards and home/away status. These PIs were selected as they formed the match report statistics at the time of download, thereby encompassing many areas of the game.
      The 27 PIs formed the isolated data, whereas the relative data were calculated by deducing the difference in each PI between teams per match. For example, if team A made 200 m and team B made 400, the relative metres made for each team would be −200 and 200, respectively. Nomenclature was used to identify features belonging in each group as follows: PII indicated a PI in its isolated form and PIR indicated a PI in its relative form. For example, CarriesI relates to isolated carries and CarriesR relates to relative carries.

      2.2 Approach

      Random forest classification
      • Breiman L.
      Random forests.
      (RFC) was completed on the full dataset for both isolated and relative data to categorise matches as either wins or losses. Each of the 27 PIs represents a feature in the RFC; the combination of all PIs, across the matches considered, forms the feature space of the algorithm. Generally, this feature space is interrogated by the RFC process to generate and drive ensemble decisions that promote classification of the data to one of the binary win/lose outcomes.
      The method used an ensemble of classification trees by drawing a new training set each time, with replacement, from the original sample.
      • Friedman J.
      • Hastie T.
      • Tibshirani R.
      Random forests, chapter 15.
      This training set was drawn randomly using two thirds of the full dataset, with the remaining dataset forming the out of bag (OOB) test set. The tree was then tested using the OOB set.
      • Friedman J.
      • Hastie T.
      • Tibshirani R.
      Random forests, chapter 15.
      From this set, the error rate (number of incorrect predictions divided by the total number of predictions) was noted.
      • Breiman L.
      Random forests.
      This was averaged for each tree built, to give an OOB error for the random forest model.
      • Breiman L.
      Random forests.
      The mean decrease accuracy (MDA) was used as the measure of importance.
      • Breiman L.
      Random forests.
      MDA represents how much the model accuracy will decrease if a PI was removed from the model, with high values indicating that the PI is relatively more important. The prediction error from OOB data was recorded after permuting through each PI. The difference between the model with and without the PI was determined, then averaged over all trees and normalised.
      • Friedman J.
      • Hastie T.
      • Tibshirani R.
      Random forests, chapter 15.
      The z-scores of the MDA values were then calculated to determine significance. Partial dependency plots (see Fig. 2) were also used to monitor the relationship between match outcome and features used within modelling.
      • Breiman L.
      • Cutler A.
      • Liaw A.
      • et al.
      randomForest: Breiman and Cutler’s Random Forest for Classification and Regression. R Package.
      Maximum relevance minimum redundancy (MRMR) was used to simplify the dataset. The PI with the highest mutual information with match outcome is selected first with the successive features selected in order of maximising mutual information that they have with match outcome, whilst minimising the mutual information shared with the features already selected.
      • Sakar C.O.
      • Kursun O.
      • Gurgern F.
      A feature selection method based on kernel canonical correlation analysis and the minimum redundancy-maximum relevance filter method.
      The mutual information was calculated as:
      Ixy=12ln1ρxy2


      where I represents the mutual information, and ρ the correlation coefficient of features x and y. Pearson was used as the correlation method between two continuous features, Cramer's V for two binary features and Somers' Dxy was used to compare continuous and binary features.
      • Sakar C.O.
      • Kursun O.
      • Gurgern F.
      A feature selection method based on kernel canonical correlation analysis and the minimum redundancy-maximum relevance filter method.
      The score qj, for maximising MRMR at each jth step was calculated as:
      qj=Ixjy1SxkIxjxk


      where S refers to the set of features and k represents the iteration step prior to j.
      An optimisation loop was created to maximise the model accuracy in predicting matches, whilst minimising the features used in modelling. A summary of the steps taken can be seen in Fig. 1.
      Fig. 1
      Fig. 1Flow diagram outlining steps taken within the optimisation loop to maximise model OOB accuracy whilst minimising the number of features used in modelling. A similar loop was used to optimise random forest parameters.
      Once a unique set of features was chosen for both datasets, the RFC parameters were then optimised. The features selected in the previous step were maintained, but the number of features considered at each split was changed at each iteration instead. Values from one to the maximum number of features in the model were chosen and tested at one step intervals. The same metric was used to choose a value as described in Fig. 1. Similar optimisation was used for the number of trees within the model, with only the number of trees changed in each iteration. Tree values between 50 and 2500 were tested at 50 tree intervals. The initial optimised number of trees was maintained the same in both models to allow comparison when the model was permuted, due to differences in trees impacting MDA values. Modelling and analysis were completed using the following packages in R: randomForest,
      • Breiman L.
      • Cutler A.
      • Liaw A.
      • et al.
      randomForest: Breiman and Cutler’s Random Forest for Classification and Regression. R Package.
      rfUtlities,
      • Evans J.S.
      • Murphy M.A.
      rfUtilities: Random Forest Model Selection and Performance Evaluation. R Package.
      mRMRe
      • De Jay N.
      • Papillon-Cavanagh S.
      • Olsen C.
      • et al.
      mRMRe: an R package for parallelized mRMR ensemble feature selection.
      and rfPermute.
      • Archer E.
      rfPermute: Estimate Permutation p-Values for Random Forest Importance metrics. R Package.

      2.3 Model evaluation

      After a final model was established for both datasets, data were sourced for the opening rounds of the 2021/22 season. This included 53 matches up to and including round 10 of the competition, excluding any postponed matches due to COVID-19.
      The model was then applied in prediction. McNemar's test was used throughout for model comparison.
      • De Jay N.
      • Papillon-Cavanagh S.
      • Olsen C.
      • et al.
      mRMRe: an R package for parallelized mRMR ensemble feature selection.
      The test statistic can be calculated by:
      χ2=BC2B+C


      where, B represents the number of outcomes correctly identified by the first model only, and C represents the number of outcomes correctly by the second models only.
      • De Jay N.
      • Papillon-Cavanagh S.
      • Olsen C.
      • et al.
      mRMRe: an R package for parallelized mRMR ensemble feature selection.
      A 5% significance level was utilised for p-values and 95% confidence intervals were used throughout this study.

      3. Results

      The initial RFC for season 2020-21 was completed on both datasets. The full isolated model correctly classified 122 match performances out of 192, giving an accuracy of 64% with a 95% confidence interval (CI) of (56%, 70%). Within this, 66% of wins were correctly classified compared to 61% of losses.
      The full relative model correctly classified 159 out of 192 match performances (83%, CI (77%, 88%)), with 82% of wins correctly classified and 83% of losses. McNemar's test confirmed that the full relative model outperformed the full isolated model with a value of 16.00 (p < 0.05).
      The full models for both the isolated and relative sets were then reduced with feature selection and the RFC parameters optimised. For the isolated data, six features were the optimum number of features. These features were Metres MadeI, Kicks from HandI, Turnovers ConcededI, Scrum PenaltiesI, Turnovers WonI, and Lineouts LostI. Using this reduced feature set, 1650 was identified as the optimal number of trees, whereas the features tested at each split were optimised at five. The reduced isolated model, given the above parameters and features, accurately classified 126 out of 192 match performances (66%, CI (58%, 72%)), including 69% of wins and 63% of losses.
      Within the relative set, optimisation led to the selection of seven features for the reduced relative model. These features were Kicks from HandR, Metres MadeR, Scrum PenaltiesR, Scrums LostR, Control PenaltiesR, Turnovers ConcededR, and Clean BreaksR. The optimal number of features tried at each split was one for the reduced relative model. To maintain the ability to compare MDA in both models, the number of trees was set to 1650 to match the reduced isolated model. The reduced relative model correctly classified 163 out 192 match performances (85%, CI (79%, 90%)), of which it correctly identified 84% of wins and 85% of losses. McNemar's test value was 16.40 (p < 0.05) illustrating that relative data outperformed the isolated data.
      There was no significant difference in reduced model performance, with McNemar's values of 0.25 (p > 0.05) for the isolated models' comparison and 0.75 (p > 0.05) for the relative models' comparison.
      Both reduced models were used in prediction on the 2021–22 datasets for URC matches that had been completed at the time of analysis (n = 53). The reduced isolated model accurately predicted 76 out of 106 match performances (72%, CI (62%, 80%)), including 79% of wins and 64% of losses. With the reduced relative model, 95 match performances out of 106 were correctly predicted (90%, CI (82%, 95%)), with 89% of wins and 91% of losses. In prediction, the reduced relative model outperformed the reduced isolated model based on McNemar's test (χ2 = 10.62, p < 0.05).
      Both full models were also used in prediction on the 2021–22 datasets for URC matches. The full isolated model accurately predicted 77 out of 106 match performances (73%, CI (63%, 81%)), including 74% of wins and 72% of losses. With the full relative model, 96 match performances out of 106 were correctly predicted (91%, CI (83%, 95%)), with 91% of wins and 91% of losses. In prediction, the full relative model outperformed the full isolated model based on McNemar's test (χ2 = 24.60, p < 0.05). When the full and reduced models were compared in prediction, there was no evidence of significant differences in performance (χ2 = 0 in both cases, p > 0.05).
      The MDA z-scores for each feature in the model are summarised in Table 1 along with the corresponding p-values. Within the reduced isolated model, only five features were significant based at the 5% significance level. These features were Metres MadeI, Turnovers WonI, Kicks from HandI, Scrum PenaltiesI and Turnovers ConcededI, with their related MDA z-scores ranging from 21.7 to 12.5.
      Table 1The mean decrease accuracy values and associated p values based for the isolated and relative reduced model features.
      FeaturesMean decrease accuracy z-scoresp-value
      Metres MadeI21.70.01
      Kicks from HandI18.20.01
      Turnovers ConcededI17.30.02
      Turnovers WonI15.90.03
      Scrum PenaltiesI12.50.04
      Lineouts LostI4.50.15
      Kicks from HandR51.60.01
      Metres MadeR25.30.01
      Clean BreaksR25.20.01
      Turnovers ConcededR24.80.01
      Scrum PenaltiesR17.70.01
      Control PenaltiesR1.90.36
      Scrums LostR0.30.46
      Within the reduced relative model, only five features were significant including Kicks from HandR, Clean BreaksR, Scrum PenaltiesR, Metres MadeR and Turnovers ConcededR. The range of MDA z-scores was much larger for significant features in the relative set (51.6–17.7), and this is primarily due to the magnitude of the MDA z-score for Kicks from HandR.
      Partial dependence plots for all statistically significant features within the reduced relative model, based on MDA z-scores, are presented in Fig. 2. Plots A–C illustrate the partial dependence across the range of Kicks from HandR, Clean BreaksR, and Metres MadeR respectively, which were all positively associated with winning. Plots D and E show partial dependence for both Scrum PenaltiesR and Turnovers ConcededR, with both PIs negatively associated with winning. There is no increase in probability of winning after 10 relative kicks (Fig. 2A) and no increase in probability of winning after approximately 300 relative metres made (Fig. 2B). Clean BreaksR has no increase in probability of winning after 10 additional clean breaks (Fig. 2C) whereas, Scrum PenaltiesR tend to have no increase in winning with less than −5 penalties (Fig. 2E).
      Fig. 2
      Fig. 2Partial dependence plots for significant relative features (based on mean decrease accuracy z-scores). The plots show the marginal effect of relative kicks from hand (A), relative metres made (B), relative clean breaks (C), relative turnovers conceded (D) and relative scrum penalties (E) on classification of match outcome. The x axis contains the range of values for each of the previously named PIs, and the y axis contains the partial dependence. Partial dependence values range from −1 to 1, with negative values leading to increased likelihood of a match performance being classified as a loss and positive values leading to increased likelihood of a performance being classified as a win.

      4. Discussion

      The focus of this study was to investigate which PIs were associated with match outcome within the URC, compare efficacy of isolated and relative data in match outcome prediction and investigate whether feature selection methods could be used in PI statistical models to reproduce accuracy with smaller datasets. The current results indicate that kicks from hand, metres made, clean breaks, turnovers conceded, and scrum penalties were key PIs in differentiating between winning and losing performances within URC matches. Furthermore, the current study corroborates what has been recognised within literature: team performance data are much more efficient at predicting match outcome when expressed relative to the opposition's performance.
      • Bennett M.
      • Bezodis N.
      • Shearer D.A.
      • et al.
      Descriptive conversion of performance indicators in rugby union.
      This suggests that team performance should not be analysed independently, but in context of the opposition. The study also demonstrated that utilising feature selection methods to reduce datasets does not negatively impact model effectiveness in this context. The ability to generate smaller datasets with methods such as MRMR, whilst maintaining high predictive accuracy, is valuable when ensuring that results can be successfully translated into a practical application. This, in turn, assists with the development of tactical planning, informs elements of coaching, and simplifies monitoring processes.
      Kicking was significant within both isolated and relative modelling datasets; however, when data were relativised, kicks from hand became a more effective differentiator between match outcomes. When kicks from hand was removed from the model, it led to an average mean decrease accuracy z-score of 51.6, demonstrating its power in differentiating between wins and losses. Over time, the nature of kicking has changed within Rugby Union
      • Stats Perform
      Revolutionising rugby – a statistical analysis on how the game has evolved.
      and can be performed in search of territorial or tactical advantage. The outcomes of this study suggest that promoting tactics that allow a team to gain additional kicks against their opposition may be beneficial to match success. Using box kicks in vulnerable positions, kicks for touch, kick chase tactics and “winning the kicking battle” are all examples of areas where teams may be able to perform additional kicks against their opposition. The latter refers to periods of play where teams exchange many kicks in a row, and starting and finishing these battles could be another area that would assist with increasing relative kicks from hand within a match. The introduction of the 50:22 law in the test data is likely to drive additional kicks within winning teams.
      • Stats Perform
      Revolutionising rugby – a statistical analysis on how the game has evolved.
      These findings validate what has been reported in both elite Rugby Union and sub-elite Rugby Union.
      • Bennett M.
      • Bezodis N.
      • Shearer D.A.
      • et al.
      Descriptive conversion of performance indicators in rugby union.
      • Bennett M.
      • Bezodis N.
      • Shearer D.A.
      • et al.
      Predicting performance at the group-phase and knockout-phase of the 2015 Rugby World Cup.
      • Mosey T.J.
      • Mitchell L.J.G.
      Key performance indicators in Australian sub-elite rugby union.
      Attacking metrics, including metres made, and clean breaks also ranked highly within the reduced relative model, demonstrating the importance of an effective attack and conversely a strong defence. This corroborates what has been reported within international and sub-elite Rugby Union.
      • Bennett M.
      • Bezodis N.
      • Shearer D.A.
      • et al.
      Descriptive conversion of performance indicators in rugby union.
      ,
      • Mosey T.J.
      • Mitchell L.J.G.
      Key performance indicators in Australian sub-elite rugby union.
      Increased relative metres made can be an indicator of gainline success but also preventing this for the opposition, hence it is clear why this metric can indicate a winning performance over the entirety of a match. Clean breaks featured in the reduced relative model only and was the third most important feature based on MDA. This suggests that making clean breaks alone is not key to match success, but the ability to make more than the opposition is. This could be achieved by completing more clean breaks or potentially by preventing the opposition from executing clean breaks.
      Another key area of importance is at the breakdown, with turnovers conceded a significant feature within both models, which is in line with what had been reported within literature for men's Rugby Union.
      • Mosey T.J.
      • Mitchell L.J.G.
      Key performance indicators in Australian sub-elite rugby union.
      This suggests that conceding fewer turnovers than the opposition, or alternatively forcing more turnovers from the opposition, is key to winning matches.
      Set piece discipline is also considered important based on MDA, in the form of scrum penalties. Over time, the attitude towards scrums has changed, with packs getting heavier,
      • Hill N.E.
      • Rilstone S.
      • Stacey M.J.
      • et al.
      Changes in northern hemisphere male international rugby union players’ body mass and height between 1955 and 2015.
      law changes, and entire front row substitution typical in every match. Stolen scrums are uncommon within professional Rugby with the average in the URC 2020/21 season 0.47 per match. This means other methods are needed to force turnovers in the scrum and, hence, scrum penalties become more important to the game. The team's ability to control their own scrum and opposition scrums is key to forcing the opposition to concede scrum penalties. Teams can then use awarded penalties to either kick for points or gain a tactical or territorial advantage. Scrum penalties have not yet been identified as a contributor to winning performances, but many studies have identified total penalties conceded a key indicator of match success.
      • Bennett M.
      • Bezodis N.
      • Shearer D.A.
      • et al.
      Descriptive conversion of performance indicators in rugby union.
      ,
      • Bishop L.
      • Barnes A.
      Performance indicators that discriminate winning and losing in the knockout stages of the 2011 rugby World Cup.
      ,
      • Vaz l.
      • Hendricks S.
      • Kraak W.
      Statistical review and match analysis of rugby world cups finals.
      It is possible that breaking down penalties in different areas of the game may be useful, especially in practical implementation of research.
      Some key differences were found between URC and the other competitions. Within World Cup groups and knockout matches, missed tackles and tackle ratio were both reported as significantly different between winning and losing teams.
      • Hughes A.
      • Barnes A.
      • Churchill S.M.
      • et al.
      Performance indicators that discriminate winning and losing in elite men’s and women’s rugby union.
      This finding suggests that missed tackles have a different impact in higher stake games such as group and knockout matches. It is not clear whether this relationship is strictly seen at the international level, as knockout matches have not been analysed within this study. Within the Six Nations, there were many similarities between studies such as line breaks, turnovers won, and possessions kicked.
      • Ortega E.
      • Villarejo D.
      • Palao J.M.
      Differences in game statistics between winning and losing rugby teams in the Six Nations Tournament.
      Additionally, mauls won was highlighted as a key feature, which was not identified in the URC. This is interesting as it could suggest that mauls are more effective within winning teams in the Six Nations. Premiership Rugby, arguably the closest in style to the URC structure, has reported many key features in common with this paper, as previously discussed.
      • Bennett M.
      • Bezodis N.
      • Shearer D.A.
      • et al.
      Descriptive conversion of performance indicators in rugby union.
      However, winning teams in the Premiership tend to have a higher difference in metres per carry relative to their opposition. This was not identified within our current study, as average carry was not available as part of this data set but could be calculated in future analysis in this area.
      Whilst random forest modelling is a recognised and popular method within Rugby Union
      • Colomer C.M.E.
      • Pyne D.B.
      • Mooney M.
      • et al.
      Performance analysis in rugby union: a critical systematic review.
      performance analysis, feature selection has not been used within key literature, with the possibility of its application only being discussed briefly.
      • Bunker R.P.
      • Thabhtah F.
      A machine learning framework for sports result prediction.
      Using MRMR has allowed the current model to target the key features that are driving successful performances, whilst removing highly correlated features from within the model. This assists with removing similar features, for example metrics such as defenders beaten and clean breaks, which are highly correlated due to their relationship in matches. This is useful both in reducing the noise in the modelling, but also within practical applications of this research. Within a professional Rugby environment, the reduced feature set can be utilised by practitioners to focus on a manageable set of parameters that can be focussed on in training. Many other feature selection methods, such as Principal Component Analysis and Partial Least Squares regression can also be used to reduce features. However, these methods create features based on linear combinations of the original features, which adds complexity when interpreting and applying results
      • Jolliffe I.T.
      Rotation and interpretation of principal components, chapter 11.
      as seen in research in Rugby League.
      • Parmer N.
      • James N.
      • Hearne G.
      • et al.
      Using principal component analysis to develop performance indicators in professional rugby league.
      The interpretation cost may outweigh the benefit of model simplification, which is avoided by using MRMR as it maintains features in their original form.
      • Evans J.S.
      • Murphy M.A.
      rfUtilities: Random Forest Model Selection and Performance Evaluation. R Package.
      The benefit of MRMR feature selection in combination with other machine learning methods is unknown but it may be advantageous for future studies to investigate this.
      The reduced relative model was effective in prediction, with 90% of match performances correctly classified. The majority of errors (9 out of 11) were from matches with point differences of six points or less, suggesting that close matches may be more difficult to predict. This is interesting as in Rugby Union, any team who lose by seven points or fewer is awarded a bonus point.
      • United Rugby Championship
      Summary of rules.
      Close matches have been studied independently, with studies using cluster analysis to decide what defines a close game rather than using the bonus points laws to decide.
      • Vaz L.
      • Rooyen M.V.
      • Sampaio J.
      Rugby game-related statistics that discriminate between winning and losing teams in Irb and Super twelve close games.
      ,
      • Vaz L.
      • Mouchet A.
      • Carreras D.
      • et al.
      The importance of rugby game-related statistics to discriminate winners and losers at the elite level competitions in close and balanced games.
      Further research is needed to understand this implication as well as how it can be avoided, if at all, in future studies. Previous studies have highlighted the importance of match location to winning and losing teams,
      • Vaz L.
      • Carreras D.
      • Kraak W.
      Analysis of the effect of alternating home and away field advantage during the Six Nations Rugby Championship.
      however our current study did not find that this was the case. Given the COVID-19 pandemic, all matches within the 2020/21 dataset took place with no crowds, which may have diminished the influence of home advantage.

      5. Conclusions

      Indicators of winning performances within the URC can be simplified to five key features; kicks from hand, metres made, clean breaks, turnovers conceded and scrum penalties. Kicking has been highlighted as a key driver in match success, with a team kicking more than their opposition leading to increased probability of winning. It has also demonstrated the effectiveness of using data relative to the opposition, and that simplified datasets can be used to understand the drivers of match outcome in Rugby Union. MRMR has allowed a small set of PIs to be highlighted in this study, leading to manageable results when put into a practical perspective.

      Funding information

      This work was supported by a jointly funded PhD scholarship from Ospreys Rugby and EPSRC (EP/T517987/1).

      Confirmation of ethical compliance

      Ethical approval was given by Swansea University College of Engineering Research Ethics and Governance under application number GS_31-08-21.

      CRediT authorship contribution statement

      Georgia A. Scott: Data curation, Formal analysis, Writing – original draft, Software, Visualization. Neil Bezodis: Supervision, Writing – review & editing. Mark Waldron: Supervision, Writing – review & editing. Mark Bennett: Methodology, Supervision, Writing – review & editing. Simon Church: Conceptualization, Supervision. Liam P. Kilduff: Conceptualization, Methodology, Supervision, Writing – review & editing. M. Rowan Brown: Conceptualization, Methodology, Software, Supervision, Writing – review & editing.

      Declaration of interest statement

      All authors confirm they have no conflicts or competing interests in publishing this work.

      Acknowledgements

      The authors would like to thank the Ospreys and EPSRC for supporting this research.

      References

        • Colomer C.M.E.
        • Pyne D.B.
        • Mooney M.
        • et al.
        Performance analysis in rugby union: a critical systematic review.
        Sports Med Open. 2020; 6: 4https://doi.org/10.1186/s40798-019-0232-x
        • Bennett M.
        • Bezodis N.
        • Shearer D.A.
        • et al.
        Descriptive conversion of performance indicators in rugby union.
        J Sci Med Sport. 2019; 22: 330-334https://doi.org/10.1016/j.jsams.2018.08.008
        • Bennett M.
        • Bezodis N.
        • Shearer D.A.
        • et al.
        Predicting performance at the group-phase and knockout-phase of the 2015 Rugby World Cup.
        Eur J Sport Sci. 2021; 21: 312-320https://doi.org/10.1080/17461391.2020.1743764
        • Mosey T.J.
        • Mitchell L.J.G.
        Key performance indicators in Australian sub-elite rugby union.
        J Sci Med Sport. 2020; 23: 35-40https://doi.org/10.1016/j.jsams.2019.08.014
        • Bunker R.P.
        • Spencer K.
        Performance indicators contributing to success at the group and play-off stages of the 2019 rugby world cup.
        J Hum Sport Exerc. 2021; 17https://doi.org/10.14198/jhse.2022.173.18
        • Hughes A.
        • Barnes A.
        • Churchill S.M.
        • et al.
        Performance indicators that discriminate winning and losing in elite men’s and women’s rugby union.
        Int J Perform Anal Sport. 2017; 17: 534-544https://doi.org/10.1080/24748668.2017.1366759
        • Bishop L.
        • Barnes A.
        Performance indicators that discriminate winning and losing in the knockout stages of the 2011 rugby World Cup.
        Int J Perform Anal Sport. 2013; 13: 149-159https://doi.org/10.1080/24748668.2013.11868638
        • Ortega E.
        • Villarejo D.
        • Palao J.M.
        Differences in game statistics between winning and losing rugby teams in the Six Nations Tournament.
        J Sports Sci Med. 2009; 8: 523
        • Pino-Ortega J.
        • Rojas-Valverde D.
        • Gómez-Carmona C.D.
        • et al.
        Training design, performance analysis, and talent identification—a systematic review about the most relevant variables through the principal component analysis in Soccer, Basketball, and Rugby.
        Int J Environ Res Public Health. 2021; 18: 2642
        • Vaz L.
        • Carreras D.
        • Kraak W.
        Analysis of the effect of alternating home and away field advantage during the Six Nations Rugby Championship.
        Int J Perform Anal Sport. 2012; 12: 593-607https://doi.org/10.1080/24748668.2012.11868621
        • Liu H.
        • Hopkins W.
        • Gómez A.M.
        • et al.
        Inter-operator reliability of live football match statistics from OPTA Sportsdata.
        Int J Perform Anal Sport. 2013; 13: 803-821https://doi.org/10.1080/24748668.2013.11868690
        • Stats Perform
        A collaborative ecosystem.
        (Available at:)
        • Breiman L.
        Random forests.
        Mach Learn. 2001; 45: 5-32
        • Friedman J.
        • Hastie T.
        • Tibshirani R.
        Random forests, chapter 15.
        in: The Elements of Statistical Learning. 2nd ed. Springer, New York2009
        • Breiman L.
        • Cutler A.
        • Liaw A.
        • et al.
        randomForest: Breiman and Cutler’s Random Forest for Classification and Regression. R Package.
        2022 (4.7-1)
        • Archer E.
        rfPermute: Estimate Permutation p-Values for Random Forest Importance metrics. R Package.
        2022 (2.5.1)
        • Sakar C.O.
        • Kursun O.
        • Gurgern F.
        A feature selection method based on kernel canonical correlation analysis and the minimum redundancy-maximum relevance filter method.
        Expert Syst Appl. 2012; 39: 3432-3437https://doi.org/10.1016/j.eswa.2011.09.031
        • Evans J.S.
        • Murphy M.A.
        rfUtilities: Random Forest Model Selection and Performance Evaluation. R Package.
        2019 (2.1-5)
        • De Jay N.
        • Papillon-Cavanagh S.
        • Olsen C.
        • et al.
        mRMRe: an R package for parallelized mRMR ensemble feature selection.
        Bioinform. 2013; 29: 2365-2368https://doi.org/10.1093/bioinformatics/btt383
        • Stats Perform
        Revolutionising rugby – a statistical analysis on how the game has evolved.
        (Available at)
        • Hill N.E.
        • Rilstone S.
        • Stacey M.J.
        • et al.
        Changes in northern hemisphere male international rugby union players’ body mass and height between 1955 and 2015.
        BMJ Open Sport Exerc Med. 2018; 4e000459https://doi.org/10.1136/bmjsem-2018-000459
        • Vaz l.
        • Hendricks S.
        • Kraak W.
        Statistical review and match analysis of rugby world cups finals.
        J Hum Kinet. 2019; 66: 247-256https://doi.org/10.2478/hukin-2018-0061
        • Bunker R.P.
        • Thabhtah F.
        A machine learning framework for sports result prediction.
        Appl Comput Inform. 2017; 15: 27-33https://doi.org/10.1016/j.aci.2017.09.005
        • Jolliffe I.T.
        Rotation and interpretation of principal components, chapter 11.
        in: Principal Component Analysis. 2nd ed. Springer New York, New York2002
        • Parmer N.
        • James N.
        • Hearne G.
        • et al.
        Using principal component analysis to develop performance indicators in professional rugby league.
        Int J Perform Anal Sport. 2018; 18: 938-949https://doi.org/10.1080/24748668.2018.1528525
        • United Rugby Championship
        Summary of rules.
        (Available at:)
        • Vaz L.
        • Rooyen M.V.
        • Sampaio J.
        Rugby game-related statistics that discriminate between winning and losing teams in Irb and Super twelve close games.
        J Sports Sci Med. 2010; 9: 51-55
        • Vaz L.
        • Mouchet A.
        • Carreras D.
        • et al.
        The importance of rugby game-related statistics to discriminate winners and losers at the elite level competitions in close and balanced games.
        Int J Perform Anal Sport. 2011; 11: 130-141