Enhancing Consumer Insights: Integrating Television And Print Audience Currencies With Consumer Behavior Data

Dr Jim Collins, Mediamark Research & Intelligence

Pete Doe, The Nielsen Company

Worldwide Readership Research Symposium Valencia 2009 Session 5.3

Introduction

During the last year, The Nielsen Company (Nielsen) and Mediamark Research & Intelligence (MRI) have brought to market fusions of two of the most important media research databases in the United States; the Nielsen National People Meter Television panel (NPM) and MRI’s print readership study The Survey of the American Consumer. For a variety of reasons the development, introduction and utilization of these media ratings fusions signifies an important moment in the evolution of media measurement, planning and buying in the United States:

  1. Validates fusion in the United States as an appropriate methodology for the development of integrated media ratings data sources
  2. Incorporates consumer behavioral and attitudinal targeting, as distinct from and in addition to demographic targeting, into the television planning and transactional processes.
  3. Develops an integrated multi-media database sourced from currency-quality media measurements.

Given the significance of this development in the United States media market a fuller and more formal elaboration of the fusion processes, validation procedures, commercial utility and continuing development is appropriate. With that said, it is appropriate to acknowledge the larger worldwide context into which this work and its product fits with respect to both its incorporation of well developed fusion/data integration techniques and ways in which it elaborates on them.

The Two Fusions: Purposes and Methods

Initial deliberations, first Nielsen and MRI jointly, and subsequently including several common television and advertising agency clients, identified the need to develop two Nielsen/MRI fused data sources. This need arises from the distinct ways in which the various television planning, buying and selling constituencies approach their respective tasks, and in particular the sorts of analyses they conduct and the software analysis tools employed in these endeavors. Broadly, these purposes and their related analyses can be characterized in each of two ways: 1) the need to analyze large numbers of potential target audiences with respect to a relatively narrow set of television measures, and 2) the need to analyze extensive television viewing behavior with respect to a refined target. In each case the availability of extensive currency quality television metrics and consumer behavior and attitudinal measures is essential but for somewhat different reasons:

  1. The integration of NPM television measures (programs, networks and dayparts) into MRI for the purposes of target evaluation, development and selection against copious consumer behavior and attitudinal measures available in MRI.
  2. The integration MRI consumer, attitudinal, print and other media measures into NPM for the purpose of television analysis against these extensive non-demographic targets.

In short, the NPM into MRI fusion supports target analysis, selection and development against the extensive, currency quality television measures available from NPM, with the MRI into NPM fusion supporting complete television analysis against previously identified consumer-related targets integrated from MRI.

NPM into MRI Fusion

Broadly, the NPM into MRI fusion involves:

  1. The integration of approximately 13,000 individual television programs
  2. The integration of approximately 1,200 individual network specific dayparts
  3. Live Plus 7-Day Viewing Minutes are the television metric integrated from NPM into MRI
  4. As MRI is an Adult (Age 18 and Older) survey only viewing data from the Age 18 and Older portion of the NPM are integrated with MRI respondents having a television in their households
  5. The integration occurs monthly using the most recent calendar month of NPM television viewing data
  6. NPM panelists are projection weighted using their average daily weight for the month, thus accounting for their proportionality in the population and the number of days for which they are active in the panel throughout the month.

Insofar as MRI’s survey includes a relatively extensive battery of television related metrics, in addition to demographics and other media measures, there is an extensive range of common linking variables available to support the fusion. These linking variables can be classified into five categories:

  1. Explicit Linking Variables – metrics on which MRI respondents and NPM panelists must match
    1. Sex (Male/Female)

b. Age (18-34, 35-54, 55 and Older)

  1. Race (African-American/non-African-American)
  2. Spanish/English Language Use
  3. Cable/non-Cable
  4. Socio-Economic Status (A factor analyzed measure derived from Education and Household Income and ranked into three groups)
  5. Presence of Young Children
  6. Additional Personal and Household Demography
    1. Personal
      1. Education
      2. Employment Status
      3. Principal Shopper
    2. Household
      1. Household Income
      2. Presence of Children by Age
      3. Own a Cat or Dog
      4. Own a PC and Access Internet
      5. Geography
  7. Household Television Characteristics
    1. Number of Television Sets
    2. Cable
      1. Digital
      2. Pay TV
    3. DVR and DVD
  8. Approximately 50 Internet Sites
  9. Approximately 200 Television Programs across multiple Networks and Dayparts

Given the large number of commonly available linking variables (approximately 300) a primary challenge of the fusion process is the development of a strategy to prioritize the matching. The necessity of developing such a strategy is predicated on the assumptions that:

  1. Not all records from each dataset can be perfectly matched
  2. With respect to television viewing behaviors and their relationships to consumer (and other) measures, that in cases where perfect matches are not possible that it is more important to match on some measures than on others.

Given the large number of common linking measures the obviousness of the first assumption is overwhelming (as is common with most fusions) and hence, the import of addressing the second assumption.

The strategy adopted in the NPM to MRI fusion to develop a matching precedence is twofold. The first aspect is the adoption of a set of explicit matching criteria (e.g. Sex, Age, etc.) on which MRI respondents and NPM panelists must match. This set of explicit matching criteria, while grounded somewhat in the conventions of media targeting (e.g. Women 18-54, etc.), directly and/or indirectly relates to how television buyers and sellers broadly understand television viewing and consumer behaviors and attitudes – the very sorts of relationships these fusion are designed to make more explicit and complete. (While these explicit matching measures are enforced throughout the fusion process, at the end there is inevitably a small percentage of mismatch – generally less than <5% – but matching on Sex is always strictly enforced. The minimal level of mismatch is largely due to the similar population projection weighting targets employed for NPM and MRI and the relatively consistent levels of television and cable penetration between the two services.) Hence, by insuring MRI to NPM matching on these broad demographic and television (cable/non-cable) measures, measures which are understood to be important with respect to the relevant behaviors, the general character of the television to consumer relationships is established.

With that said, if the extent of the television to consumer relationships was completely explained through the explicit control measures then there would be little reason to undertake the fusion. Ultimately however the explanatory power of these explicit control measures is incomplete – the nuances of viewing and consumer behaviors are not completely explained by broad demography, as valuable as it is. Hence, the utility of using 1) a more encompassing set of demographics and 2) an extensive repertoire of television measures, both generic and program specific, in the matching process to insure appropriate integration with respect to demographics, television viewing, consumer behavioral and attitudinal measures.

The importance of employing measures directly related to the phenomena begin fused, in this case television viewing, is emphasized by Suzanne Rassler in Statistical Matching:

“The importance of the common variables forming a link between the specific variables of the donor and recipient sample was emphasized for conducting an efficient statistical match…”[Rassler 48]

“Within media and consuming data the typical demographic and socioeconomic variables will surely not completely explain media exposure and consuming behavior. Variables already concerning media exposure and consuming behavior have to be asked as well. Thus, the common variables also have to contain variables concerning television and consuming behaviors…Roberts (1994) reports better results using such ‘specific common’ variables than the usual demographic and socioeconomic issues alone.” [Rassler 35]

The abundance of common matching measures available in NPM and MRI imposes a challenge on the fusion procedure itself. With a relatively limited set of common measures available matching between two (or more) data sources is relatively straightforward insofar as most records will find close matches among the limited set of matching criteria. In circumstances involving a superabundance of possible matching criteria, such as these NPM / MRI fusions, the challenge for the fusion procedure is to determine which among the plethora of possible matching measures are the most significant with respect to the behaviors being integrated. A similar challenge was posed by the J.D.Power & Associates / MRI (JDP/MRI) fusion reported on at the 2007 Worldwide Readership Research Symposium [Collins and Pingatore, 2007].

The strategy developed for JDP/MRI employed at its core a classification-tree wherein ownership of a particular automobile make/model was the dependent variable and the plethora of common matching variables were the independent ones. Employing the make/model specific classification tree JDPower and MRI respondents were 1) classified into the discriminating relatively discrete and homogeneous nodes of the tree and 2) matched within common nodes. This technique insured that the fusion process matched JDPower and MRI respondents on the most discriminating, with respect to automobile make/model buying decisions, common measures.

Adapted to the conditions of the NPM into MRI fusion the process proceeds as follows:

    1. Separate the MRI and NPM databases into subsets defined by the explicit linking variables (e.g. 18-23 years old, non-African-American, English Speaking Males in Households with Cable).
    2. Using the NPM data (the television ratings currency) develop a classification tree with Total Viewing Minutes as the dependent variable and the common linking variables (demographics, television, internet, etc.) as the independent ones. Total Viewing Minutes is used as the dependent variable insofar as it is a broad measure of television viewing behavior, naturally representing the magnitude of television consumption.
    3. Using the classification tree model, classify NPM panelists and MRI respondents into discrete homogeneous nodes of isomorphic classification trees, one for NPM and the other for MRI.
    4. Match NPM panelists with MRI respondents from corresponding nodes of the isomorphic trees, using split- weighting to preserve the incidence levels of the fused measures (e.g. television viewing behaviors from NPM with consumer measures from MRI).
    5. If there are any remaining NPM panelists and/or MRI respondents rebuild the classification tree model specifying a slightly smaller number of terminal nodes and repeat Steps #3 and #4.

In certain population subsets, as defined by the explicit linking variables, not all NPM panelists and MRI respondents are able to be matched (<5%), in which case these explicit constraints are loosened somewhat, with Sex matching always being maintained.

MRI into NPM Fusion

The fusion of MRI consumer, attitudinal, print and other media (e.g. radio) into NPM while broadly similar involves important difference with respect to processes and linking data. The MRI into NPM fusion involves largely the same set of common matching variables. One important difference through Q2 2009 is that television networks, dayparts and program genre are used for matching rather than individual television programs. (Beginning with Q3 2009, it is anticipated that individual television programs will be used in a fashion similar to the NPM into MRI fusion.)

It is with respect to the fusion processes that the differences between the MRI into NPM and the NPM into MRI fusions are most substantial. Many of the differences with respect to the fusion processes are driven by certain constraints imposed by the analysis of NPM data for television program ratings. While a complete discussion is beyond the scope of this work, generally NPM panelists have non-zero projection weights only for those days for which they are active in the panel, and their weights will be different day-to-day depending on the composition of the entire NPM panel on any particular day. Because of the dynamic quality of this projection weighting scheme, it is not feasible to implement a split-weighting fusion as is performed in the NPM into MRI fusion.

Alternatively the MRI into NPM fusion involves whole respondent (as opposed to split-weight) matching. The particular steps of the first stage of the MRI into NPM fusion matching process are virtually identical to those outlined above for NPM into MRI, with the exception that in Step #4 rather than employing split-weighting, the matching attempts to select MRI respondents

and NPM panelists having the most similar projection weights. The result of this control on projection weights in the matching process is that the incidence levels for the MRI measures fused into the NPM database are generally within 10% of their native MRI levels.

While the level discrepancies are relatively modest for the MRI measures when fused into NPM, a second phase is employed to bring them into (almost) perfect coincidence. This second phase proceeds as follows within each of six Age/Sex breaks (Male/Female separately by Age 18-34, 35-54 and 55 or Older):

  1. For each MRI measure develop a logistic regression model in the MRI dataset. The dependent variable is the individual MRI measure and the independent variables are the common linking variables (demographics, television, etc.).
  2. If the incidence level for the specific fused MRI variable is higher than the native level, use the logistic regression model to unassign this measure from the least likely NPM panelists who have received it via the first phase of the fusion.
  3. Conversely, if the incidence level for the specific fused MRI variable is lower than the native level, use the logistic regression model to assign this measure for the most likely NPM panelists who have not received it via the first phase of the fusion.

This second phase of the MRI into NPM fusion process strongly resembles various fusion-on-the-fly or just-in-time fusion processes as developed by Gilles Santini and Roland Soong among others. Advantages of using this two phase technique – whole record/respondent fusion (Phase #1) followed by a model-based item-specific adjustment (Phase #2) – include:

  1. The interrelationships among the MRI data items are largely retained by the use of whole-record fusion and projection weight controls embodied in Phase #1
  2. The native incidence levels for each of the individual MRI measures are retained within each of the size Age/Sex groups through Phase #2 adjustment
  3. Whatever Phase #2 item adjustment that occurs does so in a manner which accounts for the relationships between the individual items and the large pool of common linking / independent variables.

Integrating Internet Site Measures

Independent of the two MRI/NPM fusions MRI and Nielsen individually integrate Nielsen Online’s NetView internet site measures into MRI’s Survey of the American Consumer and NPM respectively. These fused products have been available for several years and while they involve somewhat different fusion techniques their ambitions are broadly consonant – integrate extensive and granular internet site information with their respective media and consumer measures to support cross-media (television/internet and print/internet) analysis. Consequently, with the development of the NPM/MRI fusions the capabilities exists to develop tripartite integrations involving print (and consumer measures) from MRI, television from NPM and internet from Nielsen Online.

Given the somewhat different techniques employed in these various fusions and the constraints imposed by the resulting databases, two different approaches have been adopted to develop the tripartite fusions.

The integration of the MRI/NetView fusion with the NPM into MRI fused database is relatively simple and straightforward – in short, a split-weighted match-merge. The NPM into MRI and the NetView into MRI fusions employ very similar fusion processes; each uses split-weighted matching utilizing an extensive variety of demographic and media-specific (television and internet respectively) measures. Each relevant MRI respondent is separately split-weight matched with one or more (zero in the case of no relevant television and/or internet behavior) NPM and NetView panelists. In short, each MRI respondent has complete NPM and NetView integrated data as a result of the separate MRI/NPM and MRI/NetView fusions, with the only difference being the sets of projection weights each MRI respondent has as a result of these separate split-weight fusions. Consequently, the final MRI/NPM/NetView database is simply the result of a split-weight match/merge of the MRI respondents across the separate MRI/NPM and MRI/NetView fusions.

Because the MRI/NPM and MRI/NetView fusions separately employ extensive, demographic and media-specific matching variables the resulting tripartite database manifests relatively robust and genuine print/television/internet/consumer relationships. Moreover, because split-weighting is employed throughout – both in the separate fusions and the final match/merge – the incidence levels for the various metrics intrinsic to each of the databases are retained.

While the details of the NPM/NetView fusion are somewhat different than those of the MRI/NetView one, the development of the tripartite NPM/MRI/NetView database is relatively comparable, i.e. the MRI and NetView data is naturally attached to the NPM panelists.

Briefly, the NPM/NetView fusion employs a three-stage process:

    1. Fuse Home and Work Internet usage panels
    2. Impute TV viewing probabilities onto this fused Internet database, using a single source database – Nielsen’s @Plan database that measures claimed product and media usage (36,000 interviews annually)

has been used to date, and we are about to transition to Nielsen’s Convergence Panel (3,000 persons) that provides metered TV and Internet use.

    1. Fuse the Home/Work/TV probability database onto the National People Meter panel, using demographics, internet access and TV viewing

Comparison of the fused data with the single source Nielsen Convergence Panel demonstrates that the fused results are very reliable: comparing duplications between TV networks and their related websites (where we might expect the fusion model to be most challenged) reveals just 2% regression to mean – a very good performance.

Because both the MRI into NPM and NetView into NPM fusions result in the relevant data being fused to the NPM panel, the NPM/NetView/MRI fusion naturally arises – the relevant NPM panelists separately are assigned MRI and NetView data.

However, initial evaluation demonstrated that certain expected relationships, primarily at the individual media brand level, were not evident in the resulting NPM/NetView/MRI database. For example, the ESPN brand has significant presences in the television, internet and magazine channels, but the expected strong relationship among the three was not initially evident. In the NPM/NetView/MRI fusion this weakness was most pronounced with respect to ESPN’s magazine and internet site audience duplication. The root of this problem was founded in the fact that the initial fusion of MRI data into NPM had no internet-related matching control. Thus, the relatively strong relationship within the MRI database between reading ESPN Magazine and visiting ESPN’s internet sites was not articulated as a matching component of the MRI into NPM fusion. When additional internet controls were added to this fusion the expectedly strong relationships emerged.

The table below compares results from four sources: MRI’s standalone Spring 2008 data; the MRI/NetView fusion for October 2008; the MRI/NPM fusion for October 2008 and finally, the MRI/NPM fusion from February 2009. The MRI/NPM fusion data both use ESPN.Com data from the linked NPM/NetView fusion data but the February data also use these data in the NPM/MRI fusion process. The highlighted data show a clear difference in the online/print duplication, with the February data giving credible and usable duplication information. By using the internet data in the fusion, we have extended the analytic capability of the fusion from TV/print to TV/Print/Online.

ESPN Summary

Adults 18+ MRI/NPM MRI/NPM

MRI/NetView (October 2008: (February 2009:

MRI Survey Data Fusion (October no internet with internet (Spring 2008) 2008) hooks) hooks)

% % % %

Penetrations (%) ESPN Magazine

Penetrations (%) ESPN Online

Penetrations (%) Magazine and Online

6.4 6.6 6.5 6.5

4.6 8.7 10.1 7.4

1.6 1.5 0.9 1.3
Random Duplication Magazine and Online

Diff from Random Magazine and Online Index on Random Magazine and Online

0.3 0.6 0.7 0.5
1.3 0.9

534 261

0.з 0.9

1з8 279

The comparable issue did not arise in the MRI/NPM/NetView fusion as the separate NPM/MRI and NetView/MRI fusions 1) made extensive use of media brand specific measures in the fusion processes and 2) the relationships among the three relevant media channels for the ESPN brand were nicely pronounced in the standard MRI database into which the television and internet metrics were integrated.

Validation Work

The fusion of such a wide-ranging media and product data database with the TV currency data offers extensive validation challenges, but also extensive validation opportunities, given that the MRI database contains a wide variety of TV data and is therefore in itself a single source TV/media/product database.

Our validation program covered the following:

  1. Print currency variation due to TV panel reporting discontinuity
  2. Fused MRI profile analyses
  3. TV/media/product interactions

The methods and findings are discussed below; the overall conclusion is that the integrated NPM/MRI Fusion 1 database is reliable and robust.

Print Currency Variation

The NPM is a daily reporting database. The homes and persons sample changes daily, as new homes begin reporting, old homes

leave and existing homes change reporting status based on reporting eligibility criteria. Each panel individual therefore changes weights each day, and because the weighting does not (indeed, cannot) control for multiple readership and product variables simultaneously, the panel’s representation of these variables changes on a daily basis. On average, the figures match the MRI data (restricted to TV households) over a calendar quarter since the fusion uses respondents’ average daily weights across the quarter, but a key issue for the utility of the data is the extent of the variation of results on a daily and monthly basis.

In fact the data are acceptably stable, even at the daily level. Quarterly results are shown below for eight titles, ranging from very large (People, with a 19.3 AIR) to very niche (Barron’s, with AIR of 0.5). We looked at the range of results on a daily and monthly basis that were delivered by the NPM weights: inevitably the variability of penetrations is greater on a daily basis than monthly and percentage wise the variability is greater for lower AIR, but the bottom line is that the AIR estimates are preserved to three decimal places for each publication, except People which differs by just 0.001% on 1 day in the three month period.

Comparison of MRI and NPM/MRI Fusion Readership Estimates – Q4 2008

Adults 18+ (MRI 224.48M, NPM UE 22з.98M) Magazine
  People Woman’s

Day

Ebony Men’s

Fitness

OК! Weekly Official Хbox

Mag

Ducks

Unlimited

Barron’s
MRI (000) 42,836 22,110 11,685 8,026 6,194 4,969 2,820 1,207
AIR (%) 19.3 10.0 5.3 3.6 2.8 2.2 1.3 0.5
NPM/MRI Fusion Ave 000’s 42,077 21,758 11,460 7,837 6,050 4,953 2,806 1,194
Index 98.2 98.4 98.1 97.6 97.7 99.7 99.5 98.9
Fusion Results Variation  
Daily Max across 3 months 42702 22084 11759 8157 6204 5234 2979 1290
Daily Min across 3 months 41497 21286 10990 7418 5861 4760 2635 1054
Daily Range (Max – Min) 1205 798 769 739 343 475 344 236
Daily Range (% of 000s) 3 4 7 9 6 10 12 20
Month 1 Average 42197 21795 11577 7631 6032 5055 2869 1224
Month 2 Average 41920 21739 11535 7922 6096 4929 2816 1173
Month 3 Average 42149 21744 11244 7943 6013 4877 2728 1189
Monthly Range (Max – Min) 277 56 333 312 82 178 141 51
Monthly Range (%of 000s) 1 0 3 4 1 4 5 4
Daily Range % of 18+ (AIR Points) 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Monthly Range % of 18+ (AIR Points) 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

.

Fused MRI Profile and Penetration Analysis

This analysis assessed how well MRI profiles and penetrations were preserved in the fusion process.

The penetration analysis shows good correspondence between fused and standalone MRI data: over a quarter the penetrations, by design, are identical; over a month the average profile differs by less than 0.2% points and 2% overall.

The profile analysis works as follows: if, for example, a product skews upscale and towards younger, larger homes (e.g. a luxury SUV) in the MRI database, we would hope and expect to see the same profile in the fused database when assessed against the NPM sample characteristics. Not only is this an important consideration for the credibility of any fusion analysis, it also gets to the heart of the fusion principle – the assumption of conditional independence that implies that the fusion linking variables are perfect predictors of the interactions between TV viewing and print and product use.

Comparing MRI and NPM/MRI for each magazine and product across sample characteristics generates a very large number of profile points; to summarize these data we make use of correlation and regression to mean statistics.

    1. Profile skews are calculated as per this example: the fusion has reproduced the skew to older persons of users of a particular high end credit card.

High-end Credit Card Users Profile Differences

  MRI NPM
Age 18 24 -7.3 -4.5
Age 25 34 -1.0 -3.7
Age 35 44 0.6 1.9
Age 45 54 1.3 1.1
Age 55 64 5.0 3.7
Age 65 1.3 1.5

In this case the correlation of the differences is 0.89. For our data fusion evaluation, the correlation of the differences across all the linking variables is calculated. In addition the absolute average skew is calculated, as low levels of skew can lead to low but non-problematic correlation figures.

    1. The absolute values of differences are used to assess regression to the mean across the linking variables: in the example above, MRI shows an absolute skew difference of 16.6, the fusion a skew of 16.5. This is a reduction of 0.1, which is 0.4% of 16.6, meaning that the profile skew shows 0.4% regression to the mean.

Overall results calculated across 7132 fields in the first fusion produced (Q1 2008) were 0.76 for correlation and 22% regression to mean. The minimum correlation across 468 groups was 0.33, for hybrid vehicle ownership – explained by the very low penetration (0.84%).

These results are encouraging given the large number of linking variables that feature in the fusion.

TV/Media/Product Interactions

The acid test of the fusion is the extent to which interactions between TV and other media and products are accurately captured. The best assessment of this is given by comparison of MRI-based results with the fusion results. When doing this comparison, we need to remember that the TV results from MRI are based on claimed behavior rather than the metered data in the fusion database. To make direct comparison more feasible, we edit the respondent-level NPM viewing minutes for networks and programs so that the metered penetration data tie in with the MRI penetration. This is achieved by classifying respondents with the lowest number of minutes viewing as non-viewers (e.g. if a Network has 20% claimed viewing on MRI, we classify the heaviest twenty percentage points of NPM network viewers as MRI-comparable viewers and the rest as non-viewers).

For this analysis, we compared MRI standalone and fused interactions between 58 networks and 11,291 MRI print and product categories – two sets of 654,878 values, over 1.3M in total.

There were a number of tests applied to the data:

  1. Correlation and Regression to Mean statistics.

Overall correlation of the differences from random duplication shown by the two data sets (MRI and fused) was 0.7 and regression to mean was 29%. Given the definitional differences between fused and actual viewing this is probably as good as we might expect.

  1. Statistical differences

This statistical approach has been developed by Nielsen and applied for various data fusion validation tasks in the last few years, where comparable single source data are available. For each pair of real and fused values we calculate Z-scores for the difference, then assess the spread of Z-scores and compare this against the distribution that we would expect given two independent samples – i.e., 1% showing significant differences at the 99% level of confidence, 5% at the 95% level of confidence and so on. If we see more than this, we can reasonably assume that the extra variability is due to fusion bias.

In fact the distribution is as follows:

1 Standard Error: 84% of values (expected = 68%)

1.96 Standard Errors: 96% of values (expected = 95%)

2.58 Standard Errors: 99% of values (expected = 99%)

So most of the values are statistically closer together than would be expected from two independent samples, and there are no more outliers than one would expect. This is very encouraging and suggests that the fusion has been successful.

Going further, we can look at the data by MRI product category – there are 837 in total. The majority of categories are very reliable – 692 of the 837 (82%) have 99% of values within 2.58 standard errors. 829 of the 837 (99%) have 90% of values within 2.58 standard errors. Categories where the fusion performs slightly less well are satellite ownership (though most of these fields are native on the NPM anyway so these fusion fields are less important than others), some financial fields including insurance, and female products including cosmetics.

  1. Index consistency

Another useful assessment is given by assessing the two samples’ consistency in terms of indices – if a particular magazine indexes high for a particular network’s viewers in MRI, we should hope to see a similar pattern in the fused data.

We classified the interaction data into four groups (index of 120+, 100-120, 80-100, less than 80) and assessed the consistency of these classifications.

The table below summarizes the results for all interactions, showing good correspondence between the standalone MRI and fused data.

Index Consistency MRI
% Total >120 100-120 80-100 <80
Fusion Total 100 28 22 24 26
>120 29 18 5 3 2
100-120 28 7 10 7 3
80-100 26 2 5 11 8
<80 16 1 1 3 11

Based on the Z-score analysis, the majority of the off-diagonal results are not significantly different, being smaller penetration products and/or networks.

These validation data give us an indication of the success of the process, and also serve as benchmark data for subsequent waves while giving us a means of assessing the value of refinements to the methodology.

Conclusion

The integration of TV and Print currency data, along with the detailed product information contained within the MRI database, calls for many issues to be addressed simultaneously: preservation of currency levels, ease of reporting and faithful reflection of product and print interactions with TV are the three key areas. The wealth of common variables available to us means that we have a great opportunity, albeit a fairly complicated statistical task, to use this information in as effective a manner as possible. Equally, this wealth of data offers us extensive validation capabilities, and the work we have done to date suggests that the fusion databases are accurate and actionable.

Bibliography

Rassler, Susanne (2002) Statistical Matching: A Frequentist Theory, Practical Applications, and Alternative Bayesian Approaches. Springer-Verlag, New York.

Collins, James and Gina Pingitore, “Dynamic Segmentation Fusion”, Worldwide Readership Research Symposium, Vienna, 2007.