W2VPCA: A Machine Learning Method for Measuring Attitudes With Natural Language

Company strategy influences many decisions in freight transportation. Behavioral models of company decision-making therefore could benefit from including strategy variables. However, strategy is difficult to observe and quantify. Attitudinal surveys of company executives can be used to collect measurements of latent strategy to use in quantitative models. However, surveys are costly and burdensome. Text mining methods to collect measurements overcome these issues somewhat, but typically require manual intervention and ignore the context of words, which can be problematic. This study introduces a new machine learning method to generate strategy measurement data from existing big text data. The new method, called W2VPCA, combines Natural Language Processing and Principal Components Analysis. W2VPCA produces measurement data that serve as quantitative indicators of latent strategy in behavioral models. W2VPCA is unsupervised, data-driven, and uses information on word context. We apply W2VPCA to generate measurements of latent strategies using readily available, large-scale text data: annual company reports. The empirical measurements are used successfully to associate two latent strategies, one focusing on distribution and the other on products, with truck fleet and distribution center outsourcing decisions. The main empirical outcome is that the W2VPCA measurements outperform Bag-of-Words measurements in a psychometric analysis of latent firm strategies. While this study focuses on freight behavioral models, W2VPCA may also have applications in behavioral modeling in other domains.

like revenue, which researchers can easily obtain and analyze (e.g., from Fortune magazine [2]), there is no database of firm strategy that researchers can simply download.Due to this data gap, freight behavioral models to date rarely include strategy variables, and among forecasting platforms that simulate the freight transportation system (e.g., [3], [4]), none are sensitive to strategy.
This gap is unfortunate since strategy is a key determinant of firm behavior [5] and has important implications for freight transportation.Although freight behavioral models lack strategy variables, studies in other areas show that strategy affects firm decision-making regarding assets (e.g., fleet ownership), sourcing or supplier selection, and more [6].For example, production strategies determine the supply of conventional fuel and alternative powertrain vehicles in the marketplace [7].Retailer strategies inform when to deliver from brick-and-mortar locations versus from upstream distribution facilities [8].These applications show that strategy impacts transportation decisions, such as vehicle powertrain choice and production and attraction patterns.This gap in data, therefore, is an issue for freight transportation behavioral modeling, and it limits our ability to model the impact of freight transportation on the environment, congestion, profitability, and other outcomes [6].
Strategy data, however, are hard to obtain because companies often treat this information as proprietary.Even when companies are more forthcoming, strategy is articulated verbally, not quantitatively.To address these challenges, researchers often collect quantitative measurements of the latent (unobserved) strategies or other latent constructs using surveys of individuals, such as company executives (e.g., in Hambrick [9] and Golob and Regan [10]).One challenge with surveys is question design: question wording relies mainly on judgment of the survey developers, or "conventional wisdom" [11].Surveys also are problematic since individuals are often busy and reluctant to participate [9].Obtaining sufficient sample sizes is difficult.Consequently, addressing the gap in strategy data with surveys has significant challenges.Alternatively, existing text sources such as annual reports contain a wealth of strategy information [12].In recent years, researchers developed machine learning (ML) methods to derive strategy measurement data from such sources.However, existing ML approaches for extracting strategy measurements are lacking in two ways.First, many existing methods rely on the Bag-of-Words (BOW) assumption: that the presence of a word or frequency of its appearance indicates its relevance [13].In doing so, BOW ignores the contexts of the words that are being studied.This is problematic in latent strategy analysis since words can imply different things  [14].For instance, a firm that focuses strategically on product reliability may devote more resources to manufacturing quality while outsourcing transportation functions, while a firm that focuses on reliable delivery might own a truck fleet so that it can ensure customers receive their goods on time.As such, companies have different goals, and they have different strategies to support their goals.Like Kabanoff and Brown [6], we hypothesize that individual organizations use certain words differently -for example, in different contexts -depending on what matters to that organization.Second, to account for context, many ML efforts including Kabanoff and Brown use supervision, or human intervention, to transform the ML output into useable results.In addition to being labor-intensive [15], supervised text analysis typically also involves human judgment, which raises more issues since judgment can be flawed or inconsistent [14].
To summarize, several methods that generate measurements of latent strategies are available (Table I).Surveys elicit measurements about strategies that are of particular interest to the analyst.However, surveys are costly and burdensome.Surveys also rely on human judgement for designing questions with an optimal selection of words.Applying BOW to existing text is less resource intensive, but requires the BOW assumption, which can be problematic.Supervised BOW methods mitigate this issue, but like surveys are resource intensive.
Our objective is to develop a method for generating strategy measurement data that leverages readily available text, is unsupervised, and is sensitive to word context.In doing so, the new method also aims to provide improved measurement data to freight behavioral models compared to a BOW approach.We seek empirical evidence to confirm whether strategy indeed informs the usage of various words by individual companies.
The main contribution of this paper is an innovative, new ML approach to generate strategy measurement data from existing text.Our method involves modifying the Natural Language Processing (NLP) word2vec algorithm [16] then applying Principal Components Analysis (PCA) (Pearson [17] and Hotelling [18]).The new method, W2VPCA, produces quantitative data that can serve as latent variable measurements in behavioral models.The method is unsupervised, which makes it relatively inexpensive and data-driven, thus avoiding many issues related to human judgment.Moreover, in contrast with BOW approaches, context is a key input in our approach.
The second contribution of our work is empirical.We demonstrate our method and develop insights into firm strategies and their real-world impacts on transportation decisions (fleet ownership and distribution center (DC) con-trol) using publicly available annual reports from 245 Fortune 500 companies.We find that companies use many words (such as "reliability") differently depending on whether the company outsources these two freight transportation functions.This finding suggests that W2VPCA measurements reflect latent company strategies that are related to transportation.We then apply Confirmatory Factor Analysis (CFA) to the measurements and find evidence of two underlying strategies, which we name "Product Focus" and "Delivery Focus".To the authors' knowledge, ours is the first application to relate firm strategies to strategic transportation decisions using measurement data generated by unsupervised NLP-based processes.
Finally, empirical results from the new method perform better than BOW results in detecting latent strategies.Specifically, the W2VPCA-based CFA model fit is 0.917, which exceeds the recommended minimal threshold (0.9).The comparable BOW-based CFA is rejected with a fit of 0.849.As such, a key contribution is that the new method can provide new insights into transportation decisions that were not previously possible using ML approaches.However, this finding is based only on the empirical work conducted in this study.Additional work is needed to demonstrate this finding conclusively.
The remainder of this paper is structured as follows.Section II reviews measurement data, their use in behavioral models, and existing methods for collecting measurements.Section III presents the proposed new approach to generating measurement data.Section IV describes the data that we analyze with the new method.Section V presents the empirical measurement results from W2VPCA and a BOW method.Section VI investigates differences in measurements and company strategies depending on whether companies outsource transportation functions.Section VII summarizes the work, elaborates on its limitations, and identifies extensions.

II. BACKGROUND
This section discusses existing approaches to collect data that measure latent constructs, and briefly covers their use in behavioral models.The traditional method is to use attitudinal surveys.More recently, text mining and NLP have been applied.This section presents the main approaches and outlines their advantages and disadvantages.

A. The Role of Measurement Data in Latent Variable Analysis
As Section I discusses, firm strategy can be difficult to observe and quantify.Earlier methodological developments overcome this by treating strategy as a latent (unseen) construct that can be measured quantitatively using one or more manifest (observed) variables.The observable variables (measurements) are treated as quantitative metrics of the latent construct.Behavioral analysts use the resulting measurements in models with psychometric inputs, such as factor analysis or structural equation modeling [19] or hybrid choice models [20] to better understand decision-making in transportation or other areas.
In passenger transportation modeling, which has a rich history of analyzing latent constructs, each latent construct is referred to as an attitudinal variable or perception of the individual traveler that affects her/his decisions [20].When the organization is treated as a decision-making unit, then latent variables are a natural device for representing organizational strategy as Danneels [21] and Ben-Akiva [22] discuss.We adopt this perspective in our empirical analysis.Similarly, we treat company strategy as the business analogue of a personal attitude, and we use the same history of psychometric modeling to inform our approach.

B. Attitudinal Surveys
Attitudinal surveys collect measurement data by posing questions about perceptions or strategies to the respondent (for example, in Golob and Regan [10]).Respondents answer each question based on how much they agree with the questione.g., using a Likert scale.These responses are the measurements.Endeavors including Bollen [19] and Ben-Akiva et al. [20] provide theoretical support to this approach.
Although attitudinal surveys have a rich theoretical and empirical history, their use has several drawbacks (as noted in [20], [23], [24]).Attitudinal surveys can be costly to administer and suffer from typical survey issues related to respondent burden, sample size and self-selection.Attitudinal questions and answers are highly subjective and are limited to the content in the questionnaire.Measurements are constrained by pre-imposed scales and bounds (e.g., 1 to 5 may represent "Agree" to "Disagree").This can create issues in comparing measurements between individuals or organizations, or between questions.Pre-imposed integer scales and bounds can be analytically cumbersome and are a form of data censoring.Lastly, attitudinal surveys of executives are constrained to one individual's perceptions, which may not capture the strategy of the organization more broadly.Shaw and Mokhtarian [25] overcome these drawbacks by developing an innovative method to transfer attitudinal data from one survey dataset to another, but it is limited to the attitudes covered in the first survey and thus may not cover all purposes.Consequently, the ability to collect data more easily will make strategy analysis more accessible in freight behavioral models.

C. Text Mining and Natural Language Processing
As Forrester [12] suggests, surveys are not the only source of strategy data: natural language text sources-such as company reports and letters to shareholders-also contain strategy information.Such texts reflect the collective view of the company's executive management team and so likely represent company strategy better than survey data from a single executive.These sources already exist and are readily available, which is a major advantage over attitudinal surveys.But unlike attitudinal surveys, which may contain ten or twenty questions with pre-defined answer selections, natural language text consists of open-ended writing that may be thousands of pages long and cannot be analyzed in its raw form.
Computational power and analytic methods now make it possible to process and analyze large text sources.Reviews (e.g., in Pollach [26] and Tausczik and Pennebaker [14]) describe how advancements in ML are used to study strategy and other latent constructs.In particular, text mining extracts information, such as word counts, from unstructured text [27] while NLP is similar but also uses linguistic analysis to achieve "human-like language processing for a range of particular tasks or applications" [28].
To date, most methods to generate measurements from natural language text use an unsupervised BOW approach.Ramirez-Esparza et al. [15] applies factor analysis to study depression based on word counts in Internet posts.In a freight transportation example, company strategy is analyzed by applying structural equation modeling to word counts from annual company reports [29].However, psychological experts consider unsupervised BOW to be "quite crude" because it ignores word context [14].For words with multiple possible meanings, the true meaning of the word is not known without examining its context.For example, in the Linguistic Inquiry and Word Count (LIWC) engine [14], the word "mad" is coded as an indicator of anger, although this word can also denote insanity.Consequently, the main disadvantage of unsupervised BOW for strategy measurement is that a simple count of the number of occurrences of a word in a document may be misleading as an indicator of strategy.
To overcome this drawback, the Computer-Aided Text Analysis (CATA) [26] application supplements BOW with supervision consisting of manual assessments of word contexts.CATA is used successfully in the business domain to measure latent firm constructs (e.g., in McKenny et al. [30] and Pandey and Pandey [31]) based on company documents such as annual reports.CATA is used to uncover latent strategies including Product focus, Customer service, and Research and development [6].However, the required manual intervention relies on human judgment and subject matter expertise, which can be inconsistent and can require multiple reviews to improve consistency [14].Second, manual intervention requires labor and time, which are limited resources in most studies.
Other text mining and NLP methods, including clustering and text classification are used to study firms [32].However, the output from these methods cannot be used as latent strategy measurements.More recently, Baburajan et al. [24] applies the NLP topic modeling method [33] to infer attitudes (in the form of topics) from open-ended survey responses.The study finds that the inferred topics correlate with Likert scale responses, and that topic models found topics that weren't deliberately included in the questionnaire.The implication for our work is that NLP methods are a promising potential substitute for

D. Word Embeddings and PCA
The NLP basis of this study relies on methods that quantify symbolic data (e.g., words) by representing each symbol as a distribution.Each feature of the distribution represents a unique concept (Barlow [34] and Hinton [35])."Word embeddings" refers to the treatment of words as vectors that have a real value in each dimension.Until recently, for a set of words (vocabulary) V , this was achieved using a |V | x1 "one-hot" vector to represent each unique word v i ∈V .The one-hot vector for v i has 1 in the i th place and zeroes everywhere else.Analysis with one-hot vectors is computationally challenging for large vocabularies, and there is no easy way to acknowledge similarities between similar words.Recent NLP studies by Bengio et al. [36] and Mikolov et al. [16] address these drawbacks by representing words in a much denser N-dimensional vector space as note, where N≪ |V | and N is specified by the user (e.g., [37], [38]).Fig. 1 illustrates how four words may be represented as a mix of "alive" and "mobile" concepts as vectors in a two-dimensional space instead of their four-dimensional one-hot forms: (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0) and (0, 0, 0, 1).
The word2vec algorithm [16] is founded on the idea that similar words are used in similar contexts.It produces quantitative representations of each word based on how it is used in context.We select word2vec over the similar GloVe method [39] due to convenience of its Python libraries.Conveniently, word embeddings can be manipulated algebraically.In a famous example, the vector addition of "King -Man + Woman" generates a vector that is very close to the "Queen" vector [40].Applications of word embedding models include machine translation and question-and-answer tools [41].
PCA is commonly used to visualize individual word vectors and their algebraic relationships.For example, twodimensional PCA is used to show how word2vec uncovers relationships between countries and their capital cities as Mikolov et al. illustrates [42].PCA is also used for strategy analysis [6], but that application used PCA to identify strategies rather than to generate measurement data for subsequent input to behavioral models.Our approach, in contrast, uses PCA as an integral part of the W2VPCA method.To our knowledge, no earlier studies have combined word2vec and PCA to generate latent variable measurement data for psychometric analysis.
In this paper, we exploit NLP and PCA to address the firm strategy data gap.We develop a method, W2VPCA, that links word2vec and PCA to produce context-based strategy measurements.As such, W2VPCA generates measurements that can be used in models that use psychometric inputs, such as factor analysis and Hybrid Choice models.The method is unsupervised and can be applied to existing, open-ended text sources such as company reports.
III. APPROACH Fig. 2 illustrates our approach to generating strategy measurement data.First, we preprocess the input data by tagging each keyword with a company-specific tag (for a pre-selected set of keywords that the analyst chooses).Second, we apply word2vec to this preprocessed input data.Using the modified input data, word2vec now produces word embeddings that are specific to each company for the selected keywords.In other words, the preprocessing step makes the word2vec algorithm sensitive to differences in word use by individual companies.Then, for each keyword, we use PCA to find the first principal component of the keyword vectors, then project each vector onto this component.This transforms the vector-valued measurements into real numbers that can be input to psychometric models.The resulting keyword measurements are specific to each company.

A. The word2vec Algorithm
The word2vec algorithm of Mikolov et al. [16] produces word embeddings (vectors) by analyzing the context of each "target" word.The context consists of words that immediately precede and follow the target word within the same sentence.The analyst specifies how many context words to include (e.g., up to three words on each side of the target word).For simplicity, we first focus on the case where each target word has only one context word (e.g., for the sentence Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply."Children play.",play is context word for the target word children and vice versa).Our mathematical description of word2vec follows that in Rong [46], adopting the Continuous Bag of Words (CBOW) solution approach.First, we denote V as the vocabulary or set of unique words, v l as a word in V , |V | (indices i, j, l) as the vocabulary size, C (indexed by c) as the set of context words around the target word, N (index n) as the dense vector space dimension that the analyst selects, and D (index d) as the set of documents to be analyzed.Since there is one document per company, d also represents the company.For each target word, let each c ∈ {1, 2, . . ., C} correspond to unique positions in |V | (e.g., the first context word v c , c = 1, is equivalent to some v i at the i th position of |V | without loss of generality).The following vectors and matrices, with their dimensions noted in parentheses, are defined: The objective of word2vec is to correctly predict an unknown target word given a set of context words.This goal helps the algorithm to learn, or infer, the values that constitute each word vector.The problem is set up as a two-layer neural network [46]).Fig. 3 illustrates the neural network for one input word v i and one output word v j in word2vec.For our two-word example, the input x has a 1 at position i of the input node set and 0 elsewhere.Each neuron or dimension n in the hidden layer represents one symbolic concept.W X transforms each input vector x into h (the "hidden layer" in Fig. 3) as follows: Since x has 1 in the i th row and 0 elsewhere, h is equivalent to (w X i ) ′ , the transpose of the i th row of W X .Consequently, w X i can be used as the dense representation of word v i .As the following steps illustrate, a second weight matrix, W Y , transforms h into y, which is lastly compared to the one-hot vector of the observed target word v j .
The values of W X , W Y and h (typically initialized with random values) are estimated as follows.Each element of y is computed as: where u j is referred to as the score and is computed as: The expression with exponentiation in Equation 2 is called the softmax function and is a log-linear classification model.It produces the posterior, multinomial distribution of word predictions.The rest of the word2vec algorithm involves maximizing the probability that the predicted word is the actual observed target word conditional on observing context words v 1 , v 2 , . . ., v c , . . ., v C .Mathematically, this is achieved through minimizing the loss function E: At its minimum, this expression equals zero when the probability is one.We train the model using Gensim's default CBOW solution method with negative sampling, which computes a loss function for the correct word and a sample of other words.When only one context word exists, then x = x c .When multiple context words are input, CBOW computes x as: Although x now consists of fractions and zeroes that add up to one, the problem is essentially the same.The typical solution method at this point is Stochastic Gradient Descent (SGD) with modifications to expedite the process.The estimated output is first compared to the observed one-hot target word vector, then the elements of W X and W Y are updated using SGD to better match the observed data.For the sake of brevity, we omit details of SGD and refer the reader to Rong [46].Word2vec iterates through the entire set of target word positions in the document (or a concatenated set of documents) one time per solution iteration to learn W X and W Y .For any word that appears more than once, the algorithm updates the weight matrices each time the word is encountered.After word2vec processes all the input text data, the learned representation of each word has converged to a single vector, represented by a row in W X .It follows that the vector for a single keyword represents the average use of the keyword across all input documents or companies.Therefore, this vector cannot be used to study latent preferences of individual companies.

B. W2VPCA
We now present the details of W2VPCA.Our extension to the original word2vec is discussed and we show how the outputs integrate with PCA.Algorithm 1 presents pseudocode for W2VPCA.W2VPCA (lines 3 to 29) is run one time for each keyword v k in the set of keywords K .All other words, including the remaining keywords, are not modified but are still included in the analysis.This makes use of all available Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
k narmalized to have zero mean for each column 26: calculate A, the covariance matrix of W X N k 27: apply PCA to solve λe = Ae, finding λ and e 28: e (1)  ← e (0) that has the greatest value of λ i 29: calculate the set of stuategy measurements y * k,d (Equation 7) 30: end for information and allows all other words to anchor the keyword of interest in vector space.
For each keyword v k , word2vec estimates the average use the word across all companies, generating a single row vector w X k in W X .In contrast, in W2VPCA, the goal is to produce a company-specific vector w X k,d for each keyword v k , for a total of D vectors.We accomplish this through the following steps.For keyword v k , we form a new, unique v k d for each document (company) d and replace v k with v k d everywhere in d.It follows that the context words around v k d , which are input to W2VPCA, are now document specific.In other words, in word2vec the final estimate of w X k is based on context words that surround v k from all the documents.In contrast, when W2VPCA estimates w X k,d , it uses only the context words that company d uses: e.g., The full word2vec algorithm is then run using the modified document set D k , which now has vocabulary We denote the new weight matrices W X V k and W Y V k , using subscript V k to emphasize that the new weight matrices are estimated using V k .
As the vocabulary is extended from V to V k , the various indices, vector sizes and matrix sizes that are originally based on it becomes: It follows that for keyword v k , W2VPCA produces , where w X k,d is the row vector that is associated with keyword v k and document d.In other words, row vectors w X k,1 , w X k,2 , . . .,w X k,D now represent the contexts of the keyword for each company d = 1, 2, . . ., D, respectively.The rest of W2VPCA focuses on the vectors w X k,d , that is, the |D| ×N subset of W X , denoted W X k , that contains the company-specific vectors for keyword v k .
After the word2vec stage completes, we transform the dense representation of each keyword from its company-specific, vector-valued quantity w X k,d to a real-valued measurement.Using inspiration from earlier word2vec applications that used PCA for data visualization, we apply PCA to the |D| companyspecific vectors for each keyword.However, unlike earlier efforts, our goal is not simply visualization.Instead, we aim to obtain real-valued strategy measurements.In keeping with Fig. 2, this involves finding the main way that keyword v k is used among companies, then measuring the difference in its use among companies.We achieve the former by finding the direction of greatest variance in W X k .Recall that each dimension n ∈ N represents a concept.It is not likely that a given dimension n captures the direction of greatest variance in W X k .Therefore, we apply the classic PCA method [47], which allows us to find the direction of greatest variance (the first principal component of the |D| vectors) and to measure differences among individual companies along this direction.
The PCA stage of W2VPCA uses the following computations.The input, W X k , is the |D| ×N matrix of company-specific word vectors (w X k,1 , w X k,2 , . . .,w X k,D ) for keyword v k .We normalize W X k to have zero mean for each column n ∈ N .The normalized matrix, denoted W X N k , has N x N covariance matrix A. Solving the equation λϵ = Aϵ yields a diagonal matrix of eigenvalues λ and their associated eigenvectors ϵ for this covariance matrix.The eigenvectors, which are orthogonal, represent directions of variance in the data.The first principal component is the eigenvector with the greatest value of λ i .The principal eigenvector, ϵ (1) , is a column vector.Taking the dot product of ϵ (1) and each of the |D| vectors w X N k,1 , w X N k,2 , . . .,w X N k,D transforms the normalized input data into a |D| × 1 vector Y * k containing strategy measurements: Each entry y * k,d ∈Y k * is a real-valued quantity that is specific to company or document d.We treat y * k,d as the set of company-specific strategy measurements for keyword v k .Thus concludes the W2VPCA algorithm.
In keeping with the theoretical foundation of symbolic data representation, we hypothesize that the resulting measurements represent a mixture of underlying concepts.We interpret this as an organizational emphasis on different latent strategies.While this provides intuitive justification for our approach, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
we suggest this as an area for future theoretical and empirical study.The resulting measurements are continuous and not constrained to pre-imposed bounds.

C. Empirical Applications and Comparison With BOW
Two empirical applications are conducted to demonstrate the usefulness and performance of the novel attitudinal measurements.The first application (Section V) examines differences in group means, focusing on the mean measurements across companies depending on whether they operate their own fleets or distribution centers.Statistical t-tests are used.We also evaluate effect size using Cohen's D [48], which represents the distance between the two group means relative to their pooled standard deviation.For example, an effect size of 0.2 means that the difference between the means is 0.2 standard deviations.By convention, 0.2 is considered small, 0.5 medium, and 0.8+ large.Effect size is assessed independently of statistical significance.In other words, a difference in means may be statistically significant, but the actual difference between the two means may be negligible.For Groups 1 and 2 with means µ 1 , µ 2 and standard deviations s 1 , s 2 , Cohen's D is computed as We further test our method in a post hoc, exploratory analysis in a second application (Section VI) that detects underlying firm strategies using factor analysis (references include [19], [49]).This study uses exploratory factor analysis (EFA) then confirmatory factor analysis (CFA).We use the empirical measurements to hypothesize latent factors, or strategies, across organizations.We use factor analysis because CFA, in particular, "enables researchers to find evidence for validity of instruments" [50].
Within each application, we compare measurements (or EFA/CFA results) from W2VPCA with measurements (or EFA/CFA results) that are generated using BOW.We refer to the BOW approach used here as Bag-of-Words with Simple Scaling (SS-BOW) to distinguish it from other BOW approaches (LDA, CATA, etc.).Following the description in other research [29], SS-BOW counts the total number of appearances, n k,d , of each keyword v k in a document d that is prepared by a company.Again, each company prepares exactly one document, so d is effectively the index for either the document or its authoring company.The total number of keyword counts in document d is The SS-BOW measurement m k,d for each keyword is a normalized, company-specific word frequency: Because of the normalization, k m k,d = 100∀d.SS-BOW measurements are therefore based on relative frequencies of word use between companies, with the premise that more frequent use of a word is correlated with preference for some underlying strategy.The source of text data for each company is its annual report from the year 2017.Similar to previous works [51], this study uses the US Securities and Exchange Commission (SEC) 10-K report [52] because it broadly discusses the company's strategies to finding success in light of operational, geopolitical, and other real-world challenges.The 10-K document format also is standardized, making it less prone to bias than documents like shareholder letters.Also following previous research [6], our study selects specific words to analyze for detecting underlying strategies that drive the company's discussion.For expedience, we select keywords based on judgment and experience with prior freight studies.We discuss this limitation and potential ways to address it in the last section.
IV. DATA Data sources include texts that refer to company strategies, data on private fleets, and data on distribution centers.This section discusses these data and the keywords that are selected for analysis.The following terms are used.A private fleet is a truck fleet that is owned by the company that uses them (compared to using a for-hire carrier).A company that owns or leases distribution centers, in contrast to outsourcing this function, is said to have "DC control."

A. Preliminaries: Data Development and Keyword Selection
Company-specific inputs are obtained from 10-K reports filed in 2017 with the US Securities and Exchange Commission (SEC) [52].Publicly owned, US-based companies file a 10-K report annually, providing a comprehensive overview of the company and its operations.We extract all sentences and words in the * .html10-K report then pre-process the text before analysis.This involves converting all words to lowercase, breaking sentences into words, and removing punctuation.Irrelevant "stop words" such as "the" and "are" are removed.We use a window size of 5 words in W2VPCA, meaning that up to 5 preceding words and up to 5 words following the target word are used as the context words, provided they are also in the same sentence.
Neither W2VPCA nor SS-BOW is computationally intensive for our dataset of 245 companies (Section IV).The processing time (including reading in all documents, preparing the words dataset, and generating measurements) was about three hours for W2VPCA and less than two hours for SS-BOW using a single 1.6GHz dual-core Intel Core i5 processor with 4 GB of memory.
Approximately 30 keywords are selected for this analysis.Consistent with attitudinal survey design, judgment and experience guided the choice of keywords.This process can be made more elaborate in future studies [53].Keywords are selected based on their possible reflection of underlying company strategy, particularly as it may relate to products, services, or logistics strategies.Two sets of keywords are tested empirically in this study (Table II).One set has 12 words that focus on subjective attitudinal words (e.g., "quality"), and the other contains a mix of 22 attitudinal and logistics/geographics-related words (e.g., "storage").

B. Company Data
The methods are applied in an empirical demonstration to study the strategies of 245 companies in freight-intensive sectors from the year 2017 Fortune 500 list [2].Freight-intensive sectors (raw materials, manufacturing, and retail/wholesale) are selected because their supply chains require transportation assets for shipping goods.This supports our empirical application, which relates latent strategies to observed strategic decisions regarding physical distribution capabilities, i.e., whether firms outsource or dedicate internal resources to these functions.
Data regarding private fleet ownership [54] for these companies are from FleetOwner magazine [55].Distribution center data are from CoStar commercial real estate data on US properties [56].This study identifies distribution centers as: (1) all properties with a "Distribution" use; and (2) all "Refrigeration/Cold Storage" ("Light Distribution," Warehouse") properties that are 20,000 (100,000, 150,000) SF or larger.Thresholds for category (2) are determined based on visual inspection of aerial images for roughly 100 properties.Geospatial data are integrated for use in further studies but are not used in the current study.
In our final sample of 245 companies, 208 have DC control and 61 have a private fleet.Each company has one report for year 2017.Thus, 245 documents (one for each company) are converted to words then analyzed.Most of the reports contain fewer than 50,000 words, with a mode of about 30,000 words.The total count of attitudinal keywords is less than 1,000 in approximately 95 percent of the documents, with an average of about 500 attitudinal keyword counts per document.
Several 10-K reports were reviewed to find anecdotal examples of keyword use.These examples suggest that, for at least some keyword occurrences, the selected keywords are used differently by companies depending on their logistics controls.Examples from the 10-K documents include: • AutoNation (no fleet, no DCs): ". . .product quality, affordability and innovation,. . ." • Aramark (has both private fleet and DCs): ". . .We operate a fleet of service vehicles. . ."

V. RESULTS
This section discusses measurement results for the baseline method (SS-BOW) and our new method (W2VPCA).Table II shows the mean and standard deviation of each measurement for all 245 companies.Mean SS-BOW measurements range from zero to 100 as expected due to the normalization process.Mean W2VPCA measurements are centered at zero as expected.

A. SSBOW Measurements
Based on the SS-BOW attitudinal keyword measurement statistics in Table II, "cost," "value" and "service" are the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.most frequently used terms but have considerable variation in relative frequency among firms.Other keywords, such as "efficiency" and "reliability," are used less frequently.Transportation-related keywords such as "ship" and "distribution" also have considerable variability in frequency of use.
It is important to assess whether these results can support strategy-related analysis.One way to investigate this is to evaluate whether the group means of the measurements differ in accordance with the decision to outsource transportation and logistics functions.If a difference is found, then it supports the hypothesis that the measurements represent latent strategies.For the SS-BOW results, differences in group means are first explored visually in Fig. 4. The figure indicates that companies appear to use certain words with different relative frequencies depending on whether they exercise DC control.For example, companies with DC control mention cost more and service less than other companies.We statistically evaluate differences in group means later in this work to test these differences more conclusively.

B. W2VPCA Measurements
This subsection demonstrates the remarkable empirical result that companies appear to use words differently, depending on the types of transportation and logistics controls that they exercise.This finding is presented and discussed from many viewpoints.
First, we observe that due to the methodology, the W2VPCA standard deviations that are reported in Table II are correlated with the range of contexts in which a given word appears.Interestingly, just as the attitudinal keywords "cost," "value" and "service" stand out as the most frequently used terms (in SS-BOW), in the W2VPCA results these words stand out due to having considerable variation in contextual uses among firms.
Since the W2VPCA measurements are based on vectors with 100 dimensions, the results are best visualized using one-and two-dimensional plots.Our first visual comparison (Fig. 5) analyzes the average keyword embedding for groups of companies that have the same logistics control status.To achieve this, W2VPCA is applied as outlined in Fig. 4, but is modified so that the keyword tagging is not company  specific.Instead, keywords are appended with "_DC" for all companies that control their distribution centers (left side of figure) and with "_F" for all companies that have a private fleet (right side of figure).Keywords for companies without the respective control are not appended.Using the notation of Section III, the results are generated using a vocabulary of size |V |+1 for each case (in the first case, DC control/No DC control is analyzed, and Fleet/No Fleet is analyzed in the second case).The measurements are generated using 100 dimensions in word2vec and two-dimensional PCA analysis (rather than using only the first principal component) to enhance the visual clarity of the outcomes.We observe the following: • The embeddings for each keyword pair (e.g., "reliability" and "reliability_F") are located in similar areas of each graph, meaning that the two groups use each word similarly; • For each keyword, there is a small difference in embedding location depending on the control status.This means that while keyword usages are similar, they are not identical; and • In some cases, the difference appears to be quite small, while in other cases it is larger.A larger difference represents a larger difference in contextual use of the word between the two groups.The remaining figures and tables in this subsection are developed using company-specific word embeddings.Fig. 6 illustrates the measurement for all keywords in the attitudinal set.Symbology is based on a combination of the two logistics factors, with "T0" ("T1") denoting companies without (with) Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
a private fleet and "D0" ("D1") denoting companies without (with) DC control.A "jitter" function is used to shift each point slightly away from its true location in a random direction, which improves the visual distinction between points.
Fig. 6 illustrates several results.First, it demonstrates the achievement of the key objective of this study: the generation of context-sensitive, quantitative, strategy measurements for each company.Second, the figure illustrates the spread in attitudinal measurements for each keyword.For instance, the measurement for "cost" ranges from about −3 to 4, while the measurement for "innovative" ranges from about -1 to 1.This implies that companies use the word "innovative" in a more limited range of contexts (compared to the word "cost").Remarkably, the range or differences in usage come from natural language that is written by the companies, with no pre-imposed scale from the analyst.Third, Fig. 6 suggests that, for at least some words, measurements differ by company strategy regarding logistics control.For instance, a handful of companies (with DC control but no private fleet, as indicated by gray triangles) on average use the word "value" quite differently from other companies, with measurements of about −4 to −5.This is similar to the point that is illustrated in Fig. 5, but Fig. 6 shows differences in measurements across individual companies rather than groups of companies.
Fourth, the right-hand side of the plot displays the percentage of variance in that is captured by the first principal component (e.g., 0.30 is 30 percent).For example, the measurement for "value" represents 30 percent of the total variance across embeddings of the word "value" among the companies.Although the implications of this result need further explanation, this study suggests that this may be a measure of "noise" in the W2VPCA measurements.It seems that ideally the W2VPCA algorithm would capture all the difference in word use between companies, which would generate a PC metric of 1.0.Refining the word or context selection process may improve this fit.We recommend future theoretical and empirical studies on this topic.

VI. COMPARISON OF METHODS
We now conduct a t-test and evaluate Cohen's D [48] for each keyword to explore whether companies with different logistics strategies appear to use the keyword in different ways.Factor analysis is also conducted with the empirical measurements.

A. Statistical Comparison of Methods
The t-test results enable statistical comparisons of group means such as those shown in Fig. 4. As such, the test results indicate which keywords may help identify underlying logistics control strategies.The hypothesis test uses Equations 10 and 11.The null hypothesis H 0 is that the mean measurement of keyword v k is the same for Group 1 and Group 2 (1 and 2 are denoted with subscripts).The alternative hypothesis H AL T is that the mean measurements are not equal.A two-tailed test is used since the difference in means can be positive or negative.Groups are based on logistics control status, e.g., the test measures whether there is a statistically significant difference between measurements depending on private fleet ownership.
Tests are conducted for mean keyword measurement differences for each keyword set and each method.Some words are present in both keyword sets.In general, a keyword measurement depends partly on set membership.Therefore, a separate t-test must be used.Table II summarizes the t-test results.A total of 136 t-tests are conducted, with four tests (based on two methods and two types of logistics control) conducted for each of the 34 keywords (from the 12 and 22 keyword sets).Each row in the table shows the results for four hypothesis tests depending on which method and type of control is being tested.For example, within the attitudinal keyword set, the SS-BOW difference in means for the keyword "affordability" is 0.02 for the two DC Control groups.This is statistically significant at the 91 percent level, providing evidence that "affordability" on average really is used more or less frequently by companies depending on whether they have DC control or not.In addition, Table II shows the effect size using Cohen's D. Using the conventional thresholds [48], the stars are filled in halfway for relatively small differences in means (0.2 < Cohen's D < 0.5) and fully for both moderate (0.5 < Cohen's D < 0.8) and large differences (Cohen's D > 0.8).
A lower p-value, or higher t-statistic, for group mean differences suggests that the associated keyword detects differences in logistics control strategies relatively well.Using a 70 percent significance threshold, or p-value less than 0.30 (denoted with light gray shading in the table), the W2VPCA (SS-BOW) method produces nine (12) statistically significant differences when the Attitudinal keyword set is used to distinguish differences between companies with varying logistics controls.W2VPCA (SS-BOW) produces 28 (24) statistically significant differences in means when the Mixed keyword set is used.A similar pattern holds for effect sizes over 0.2.
These findings suggest the following.First, although both methods are still in the empirical development stage and are not theoretically proven, these results lend empirical support to their underlying assumptions.In other words, these results convey that companies use many words differently, and more or less often, depending on their strategies.Second, we also would like to compare the performance of each method.Since the methods are new, a formal procedure to make this comparison is not yet available.For now, in an ad hoc fashion, we observe that SS-BOW yields a greater number of significant differences when the Attitudinal keyword set is used, while W2VPCA yields more with the Mixed set.This finding, while interesting, does not immediately suggest that either method is superior.This is recommended as an area to explore in future studies.

B. Empirical Application: Factor Analysis
This section provides further insights into the performance of various keywords and methods.We present an empirical application using the strategy measurements in traditional psychometric applications containing latent variables: EFA and CFA [50].Factor analysis is a method that helps identify latent variables (factors) among a set of observed indicators (measurements).Statistically, this is achieved by relating the covariance of measurements to the hypothesized factors.EFA helps identify the number of factors present in the data.It also helps identify the factor structure, or the set of measurements that are associated with each factor.EFA estimates factor loadings (or path coefficients), which reflect the correlation between factors and the various measurements.All measurements typically are included in this exploratory analysis.In contrast, CFA is used to evaluate, or confirm, the statistical validity of a particular factor structure that is specified by the analyst.
After examining the results from Section VI-A, we created four sets of keywords to test initially in EFA.Each keyword set is named based on (a) whether it contains attitudinal keywords only or a mix of keywords and (b) the number of words in the set: • Attitudinal (7) set: efficiency, innovative, quality, reliability, security, service, technology  9) plus customer, delivery, efficiency, environmental, provide, reliability, safety, security, ship, standard Each of the four keyword sets is input to an EFA, with EFA performed for x latent factors (where x is one, two or three).The R factanal command is used [57].The resulting statistical estimates are shown in Table III.Each row of the table shows the fit results of one EFA.The EFA Parameters column notes the number of factors (Factors) used in the EFA and the resulting degrees of freedom (DOF) [49].The null hypothesis in each case is that x factors are sufficient.The null hypothesis is rejected with more confidence with higher Chi-Squared statistics (and lower p-values).As such, a lower Chi-Squared statistic (and higher p-value) suggests that x factors are sufficient for capturing the covariance among keywords.We use a minimum p-value of 0.05 to identify statistical support for the number of factors tested.The EFA yields two outcomes.First, we observe that EFA with W2VPCA results outperforms EFA with SS-BOW in all but the Mixed (19) EFA, where neither performs well.Based on the criteria, one factor is sufficient (with a p-value of 0.048) with the Attitudinal (7) set and W2VPCA measurements, although using two factors is much better statistically.Second, we proceed to CFA using the EFA outcomes, which indicate that the Attitudes (7) set is sufficient to detect two underlying factors.The EFA findings indicate that "innovative" and "security" will provide a good foundation for one, with "efficiency," "reliability" and "technology" as the foundation for the second.Based on additional experimentation, "quality" is also assigned to the former and "service" to the latter.The factors are named Product Focus and Delivery Focus since they are believed to represent the companies' underlying strategies regarding different emphases on product development and innovation versus customer service in terms of shipping and delivery to the customer.
The results for these factors, shown in Table IV, statistically demonstrate that the W2VPCA measurements perform better than the SS-BOW measurements in the CFA.The SS-BOW and W2VPCA-based models have a Comparative Fit Index (CFI) of 0.849 and 0.917, respectively.Using the recommended minimum threshold of 0.9 [49], the SS-BOW-based model is rejected.In other words, W2VPCA detects latent strategies that SS-BOW does not conclusively detect in this empirical example.Z-values and p-values, which have the usual interpretation here, show that all W2VPCA keyword variables are statistically significant at about the 90 percent or higher level, with six of the seven variables being significant at the 99 percent level.The SS-BOW keywords are generally not as statistically significant, with most of them being significant at the 80 percent level or better.Many of the SS-BOW keywords would therefore be rejected as insignificant when a high threshold for statistical significance (such as 95%) is used.CFA estimates (or loadings) represent the correlation between the keyword measurement and the underlying strategy, or factor.Therefore, another interpretation of the estimated parameters in Table IV is that the W2VPCA-based estimates generally are more correlated with the underlying factors (with five of the estimates exceeding 0.4) than the SS-BOW-based estimates (which yields only two well correlated estimates).
These empirical results indicate that W2VPCA produces latent strategy measurements that can be used in behavioral models with psychometric inputs.While these empirical results do not constitute proof that either method is superior to the other, they do suggest that W2VPCA measurements outperform SS-BOW measurement in at least some instances.

VII. SUMMARY AND EXTENSIONS
This study develops a new method for generating latent strategy measurements using existing text instead of surveys.W2VPCA computes measurements for psychometric models by analyzing differences in word usage by different organizations.We compare the new method with BOW using company reports with thousands of words.In this empirical analysis, both methods identify differences in company strategies.However, W2VPCA measurements outperform SS-BOW measurements in a psychometric analysis of firm strategies.
Limitations of this work and potential extensions include the following.The empirical analysis can be extended to include smaller firms.Researchers can apply W2VPCA in other contexts to strengthen empirical evidence of its value.As an example, text from company websites can be analyzed to infer company strategies regarding sustainability.Measurements for words associated with sustainability can be analyzed in conjunction with other company attributes (e.g., industry sector) to infer the effect of companies' environmental strategies on fleet electrification decisions, for example.Our keywords were manually selected; meta-analysis methods can be developed to improve the keyword selection process.Future work can further explore the theoretical validity of W2VPCA.W2VPCA measurements can be tested in other types of models.A comparison of W2VPCA results to measurements from attitudinal surveys would provide more insights into the strengths and weaknesses of each measurement method.Company executives can be interviewed to confirm whether the inferred strategies match their real-world strategies.Procedures to statistically compare SS-BOW and W2VPCA methods can be further developed.
x c (|V | x1) : one-hot input vector for context word v c {x c } : set of one hot vectors for context words v 1 , v 2 , . . ., v c , . . ., v C surrounding the target word v j x (|V | x1) : CBOW input vector, formed using {x c } y (|V | x1) : output vector of the predicted target word h (N x1) : compact vector representation of x W X (|V | xN ) : weight matrix that transforms x into h W Y (N x |V |) : weight matrix that transforms h into y w X i (1xN ) : i th row vector in W X associated with word v i w Y j (N x1) : j th column vector in W Y associated with word v j

TABLE I COMPARISON
OF METHODS TO GENERATE STRATEGY MEASUREMENTS depending on context Algorithm 1 W2VPCA Pseudocode Input: Set of dociments D and set of keywords K Output: Measurements y * k,d ∀v k ∈ K , d ∈ D 1: for each keyword v k in K do 2: // constructing modified document set D k with vocabulary V k 3: for each document d in D do

TABLE II DESCRIPTIVE
STATISTICS AND DIFFERENCES IN MEANS (STATISTICAL SIGNIFICANCE AND EFFECT SIZE)

TABLE III EFA
RESULTS FOR THREE OR FEWER FACTORS