Geographic Data Science, Theory, and Black Boxes:

Ensuring Evaluations do not 'Speak for Themselves'

Sam Stehle, Penn State / @ThHigherThFewer

AAG 2017: Geographic Data Science, April 9, 2017

GeoVISTA PSU
FarberClusters

many methods do not generate a single, optimal, ‘gold standard’ result

“The whole point in performing unsupervised methods in data mining is to find previously unknown knowledge”

infinite parameter combinations

generative components

partial grouping assignments

Färber et al 2010, 2
“Models which achieve better predictive perplexity often have less interpretable latent spaces” ChangEvaluation
Chang et al 2009, 2

Interestingness Measures

Measure Classification Description
Conciseness Objective pattern contains few attribute-value pairs
Generality/Coverage Objective pattern characterises greater proportion of inputs
Diversity Objective pattern contents are significantly different from one another
Peculiarity Objective pattern is far from other patterns as per a distance measure
Reliability Objective pattern occurs in high percentage of applicable cases
Novelty Subjective pattern contains information not previously known or inferrable
Unexpectedness/Surprisingness Subjective pattern contradicts existing expectations
Utility Semantic pattern contributes toward reaching a goal
Actionability Semantic pattern enables decision making

An illustration: Latent Dirichlet Allocation

Semi-classification of text
LDA_Diagram
3 critical parameters
  • k - number of topics/clusters
  • alpha - prior distribution of documents-to-topics
  • minimum term-frequency/inverse-document frequency (tf-idf)- defines terms in the vocabulary
Model Sensativity Analysis

4 levels of k: 10, 25, 50, 100

3 minimum term frequency - inverse document frequency

4 alpha prior distributions

Catalonian parliamentary elections

de facto referendum on independence from Spain

21688 English news articles

August-November 2015

local, regional, national, international news sources

variety of specific sections, where available

Conciseness

pattern contains few attribute-value pairs

fewer topics fit better into user's knowledge base

ChangEvaluation
Generality/Coverage

pattern characterises greater proportion of inputs

higher minimum tf-idf reduces size of vocabulary

documents consisting of no terms in vocabulary cannot be coded

measured as percent of documents coded

Generality/Coverage

pattern characterises greater proportion of inputs

min tf-idf reduction in vocabulary generality reduction from full coverage
0.3 18.43 % 99.26 % 0.74 %
0.56 50.7 % 99.05 % 0.95 %
0.8 70.44 % 92.54 % 7.46 %
Diversity

pattern contents are significantly different from one another

topic disparity yields new, separate insights

measured as variance from even distribution of each document to every topic

Diversity

pattern contents are significantly different from one another

Diversity

pattern contents are significantly different from one another

Peculiarity

pattern is far from other patterns as per a distance measure

measured as inverse of proportion of terms shared by topics in a pair of patterns

number of topics and minimum tf-idf influence potentially shared terms

Peculiarity

pattern is far from other patterns as per a distance measure

alpha = 0.006         alpha = 0.029
Peculiarity

pattern is far from other patterns as per a distance measure

  minimum tf-idf = 0.03         minimum tf-idf = 0.056
Reliability

pattern occurs in high percentage of applicable cases

track topics and terms for individual documents across patterns

take random sample of documents

Novelty

pattern contains information not previously known or inferrable

subjective evaluation of patterns via their topic definitions

increased range of terms - higher tf-idf

--and--

increased topics

increase the range of information available

Novelty

pattern contains information not previously known or inferrable

10 topic model: 'glencor' 'casilla' 'aspa' 'volkswagen' 'emiss' 'pet' 'deulofeu' 'elch'

25 topic model: 'dish' 'chicken' 'pan' 'rice' 'dice' 'squid' 'nadal' 'prawn'

50 topic model: 'abus' 'nosecessionist' 'insult' 'shakira' 'pop' 'summon' 'racist' 'fling'

100 topic model: 'pet' 'vet' 'microchip' 'vaccine' 'refuge' 'fenc' 'foncubierta' 'martinez'

Unexpectedness/Surprisingness

pattern contradicts existing expectations

special case of novelty

Utility/Actionability

pattern contributes toward reaching a goal/taking an action

I combine goals and actions in this evaluation

select patterns to map semantic spaces of topics onto geographic space

Summary
Measure Recommendation Confidence
Conciseness low k high
Generality/Coverage low minimum tf-idf high
Diversity low alpha, mid tf-idf medium
Peculiarity low k: low alpha, high k: high tf-idf low
Reliability * *
Novelty high topics, low minimum tf-idf high
Unexpectedness/Surprisingness high topics, medium tf-idf, medium alpha low
Utility/Actionability * *
Final thoughts

  • evaluation, not validation
  • consider what evaluations actually measure
  • parameterization
  • approaching general theory on using black box methods
  • evaluative measures are independent
  • Sam Stehle

    samstehle@psu.edu

    http://personal.psu.edu/sks5122/Presentations/AAG2017_BlackBoxes/#/

    @thHigherthFewer