How organizations deal with outliers, those data points that occur where they’re not expected, provide useful insights into the culture and data maturity of those organizations. Outliers occurring in simple frequency graphs could be blips that occur at the extreme ends of the normal curve. In e-discovery, outliers can be documents flagged by analytics software […]

Read More

Selection bias occurs when data are selected for analysis in a way that not all objects being evaluated are equally likely to be selected. This results in samples that are not representative of entire populations. An extreme example would be predicting the presidential race by only sampling New York City or Los Angeles, or predicting all […]

Read More

Simpson’s Paradox is a kind of statistical brain teaser that provides lessons on text analytics and choosing the best tools to work with enterprise content. The “paradox” is that sometimes trends that seem apparent when data are analyzed as separate groups become reversed or disappear when the groups are combined. An example of Simpson’s Paradox […]

Read More

Implicit biases – those that we form and use without explicit consideration – can wreak havoc on achieving critical goals. One such type of bias is especially damaging when designing file classification systems – confirmation bias. That is the “…tendency to search for, interpret, favor, and recall information in a way that confirms one’s preexisting […]

Read More

The Data-Information-Knowledge-Wisdom (“DIKW”) model is a useful for examining how well an organization is doing in deriving value from its unstructured content. In his book, Too Big to Know,* David Weinberger credits Russell Ackoff, a leading organizational theorist, with making a pyramid-shaped depiction of the DIKW model in a 1988 address to the International Society for […]

Read More

The central theme of David Weinberger’s book Everything is Miscellaneous* is that no single method of classification serves all purposes, and it is a concept worth considering when designing classification schemes for enterprise content management (“ECM”). One example of a classification scheme that he uses is the well-known periodic table which arranges basic elements in […]

Read More

In Everything is Miscellaneous, David Weinberger points out that no single classification system will necessarily best serve all those who use the classified content, and he points out several tools used by popular websites to let individual users create and share what they consider to be significant information. Many of those tools could be applied to improve the […]

Read More