Friday, January 20, 2017
"Predictive Linguistics Interview 1/19/2017"
Clif High Interview 1/19/2017
"About Predictive Linguistics And Our Methods"
"Predictive Linguistics is the process of using computer software to aggregate vast amounts of written text from the internet by categories delineated by emotional content of the words and using the result to make forecasts based on the emotional 'tone' changes within the larger population. A form of 'collective sub-conscious expression' is a good way to think of it. Predictive linguistics can be used to forecast trends at many different levels, from the detail of sales to individuals, all the way up to forecasts about emerging global population trends. It is this last that concerns us here at halfpasthuman.com
We invented the 'emotive reduction algorithm(s)' employed in 1997, as well as much of the emerging science behind deep data mining for emotional content over these past decades. Predictive Linguistics uses emotional qualifiers and quantifiers, expressed as numeric values, for each and all words/phrases discovered/filtered in the aggregation process. Over 80 % of all the words gathered will be discarded for one or more reasons.
Predictive Linguistics works as NO conscious expressions are processed through the software. Rather the contexts discussed within the report in the form of entities and linguistic structures are read up in the various intake software programs, and the emotional sums of the language found at that time are retrieved. Words that are identified within my system as 'descriptors' are passed through the processing as well. These descriptor words, in the main, are those words and phrases that provide us with the detail sets within the larger context sets.
As an example, the word 'prophecy' may be read up by our software at a sports oriented forum. In that case, perhaps, due to the emotional sums around the context, and the emotional values of the word itself within the lexicon, it would be put into the contextual 'bin' within the database as a 'detail word'. Note that the context of the use of the word in the sports forum is lost in the process and is of no use to us in these circumstances. What occurs is that the word is picked up as being atypical in its context, therefore of high potential 'leakage of future' value. The way this works is that most sports forum language about future events would be statistically more likely to use words such as 'bet' as in 'I bet this XXX will be outcome', or 'I predict', or 'I think that XXX will happen'. So it is the context plus emotional values plus rarity of use within the context that flags words for inclusion in the detail level of the data base. Further, it is worth noting that most detail level words are encountered in our processing mere days before their appearance. Within the IM data primarily, and then within ST data next. But a preponderance are discovered within the IM time period. Perhaps an artifact of our processing, if so, one not explored due to lack of time (cosmic joke noted).
Words are linked by their array values back to the lexicon using our set theory model, and the language used within the interpretation (detail words excepted) derives from the lexicon and its links to the changing nature of contexts as they are represented within our model.
Predictive Linguistics is a field that I pioneered in 1993. The software and lexicon has been in continual change/update mode since. This is due to the constantly changing nature of language and human expression. Predictive Linguistics works to predict future language about (perhaps) future events, due to the nature of humans. It is my operating assumption that all humans are psychic, though the vast majority do nothing to cultivate it as a skill, and are likely unaware of it within themselves. In spite of this, universe and human nature has it that they 'leak' prescient information out continuously in their choice of language. My software processing collects these leaks and aggregates them against a model of a timeline and that information is provided in this report.
The ALTA report is an interpretation of the asymmetric trends that are occurring even this instant as millions of humans are typing billions of words on the internet. The trends are provided in the form of a discussion of the larger collections of data (dubbed entities) down to the smallest aspect/attribute swept up from daily discussions within that context. Within the ALTA report format, detail words are provided as noted below. Phrases and idiomatic expressions are also provided as details. In the main, geographic references are merely summed, and if deemed pertinent, the largest bag in the collection is discussed as a 'probable', or 'possible' location to the events being referenced within the details. In our discussions, the interpretation is provided in a nested, set theory (fuzzy logic) pattern.
• Aspects/Attributes are: collections of data that are within our broader linguistic structures and are the 'supporting' sets that provide our insight into future developments. The Aspect/Attribute sets can be considered as the 'brought along' serendipitous future forecasts by way of links between words in these sets and the lexicon.
• Entities are: the 'master sets' at the 'top' of our nested linguistic structures and contain all reference that center around the very broad labels that identify the entity: Markets, and GlobalPop as examples.
• Lexicon is: at its core level, the lexicon is a digital dictionary of words in multiple languages/alphabets stripped of definitions other than such technical elements as 'parts of speech' identifiers. The lexicon is quite large and is housed in a SQL database heavily populated with triggers and other executable code for maintenance and growth (human language expands continuously, so the lexicon must as well). Conceptually, at the Prolog software engine processing level, the lexicon is a predicate assignment of a complex, multidimensional array of integers to 'labels', each of which is a word within the lexicon. The integers within the 8x8x10 level array structure are composed of emotional qualifiers which are assigned numeric representations of the intensity, duration, impact and other values of the emotional components given by humans to that word and also contain emotional quantifiers which are assigned numeric representations of the degree of each of the 'cells' level of 'emotional assignment'.
• Spyders are: Software programs, that once executed are self directing, within programmed limits, thus are called 'bots', and within these constraints are allowed to make choices as to linguistic trails to explore on the internet. The job of the spyders is to search, retrieve and pre-process (part of the exclusions process that will see 90% of all returned data eliminated from consideration in our model) the 'linguistic bytes' (2048 words/phrases in multibyte character format) which are aggregated into our modelspace when processing is complete."