STOCHASTIC QUERY OPTIMIZATION FOR LARGE SCALE TEXT SEARCH
Searching on social media about an entity or a concept can be challenging when such entity has ambiguous meanings. Identifying the correct searching terms is time-consuming, as users have to examine the search results in order to adjust the searching query. Our project provides an automatic searching query construction algorithm to help solve this problem.
Our industrial partner, Legendary Entertainment, is a leading media company in the US with films, TVs, digital and comics. One thing Legendary cares about is what the public talk about their products, and they try find this out by running queries on social media such as Twitter.
The problem they encounter is that when the entity they want to query sometimes can be ambiguous and thus is hard to search for. Thus, Legendary needs to use combinations of text tokens constrained to simple logical operators so that the returned tweets form a highly pure set where the majority of the tweets are about the entity they curious about.
Coming up with a query by hand that returns a highly pure set of tweets can be time consuming and error prone. People need to first run a simple query that just contain the entity name and examine the returned tweets to figure out a text token to add to original query. Doing this over and over again can finally produce a satisfying query.
One Example is "Fargo"
The problem they encounter is that when the entity they want to query sometimes can be ambiguous and thus is hard to search for. Thus, Legendary needs to use combinations of text tokens constrained to simple logical operators so that the returned tweets form a highly pure set where the majority of the tweets are about the entity they curious about.
Coming up with a query by hand that returns a highly pure set of tweets can be time consuming and error prone. People need to first run a simple query that just contain the entity name and examine the returned tweets to figure out a text token to add to original query. Doing this over and over again can finally produce a satisfying query.
One Example is "Fargo"
Every year Legendary has about 50 projects to do and they are required to generate over 100 queries to search on social media. It takes about 2 to 3 hours to come with one satisfying query by human. Moreover, these manually generated queries are not perfect, since human can misread or misanalyze the returned tweets. Therefore, we are motivated to create an automated query-generating model. Our goal is to generate queries for entities Legendary specifies, especially for entities that have ambiguous meanings. The tweets returned by our queries should be fairly pure, meaning the false positive rate should be relatively low.