Text Analytics 

Analyzing user-generated data (e.g. customer reviews, social media blog posts) can derive valuable business insights that drive innovation and revenue growth. Existing text analytic tools only provide a coarse view of the data and leave detailed hypothesis testing to manual reading, e.g. why do people not like about the Sushi in this restaurant? SocoDB provides a scalable platform that enables you to find precise evidence for your hypotheses from millions of rows in days instead of weeks. 

Who needs it? 

noun_management consulting_2072736.png

Consulting firm


Product innovation team

noun_data science_1875850.png

Data Scientist

Text analytics is hard

Limited Keywords Match

Existing tools depend on combinations of keywords to filter down the data. Keyword matching falls short to handle the wide range of natural language variations, especially in noisy real user data, leading to inaccurate analysis.

Manual Aspect Extraction

A restaurant has a bad rating but is it because of the food or is it because of the service? No existing tools can directly tell such fine-grained information that completes the story, which leaves the only option is to read the data manually.

Sloppy Quantification

It’s crucial to accurately quantify how big a phenomenon base on data. Calculating the proportion of opinion is often hand-wavy or labor-intensive, which impacts the reliability of the final results.

SocoDB makes it easy

Natural Language Understanding

SocoDB is built on top of large-scale pretraining models that learn from billions of text data. It understands the similarity between “stressed” from “exhausted” and filters down the data with 1 query rather than 40 different keywords.

Automatic Aspect Aggregation

SocoDB supports semantic aggregation that finds groups of answers to specific questions, e.g. what do people not like about it?  SocoDB also provides explanations on how it gets the results, allowing analysts to trace back to the original source.

Accurate Quantification

SocoDB analyzes the data and finds the semantic relationships between any two data points. This allows analysts in real-time to find out accurate quantity estimation for any hypothesis