So… you think all your data in a particular field are a string type, but when you try to run your query, you get some errors. Doing more investigation, it looks like you have some int and undefined types as well. Bummer…
Despair not! We can actually work around this (without data prep 😉). To recap, in our first blog, we created an integration with MongoDB on Rockset, so Rockset can read and [update] the data coming in MongoDB. Once the data is in Rockset, we can run SQL on schemaless and unstructured data.
We are interested in the
Essentially, once you turn your string formatted date into a date object, you can then extract the year.
At first glance, this seems pretty easy to solve— if you wanted to order all the movie titles by the release year, you can write something like this:
SELECT t.title, t.release_date FROM commons.TwtichMovies t ORDER BY EXTRACT( YEAR FROM DATE_PARSE(t.release_date, '%Y-%m-%d') ) DESC ;
When running this query, we get a timestamp parsing error:
Timestamp parse error:
This could mean you’re working with other data types that are not strings. To inspect, you can write something like this:
SELECT t.title, TYPEOF(t.release_date) FROM commons.TwtichMovies t WHERE TYPEOF(t.release_date) != 'string' ;
This is what we get back:
Now, that we know what’s causing the error, we can re-write the query to discard anything that’s not a string type— right 🤗?
SELECT t.title, t.release_date FROM commons.TwtichMovies t WHERE TYPEOF(t.release_date) = 'string'; ORDER BY EXTRACT( YEAR FROM DATE_PARSE(t.release_date, '%Y-%m-%d') )DESC ;
WRONG 🥺! This actually returns a timestamp parsing error as well:
Timestamp parse error
You’re probably saying to yourself, “what the heck.” One case we didn’t take into consideration earlier is that there could be empty strings 🤯- If we run the following query:
SELECT DATE_PARSE('', '%Y-%m-%d');
We get the same timestamp parsing error back:
Timestamp parse error
How do we actually write this query to avoid the timestamp parsing errors? Here, we can actually check the LENGTH() of the string and filter out everything that doesn’t meet the length requirement— so something like this:
WHERE LENGTH(t.release_date) = 10
We can also TRY_CAST()
t.release_date to a string. If the field value can’t be turned into a string, a
null value is returned (i.e. it won’t error out). Putting this all together, we can technically write something like this:
SELECT t.title, t.release_date FROM commons.TwtichMovies t WHERE TRY_CAST(t.release_date AS string) is not null AND LENGTH(TRY_CAST(t.release_date AS string)) = 10 ORDER BY EXTRACT( YEAR FROM DATE_PARSE(t.release_date, '%Y-%m-%d') ) ;
Voila! it works!
During the stream, I actually wrote a more complicated version of this query. The above query and the query in the stream are equivalent. We also wrote queries that aggregate! You can catch the full breakdown of the session below:
TLDR: you can find all the resources you need to get started on Rockset in the developer corner.